Jaggies

Jaggies

Jaggies are visual artifacts in raster images, most frequently from aliasing, which in turn is often caused by non-linear mixing effects producing high-frequency components, or missing or poor anti-aliasing filtering prior to sampling. Jaggies are stair-like lines that appear where there should be "smooth" straight lines or curves. For example, when a nominally straight, un-aliased line steps across one pixel either horizontally or vertically, a "dogleg" occurs halfway through the line, where it crosses the threshold from one pixel to the other. Jaggies should not be confused with most compression artifacts, which are a different phenomenon. == Causes == Jaggies occur due to the "staircase effect". This is because a line represented in raster mode is approximated by a sequence of pixels. Jaggies can occur for a variety of reasons, the most common being that the output device (display monitor or printer) does not have sufficient resolution to portray a smooth line. In addition, jaggies often occur when a bit-mapped image is scaled to a higher resolution. This is one of the advantages that vector graphics have over bitmapped graphics – a vector image can be losslessly scaled to any arbitrary resolution or stretched infinitely in either axis without introducing jaggies. == Solutions == The effect of jaggies can be reduced by a graphics technique known as spatial anti-aliasing. Anti-aliasing smooths out jagged lines by surrounding them with transparent pixels to simulate the appearance of fractionally-filled pixels when viewed at a distance. The downside of anti-aliasing is that it reduces contrast – rather than sharp black/white transitions, there are shades of gray – and the resulting image can appear fuzzy. This is an inescapable trade-off: if the resolution is insufficient to display the desired detail, the output will either be jagged, fuzzy, or some combination thereof. While machine learning-based upscaling techniques such as DLSS can be used to infer this missing information, other types of artifacts may be introduced in the process. In real-time 3D rendering such as in video games, various anti-aliasing techniques are used to remove jaggies created by the edges of polygons and other contrasting lines. Since anti-aliasing can impose a significant performance overhead, games for home computers often allow users to choose the level and type of anti-aliasing in use in order to optimize their experience, whereas on consoles this setting is typically fixed for each title to ensure a consistent experience. While anti-aliasing is generally implemented through graphics APIs like DirectX and Vulkan, some consoles such as the Xbox 360 and PlayStation 3 are also capable of anti-aliasing to little direct performance cost by way of dedicated hardware which performs anti-aliasing on the contents of the framebuffer once it has been rendered by the GPU. Jaggies in bitmaps, such as sprites and surface materials, are most often dealt with by separate texture filtering routines, which are far easier to perform than anti-aliasing filtering. Texture filtering became ubiquitous on PCs after the introduction of 3Dfx's Voodoo GPU. == Notable uses of the term == In the 1985 game Rescue on Fractalus! for the Atari 8-bit computers, the graphics depicting the cockpit of the player's spacecraft contains two window struts, which are not anti-aliased and are therefore very "jagged". The developers made fun of this and named the in-game enemies "Jaggi", and also initially titled the game Behind Jaggi Lines!. The latter idea was scrapped by the marketing department before release.

SEMAT

SEMAT (Software Engineering Method and Theory) is an initiative to reshape software engineering such that software engineering qualifies as a rigorous discipline. The initiative was launched in December 2009 by Ivar Jacobson, Bertrand Meyer, and Richard Soley with a call for action statement and a vision statement. The initiative was envisioned as a multi-year effort for bridging the gap between the developer community and the academic community and for creating a community giving value to the whole software community. The work is now structured in four different but strongly related areas: Practice, Education, Theory, and Community. The Practice area primarily addresses practices. The Education area is concerned with all issues related to training for both the developers and the academics including students. The Theory area is primarily addressing the search for a General Theory in Software Engineering. Finally, the Community area works with setting up legal entities, creating websites and community growth. It was expected that the Practice area, the Education area and the Theory area would at some point in time integrate in a way of value to all of them: the Practice area would be a "customer" of the Theory area, and direct the research to useful results for the developer community. The Theory area would give a solid and practical platform for the Practice area. And, the Education area would communicate the results in proper ways. == Practice area == The first step was here to develop a common ground or a kernel including the essence of software engineering – things we always have, always do, always produce when developing software. The second step was envisioned to add value on top of this kernel in the form of a library of practices to be composed to become specific methods, specific for all kinds of reasons such as the preferences of the team using it, kind of software being built, etc. The first step is as of this writing just about to be concluded. The results are a kernel including universal elements for software development – called the Essence Kernel, and a language – called the Essence Language - to describe these elements (and elements built on top of the kernel (practices, methods, and more). Essence, including both the kernel and language, has been published as an OMG standard in beta status in July 2013 and is expected to become a formally adopted standard in early 2014. The second step has just started, and the Practice area will be divided into a number of separate but interconnected tracks: the practice (library track), the tool track are so far identified and work has started or is about to get started. The practice track is currently working on a Users Guide. == Education area == The area focuses on leveraging the work of SEMAT in software engineering education, both within academia and industry. It promotes global education based on a common ground called Essence. The area's target groups are instructors such as university professors and industrial coaches as well as their students and learning practitioners. The goal of the area is to create educational courses and course materials that are internationally viable, identify pedagogical approaches that are appropriate and effective for specific target groups and disseminate experience and lessons learned. The area includes members from a number of universities and institutes worldwide. Most members have already been involved in leveraging aspects of SEMAT in the context of their software engineering courses. They are gathering their resources and starting a common venture towards defining a new generation of SEMAT-powered software engineering curricula. As of 2018, some studies of utilizing Essence in educational settings exist. One example of the use of Essence in university education was a software engineering course carried out in Norwegian University of Science and Technology. A study was conducted by introducing Essence into a project-based software engineering course, with the aim of understanding what difficulties the students faced in using Essence, and whether they considered it to have been useful. The results indicated that Essence could also be useful for novice software engineers by (1) encouraging them to look up and study new practices and methods in order to create their own, (2) encouraging them to adjust their way-of-working reflectively and in a situation-specific manner, (3) helping them structure their way of working. The findings of another study introducing students to Essence through a digital game supported these findings: the students felt that Essence will be useful to them in future, real-world projects, and that they wish to utilize it in them. == Theory area == An important part of SEMAT is that a general theory of software engineering is planned to emerge with significant benefits. A series of workshops held under the title SEMAT Workshop on a General Theory of Software Engineering (GTSE) are a key component in awareness building around general theories. In addition to community awareness building, SEMAT also aims to contribute with a specific general theory of software engineering. This theory should be solidly based on the SEMAT Essence language and kernel, and should support software engineering practitioners' goal-oriented decision making. As argued elsewhere, such support is predicated on the predictive capabilities of the theory. Thus, the SEMAT Essence should be augmented to allow the prediction of critical software engineering phenomena. The GTSE workshop series assists in the development of the SEMAT general software engineering theory by engaging a larger community in the search for, development of, and evaluation of promising theories, which may be used as a base for the SEMAT theory. == Organizational structure == === Main organization === SEMAT is chaired by Sumeet S. Malhotra of Tata Consultancy Services. The CEO of the organization is Ste Nadin of Fujitsu. The Executive Management Committee of SEMAT are Ivar Jacobson, Ste Nadin, Sumeet S. Malhotra, Paul E. McMahon, Michael Goedicke and Cecile Peraire. === Japan Chapter === Japan Chapter was established in April 2013, and it has more than 250 members as of November 2013. Member activities include carrying out seminars about SEMAT, considering utilization of SEMAT Essence for integrating different requirements engineering techniques and body of knowledges (BoKs), and translating articles into Japanese. === Korea Chapter === The chapter was inaugurated with about 50 members in October 2013. Member activities include: 2e Consulting started rewriting their IT service engagement methods using the Essence kernel, and uEngine Solutions started developing a tool to orchestrate Essence-kernel based practices into a project method. Korean government supported KAIST to conduct research in Essence. === Latin American Chapter === Semat Latin American Chapter was created in August 2011 in Medellin (Colombia) by Ivar Jacobson during the Latin American Software Engineering Symposium. This Chapter has 9 Executive Committee members from Colombia, Venezuela, Peru, Brazil, Argentina, Chile, and Mexico, chaired by Dr. Carlos Zapata from Colombia. More than 80 people signed the initial declaration of the Chapter and nowadays the Chapter members are in charge of disseminating the Semat ideas in all Latin America. Chapter members have participated in various Latin American conferences, including the Latin American Conference on Informatics (CLEI), the Ibero American Software Engineering and Knowledge Engineering Journeys (JIISIC), the Colombian Computing Conference (CCC), and the Chilean Computing Meeting (ECC). The Chapter contributed in the submission sent in response to the OMG call for proposals and currently studies didactic strategies for teaching the Semat kernel by games, theoretical studies about some kernel elements, and practical representations of several software development and quality methods by using the Semat kernel. Some of the members also translated the Essence book and some other Semat materials and papers into Spanish. === Russia Chapter === Russian Chapter has about 20 members. A few universities have incorporated SEMAT in their training courses , including Moscow State University, Moscow Institute of Physics and Technology, Higher School of Economics, Moscow State University of Economics, Statistics, and Informatics. The chapter and some commercial companies are carrying out seminars about SEMAT. INCOSE Russian Chapter is working on an extension of SEMAT to systems engineering. EC-leasing is working on an extension of the Kernel for Software Life Cycle. Russian Chapter attended in two conferences: Actual Problems of System and Software Engineering and SECR with SEMAT section and articles. Translation of the Essence book into Russian is in progress. == Practical Applications of SEMAT == Ideas developed by the SEMAT community have been applied by both industry and ac

Noisy text analytics

Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques. == Techniques for noisy text analysis == Missing punctuation and the use of non-standard words can often hinder standard natural language processing tools such as part-of-speech tagging and parsing. Techniques to both learn from the noisy data and then to be able to process the noisy data are only now being developed. == Possible source of noisy text == World Wide Web: Poorly written text is found in web pages, online chat, blogs, wikis, discussion forums, newsgroups. Most of these data are unstructured and the style of writing is very different from, say, well-written news articles. Analysis for the web data is important because they are sources for market buzz analysis, market review, trend estimation, etc. Also, because of the large amount of data, it is necessary to find efficient methods of information extraction, classification, automatic summarization and analysis of these data. Contact centers: This is a general term for help desks, information lines and customer service centers operating in domains ranging from computer sales and support to mobile phones to apparels. On an average a person in the developed world interacts at least once a week with a contact center agent. A typical contact center agent handles over a hundred calls per day. They operate in various modes such as voice, online chat and E-mail. The contact center industry produces gigabytes of data in the form of E-mails, chat logs, voice conversation transcriptions, customer feedback, etc. A bulk of the contact center data is voice conversations. Transcription of these using state of the art automatic speech recognition results in text with 30-40% word error rate. Further, even written modes of communication like online chat between customers and agents and even the interactions over email tend to be noisy. Analysis of contact center data is essential for customer relationship management, customer satisfaction analysis, call modeling, customer profiling, agent profiling, etc., and it requires sophisticated techniques to handle poorly written text. Printed Documents: Many libraries, government organizations and national defence organizations have vast repositories of hard copy documents. To retrieve and process the content from such documents, they need to be processed using Optical Character Recognition. In addition to printed text, these documents may also contain handwritten annotations. OCRed text can be highly noisy depending on the font size, quality of the print etc. It can range from 2-3% word error rates to as high as 50-60% word error rates. Handwritten annotations can be particularly hard to decipher, and error rates can be quite high in their presence. Short Messaging Service (SMS): Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this non-standard form known as the texting language.

Text-to-video model

A text-to-video model is a form of generative artificial intelligence that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models. == Models == There are different models, including open source models. Chinese-language input CogVideo is the earliest text-to-video model "of 9.4 billion parameters" to be developed, with its demo version of open source codes first presented on GitHub in 2022. That year, Meta Platforms released a partial text-to-video model called "Make-A-Video", and Google's Brain (later Google DeepMind) introduced Imagen Video, a text-to-video model with 3D U-Net. === 2023 === In February 2023, Runway released Gen-1 and Gen-2, among the first commercially available text-to-video and video-to-video models accessible to the public through a web interface. Gen-1, initially released as a video-to-video model, allowed users to transform existing video footage using text or image prompts. Gen-2, introduced in March 2023 and made publicly available in June 2023, added text-to-video capabilities, enabling users to generate videos from text prompts alone. In March 2023, a research paper titled "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation" was published, presenting a novel approach to video generation. The VideoFusion model decomposes the diffusion process into two components: base noise and residual noise, which are shared across frames to ensure temporal coherence. By utilizing a pre-trained image diffusion model as a base generator, the model efficiently generated high-quality and coherent videos. Fine-tuning the pre-trained model on video data addressed the domain gap between image and video data, enhancing the model's ability to produce realistic and consistent video sequences. In the same month, Adobe introduced Firefly AI as part of its features. === 2024 === In January 2024, Google announced development of a text-to-video model named Lumiere which is anticipated to integrate advanced video editing capabilities. Matthias Niessner and Lourdes Agapito at AI company Synthesia work on developing 3D neural rendering techniques that can synthesise realistic video by using 2D and 3D neural representations of shape, appearances, and motion for controllable video synthesis of avatars. In June 2024, Luma Labs launched its Dream Machine video tool. That same month, Kuaishou extended its Kling AI text-to-video model to international users. In July 2024, TikTok owner ByteDance released Jimeng AI in China, through its subsidiary, Faceu Technology. By September 2024, the Chinese AI company MiniMax debuted its video-01 model, joining other established AI model companies like Zhipu AI, Baichuan, and Moonshot AI, which contribute to China's involvement in AI technology. In December 2024 Lightricks launched LTX Video as an open source model. === 2025 === Alternative approaches to text-to-video models include Google's Phenaki, Hour One, Colossyan, Runway's Gen-3 Alpha, and OpenAI's Sora, Several additional text-to-video models, such as Plug-and-Play, Text2LIVE, and TuneAVideo, have emerged. FLUX.1 developer Black Forest Labs has announced its text-to-video model SOTA. Google was preparing to launch a video generation tool named Veo for YouTube Shorts in 2025. In May 2025, Google launched the Veo 3 iteration of the model. It was noted for its impressive audio generation capabilities, which were a previous limitation for text-to-video models. In July 2025 Lightricks released an update to LTX Video capable of generating clips reaching 60 seconds, and in October 2025 it released LTX-2, with audio capabilities built in. === 2026 === In February 2026, ByteDance released Seedance 2.0, it was noted for its impressive realistic generation, motion and camera control and 15 second generation, however the model faced huge critiscism from Motion Picture Association for copyright infringement. After viewing a viral clip of a fight between actors Brad Pitt and Tom Cruise, Rhett Reese, who is the co-writer of Deadpool & Wolverine and Zombieland announced that on social media "I hate to say it. It’s likely over for us," further stating that "In next to no time, one person is going to be able to sit at a computer and create a movie indistinguishable from what Hollywood now releases." == Architecture and training == There are several architectures that have been used to create text-to-video models. Similar to text-to-image models, these models can be trained using Recurrent Neural Networks (RNNs) such as long short-term memory (LSTM) networks, which has been used for Pixel Transformation Models and Stochastic Video Generation Models, which aid in consistency and realism respectively. An alternative for these include transformer models. Generative adversarial networks (GANs), Variational autoencoders (VAEs), — which can aid in the prediction of human motion — and diffusion models have also been used to develop the image generation aspects of the model. Text-video datasets used to train models include, but are not limited to, WebVid-10M, HDVILA-100M, CCV, ActivityNet, and Panda-70M. These datasets contain millions of original videos of interest, generated videos, captioned-videos, and textual information that help train models for accuracy. Text-video datasets used to train models include, but are not limited to PromptSource, DiffusionDB, and VidProM. These datasets provide the range of text inputs needed to teach models how to interpret a variety of textual prompts. The video generation process involves synchronizing the text inputs with video frames, ensuring alignment and consistency throughout the sequence. This predictive process is subject to decline in quality as the length of the video increases due to resource limitations. The Will Smith Eating Spaghetti test is a benchmark for models. == Limitations == Despite the rapid evolution of text-to-video models in their performance, a primary limitation is that they are very computationally heavy which limits its capacity to provide high quality and lengthy outputs. Additionally, these models require a large amount of specific training data to be able to generate high quality and coherent outputs, which brings about the issue of accessibility. Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model's ability to align generated video with the user's intended message. Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation. Another issue with the outputs is that text or fine details in AI-generated videos often appear garbled, a problem that stable diffusion models also struggle with. Examples include distorted hands and unreadable text. == Ethics == The deployment of text-to-video models raises ethical considerations related to content generation. These models have the potential to create inappropriate or unauthorized content, including explicit material, graphic violence, misinformation, and likenesses of real individuals without consent. Ensuring that AI-generated content complies with established standards for safe and ethical usage is essential, as content generated by these models may not always be easily identified as harmful or misleading. The ability of AI to recognize and filter out NSFW or copyrighted content remains an ongoing challenge, with implications for both creators and audiences. == Impacts and applications == Text-to-video models offer a broad range of applications that may benefit various fields, from educational and promotional to creative industries. These models can streamline content creation for training videos, movie previews, gaming assets, and visualizations, making it easier to generate content. During the Russo-Ukrainian war, fake videos made with artificial intelligence were created as part of a propaganda war against Ukraine and shared in social media. These included depictions of children in the Ukrainian Armed Forces, fake ads targeting children encouraging them to denounce critics of the Ukrainian government, or fictitious statements by Ukrainian President Volodymyr Zelenskyy about the country's surrender, among others. === Movies === Kaur vs Kore is the first Indian feature film made using generative AI which features dual role for the AI character of Sunny Leone, set to release in 2026. Chiranjeevi Hanuman – The Eternal is an Indian movie made entirely using Generative AI created by Vijay Subramaniam which is set for theatrical release in 2026. The movie was widely criticised by the Film makers in the Bollywood industr

Corpus of Linguistic Acceptability

Corpus of Linguistic Acceptability (CoLA) is a dataset the primary purpose of which is to serve as a benchmark for evaluating the ability of artificial neural networks, including large language models, to judge the grammatical correctness of sentences. It consists of 10,657 English sentences from published linguistics literature that were manually labeled either as grammatical or ungrammatical. == Public version == The publicly available version of CoLA contains 9,594 sentences that belong to training and development sets. It excludes 1,063 sentences reserved for a held-out test set.

Digital image correlation and tracking

Digital image correlation and tracking is an optical method that employs tracking and image registration techniques for accurate 2D and 3D measurements of changes in 2D images or 3D volumes. This method is often used to measure full-field displacement and strains, and it is widely applied in many areas of science and engineering. Compared to strain gauges and extensometers, digital image correlation methods provide finer details about deformation, due to the ability to provide both local and average data. == Overview == Digital image correlation (DIC) techniques have been increasing in popularity, especially in micro- and nano-scale mechanical testing applications due to their relative ease of implementation and use. Advances in computer technology and digital cameras have been the enabling technologies for this method and while white-light optics has been the predominant approach, DIC can be and has been extended to almost any imaging technology. The concept of using cross-correlation to measure shifts in datasets has been known for a long time, and it has been applied to digital images since at least the early 1970s. The present-day applications are almost innumerable, including image analysis, image compression, velocimetry, and strain estimation. Much early work in DIC in the field of mechanics was led by researchers at the University of South Carolina in the early 1980s and has been optimized and improved in recent years. Commonly, DIC relies on finding the maximum of the correlation array between pixel intensity array subsets on two or more corresponding images, which gives the integer translational shift between them. It is also possible to estimate shifts to a finer resolution than the resolution of the original images, which is often called "sub-pixel" registration because the measured shift is smaller than an integer pixel unit. For sub-pixel interpolation of the shift, other methods do not simply maximize the correlation coefficient. An iterative approach can also be used to maximize the interpolated correlation coefficient by using non-linear optimization techniques. The non-linear optimization approach tends to be conceptually simpler and can handle large deformations more accurately, but as with most nonlinear optimization techniques, it is slower. The two-dimensional discrete cross correlation r i j {\displaystyle r_{ij}} can be defined in several ways, one possibility being: r i j = ∑ m ∑ n [ f ( m + i , n + j ) − f ¯ ] [ g ( m , n ) − g ¯ ] ∑ m ∑ n [ f ( m , n ) − f ¯ ] 2 ∑ m ∑ n [ g ( m , n ) − g ¯ ] 2 . {\displaystyle r_{ij}={\frac {\sum _{m}\sum _{n}[f(m+i,n+j)-{\bar {f}}][g(m,n)-{\bar {g}}]}{\sqrt {\sum _{m}\sum _{n}{[f(m,n)-{\bar {f}}]^{2}}\sum _{m}\sum _{n}{[g(m,n)-{\bar {g}}]^{2}}}}}.} Here f(m, n) is the pixel intensity or the gray-scale value at a point (m, n) in the original image, g(m, n) is the gray-scale value at a point (m, n) in the translated image, f ¯ {\displaystyle {\bar {f}}} and g ¯ {\displaystyle {\bar {g}}} are mean values of the intensity matrices f and g respectively. However, in practical applications, the correlation array is usually computed using Fourier-transform methods, since the fast Fourier transform is a much faster method than directly computing the correlation. F = F { f } , G = F { g } . {\displaystyle \mathbf {F} ={\mathcal {F}}\{f\},\quad \mathbf {G} ={\mathcal {F}}\{g\}.} Then taking the complex conjugate of the second result and multiplying the Fourier transforms together elementwise, we obtain the Fourier transform of the correlogram, R {\displaystyle \ R} : R = F ∘ G ∗ , {\displaystyle R=\mathbf {F} \circ \mathbf {G} ^{},} where ∘ {\displaystyle \circ } is the Hadamard product (entry-wise product). It is also fairly common to normalize the magnitudes to unity at this point, which results in a variation called phase correlation. Then the cross-correlation is obtained by applying the inverse Fourier transform: r = F − 1 { R } . {\displaystyle \ r={\mathcal {F}}^{-1}\{R\}.} At this point, the coordinates of the maximum of r i j {\displaystyle r_{ij}} give the integer shift: ( Δ x , Δ y ) = arg ⁡ max ( i , j ) { r } . {\displaystyle (\Delta x,\Delta y)=\arg \max _{(i,j)}\{r\}.} == Deformation mapping == For deformation mapping, the mapping function that relates the images can be derived from comparing a set of subwindow pairs over the whole images. (Figure 1). The coordinates or grid points (xi, yj) and (xi, yj) are related by the translations that occur between the two images. If the deformation is small and perpendicular to the optical axis of the camera, then the relation between (xi, yj) and (xi, yj) can be approximated by a 2D affine transformation such as: x ∗ = x + u + ∂ u ∂ x Δ x + ∂ u ∂ y Δ y , {\displaystyle x^{}=x+u+{\frac {\partial u}{\partial x}}\Delta x+{\frac {\partial u}{\partial y}}\Delta y,} y ∗ = y + v + ∂ v ∂ x Δ x + ∂ v ∂ y Δ y . {\displaystyle y^{}=y+v+{\frac {\partial v}{\partial x}}\Delta x+{\frac {\partial v}{\partial y}}\Delta y.} Here u and v are translations of the center of the sub-image in the X and Y directions respectively. The distances from the center of the sub-image to the point (x, y) are denoted by Δ x {\displaystyle \Delta x} and Δ y {\displaystyle \Delta y} . Thus, the correlation coefficient rij is a function of displacement components (u, v) and displacement gradients ∂ u ∂ x , ∂ u ∂ y , ∂ v ∂ x , ∂ v ∂ y . {\displaystyle {\frac {\partial u}{\partial x}},{\frac {\partial u}{\partial y}},{\frac {\partial v}{\partial x}},{\frac {\partial v}{\partial y}}.} DIC has proven to be very effective at mapping deformation in macroscopic mechanical testing, where the application of specular markers (e.g. paint, toner powder) or surface finishes from machining and polishing provide the needed contrast to correlate images well. However, these methods for applying surface contrast do not extend to the application of free-standing thin films for several reasons. First, vapor deposition at normal temperatures on semiconductor grade substrates results in mirror-finish quality films with RMS roughnesses that are typically on the order of several nanometers. No subsequent polishing or finishing steps are required, and unless electron imaging techniques are employed that can resolve microstructural features, the films do not possess enough useful surface contrast to adequately correlate images. Typically this challenge can be circumvented by applying paint that results in a random speckle pattern on the surface, although the large and turbulent forces resulting from either spraying or applying paint to the surface of a free-standing thin film are too high and would break the specimens. In addition, the sizes of individual paint particles are on the order of μms, while the film thickness is only several hundred nanometers, which would be analogous to supporting a large boulder on a thin sheet of paper. == Digital volume correlation == Digital Volume Correlation (DVC, and sometimes called Volumetric-DIC) extends the 2D-DIC algorithms into three dimensions to calculate the full-field 3D deformation from a pair of 3D images. This technique is distinct from 3D-DIC, which only calculates the 3D deformation of an exterior surface using conventional optical images. The DVC algorithm is able to track full-field displacement information in the form of voxels instead of pixels. The theory is similar to above except that another dimension is added: the z-dimension. The displacement is calculated from the correlation of 3D subsets of the reference and deformed volumetric images, which is analogous to the correlation of 2D subsets described above. DVC can be performed using volumetric image datasets. These images can be obtained using confocal microscopy, X-ray computed tomography, Magnetic Resonance Imaging or other techniques. Similar to the other DIC techniques, the images must exhibit a distinct, high-contrast 3D "speckle pattern" to ensure accurate displacement measurement. DVC was first developed in 1999 to study the deformation of trabecular bone using X-ray computed tomography images. Since then, applications of DVC have grown to include granular materials, metals, foams, composites and biological materials. To date it has been used with images acquired by MRI imaging, Computer Tomography (CT), micro-CT, confocal microscopy, and lightsheet microscopy. DVC is currently considered to be ideal in the research world for 3D quantification of local displacements, strains, and stress in biological specimens. It is preferred because of the non-invasiveness of the method over traditional experimental methods. Two of the key challenges are improving the speed and reliability of the DVC measurement. The 3D imaging techniques produce noisier images than conventional 2D optical images, which reduces the quality of the displacement measurement. Computational speed is restricted by the file sizes of 3D images, which are significantly larger than 2D images. For example, an

CLAWS (linguistics)

The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96–97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus. == History == A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. Developed in the early 1980s, CLAWS was built to fill the ever-growing gap created by always-changing POS necessities. Originally created to add part-of-speech tags to the LOB corpus of British English, the CLAWS tagset has since been adapted to other languages as well, including Urdu and Arabic. Since its inception, CLAWS has been hailed for its functionality and adaptability. Still, it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved. Ambiguity arises in cases such as with the word flies, and whether it should be classified as a noun or a verb. It's these ambiguities that will require the various upgrades and tagsets that CLAWS will endure. == Rules and processing == CLAWS uses a Hidden Markov model to determine the likelihood of sequences of words in anticipating each part-of-speech label. === Sample output === This excerpt from Bram Stoker's Dracula (1897) has been tagged using both the CLAWS C5 and C7 tagsets. This is what a CLAWS output will generally look like, with the most likely part-of-speech tag following each word. == Tagsets == === CLAWS1 tagset === The first tagset developed in CLAWS, CLAWS1 tagset, has 132 word tags. In terms of form and application, C1 tagset is similar to Brown Corpus tags. See Table of tags in C1 tagset here. === CLAWS2 tagset === From 1983 to 1986, updated versions leading to CLAWS2 were part of a larger attempt to deal with aspects such as recognizing sentence breaks, in order to avoid the need for manual pre-processing of a text before the tags were applied, moving instead to optional manual post-editing to adjust the output of the automatic annotation, if needed. The CLAWS2 tagset has 166 word tags. See Table of tags in C2 tagset here. === CLAWS4 tagset === The CLAWS4 was used for the 100-million-word British National Corpus (BNC). A general-purpose grammatical tagger, it is a successor of the CLAWS1 tagger. In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets. For example, the BNC project used two tagset versions: "a main tagset (C5) with 62 tags with which the whole of the corpus has been tagged, and a larger (C7) tagset with 152 tags, which has been used to make a selected 'core' sample corpus of two million words." The latest version of CLAWS4 is offered by UCREL, a research center of Lancaster University. === CLAWS5 tagset === The CLAWS5 tagset, which was used for BNC, has over 60 tags. See Table of tags in C5 tagset here. === CLAWS6 tagset === The CLAWS6 tagset was used for the BNC sampler corpus and the COLT corpus. It has over 160 tags, including 13 determiner subtypes. See Table of tags in C6 tagset here. === CLAWS7 tagset === The standard CLAWS7 tagset is used currently. It is only different in the punctuation tags when compared to the CLAWS6 tagset. See Table of tags in C7 tagset here. === CLAWS8 tagset === CLAWS8 tagset was extended from C7 tagset with further distinctions in the determiner and pronoun categories, as well as 37 new auxiliary tags for forms of be, do, and have. See Table of tags in C8 tagset here