Charles Kemp, J. Tenenbaum
Hasil untuk "Comparative grammar"
Menampilkan 20 dari ~2531619 hasil · dari arXiv, Semantic Scholar, DOAJ
Xiaoyin Xi, Neeku Capak, Kate Stockwell et al.
This research seeks to benefit the software engineering society by proposing comparative separation, a novel group fairness notion to evaluate the fairness of machine learning software on comparative judgment test data. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive groups -- satisfying the separation criterion. However, evaluation of separation requires ground truth labels for each test data point. This motivates our work on analyzing whether separation can be evaluated on comparative judgment test data. Instead of asking humans to provide the ratings or categorical labels on each test data point, comparative judgments are made between pairs of data points such as A is better than B. According to the law of comparative judgment, providing such comparative judgments yields a lower cognitive burden for humans than providing ratings or categorical labels. This work first defines the novel fairness notion comparative separation on comparative judgment test data, and the metrics to evaluate comparative separation. Then, both theoretically and empirically, we show that in binary classification problems, comparative separation is equivalent to separation. Lastly, we analyze the number of test data points and test data pairs required to achieve the same level of statistical power in the evaluation of separation and comparative separation, respectively. This work is the first to explore fairness evaluation on comparative judgment test data. It shows the feasibility and the practical benefits of using comparative judgment test data for model evaluations.
Muneef Y. Alsawsh, Mohammed Q. Shormani
This study examines the acquisition of English irregular inflections by Yemeni learners of English as a second language (L2), utilizing a Universal Grammar (UG) approach. Within the UG approach, the study considers Feature Reassembly Hypothesis (FRH) (Lardiere, 2008, 2009) part of UG, focusing on the roles of first language (L1) transfer and L2 developmental influence. It analyzes learner errors across two developmental stages. Stage 1 data reveal a dominant influence of L1 transfer, particularly in phonological and structural mismatches, while stage 2 data demonstrate increased learner sensitivity to UG properties and morphological reconfiguration toward the target language. Findings reveal that errors in irregular inflectional morphology are attributed to both interlingual and intralingual sources, with overgeneralization of L2 rules as a common developmental strategy. Statistical analysis, including a one-way ANOVA, indicates significant improvement in the production of well-formed irregular inflections from stage 1 to stage 2, underscoring learners' continued access to UG. However, persistent difficulties with consonant change, zero-morpheme, and -a plural inflections suggest that limited exposure, ineffective input modeling, and insufficient instructional quality constrain full UG access. The study concludes that while L1 transfer and L2 developmental factors influence initial stages of acquisition, appropriate linguistic input and instruction are critical for facilitating UG-driven feature reassembly in adult L2 learners.
Ankit Vadehra, Bill Johnson, Gene Saunders et al.
Text editing can involve several iterations of revision. Incorporating an efficient Grammar Error Correction (GEC) tool in the initial correction round can significantly impact further human editing effort and final text quality. This raises an interesting question to quantify GEC Tool usability: How much effort can the GEC Tool save users? We present the first large-scale dataset of post-editing (PE) time annotations and corrections for two English GEC test datasets (BEA19 and CoNLL14). We introduce Post-Editing Effort in Time (PEET) for GEC Tools as a human-focused evaluation scorer to rank any GEC Tool by estimating PE time-to-correct. Using our dataset, we quantify the amount of time saved by GEC Tools in text editing. Analyzing the edit type indicated that determining whether a sentence needs correction and edits like paraphrasing and punctuation changes had the greatest impact on PE time. Finally, comparison with human rankings shows that PEET correlates well with technical effort judgment, providing a new human-centric direction for evaluating GEC tool usability. We release our dataset and code at: https://github.com/ankitvad/PEET_Scorer.
Junyu Xie, Tengda Han, Max Bain et al.
Our objective is the automatic generation of Audio Descriptions (ADs) for edited video material, such as movies and TV series. To achieve this, we propose a two-stage framework that leverages "shots" as the fundamental units of video understanding. This includes extending temporal context to neighbouring shots and incorporating film grammar devices, such as shot scales and thread structures, to guide AD generation. Our method is compatible with both open-source and proprietary Visual-Language Models (VLMs), integrating expert knowledge from add-on modules without requiring additional training of the VLMs. We achieve state-of-the-art performance among all prior training-free approaches and even surpass fine-tuned methods on several benchmarks. To evaluate the quality of predicted ADs, we introduce a new evaluation measure -- an action score -- specifically targeted to assessing this important aspect of AD. Additionally, we propose a novel evaluation protocol that treats automatic frameworks as AD generation assistants and asks them to generate multiple candidate ADs for selection.
Dhion Meitreya Vidhiasi
This research is a forensic linguistic study that concentrates on the analysis of speech acts spoken by a companion from one of the representative offices of a ministry in Cilacap during an investigative interview between an investigator and a child victim of a sexual violence crime. The purpose of this investigation is to examine the speech of a child victim and a companion during an investigative interview that occurred at a police station in Cilacap. This investigation is qualitative in nature and is structured as a case study. Speech data were collected during the investigative interview using listening and note-taking techniques. The data were subsequently analyzed in accordance with Weigand’s (2010) dialogic speech act theory. Additionally, the function and authority of a companion in the investigative interview process are clarified by the Regulation of the Minister of Women Empowerment and Child Protection of the Republic of Indonesia (Permen PPPA) No. 2 of 2022. The analysis results indicate that the code of ethics outlined in Permen PPPA No. 2 of 2022 is contravened by the companion’s dominance of the explorative speech act and the presence of the directive speech act. This implies that the companion must be re-informed about the code of ethics outlined in the Women Empowerment and Child Protection Regulation No. 2 of 2022. The findings of this research have the potential to assist the relevant ministries in enhancing the efficacy and authority of an assistant in the interview process related to the investigation of sexual violence crimes.
Roberto Gorrieri
A subclass of nondeterministic Finite Automata generated by means of regular Grammars (GFAs, for short) is introduced. A process algebra is proposed, whose semantics maps a term to a GFA. We prove a representability theorem: for each GFA $N$, there exists a process algebraic term $p$ such that its semantics is a GFA isomorphic to $N$. Moreover, we provide a concise axiomatization of language equivalence: two GFAs $N_1$ and $N_2$ recognize the same regular language if and only if the associated terms $p_1$ and $p_2$, respectively, can be equated by means of a set of axioms, comprising 7 axioms plus 2 conditional axioms, only.
Hernán Ceferino Vázquez, Jorge Sanchez, Rafael Carrascosa
Automated Machine Learning (AutoML) has become increasingly popular in recent years due to its ability to reduce the amount of time and expertise required to design and develop machine learning systems. This is very important for the practice of machine learning, as it allows building strong baselines quickly, improving the efficiency of the data scientists, and reducing the time to production. However, despite the advantages of AutoML, it faces several challenges, such as defining the solutions space and exploring it efficiently. Recently, some approaches have been shown to be able to do it using tree-based search algorithms and context-free grammars. In particular, GramML presents a model-free reinforcement learning approach that leverages pipeline configuration grammars and operates using Monte Carlo tree search. However, one of the limitations of GramML is that it uses default hyperparameters, limiting the search problem to finding optimal pipeline structures for the available data preprocessors and models. In this work, we propose an extension to GramML that supports larger search spaces including hyperparameter search. We evaluated the approach using an OpenML benchmark and found significant improvements compared to other state-of-the-art techniques.
Kamel Yamani, Marwa Naïr, Riyadh Baghdadi
In recent years, data has emerged as the new gold, serving as a powerful tool for creating intelligent systems. However, procuring high-quality data remains challenging, especially for code. To address this, we developed TinyPy Generator, a tool that generates random Python programs using a context-free grammar. The generated programs are guaranteed to be correct by construction. Our system uses custom production rules (in the Backus-Naur Form (BNF) format) to recursively generate code. This allows us to generate code with different levels of complexity, ranging from code containing only assignments to more complex code containing conditionals and loops. Our proposed tool enables effortless large-scale Python code generation, beneficial for a wide range of applications. TinyPy Generator is particularly useful in the field of machine learning, where it can generate substantial amounts of Python code for training Python language models. Additionally, researchers who are studying programming languages can utilize this tool to create datasets for their experiments, which can help validate the robustness of code interpreters or compilers. Unlike existing research, we have open-sourced our implementation. This allows customization according to user needs and extends potential usage to other languages.
Duc-Vu Nguyen, Thang Chau Phan, Quoc-Nam Nguyen et al.
In this paper, we aimed to develop a neural parser for Vietnamese based on simplified Head-Driven Phrase Structure Grammar (HPSG). The existing corpora, VietTreebank and VnDT, had around 15% of constituency and dependency tree pairs that did not adhere to simplified HPSG rules. To attempt to address the issue of the corpora not adhering to simplified HPSG rules, we randomly permuted samples from the training and development sets to make them compliant with simplified HPSG. We then modified the first simplified HPSG Neural Parser for the Penn Treebank by replacing it with the PhoBERT or XLM-RoBERTa models, which can encode Vietnamese texts. We conducted experiments on our modified VietTreebank and VnDT corpora. Our extensive experiments showed that the simplified HPSG Neural Parser achieved a new state-of-the-art F-score of 82% for constituency parsing when using the same predicted part-of-speech (POS) tags as the self-attentive constituency parser. Additionally, it outperformed previous studies in dependency parsing with a higher Unlabeled Attachment Score (UAS). However, our parser obtained lower Labeled Attachment Score (LAS) scores likely due to our focus on arc permutation without changing the original labels, as we did not consult with a linguistic expert. Lastly, the research findings of this paper suggest that simplified HPSG should be given more attention to linguistic expert when developing treebanks for Vietnamese natural language processing.
Atle Steinar Langekiehl
The region Østfold/Follo in Norway had three extremely rare male personal names, Gautulv, Sakulv and Sjøfar, the first of which is the topic of this article. The etymology of the Norse Gautulfr is the wolf from Götaland. A Swedish rune stone mentions Gautulv, and the first Swedish written medieval sources for the name precede the Norwegian ones, although most of the namesakes lived in Norway. The nobility figures far more prominently than other social groups in medieval sources, and in Norway, the first known Gautulvs and people with the patronym Gautulvsson undoubtedly belonged to the nobility. The name Gautulv is also present in five anthropotoponyms: one in Østfold, three in Vestfold on the opposite side of the Oslofjord and one in Trøndelag. Later, Guttul became the most commonly used form of this anthroponym, which probably went extinct in Norway when the farmer Guttul Hansen Søtland died in Trøgstad in Østfold in 1797.
Anne Mette Nyvad, Ken Ramshøj Christensen
It is sometimes argued that (certain types of) lexical frequency and constructional frequency determine how easy sentences are to process and hence, how acceptable speakers find them. Others have argued that grammatical principles interact with and often override such effects. Here, we present the results from a survey on Danish with more than 200 participants. We asked people to provide acceptability ratings of a number of sentences with varying levels of complexity, with and without extraction, including complement clauses, relative clauses, parasitic gaps, and ungrammatical sentences. We predicted structural complexity and acceptability to be negatively correlated (the more complex, the less acceptable). The results show that construction frequency and acceptability are correlated, but that zero and near-zero frequencies do not predict acceptability. However, there is indeed an even stronger inverse correlation between acceptability and structural complexity, defined as a function of independently motivated factors of syntactic structure and processing, including embedding, adjunction, extraction, and distance between filler and gap. Lexical frequency also affects acceptability, but the effects are small, and, crucially, there is no evidence in our data that ungrammatical sentences are affected by such frequency effects. Furthermore, the acceptability patterns seem to be fairly stable across participants. The results show a pattern that is consistent with an approach based on grammatical principles and processing constraints, rather than based on stochastic principles alone.
Mohammad Noori, Nihad Mahmood
This paper analyses visual discourse in Ahmed Khalid Tawfiq’s The Legend of the Late Night (2002) by investing in semiotics as a critical method, gripping the essence of the image, especially if it involves an imagination imbued with horror literature, as it will give rise to a kind of novelistic narrative through the enjoyment and amazement at the new form that jumps to the mind to formal camouflage with an artistic and functional impact. The prominent approach to tackle this reading is characterized by its ability to analyze literary discourse, reveal its secrets, and interrogate its symbols to reach its goal or approach. Critical reading is considered a simulation analysis of the conceptual procedure through analyzing the physical form of the sign within the discourse or image, studying the semiotic units such as words, colours, shapes and images, and then linking the signs to their cultural, social and other contexts, and an attempt to observe the effect of narrative employment of two types that come together to establish a genre that inspires in terms of novelty and excitement; they are: the semiotics of the image and horror literature. The research problem can be defined by the question that searches for the nature of the encounter between method and procedure. In other words, how suitable is the semiotic approach for analyzing Arab novelist discourse and revealing its secrets? The conclusions lie in completing the accumulated knowledge to enhance the energies of the different genres and in an attempt to cross-fertilize some genres capable of producing their fruits due to valid hybridization in a place where procedure and employment interact.
Huchang Liao, Zeshui Xu, E. Herrera-Viedma et al.
Marek Lichter, Jiří Malý
Abstract Urban structure conceptualisation using compact and polycentric city narratives is often performed separately. However, although both are based on different spatial grammars, they are inextricably linked. The spatially equitable distribution and accessibility of urban functions are often seen as their main contributions. This paper uses the unprecedented circumstances of the COVID-19 pandemic to further analyse the relationship between the two narratives, using the radical transformation of a retail network in a post-socialist city (Brno, Czech Republic) as an example. Based on an in-depth analysis of government measures aimed at preventing the spread of the coronavirus and their coverage in the media, operational changes among all stores in the city are quantified. A comparative spatial analysis then shows that, in addition to economic inequalities, spatial injustice was exacerbated by the position of the central government, with varying degrees of intensity depending on the type of urban structure. It is argued that the resilience potential of polycentric and compact structures is very low, especially in the absence of retail planning and reflection upon spatiality in ensuring social equity.
Francis Frydman, Philippe Mangion
The synthesis of string transformation programs from input-output examples utilizes various techniques, all based on an inductive bias that comprises a restricted set of basic operators to be combined. A new algorithm, Transduce, is proposed, which is founded on the construction of abstract transduction grammars and their generalization. We experimentally demonstrate that Transduce can learn positional transformations efficiently from one or two positive examples without inductive bias, achieving a success rate higher than the current state of the art.
António José Gonçalves de Freitas, Roxana Flammini
The Ancient Near East and the Eastern Mediterranean were geographical and sociopolitical scenarios with fluent and constant connectivity from the earliest times in history. Prestige goods and raw materials found their way from one side to another through extensive networks even before the emergence of the state in Egypt and Mesopotamia, integrating movements not only of goods but also of people, technologies, cultural practices, gods, languages, and ideas (Wilkinson et al. 2011; Warburton 2020). In this volume, we named them “interconnections” to precisely emphasize the relevance of exchange in the adoption, modification, or re-adaptation of foreign traces. The influence of incoming technologies and the shaping of identities in such a dynamic world, always moving, is also considered. Naturally, many diverse theoretical approaches were proposed over time to explain those interconnections, contributing to completing the never-ending panorama of relationships (e.g. Warburton 2020: 1-21). At the same time, nowadays a comprehensive amount of evidence is usually considered in explaining those interconnections, mainly material remains, textual registers, and iconography.
Jonathan Zong, Josh Pollock, Dylan Wootton et al.
We present Animated Vega-Lite, a set of extensions to Vega-Lite that model animated visualizations as time-varying data queries. In contrast to alternate approaches for specifying animated visualizations, which prize a highly expressive design space, Animated Vega-Lite prioritizes unifying animation with the language's existing abstractions for static and interactive visualizations to enable authors to smoothly move between or combine these modalities. Thus, to compose animation with static visualizations, we represent time as an encoding channel. Time encodings map a data field to animation keyframes, providing a lightweight specification for animations without interaction. To compose animation and interaction, we also represent time as an event stream; Vega-Lite selections, which provide dynamic data queries, are now driven not only by input events but by timer ticks as well. We evaluate the expressiveness of our approach through a gallery of diverse examples that demonstrate coverage over taxonomies of both interaction and animation. We also critically reflect on the conceptual affordances and limitations of our contribution by interviewing five expert developers of existing animation grammars. These reflections highlight the key motivating role of in-the-wild examples, and identify three central tradeoffs: the language design process, the types of animated transitions supported, and how the systems model keyframes.
Venturis, Antonio
This study aims to show the readability limits of the B and C language level Italian texts, which can be used for the reading assessment and the characteristics they depend on. For this purpose, 184 Italian texts used in the certification exams of the KPG system were analysed in an attempt to reveal the main traits (quantitative, lexical and syntactic) of the texts defining their readability and their correlation with the language level for which they are intended. The correlation analysis indicated that cognitive characteristics are more important than quantitative characteristics, while a high correlation of lexical variables with B1 and B2 levels and syntactic variables with C1 and C2 levels emerged.
Halaman 40 dari 126581