Anatol Stefanowitsch, S. Gries
Hasil untuk "Language. Linguistic theory. Comparative grammar"
Menampilkan 20 dari ~4432693 hasil · dari DOAJ, CrossRef, Semantic Scholar, arXiv
Thomas Hoffmann, G. Trousdale
P. Kay, C. Fillmore
E. Bates, J. Goodman
M. Polinsky, Gregory Scontras
With a growing interest in heritage languages from researchers of bilingualism and linguistic theory, the field of heritage-language studies has begun to build on its empirical foundations, moving toward a deeper understanding of the nature of language competence under unbalanced bilingualism. In furtherance of this trend, the current work synthesizes pertinent empirical observations and theoretical claims about vulnerable and robust areas of heritage language competence into early steps toward a model of heritage-language grammar. We highlight two key triggers for deviation from the relevant baseline: the quantity and quality of the input from which the heritage grammar is acquired, and the economy of online resources when operating in a less dominant language. In response to these triggers, we identify three outcomes of deviation in the heritage grammar: an avoidance of ambiguity, a resistance to irregularity, and a shrinking of structure. While we are still a ways away from a level of understanding that allows us to predict those aspects of heritage grammar that will be robust and those that will deviate from the relevant baselines, our hope is that the current work will spur the continued development of a predictive model of heritage language competence.
Normoda Doley, T. Vu
Digital platforms like Instagram have become essential venues for the negotiation of language and identity, with the interplay of global and local practices vividly illustrated through the phenomenon of code-switching. This study seeks to fill a significant gap in comparative sociolinguistics by examining the motivations for code-switching between English and local languages in India (Hinglish) and Vietnam (Vietlish). By analyzing around 280 Instagram captions and comments, the study employs a multi-scalar framework that integrates micro-level textual mechanics, meso-level interactional dynamics, and macro-level sociocultural forces. This approach is critically informed by Translanguaging and Enregisterment, explicitly treating the blends as single, fused linguistic repertoires and is also informed by Critical Discourse Analysis, the sociolinguistics of globalization, and Auer’s insights into code-switching as a conversational resource. Our core finding reveals a distinct divergence: Hinglish emerges as a well-established and standardized system featuring its own grammar and recognized social functions, while Vietlish presents itself as a more fluid and evolving repertoire, strategically employed to evoke modernity, emotional resonance, and promotional appeal through the Algorithmic Audience mechanism. Utilizing Myers-Scotton’s markedness model, the research argues that this divergence is rooted in different processes of localization. Users in both contexts exhibit sophisticated semiotic strategies when navigating the “caption-hashtag-comment ensemble,” yet their linguistic choices reflect unique socio-historical trajectories shaping their digital language ecologies. This study contributes a novel comparative framework for theorizing digital language practices, establishing the Stabilized versus Emergent Continuum and demonstrating that the assimilation of global linguistic resources goes beyond mere borrowing, representing a dynamic process of sociocultural reconstruction.
D. Iacono, Gloria C. Feltis
Language, a uniquely human cognitive faculty, is fundamentally characterized by its capacity for complex thoughts and structured expressions. This review examines two critical measures of linguistic performance: idea density (ID) and grammatical complexity (GC). ID quantifies the richness of information conveyed per unit of language, reflecting semantic efficiency and conceptual processing. GC, conversely, measures the structural sophistication of syntax, indicative of hierarchical organization and rule-based operations. We explore the neurobiological underpinnings of these measures, identifying key brain regions and white matter pathways involved in their generation and comprehension. This includes linking ID to a distributed network of semantic hubs, like the anterior temporal lobe and temporoparietal junction, and GC to a fronto-striatal procedural network encompassing Broca’s area and the basal ganglia. Moreover, a central theme is the integration of Chomsky’s theories of Universal Grammar (UG), which posits an innate human linguistic endowment, with their neurobiological correlates. This integration analysis bridges foundational models that first mapped syntax (Friederici’s work) to distinct neural pathways with contemporary network-based theories that view grammar as an emergent property of dynamic, inter-regional neural oscillations. Furthermore, we examine the genetic factors influencing ID and GC, including genes implicated in neurodevelopmental and neurodegenerative disorders. A comparative anatomical perspective across human and non-human primates illuminates the evolutionary trajectory of the language-ready brain. Also, we emphasize that, clinically, ID and GC serve as sensitive neurocognitive markers whose power lies in their often-dissociable profiles. For instance, the primary decline of ID in Alzheimer’s disease contrasts with the severe grammatical impairment in nonfluent aphasia, aiding in differential diagnosis. Importantly, as non-invasive and scalable metrics, ID and GC also provide a critical complement to gold-standard but costly biomarkers like CSF and PET. Finally, the review considers the emerging role of AI and Natural Language Processing (NLP) in automating these linguistic analyses, concluding with a necessary discussion of the critical challenges in validation, ethics, and implementation that must be addressed for these technologies to be responsibly integrated into clinical practice.
Muhammad Mubeen Shah, Dr. Muhammad Islam
The clickbait content creation strategy on social media differs from the techniques used to construct headlines in traditional print media. Social media content creators generate their income through those links, whereas newspapers are purchased in bulk. In this context, this study presents an analysis that compares the use of linguistic choices in social and print media news headlines to engage readers. The study employed a qualitative approach, and a sample of six news headlines, comprising three from Daily Jang and three from social media pages, was analysed at two levels in the light of a framework developed by Carvalho (2008). The first level explores structural organisation and layout, objects, actors, language, grammar, rhetorical devices, and other discursive strategies. The second level explores the contextual level, which deals with the understanding of the historical or political background behind a news event and its linkages with the cognition of the audience, i.e., how a specific event may impact the audience. Additionally, we have presented a semiotic analysis to enrich our results. The comparative analysis reveals that both news media editors employed linguistic choices and other discursive strategies with distinct agendas, namely, generating more social media links and establishing an ideological stance in print media. Moreover, the semiotic analysis examines how sign language aids news creators in achieving their goal of creating hype in news headlines. This study may be helpful for researchers who intend to explore media discourse. Moreover, this research may also be used to raise awareness among social media users about how social media content writers utilise clickbait content to capture their attention and influence them to read specific news stories. Reference Abastado, C. (1980). Messages de medias. Cedic. Abba, T. S., & Musa, N. (2015). Speech act analysis of Daily Trust and The Nation newspapers headline reports on “Boko Haram” attacks. Journal of Communication and Culture, 6(1), 63–72. Alfangca, K. Z. (2015). The transitivity elements and ideology: A newspaper headlines analysis on MH370 flight accident [Undergraduate thesis, Widya Mandala Catholic University Surabaya]. http://repository.wima.ac.id/4884/ Al-Saedi, H. T. J., & Jabber, K. W. (2020). A pragmatic study of newspaper headlines in media discourse: Iraq as a case study. International Journal of Linguistics, Literature and Translation, 3(3), 48–59. Barman Roy, A., Chen, B., Tiwari, S., & Huang, Z. (2019). A discussion on the influence of newspaper headlines on social media. Journal of Communication Inquiry, 10(2), 28–44. Bonyadi, A., & Samuel, M. (2013). Headlines in newspaper editorials: A contrastive study. SAGE Open, 3(2), 1–14. https://doi.org/10.1177/2158244013494863 Bouvier, G., & Machin, D. (2018). Critical discourse analysis and the challenges and opportunities of social media. Review of Communication, 18(3), 178–192. Busri, H., & Badrih, M. (2022). Representation of linguistic characteristics in mass media. KEMBARA: Jurnal Keilmuan Bahasa, Sastra, dan Pengajarannya, 8(1), 1–14. Caple, H. (2013). Photojournalism: A social semiotic approach. Palgrave Macmillan. Carvalho, A. (2008). Media(ted) discourse and society. Taylor & Francis Online, 161–177. Chiluwa, I. (2007). News headlines as pragmatic strategy in Nigerian press discourse. The International Journal of Language, Society and Culture, 27, 63–71. Conboy, M. (2013). The language of the news. Routledge. Crystal, D., & Davy, D. (1969). Investigating English style. Longman. Develotte, C., & Rechniewski, E. (2001). Discourse analysis of newspaper headlines: A methodological framework for research into national representations. The Web Journal of French Media Studies, 4(1), 1–12. Duzett, A. (2011). Media bias in strategic word choice. Medium. https://goo.gl/JHYb62 Fairclough, N. (1992). Discourse and text: Linguistic and intertextual analysis within discourse analysis. Discourse & Society, 3(2), 193–217. Gopang, I. B., Bughio, F. A., & Pathan, H. (2018). Investigating foreign language learning anxiety among students learning English in a public sector university, Pakistan. MOJES: Malaysian Online Journal of Educational Sciences, 3(4), 27–37. Habermas, J. (1997). Die Einbeziehung des Anderen: Studien zur politischen Theorie. Suhrkamp. Hall, S. (1986). The problem of ideology – Marxism without guarantees. Journal of Communication Inquiry, 10(2), 28–44. Ismail, H. M. (2016). Pragmatic and semantic potential of newspaper headlines. US-China Foreign Language, 14(11), 753–762. Montejo, G. M., & Adriano, T. Q. (2018). A critical discourse analysis of headlines in online news portals. Journal of Advances in Humanities and Social Sciences, 4(2), 70–83. Montgomery, M. (2007). The discourse of broadcast news: A linguistic approach. Routledge. Ogilvy, D. (2011). Confessions of an advertising man. Ballantine Books. Pajunen, J. (2008). Linguistic analysis of newspaper discourse in theory and practice [Technical report]. University of Tampere. Reah, D. (2002). The language of newspapers (2nd ed.). Routledge. Reisigl, R., & Wodak, R. (2001). Discourse and discrimination: Rhetorics of racism and antisemitism. Routledge. Rustam, R. (2013). Pragmatic analysis of CNN headlines representing Pakistan [Unpublished Ph.D. dissertation]. University of Azad Jammu and Kashmir. Silverman, C. (2015). Lies, damn lies, and viral content: How news websites spread (and debunk) online rumors, unverified claims, and misinformation. Tow Center for Digital Journalism. Siposova, A. (2011). Headlines and subheadlines: Tense, modality, and register based on discourse analysis of The British Tabloid The Sun [Unpublished master’s thesis]. Masaryk University. Taiwo, R. (2007). Language, ideology, and power relations in Nigerian newspaper headlines. Nebula, 4(1), 218–245. Tuchman, G. (1978). Making news: A study of the construction of reality. Free Press. Ulum, G. (2016). Newspaper ideology: A critical discourse analysis of news headlines on Syrian refugees in published newspapers. Retrieved from www.researchgate.net Ungerer, F. (Ed.). (2000). English media texts: Past and present language and textual structure. John Benjamins. Van Dijk, T. A. (1988). News as discourse. Lawrence Erlbaum. Van Dijk, T. A. (2001). Critical discourse analysis. In D. Tannen, D. Schiffrin, & H. Hamilton (Eds.), Handbook of discourse analysis (pp. 352–371). Blackwell. Keywords: CDA of social and print headlines
N. Azimova
The Arabic noun category (ism, الاسم) represents one of the core structural elements of Arabic grammar and encompasses a wide range of lexical units, including nouns, adjectives, numerals, pronouns, participles, and infinitives. This broad semantic and grammatical scope differs significantly from the Uzbek grammatical tradition, in which nouns and related categories are classified into narrower and more strictly defined parts of speech. This paper provides a comprehensive comparative analysis of the Arabic ism category and its Uzbek equivalents, demonstrating how each language conceptualizes nominality and how these conceptualizations influence syntactic behavior, morphological marking, and category formation. Special attention is given to the categories of number, possession, and gender in Uzbek nouns, as well as the historical influence of Arabic borrowings on gender marking in Uzbek. Additionally, the study analyses how semantic, morphological, and syntactic distinctions shape the functional use of nouns in both languages. The findings highlight that while Uzbek and Arabic share several fundamental grammatical concepts, their structural realizations differ significantly due to typological divergence-Arabic being a Semitic language with rich inflectional morphology, and Uzbek being a Turkic, agglutinative language. This contrast underscores the importance of comparative linguistic research for language acquisition, translation studies, and grammatical theory.
Katarína Džunková
The article examines the phenomenon of missionary linguistics in the Russian Empire, shaped significantly by Russian state and church policies. Beginning in the late 18th century, three main genres of missionary linguistic literature were published: grammars, dictionaries, and primers. The research encompasses material from 32 languages within the Uralic, Altaic, Chukchi-Kamchatkan, Eskimo-Aleut, and Na-Dene language families, as well as isolated languages. Missionaries most frequently created primers to teach indigenous people to read and write in their native languages, aiming for their adoption of Russian. This led missionaries to develop new alphabets based on the Russian civil script. The article addresses the problems of classifying missionary linguistic works and examines the popular linguistic theories used by Russian missionaries, particularly language affinity and the comparative method. It also explores the role of Russian missionaries, who were lay philologists and often children of Orthodox priests. The article identifies missionary grammars of 17 languages and examines the reasons for their creation, as well as the methods used to describe one language in terms of another. It concludes by discussing the difficulties in classifying this extensive missionary linguistic material.
Raquel Montero, Natalia Moskvina, Paolo Morosi et al.
Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the logic, pragmatic, and numerical domains, the exact reasons for the poor performance are still unclear. This paper looks at three key features of human quantification shared cross-linguistically that have remained so far unexplored in the (M)LLM literature: the ordering of quantifiers into scales, the ranges of use and prototypicality, and the biases inherent in the human approximate number system. The aim is to determine how these features are encoded in the models' architecture, how they may differ from humans, and whether the results are affected by the type of model (thinking vs. instruct) and the language under investigation. Results show that although thinking models showed a high accuracy in the numerosity estimation task and in the organization of quantifiers into scales, there are still key differences between humans and LLMs across all model types, particularly in terms of ranges of use and prototypicality values. This work, thus, paves the way for addressing the nature of MLLMs as semantic and pragmatic agents, while the cross-linguistic lens can elucidate whether their abilities are robust and stable across different languages.
Andreas Madsack, Johanna Heininger, Adela Schneider et al.
One approach for multilingual data-to-text generation is to translate grammatical configurations upfront from the source language into each target language. These configurations are then used by a surface realizer and in document planning stages to generate output. In this paper, we describe a rule-based NLG implementation of this approach where the configuration is translated by Neural Machine Translation (NMT) combined with a one-time human review, and introduce a cross-language grammar dependency model to create a multilingual NLG system that generates text from the source data, scaling the generation phase without a human in the loop. Additionally, we introduce a method for human post-editing evaluation on the automatically translated text. Our evaluation on the SportSett:Basketball dataset shows that our NLG system performs well, underlining its grammatical correctness in translation tasks.
Iris Ferrazzo
Data elicitation from human participants is one of the core data collection strategies used in empirical linguistic research. The amount of participants in such studies may vary considerably, ranging from a handful to crowdsourcing dimensions. Even if they provide resourceful extensive data, both of these settings come alongside many disadvantages, such as low control of participants' attention during task completion, precarious working conditions in crowdsourcing environments, and time-consuming experimental designs. For these reasons, this research aims to answer the question of whether Large Language Models (LLMs) may overcome those obstacles if included in empirical linguistic pipelines. Two reproduction case studies are conducted to gain clarity into this matter: Cruz (2023) and Lombard et al. (2021). The two forced elicitation tasks, originally designed for human participants, are reproduced in the proposed framework with the help of OpenAI's GPT-4o-mini model. Its performance with our zero-shot prompting baseline shows the effectiveness and high versatility of LLMs, that tend to outperform human informants in linguistic tasks. The findings of the second replication further highlight the need to explore additional prompting techniques, such as Chain-of-Thought (CoT) prompting, which, in a second follow-up experiment, demonstrates higher alignment to human performance on both critical and filler items. Given the limited scale of this study, it is worthwhile to further explore the performance of LLMs in empirical Linguistics and in other future applications in the humanities.
Massimo Daul, Alessio Tosolini, Claire Bowern
Automatic speech recognition (ASR) is a crucial tool for linguists aiming to perform a variety of language documentation tasks. However, modern ASR systems use data-hungry transformer architectures, rendering them generally unusable for underresourced languages. We fine-tune a wav2vec2 ASR model on Yan-nhangu, a dormant Indigenous Australian language, comparing the effects of phonemic and orthographic tokenization strategies on performance. In parallel, we explore ASR's viability as a tool in a language documentation pipeline. We find that a linguistically informed phonemic tokenization system substantially improves WER and CER compared to a baseline orthographic tokenization scheme. Finally, we show that hand-correcting the output of an ASR model is much faster than hand-transcribing audio from scratch, demonstrating that ASR can work for underresourced languages.
Jonathan Harrington, Michele Gubian, Pia Greca
In ongoing sound changes, a coarticulatory effect is often enhanced as the coarticulatory source that gives rise to it wanes. But quite how phonologisation and these reciprocal coarticulatory changes are connected is still poorly understood. The present study addresses this issue through an acoustic analysis of metaphony, which like umlaut has its phonetic origins in VCV coarticulation, and which was analysed in three geographically proximal varieties spoken in the so-called Lausberg area in Southern Italy. The corpus was of 35 speakers producing mostly disyllabic words with phonetically mid stem vowels and suffix vowels that varied in phonetic height. The results of functional principal components analysis applied to the stem vowels’ first two formant frequencies showed a progressively greater enhancement to the vowel stem across the three regions that was characterised by raising, diphthongisation, and then further raising and monophthongisation. Suffix erosion was quantified by counting deletions and the degree of vowel centralisation. The analysis showed a reciprocal relationship between stem enhancement and suffix erosion across, but not within, the three dialects. Overall, the results suggest that a trade-off of cues between suffix and stem vowel has progressed to different degrees between the three varieties.
Marthe OYANE METOGHO
Based on L’Etat honteux by Sony Labou Tansi, the study constructs the archetype of the African postcolonial state. The portrait that emerges is an aggregate of motifs drawn from colonial discourse. This State/state ends in post-transition novels, a space for the construction of authorial utopias. This problematizes the relevance of discourses carried by concepts that drain ideological postures
Sonia González Cruz
In recent years, there has been a notable rise in the portrayal of LGBTQ+ characters in TV series available on online platforms. This poses a challenge for translators from a linguistic, social and cultural perspective, as they need to deal with the transference of fictional speech according to diverse identities. In this respect, translators are not only in charge of translating the fictional speech for a given audiovisual product to be either subtitled or dubbed into a different language, but they have the role of conveying and preserving LGBTQ+ characters’ identities accurately. The objective of this paper is to analyze the translation of sex-related language in TV series with LGBTQ+ representation. On the basis of a selected corpus of two different English-language TV series (Euphoria and Sex Education), this descriptive study analyzes the fictional speech of several LGBTQ+ characters and focuses on the translation of sex-related language from English into Spanish in both their dubbed and subtitled versions. The translation strategies used to render sex-related conversations when translating audiovisual fiction are discussed throughout the study in order to show different ways of facing the translation of specific sexual expressions. In this respect, the study intends to highlight the fact that all decisions made when translating fictional conversations that LGBTQ+ characters have about sex may have an influence on the representation of several topics such as sexuality, gender or identity. The study also discusses how other aspects such as the translation of inclusive language and the expression of gender identity may also affect the portrayal of LGBTQ+ characters.
Abrar Rahman, Garry Bowlin, Binit Mohanty et al.
This paper presents a comprehensive study on the tokenization techniques employed by state-of-the-art large language models (LLMs) and their implications on the cost and availability of services across different languages, especially low resource languages. The analysis considers multiple LLMs, including GPT-4 (using cl100k_base embeddings), GPT-3 (with p50k_base embeddings), and DaVinci (employing r50k_base embeddings), as well as the widely used BERT base tokenizer. The study evaluates the tokenization variability observed across these models and investigates the challenges of linguistic representation in subword tokenization. The research underscores the importance of fostering linguistically-aware development practices, especially for languages that are traditionally under-resourced. Moreover, this paper introduces case studies that highlight the real-world implications of tokenization choices, particularly in the context of electronic health record (EHR) systems. This research aims to promote generalizable Internationalization (I18N) practices in the development of AI services in this domain and beyond, with a strong emphasis on inclusivity, particularly for languages traditionally underrepresented in AI applications.
Hassane Kissane, Achim Schilling, Patrick Krauss
This study investigates the internal representations of verb-particle combinations within transformer-based large language models (LLMs), specifically examining how these models capture lexical and syntactic nuances at different neural network layers. Employing the BERT architecture, we analyse the representational efficacy of its layers for various verb-particle constructions such as 'agree on', 'come back', and 'give up'. Our methodology includes a detailed dataset preparation from the British National Corpus, followed by extensive model training and output analysis through techniques like multi-dimensional scaling (MDS) and generalized discrimination value (GDV) calculations. Results show that BERT's middle layers most effectively capture syntactic structures, with significant variability in representational accuracy across different verb categories. These findings challenge the conventional uniformity assumed in neural network processing of linguistic elements and suggest a complex interplay between network architecture and linguistic representation. Our research contributes to a better understanding of how deep learning models comprehend and process language, offering insights into the potential and limitations of current neural approaches to linguistic analysis. This study not only advances our knowledge in computational linguistics but also prompts further research into optimizing neural architectures for enhanced linguistic precision.
Mohammad Jalili Torkamani
Understanding and extracting the grammar of a domain-specific language (DSL) is crucial for various software engineering tasks; however, manually creating these grammars is time-intensive and error-prone. This paper presents Kajal, a novel approach that automatically infers grammar from DSL code snippets by leveraging Large Language Models (LLMs) through prompt engineering and few-shot learning. Kajal dynamically constructs input prompts, using contextual information to guide the LLM in generating the corresponding grammars, which are iteratively refined through a feedback-driven approach. Our experiments show that Kajal achieves 60% accuracy with few-shot learning and 45% without it, demonstrating the significant impact of few-shot learning on the tool's effectiveness. This approach offers a promising solution for automating DSL grammar extraction, and future work will explore using smaller, open-source LLMs and testing on larger datasets to further validate Kajal's performance.
Halaman 17 dari 221635