Distributional Semantics and Linguistic Theory
Gemma Boleda
Distributional semantics provides multidimensional, graded, empirically induced word representations that successfully capture many aspects of meaning in natural languages, as shown by a large body of research in computational linguistics; yet, its impact in theoretical linguistics has so far been limited. This review provides a critical discussion of the literature on distributional semantics, with an emphasis on methods and results that are relevant for theoretical linguistics, in three areas: semantic change, polysemy and composition, and the grammar–semantics interface (specifically, the interface of semantics with syntax and with derivational morphology). The goal of this review is to foster greater cross-fertilization of theoretical and computational approaches to language as a means to advance our collective knowledge of how it works.
245 sitasi
en
Computer Science
Subword-Based Comparative Linguistics across 242 Languages Using Wikipedia Glottosets
Iaroslav Chelombitko, Mika Hämäläinen, Aleksey Komissarov
We present a large-scale comparative study of 242 Latin and Cyrillic-script languages using subword-based methodologies. By constructing 'glottosets' from Wikipedia lexicons, we introduce a framework for simultaneous cross-linguistic comparison via Byte-Pair Encoding (BPE). Our approach utilizes rank-based subword vectors to analyze vocabulary overlap, lexical divergence, and language similarity at scale. Evaluations demonstrate that BPE segmentation aligns with morpheme boundaries 95% better than random baseline across 15 languages (F1 = 0.34 vs 0.15). BPE vocabulary similarity correlates significantly with genetic language relatedness (Mantel r = 0.329, p < 0.001), with Romance languages forming the tightest cluster (mean distance 0.51) and cross-family pairs showing clear separation (0.82). Analysis of 26,939 cross-linguistic homographs reveals that 48.7% receive different segmentations across related languages, with variation correlating to phylogenetic distance. Our results provide quantitative macro-linguistic insights into lexical patterns across typologically diverse languages within a unified analytical framework.
Innovative Approaches in English Language Teaching: A Comparative Analysis of Traditional and Modern Methods
Amira A. Hashim
With English turning out to be a needed international tool for communication, debate after debate has risen on what methodology best teaches the language. This review article covers a range of ELT methodologies and their evolution, from traditional approaches contrasted with modern methods to strengths, limitations, and the place of technology within them. Traditional approaches to teaching, like grammar-translation, audio-lingual, and direct instruction, focus on the precept of structured learning; they focus on grammar rules and vocabulary. These methods, however, very often result in the negligence of communicative competence and also fail to engage learners actively, which in turn limits their real-life applicability. In contrast, contemporary methods such as CLT, TBL and TELL focus on interaction, fluency and learner autonomy. These methods, while promoting practical use of the language and integrating technology to enhance learning, also have tools for personalized and adaptive learning. However, the challenges posed by these methods, such as the digital divide and resistance to change among educators, are a significant barrier. Also very important is the need for interaction between traditional and modern, leading to a balanced framework by context sensitivity, which tries to maximize both linguistic and communicative proficiency. These recommendations then conclude the paper with a review of ideas on how best to synthesize innovative methods with old practices, emphasizing continuous professional development, investment in technology, and research into the longitudinal effects of hybrid teaching. With these factors in mind, educators will be able to accommodate the ever-changing nature of ELT and make sure learning is effective for students of diverse backgrounds.
Linguistic misconceptions of tautology in the English second language context among student educators
Innocent Zitha, Lutendo Nendauni
Abstract There is a misconception that tautology is an emphasis rather than a semantic error in English second language contexts. Subsequently, it is considered enablement for emphasis to disambiguate the meaning of the phrases and statements. This effect has been widely observed on different platforms of communication. Hence, the explicit purpose of this article is to linguistically identify and evaluate common illusions of tautology by English student educators in their academic writing. The researchers adopted the contrastive analysis theory as the theoretical point of departure to pursue the study’s aim. Furthermore, this article adopted a statistic descriptivism design embedded in the qualitative approach. The data for this study were collected through essay scripts from 30 purposively selected third-year English major students enrolled for a bachelor’s degree in education at the University of Venda. The researchers read and analysed the selected essay scripts for prevalent error tagging and classified them into error types. The findings of this study reported the following categories of redundancy errors exhibited: semantic redundancies, double comparatives, double superlatives, redundancy, and double negation. The major causes of redundancy errors are ascribed to fossilisation, ignorance of rule restrictions, overgeneralisation and false concepts hypothesised. Moreover, the key contribution of this article is addressing and changing the predominant misconception of semantic standard error through mitigating strategies. More attention needs to be paid to this area because tautology is considered stylish writing, while it is a semantic error.
Perceptions of Korean–Korean Sign Language Grammar among Students Majoring in Korean Sign Language Education and Implications for Korean Grammar Education
Hyeoung-gil Jeon
This study examines how graduate students majoring in Korean Sign Language (KSL) Education perceive the grammatical systems of Korean and Korean Sign Language from the perspective of Korean grammar education, and explores the implications for grammar instruction. A qualitative analysis was conducted on 41 final course reports using manual coding to identify major categories and tendencies. The results show that 85.4% of the reports reflected awareness of morpho-syntactic features such as word order, particles, morphological changes, and signunits, indicating a tendency to describe KSL structure by referencing Korean grammar. In addition, 65.9% of the reports focused on visual-linguistic character istics including simultaneity, spatiality, iconicity, and non-linearity, recognizing KSL as a natural language with a grammatical system distinct from Korean. Furthermore, 53.7% of the reports proposed educational and practical applications, such as interpreting strategies and instructional approaches grounded in actual field experience. Analysis of perceptual trends revealed complementary trajectories, with hearing students moving from theory to practice and Deaf students from experiential intuition to conceptual understanding. These findings suggest that Korean grammatical knowledge functions as a reference frame for analyzing KSL and can be extended to bilingual and multimodal approaches to grammar education that incorporates visual and spatial linguistic features.
A Comparative Study of Non-English Major Saudi Students’ Perceptions Toward Using Arabic in Teaching English as a Foreign Language
Abdullah Alshayban
In this paper, I examined gender dynamics in Saudi students' perceptions of the use of their first language (L1; Arabic) in English classrooms. Using a mixed-methods approach, I used a questionnaire with 400 students (200 men, 200 women) that assessed their attitudes towards using L1 for vocabulary, grammar, classroom atmosphere, anxiety reduction, and teacher approachability. There is a lot of gender difference: Females report much more positive attitudes toward L1 use on all factors, especially for vocabulary, grammar, anxiety reduction, and teacher approachability. L1 gives reassurance and lowers anxiety for women, who value it qualitatively, but men prioritize L2 exposure and L2 proficiency (hence express less of a preference for L1). Overall, while the differences in classroom atmosphere are not statistically significant, the broad patterns of the findings indicate that women are more likely to perceive that L1 use helped to reduce anxiety and to facilitate comprehension. These findings reinforce the notion that gender influences the ways in which learners view language learning strategies. This study contributes to existing literature by providing more recent evidence from Saudi context, taking into account further consideration of the role of gender and the provision of linguistic support in the EFL classrooms. Strategies for teaching men and women could be revised taking into account these gender-specific differences to better meet emotional and cognitive needs (for example, perhaps women need to be able to use L1 to some extent but in a strategic way), which could in turn promote engagement and make it easier to overcome barriers stemming from EFL.
MIND MAPS: THEORY AND PRACTICAL APPLICATION IN UKRAINIAN LANGUAGE CLASSES
L. Nazarevych, H. Matsiuk
The article explores the use of mind maps in teaching Ukrainian as a foreign language. Their role in developing coherent speech, particularly in acquiring grammar, vocabulary, and sentence construction, is examined. The authors argue that mind maps facilitate the systematization of knowledge, help students visualize grammatical structures, and establish logical connections between linguistic units. The article analyzes various types of mind maps tested at Ivan Puluj Ternopil National Technical University and the University of Wroclaw, including the maps «Please», «Love», «Local Case», «Walk», «My Friend», «Dmytro and Emma’s Day», and others. Their functions are outlined, such as visualizing grammatical rules, activating vocabulary, and developing coherent speech through interactive tasks on different platforms. Particular attention is given to the integration of mind maps into tandem learning and their role in intercultural communication. The methodological foundation of the study is a text-centered approach, which involves using mind maps as a tool for the gradual complication of language material. The article discusses exercises focused on changing tense forms, building narratives, conducting comparative analysis, and creatively modeling speech situations. Additionally, the study explores the potential of digital resources for creating and modifying mind maps, which helps optimize the learning process and enhance students’ motivation for independent study. The findings suggest that mind maps are effective in teaching Ukrainian as a foreign language, as they combine traditional and innovative methods, facilitate deeper learning, and encourage interactive engagement. The study’s results can be applied in the educational process of UMI (Ukrainian as a Foreign Language) to refine teaching methodologies and expand the range of pedagogical tools. Key words: mind maps, memory maps, text-centered approach, development of coherent speech, visualization, keywords, Ukrainian as a foreign language, Xmind.
Exploring Language Features of Male and Female Speakers in Pakistani TEDx Talks: A Corpus-based Comparative Analysis
Ravail Shaukat, Dr. Aniqa Rashid, Moushaffa Shahid
The research examines gender-based linguistic patterns in Pakistani TEDx Talks by analysing language features between male and female presenters. The study examines how genders employ specific speaking techniques in public discourse and assesses the applicability of Lakoff’s (1975) Deficit Model and Tannen’s Difference Model in contemporary professional communication environments. This investigation contributes to the global discourse on language and gender by highlighting shifts in formal linguistic patterns. The research methodology combines quantitative methods with qualitative evaluation for this investigation. A dataset containing ten TEDx Talks was obtained from YouTube. The analysis software AntConc 3.4.4w produced quantitative measurements before human researchers conducted qualitative interpretation of the results. The research demonstrates that Lakoff’s theory provides a basic understanding, yet fails to explain all the intricacies of gendered language behaviour in formal speech contexts. The use of hedges and other linguistic indicators does not support the assumption that their presence indicates powerlessness in male communication. Rhetorical techniques alongside pragmatic communication methods make up the reason behind their language usage in professional settings. Although Tannen’s Model provides flexible interpretations of speech style, it retains binary sex categories that do not capture how speakers adjust their communication approaches based on context. The research examines how cultural and contextual factors, along with global conditions, influence the use of the Pakistani language in TEDx Talks. Professional interaction requires speakers to use language strategically, rather than adhering to rules defined by gender norms. The analysis utilises gendered language and linguistic features through corpus-based research of TEDx Talks within spoken discourse. References Ali, A., & Shakir, A. (2024). Gender differences in the use of boosters in Pakistani opinion columns: A corpus-based study. Linguistic Forum - A Journal of Linguistics, 6(1), 1–15. Anjum, R. Y., Amjad, F., Yousaf, S., & Manzoor, F. (2018). Gender-based linguistic variations in Urdu language and their role in the suppression of females. Journal of Business and Social Review in Emerging Economies, 4(2), 231–248. Asghar, J., & Zahra, T. (2021). Gender-based linguistic variation in Pakistani IELTS argumentative essays: A multidimensional analysis. Pakistan Journal of Language Studies, 5(2), 45–60. Aziz, M., & Kamal, S. (2020). Gender stereotypes in the language of Pakistani newspapers. Journal of Gender and Social Issues, 19(1), 15–30. Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge University Press. Cameron, D. (2003). Gender and language ideologies. In The handbook of language and gender (pp. 447–467). Oxford University Press. Eckert, P., & McConnell-Ginet, S. (2013). Language and gender (2nd ed.). Cambridge University Press. Elmahdi, O. E. H., Balla, A. A. S., & Abdelrady, A. H. (2024). Gender variations in linguistic styles across online platforms: A thematic analysis. International Journal of Linguistics, Literature and Translation, 7(12), 62–70. https://doi.org/10.32996/ijllt Farooq, M. U., & Nawaz, S. (2021). Linguistic politeness of Pakistani English and British English speakers: A comparative study. Cogent Arts & Humanities, 8(1), 1996917. Gu, L. (2013). Language and gender: Differences and similarities. In Proceedings of the 2013 International Conference on Advances in Social Science, Humanities, and Management (pp. 248–251). Atlantis Press. Habib, A., & Jamil, S. (2022). Gendered language in Pakistani news reporting: A critical discourse analysis. Journal of Media Studies, 37(2), 88–105. Holmes, J. (1984). Hedging your bets and sitting on the fence: Some evidence for hedges as support structures. Te Reo, 27, 47–62. Holmes, J. (1992). An introduction to sociolinguistics. Longman. Holmes, J. (2006). Gendered talk at work. Blackwell Publishing. Holmes, J. (2013). An introduction to sociolinguistics (4th ed.). Routledge. Hymes, D. (1974). Foundations in sociolinguistics: An ethnographic approach. University of Pennsylvania Press. Jay, T. (1992). Cursing in America: A psycholinguistic study of dirty language in the courts, the movies, and daily life. John Benjamins. Johnson, R. B., Onwuegbuzie, A. J., & Turner, L. A. (2007). Toward a definition of mixed methods research. Journal of Mixed Methods Research, 1(2), 112–133. Khan, S., & Ali, R. (2020). Gendered language practices in English-medium educational institutions in Pakistan. International Journal of Linguistics and Communication, 8(1), 15–28. Labov, W. (1972). Sociolinguistic patterns. University of Pennsylvania Press. Lakoff, R. (1975). Language and woman’s place. Harper & Row. Mahmood, R., & Iqbal, Z. (2018). Use of interactive and interactional metadiscourse features in Pakistani English newspaper editorials. Journal of Language and Politics, 17(6), 812– 830. Nawaz, S., & Perveen, S. (2023). Gender-based linguistic variations in Pakistani parliamentary debates. Pakistan Journal of Language Studies, 7(1), 22–39. Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. Longman. Rafi, M. S., & Yasmin, F. (2021). Gender differences in language use in Pakistani blogs: A content analysis. Journal of Digital Media & Policy, 12(3), 295–312. Shahid, A., Arshad, M., & Shaukat, S. (2024). Lexical insights into ‘embracing change’ in Pakistani TEDx talks: A corpus-based study. Policy Research Journal. Shaukat, R., Shahid, M., & Arslan, M. F. (2024). Pakistani TED Talks: A corpus-based comparative analysis of interactional metadiscourse markers across gender. Journal of Applied Linguistics and TESOL (JALT), 7(4), 611–627. Sheyholislami, J. (2001). Critical discourse analysis. Unpublished manuscript, Carleton University, Ottawa, ON. Siddique, M., & Iqbal, Z. (2018). Interactive and interactional metadiscourse in Pakistani English newspaper editorials: A comparative study. Journal of Language and Politics, 17(6), 812–830. Tariq, M., & Hassan, N. (2023). Gender-based differences in storytelling among Pakistani speakers: A narrative analysis. Narrative Inquiry, 33(1), 45–62. Tran, T. D., Nguyen, L. N. T., Nguyen, H. T., Dang, T. B. D., & Au, M. T. (2025). Language and gender: How societal norms influence communication and implications for language teaching. International Journal of Innovative Research and Scientific Studies, 8(1), 26– 32. http://www.ijirss.com Trudgill, P. (1972). Sex, covert prestige, and linguistic change in the urban British English of Norwich. Language in Society, 1(2), 179–195. Trudgill, P. (1974). Sociolinguistics. Penguin Books. Trudgill, P. (2000). Sociolinguistics: An introduction to language and society (4th ed.). Penguin Books. Yasmin, F., & Rafi, M. S. (2021). Gender differences in language use in Pakistani blogs: A content analysis. Journal of Digital Media & Policy, 12(3), 295–312. Yasmin, R., & Anjum, F. (2018). Gender-based linguistic variations in Urdu language and their role in suppression of females. Journal of Business and Social Review in Emerging Economies, 4(2), 231–248. Zafar, M., & Mehmood, T. (2019). Gendered language in Pakistani ESL classrooms: A sociolinguistic study. International Journal of English Linguistics, 9(5), 360–375. Zahra, T., & Asghar, J. (2021). Gender-based linguistic variation in Pakistani IELTS argumentative essays: A multidimensional analysis. Pakistan Journal of Language Studies, 5(2), 45–60. Zia, A., & Akhtar, N. (2020). Gendered language practices in English-medium educational institutions in Pakistan. International Journal of Linguistics and Communication, 8(1), 15–28. Zubair, S., & Khan, M. (2022). Gendered language in Pakistani news reporting: A critical discourse analysis. Journal of Media Studies, 37(2), 88–105. Zulfiqar, S., & Ahmed, N. (2023). Gender-based linguistic variations in Pakistani parliamentary debates. Pakistan Journal of Language Studies, 7(1), 22–39. Zulqarnain, M., & Saeed, A. (2020). Gender stereotypes in the language of Pakistani newspapers. Journal of Gender and Social Issues, 19(1), 15–30.
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan, Hung-yi Lee
Audio-aware large language models (ALLMs) have recently made great strides in understanding and processing audio inputs. These models are typically adapted from text-based large language models (LLMs) through additional training on audio-related tasks. This adaptation process presents two major limitations. First, ALLMs often suffer from catastrophic forgetting, where crucial textual capabilities like instruction-following are lost after training on audio data. In some cases, models may even hallucinate sounds that are not present in the input audio, raising concerns about reliability. Second, achieving cross-modal alignment between audio and language typically relies on large collections of task-specific question-answer pairs for instruction tuning, making it resource-intensive. To address these issues, previous works have leveraged the backbone LLMs to synthesize general-purpose, caption-style alignment data. In this paper, we propose a data generation framework that produces contrastive-like training data, designed to enhance ALLMs' ability to differentiate between present and absent sounds. We further extend our approach to multi-audio scenarios, enabling the model to either explain differences between audio inputs or produce unified captions that describe all inputs, thereby enhancing audio-language alignment. We refer to the entire ALLM training framework as bootstrapping audio-language alignment via synthetic data generation from backbone LLMs (BALSa). Experimental results indicate that our method effectively mitigates audio hallucinations while reliably maintaining strong performance on audio understanding and reasoning benchmarks, as well as instruction-following skills. Moreover, incorporating multi-audio training further enhances the model's comprehension and reasoning capabilities. Overall, BALSa offers an efficient and scalable approach to developing ALLMs.
Large Language Models as Proxies for Theories of Human Linguistic Cognition
Imry Ziv, Nur Lan, Emmanuel Chemla
et al.
We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relatively linguistically-neutral in their representations and learning but differ from current LLMs in key ways. We illustrate this potential use of LLMs as proxies for theories of cognition in the context of two kinds of questions: (a) whether the target theory accounts for the acquisition of a given pattern from a given corpus; and (b) whether the target theory makes a given typologically-attested pattern easier to acquire than another, typologically-unattested pattern. For each of the two questions we show, building on recent literature, how current LLMs can potentially be of help, but we note that at present this help is quite limited.
Retrospex: Language Agent Meets Offline Reinforcement Learning Critic
Yufei Xiang, Yiqun Shen, Yeqin Zhang
et al.
Large Language Models (LLMs) possess extensive knowledge and commonsense reasoning capabilities, making them valuable for creating powerful agents. However, existing LLM agent frameworks have not fully utilized past experiences for improvement. This work introduces a new LLM-based agent framework called Retrospex, which addresses this challenge by analyzing past experiences in depth. Unlike previous approaches, Retrospex does not directly integrate experiences into the LLM's context. Instead, it combines the LLM's action likelihood with action values estimated by a Reinforcement Learning (RL) Critic, which is trained on past experiences through an offline ''retrospection'' process. Additionally, Retrospex employs a dynamic action rescoring mechanism that increases the importance of experience-based values for tasks that require more interaction with the environment. We evaluate Retrospex in ScienceWorld, ALFWorld and Webshop environments, demonstrating its advantages over strong, contemporary baselines.
How Linguistics Learned to Stop Worrying and Love the Language Models
Richard Futrell, Kyle Mahowald
Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don't really learn language and also that, even if they did, that would not be informative for the study of human learning and processing. On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure. We argue that both extremes are wrong. LMs can contribute to fundamental questions about linguistic structure, language processing, and learning. They force us to rethink arguments and ways of thinking that have been foundational in linguistics. While they do not replace linguistic structure and theory, they serve as model systems and working proofs of concept for gradient, usage-based approaches to language. We offer an optimistic take on the relationship between language models and linguistics.
Model Merging to Maintain Language-Only Performance in Developmentally Plausible Multimodal Models
Ece Takmaz, Lisa Bylinina, Jakub Dotlacil
State-of-the-art vision-and-language models consist of many parameters and learn from enormous datasets, surpassing the amounts of linguistic data that children are exposed to as they acquire a language. This paper presents our approach to the multimodal track of the BabyLM challenge addressing this discrepancy. We develop language-only and multimodal models in low-resource settings using developmentally plausible datasets, with our multimodal models outperforming previous BabyLM baselines. One finding in the multimodal language model literature is that these models tend to underperform in \textit{language-only} tasks. Therefore, we focus on maintaining language-only abilities in multimodal models. To this end, we experiment with \textit{model merging}, where we fuse the parameters of multimodal models with those of language-only models using weighted linear interpolation. Our results corroborate the findings that multimodal models underperform in language-only benchmarks that focus on grammar, and model merging with text-only models can help alleviate this problem to some extent, while maintaining multimodal performance.
Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models
Eva Portelance, Siva Reddy, Timothy O'Donnell
Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to help later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.
3 sitasi
en
Computer Science
Language Game in Advertising and Its Impact on Consumers
Anna E. Bazanova, Mohamed Alsadig Hamid Musa
The concept and phenomenon of a language game, its main functions, types and application in commercial advertising is a way to attract the attention of consumers and promote a product. Examples of phonetic, morphological and syntactic wordplay in the texts of English-language commercial advertising are analyzed. The purpose of the article is to analyze the techniques of a language game and identify their functional features at various levels in an English-language advertising text. In this article, the following methods were used: descriptive-analytical method, interpretation method, search method. When choosing a material for analysis, the method of continuous sampling was used. As the material of the research, we used English-language advertisements in various resources, such as from magazines and newspapers and videos, in which a language game was revealed. Thus, the language game implemented in advertising texts is an important phenomenon, since it contributes to the maximum impact on the consumer, since the recipient, thanks to his techniques and functions, draws attention to this advertisement. In addition, an advertisement in which a language game as present is an indicator of a high level of the consumer’s language competence.
Language. Linguistic theory. Comparative grammar, Semantics
La littérature dans l'enseignement du FLE: interculturalité, subjectivité et réflexivité dans le processus d'appropriation d'une langue étrangère
Rosiane Maria Soares da Silva Xypas, Anne Godard, Simone Aubin
Language. Linguistic theory. Comparative grammar, Literature (General)
Grammatical Alliance in Malay Language: A Study of Linguistic Typology
Ratna Soraya, Mulyadi
This article is entitled Aliansi Gramatikal Na Lingua Melayu: Study of Linguistic Typology. This study of the Malay language's grammatical alliance aims to understand (1) the basic construction of clauses, (2) the construction of complex sentences, (3) the pivotal system, and (4) ultimately the definition of the grammar alliance system. This study uses the theory of language typology as the main theory proposed by Comrie. (1988). These research data are clauses and sentences. Qualitative descriptive. This approach aims to understand the existence of the overlap of different external and internal symptoms of the subject being the subject of the study. Data from the coordinative construction and subordinative construction of the Malay language typologically leads to the finding that in syntax, the Malay languages treat S equally with A, and give different treatment to P (S`=`A`≠ P). The Malay language is a group of languages that works with the S/A pivot system. A system of grammatical alliances such as this suggests that Malay is syntaxically a language of nominative-akusative type. Observing the behavior of S on the intransitive clause with the behaviour of A and P on the transitive clauses of Malay language which indicates that S is equal to A and equally equal with P, then morphologically Malay language has a tendency as a nominative-akusative language.
Comparative-Contrastive Analysis of Linguistic Resources for Corpus Analysis of Texts
A. Dmitrijev, E. S. Krupnova
In the last few decades, a scientific field known as computational linguistics has been actively developing. The paper discusses the main task of corpus linguistics – corpus analysis of written natural-language texts with the help of linguistic resources that are used to solve it. Corpus analysis refers to a method of language research that utilizes large collections of texts or corpora to obtain statistical and linguistic data about the language. Linguistic resources such as dictionaries, thesauri, and grammatical databases greatly enhance the capability and accuracy of corpus analysis. In addition, corpus linguistics deals with the building of corpus managers that process texts, perform concordance, search for keywords and collocations, etc. The paper briefly describes the functionality of WMatrix, WordSmith, GATE, AntConc and Sketch Engine programs and makes a comparative-contrastive analysis of their characteristics. It is concluded that the programs differ in feature set, data saving parameters, input text format and accessibility. In addition, directions for their use in research and practice are suggested. Linguistic resources can be useful for stylistic analysis of texts, studying linguistic features of author's style, teaching a foreign language, for example, grammar or vocabulary, in computer lexicography, discourse analysis and other directions. The example of the corpus analysis of the topic famine during the blockade of Leningrad with the help of the AntConc program is given. In the course of the mentioned research, 749 fragments of memories of Leningrad citizens were collected on the basis of 15 frequency words and a frequency dictionary of 158 words was compiled. Considered tools not only increase the accuracy of analysis, but also expand the possibilities and integrate into software tools for automation of corpus analysis. The choice of the appropriate tool for the study depends on the scope and depth of text analysis.
Teaching Chinese grammar through International Chinese Language Education micro-lectures: negotiating mass and presence through multimodal pedagogic discourse
Zhigang Yu
Abstract Micro-lectures have become a prevailing resource for teaching Chinese grammar in International Chinese Language Education (ICLE). One crucial feature of these lectures is that they are inherently multimodal and the design of multimodal pedagogic discourse in these lectures is vital for the teaching of Chinese grammar. Based on this background, this paper investigates teaching Chinese grammar through multimodal pedagogic discourse in ICLE micro-lectures, focusing on the organization of complexity and abstraction of meaning for knowledge-building. Drawing on Systemic Functional Linguistics’ genre theory and ideational mass and presence, this paper views the organization as a dynamic negotiation of mass and presence across generic stages. Analyzing a representative ICLE micro-lecture on grammar, it scrutinizes the distribution of mass and presence across stages, along with multimodal pedagogic discourse features. The findings show that Presenting Scenarios, Example Extraction, and Grammar Explanation are pivotal stages for grammar instruction, each characterized by distinct mass and presence. Presenting Scenarios, featuring relatively weak mass and strong presence, employs non-technical and concrete multimodal texts to depict everyday scenarios, while Example Extraction with similar mass but weaker presence recontextualizes these scenarios into linguistic phenomena through non-technical linguistic text. Grammar Explanation characterized by relatively strong mass and weak presence distills grammatical knowledge from example sentences through technical and abstract linguistic text. Overall, the weakening of presence across the stages allows for recontextualizing scenario-based sentences as linguistic phenomena and generalizing these sentences into abstract grammatical concepts, while the strengthening of mass enables distilling meaning from example sentences and builds the complexity of grammatical concepts. The findings hold potential implications for the design of ICLE micro-lectures on Chinese grammar, which aims to facilitate the teaching of Chinese grammar through multimodal pedagogical discourse.
CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Nikola Ljubešić, Taja Kuzman
This paper presents a collection of highly comparable web corpora of Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole spectrum of official languages in the South Slavic language space. The collection of these corpora comprises a total of 13 billion tokens of texts from 26 million documents. The comparability of the corpora is ensured by a comparable crawling setup and the usage of identical crawling and post-processing technology. All the corpora were linguistically annotated with the state-of-the-art CLASSLA-Stanza linguistic processing pipeline, and enriched with document-level genre information via the Transformer-based multilingual X-GENRE classifier, which further enhances comparability at the level of linguistic annotation and metadata enrichment. The genre-focused analysis of the resulting corpora shows a rather consistent distribution of genres throughout the seven corpora, with variations in the most prominent genre categories being well-explained by the economic strength of each language community. A comparison of the distribution of genre categories across the corpora indicates that web corpora from less developed countries primarily consist of news articles. Conversely, web corpora from economically more developed countries exhibit a smaller proportion of news content, with a greater presence of promotional and opinionated texts.