Saadi Lahlou, Annabelle Gouttebroze, Atrina Oraee
et al.
We qualitatively compared literature reviews produced with varying degrees of AI assistance. The same LLM, given the same corpus of 280 papers but different selections, produced dramatically different reviews, from mainstream and politically neutral to critical and post-colonial, though neither orientation was intended. LLM outputs always appear at first glance to be well written, well informed and thought out, but closer reading reveals gaps, biases and lack of depth. Our comparison of six versions shows a series of pitfalls and suggests precautions necessary when using AI assistance to make a literature review. Main issues are: (1) The bias of ignorance (you do not know what you do not get) in the selection of relevant papers. (2) Alignment and digital sycophancy: commercial AI models slavishly take you further in the direction they understand you give them, reinforcing biases. (3) Mainstreaming: because of their statistical nature, LLM productions tend to favor mainstream perspectives and content; in our case there was only 20% overlap between paper selections by humans and the LLM. (4) Limited capacity for creative restructuring, with vague and ambiguous statements. (5) Lack of critical perspective, coming from distant reading and political correctness. Most pitfalls can be addressed by prompting, but only if the user knows the domain well enough to detect them. There is a paradox: producing a good AI-assisted review requires expertise that comes from reading the literature, which is precisely what AI was meant to reduce. Overall, AI can improve the span and quality of the review, but the gain of time is not as massive as one would expect, and a press-button strategy leaving AI to do the work is a recipe for disaster. We conclude with recommendations for those who write, or assess, such LLM-augmented reviews.
مقاله حاضر نگاهی به پدیده حسآمیزی از دیدگاه نظریه استعاره مفهومی است و به تحلیل حسآمیزیهای موجود در شعر سهراب سپهری پرداختهاست. بعضی از پژوهشگران علوم شناختی، با پیش چشم داشتن سلسلهمراتب حواس اولمان، حسآمیزی را گونهای از استعاره مفهومی دانستهاند؛ همچنانکه در ساخت استعاره مفهومی، یک حوزۀ مفهومی عینی بر یک حوزۀ مفهومی انتزاعی افکنده میشود تا آن را روشن سازد، در حسآمیزی نیز عناصر مربوط به حواس ضعیفتر به حیطه حواس قویتر برده میشوند تا موجب درک بهتری از مفهوم شوند. این مقاله با روش توصیفیتحلیلی کوشیدهاست از منظر متفاوتی به صنعت پرتکرار حسآمیزی در شعر سپهری نگاه کند و جسمانیت مستتر در ظاهر انتزاعی آثار او را نشان دهد. نتیجۀ حاصل از بررسی حدود 100 نمونه در اشعار سپهری، نشان میدهد که تقریباً تمام حسآمیزیهای موجود در شعر او، استعارههایی مفهومی هستند که از سلسلهمراتب حواس اولمان تبعیت میکنند. به عبارت دیگر، سپهری همواره در حسآمیزیهایش مفهوم انتزاعی را به عینیت و تجسم نزدیک کردهاست. این نکته نمایانگر آن است که شعر او برخلاف ظاهر انتزاعی، باطنی جسمیتیافته و کاملاً زمینی دارد.
Indo-Iranian languages and literature, Languages and literature of Eastern Asia, Africa, Oceania
Coding is a fundamental skill required in the engineering discipline, and much work exists exploring better ways of teaching coding in the higher education context. In particular, Code Snippets (CSs) are approved to be an effective way of introducing programming language units to students. CSs are portions of source code of varying size and content. They can be used in a myriad of ways, one of which is to teach the code they contain as well as its function. To further explore the use of CSs, a pedagogical summer internship project was set up at the Warwick Manufacturing Group (WMG). The scope of the considerations for the study derives from an educational standpoint. Within the evaluations made, the focus was primarily given to pieces of information which proved to provide evidence pertaining to the methodology involved in either teaching or developing teaching materials. By taking the results produced into account from a pedagogical perspective, it was found that several qualities of popular code snippet tutorials which benefit or hinder the learning process, including code length, interactivity, further support, and quality of explanation. These qualities are then combined and used to present a plan for the design of an effective learning resource which makes use of code snippets.
Mahtab Jamali, Paul Davidsson, Reza Khoshkangini
et al.
Context is an important factor in computer vision as it offers valuable information to clarify and analyze visual data. Utilizing the contextual information inherent in an image or a video can improve the precision and effectiveness of object detectors. For example, where recognizing an isolated object might be challenging, context information can improve comprehension of the scene. This study explores the impact of various context-based approaches to object detection. Initially, we investigate the role of context in object detection and survey it from several perspectives. We then review and discuss the most recent context-based object detection approaches and compare them. Finally, we conclude by addressing research questions and identifying gaps for further studies. More than 265 publications are included in this survey, covering different aspects of context in different categories of object detection, including general object detection, video object detection, small object detection, camouflaged object detection, zero-shot, one-shot, and few-shot object detection. This literature review presents a comprehensive overview of the latest advancements in context-based object detection, providing valuable contributions such as a thorough understanding of contextual information and effective methods for integrating various context types into object detection, thus benefiting researchers.
We introduce a new technique for repairing syntax errors in arbitrary context-free languages. This technique models syntax repair as a language intersection problem by defining a finite language that provably generates every syntactically valid repair within a given edit distance. Leveraging a theoretical connection between the Bar-Hillel construction from formal language theory and CFL reachability from program analysis, we show that repairability in a finite number of typographic edits is polylogarithmic parallel time decidable and provide an enumeration algorithm based on the Brzozowski derivative. Finally, we evaluate this algorithm and its implementation, demonstrating state-of-the-art results on a Python syntax repair benchmark.
Jafar Isbarov, Kavsar Huseynova, Elvin Mammadov
et al.
The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.
Sina Bagheri Nezhad, Ameeta Agrawal, Rhitabrat Pokharel
Multilingual language models (MLLMs) are crucial for handling text across various languages, yet they often show performance disparities due to differences in resource availability and linguistic characteristics. While the impact of pre-train data percentage and model size on performance is well-known, our study reveals additional critical factors that significantly influence MLLM effectiveness. Analyzing a wide range of features, including geographical, linguistic, and resource-related aspects, we focus on the SIB-200 dataset for classification and the Flores-200 dataset for machine translation, using regression models and SHAP values across 204 languages. Our findings identify token similarity and country similarity as pivotal factors, alongside pre-train data and model size, in enhancing model performance. Token similarity facilitates cross-lingual transfer, while country similarity highlights the importance of shared cultural and linguistic contexts. These insights offer valuable guidance for developing more equitable and effective multilingual language models, particularly for underrepresented languages.
Machine Translation has made impressive progress in recent years offering close to human-level performance on many languages, but studies have primarily focused on high-resource languages with broad online presence and resources. With the help of growing Large Language Models, more and more low-resource languages achieve better results through the presence of other languages. However, studies have shown that not all low-resource languages can benefit from multilingual systems, especially those with insufficient training and evaluation data. In this paper, we revisit state-of-the-art Neural Machine Translation techniques to develop automatic translation systems between German and Bavarian. We investigate conditions of low-resource languages such as data scarcity and parameter sensitivity and focus on refined solutions that combat low-resource difficulties and creative solutions such as harnessing language similarity. Our experiment entails applying Back-translation and Transfer Learning to automatically generate more training data and achieve higher translation performance. We demonstrate noisiness in the data and present our approach to carry out text preprocessing extensively. Evaluation was conducted using combined metrics: BLEU, chrF and TER. Statistical significance results with Bonferroni correction show surprisingly high baseline systems, and that Back-translation leads to significant improvement. Furthermore, we present a qualitative analysis of translation errors and system limitations.
In this paper, we present an approach for translating word embeddings from a majority language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings. To test our model, we annotated a small sentiment analysis corpus for the 4 endangered languages and Finnish. Our method reached at least 56\% accuracy for each endangered language. The models and the sentiment corpus will be released together with this paper. Our research shows that state-of-the-art neural models can be used with endangered languages with the only requirement being a dictionary between the endangered language and a majority language.
Regular expressions in an Automata Theory and Formal Languages course are mostly treated as a theoretical topic. That is, to some degree their mathematical properties and their role to describe languages is discussed. This approach fails to capture the interest of most Computer Science students. It is a missed opportunity to engage Computer Science students that are far more motivated by practical applications of theory. To this end, regular expressions may be discussed as the description of an algorithm to generate words in a language that is easily programmed. This article describes a programming-based methodology to introduce students to regular expressions in an Automata Theory and Formal Languages course. The language of instruction is FSM in which there is a regular expression type. Thus, facilitating the study of regular expressions and of algorithms based on regular expressions.
Ainayya Almira, Anisah Rachmawati, Insi Norma Jelita
et al.
The aim of this research is to provide insight to chemistry education teachers and researchers regarding the effectiveness of the guided inquiry learning model and provide direction for further research in this field. The research method used in this article is Systematic Literature Review (SLR), to help compile and evaluate various research related to the guided inquiry learning model. The instrument used in this research is to present the results of a literature review of various articles discussing the application of this model in chemistry learning by exploring the definition, application, strengths, weaknesses and effectiveness of the guided inquiry learning model in chemistry learning. The research results show that the application of this model can be carried out both in the theoretical and practical aspects of chemistry learning. The advantages of the guided inquiry model involve students actively, increase learning independence, and provide students with the opportunity to discuss and find their own answers. Students who study with this model tend to have higher learning achievements. However, there are also disadvantages, such as the time required to implement this model and obstacles in dealing with students who are not yet familiar with this approach.
Allama Muhammad Iqbal started his poetry from Urdu ode or amatory verses. But very soon he started his poetry in Persian. Iqbal realized that the skirt of Urdu language is very narrow for his ideas and thoughts. He recognized that Urdu is a young inexperienced language. Urdu is spoken, written and reading in a limited part of the subcontinent. On the other hand person is an old and experienced language of the world and is spoken, written and reading in a vast part of the Muslim world. Persian language keeps the most valuable assets of poetry and prose. The Persians odes of Allama Iqbal are found in Piyam-e-mashriq and Zaboor-e-Ajam. Some Urdu odes of Iqbal are found in Bang-e-Dara and Zarb-e-kaleem, but the most important odes are found in his famous book of Urdu poetry named as Bal-e-jibreel. This collection of poems by Allama Iqbal is very important, because what is clearly stated in his Persian odes, is what is indicated in Bal-e-Jibreel. The first part of Bal-e-Jibreel consists of ghazals. Essentially, these ghazals portray the same meaning that the Persian ghazals imply. However, the experimental writing of these ghazals, the sheer talent employed in this book are as climactic in terms of poetry.
Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing
Immanuel Kant written by Lucien Goldman is one of the most important books published during the 20th century. I explicate its various important aspects concentrating on its reproducing theme which is humanity, society, and their relationship with the universe within Kant’s philosophical thought. The relation of humans, society, and the universe is the most productive problem of modern philosophy; according to Goldman’s book, for Kant it is the most productive problem too. Goldman delineates how the relation of human, bourgeois society and the universe is the most focal point in Kant's philosophy, which has been pervaded throughout the chapters of the book, and I clarify it as well as Kant’s own account of the foregoing problem. Consequently, I recapitulate how most of the following philosophies are differentiated from this problem in addition to the next particular problems during the 19th and early 20th centuries.
Indo-Iranian languages and literature, General Works
Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the other languages. Multi-lingual BERT-based models (mBERT) are often used to transfer knowledge from high-resource languages to low-resource languages. Since these models are pre-trained with huge text corpora containing multiple languages, they typically learn language-agnostic embeddings for tokens from different languages. However, directly training an mBERT-based QA system for low-resource languages is challenging due to the paucity of training data. In this work, we augment the QA samples of the target language using translation and transliteration into other languages and use the augmented data to fine-tune an mBERT-based QA model, which is already pre-trained in English. Experiments on the Google ChAII dataset show that fine-tuning the mBERT model with translations from the same language family boosts the question-answering performance, whereas the performance degrades in the case of cross-language families. We further show that introducing a contrastive loss between the translated question-context feature pairs during the fine-tuning process, prevents such degradation with cross-lingual family translations and leads to marginal improvement. The code for this work is available at https://github.com/gokulkarthik/mucot.
Constantine Lignos, Nolan Holley, Chester Palen-Michel
et al.
In this position paper, we describe our perspective on how meaningful resources for lower-resourced languages should be developed in connection with the speakers of those languages. We first examine two massively multilingual resources in detail. We explore the contents of the names stored in Wikidata for a few lower-resourced languages and find that many of them are not in fact in the languages they claim to be and require non-trivial effort to correct. We discuss quality issues present in WikiAnn and evaluate whether it is a useful supplement to hand annotated data. We then discuss the importance of creating annotation for lower-resourced languages in a thoughtful and ethical way that includes the languages' speakers as part of the development process. We conclude with recommended guidelines for resource development.
Nick Trakakis, the contemporary philosopher of religion, in his book The End of the Philosophy of Religion, presents his idea on the relationship between metaphilosophy and methodology of the contemporary philosophy of religion in both analytical and continental traditions. Before entering into the controversial debates and issues of the philosophy of religion, he tries to understand the methods and goals pursued in the philosophical traditions, as well as to analyze and critique the strengths and weaknesses of these traditions. To achieve the above goal, Trakakis deals with the design and formulation of the distinction and similarity of philosophical styles in the two analytical and continental traditions. Trakakis examines the examples of thinkers from these two traditions and concludes that the religious philosophy in the analytical tradition seeks to assume a particular point of view in metaphilosophy and philosophical methodology, within the narrow confines on scientifically rationalized and logical arguments - independent of the concrete and existential concerns of human life. While introducing analytical key themes in his book The End of the Philosophy of Religion, this paper critique the Trakakis 's views from a methodological point of view and its relationship to metaphilosoph.
Indo-Iranian languages and literature, General Works
By the new foundation of philosophy and science, the concept of method gains the main status among many scholars of the new ages. The “forerunners” of philosophy and science sought the way out of the skepticism, that emerged from the endless controversies concerning the universals and the possibility of rational and philosophical knowledge about nature during the High and Late scholasticism, in the concept of “method”. As one of these founders, Francis Bacon has tried to define and describe his conception of the method. He makes use of the metaphors of ship, labyrinth, thread, and trial-court to put light on his account of scientific research. These metaphors, enable us to reconstruct the various features of his account of method and his expectations of applying method in the realm of natural sciences.
Indo-Iranian languages and literature, General Works
The article describes the royal cart burials excavated at the Late Harappan site of Sanauli near Delhi in the spring of 2018 on the basis of the available reports and photographs. The author then comments on these finds, dated to about 1900 bce, with the Sanauli cart burials being the first of their kind in Bronze Age India. In his opinion, several indications suggest that the Sanauli “chariots” are actually carts yoked to bulls, as in the copper sculpture of a bull-cart from the Late Harappan site of Daimabad in Maharashtra. The antennae-hilted swords associated with the burials suggest that these bull-carts are likely to have come from the BMAC or the Bactria and Margiana Archaeological Complex (c.2300–1500 bce) of southern Central Asia, from where thereis iconographic evidence of bull-carts. The ultimate source of the Sanauli/BMAC bull-carts may be the early phase of the Sintashta culture in the Trans-Urals, where the chariot (defined as a horse-drawn light vehicle with two spoked wheels) was most probably invented around the late twenty-first century bce. The invention presupposes an earlier experimental phase, which started with solid-wheeled carts that could only be pulled by bulls. An intermediate phase in the development is the “proto-chariot” with cross-bar wheels, attested in a BMAC-related cylinder seal from Tepe Hissar III B in northern Iran (c.2000–1900 bce). The wooden coffins of the Sanauli royal burials provide another pointer to a possible Sintashta origin. The Sanauli finds are considered in the context of the author’s archaeological model for the prehistory of the Indo-Iranian languages, which is adjusted to meet recent justified criticism.