Transformers are emerging as the new workhorse of NLP, showing great success across tasks. Unlike LSTMs, transformers process input sequences entirely through self-attention. Previous work has suggested that the computational capabilities of self-attention to process hierarchical structures are limited. In this work, we mathematically investigate the computational power of self-attention to model formal languages. Across both soft and hard attention, we show strong theoretical limitations of the computational abilities of self-attention, finding that it cannot model periodic finite-state languages, nor hierarchical structure, unless the number of layers or heads increases with input length. These limitations seem surprising given the practical success of self-attention and the prominent role assigned to hierarchical structure in linguistics, suggesting that natural language can be approximated well with models that are too weak for the formal languages typically assumed in theoretical linguistics.
Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of insights derived from text mining. Past methodological articles have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent. Furthermore, primary studies use and report different preprocessing techniques. To address this, we conduct two complementary reviews of computational linguistics and organizational text mining research to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted (i.e., open or closed vocabulary), the research question under investigation, and the data set’s characteristics (i.e., corpus size and average document length). Notably, deviations from these recommendations will be appropriate and, at times, necessary due to the unique characteristics of one’s text data. We also provide recommendations for reporting text mining to promote transparency and reproducibility.
Abstract Libraries, as centers of knowledge dissemination, require mobile robots to navigate dense bookshelves and dynamic pedestrian flows in their intelligent transformation. To address the unique challenges of library environments, this study develops an optimized DWA-RRT fusion algorithm for path planning. The method optimizes obstacle avoidance via node sampling, RRT global guidance, and force field principles to maintain safe distances. In multi-robot spatiotemporal trajectory planning, a spatiotemporal information fusion framework is constructed, combined with the gated recurrent unit-convolutional neural network to extract spatiotemporal features. Moreover, the graph attention network is improved to optimize communication. Experimental results showed that compared with the traditional dynamic window approach. The method increased average distances to static and dynamic obstacles to 0.61 m and 1.85 m, reduced path adjustment frequency, and smoothed linear speed fluctuations. The multi-robot collaborative task completion rate reached 89.76% to 98.62%, the average traffic time was shortened by 25 s to 86 s compared with the comparison method, and the number of collision risks was only 1/3–1/5 of the comparison method. It was suitable for different crowd density scenarios. This technology demonstrates potential to improve the robot’s adaptability and safety in the complex library environment through collaborative optimization of single-machine and multi-robot spatiotemporal trajectory planning. This suggests a promising approach that may balance safety and efficiency for robot autonomous navigation in library smart services and similar indoor service scenarios.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Résumé : L’analyse sémiotique des tresses traditionnelles Dafing explore les significations culturelles et sociales que ces coiffures véhiculent. Les tresses Dafing ne se limitent pas à être de simples ornements capillaires ; elles constituent des moyens de communication visuelle, intégrant des éléments d’identité culturelle, de statut social, de spiritualité et d'esthétique. Ces tresses traditionnelles sont un marqueur d’appartenance au groupe Dafing et reflètent un héritage culturel transmis de génération en génération. Les styles de tresses varient selon les étapes de la vie (mariage, initiation, deuil, etc.) ou le statut social. Chaque motif ou chaque forme transmet des informations spécifiques sur la personne qui les porte. Les coiffures traditionnelles allient des considérations esthétiques (harmonie des motifs et des lignes) à des fonctions pratiques, comme la protection des cheveux contre les intempéries. La sémiotique des tresses traditionnelles Dafing examine les motifs comme des signes porteurs de sens, dans un système structuré où chaque détail (longueur, disposition, accessoires) communique un message.
Mots-clés : sémiotique, tresse traditionnelle, Dafing, coiffure africaine
Arts in general, Computational linguistics. Natural language processing
Eunice Samwel, George Obara Nyandoro, Bwocha Nyagemi
et al.
This study investigates how The Lion Guard visually constructs pride as a constructive virtue, challenging the dominant view of pride as a destructive vice. Drawing on Panofsky’s pre‑iconographic methodology, the study identifies and interprets visual markers that encode pride at the level of form before symbolic meaning emerges. A systematic analysis of selected episodes (Seasons 1–3) reveals three interrelated dimensions: (1) Self‑Image, wherein characters’ postures, coloration, and personal artifacts signal individual dignity and self‑respect; (2) Collective Existence, which depicts communal rituals and shared spaces that foster group cohesion, cultural continuity, and hierarchical stability; and (3) Existential Ecology, which links pride to the stewardship of the Pride Lands through recurring motifs such as Pride Rock, the Lion Guard’s lair, and the “Circle of Life” narrative. These visual strategies demonstrate that pride functions as a foundational moral force that sustains both personal agency and ecological balance. The findings contribute to media‑ecocritical scholarship by illustrating how animated texts can revalorize traditionally negative virtues, offering a nuanced model for future analyses of ethical representation in children’s animation.
Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing
ABSTRACT Point of interest (POI) recommendation analyses user preferences through historical check‐in data. However, existing POI recommendation methods often overlook the influence of weather information and face the challenge of sparse historical data for individual users. To address these issues, this paper proposes a new paradigm, namely temporal‐weather‐aware transition pattern for POI recommendation (TWTransNet). This paradigm is designed to capture user transition patterns under different times and weather conditions. Additionally, we introduce the construction of a user‐POI interaction graph to alleviate the problem of sparse historical data for individual users. Furthermore, when predicting user interests by aggregating graph information, some POIs may not be suitable for visitation under current weather conditions. To account for this, we propose an attention mechanism to filter POI neighbours when aggregating information from the graph, considering the impact of weather and time. Empirical results on two real‐world datasets demonstrate the superior performance of our proposed method, showing a substantial improvement of 6.91%–23.31% in terms of prediction accuracy.
Computational linguistics. Natural language processing, Computer software
Abstract : It is no longer possible to assert the existence of absolute certainty in the interpretation of things (interpretation of texts, interpretation of existence in general), especially with postmodern thinking (with hermeneutics, deconstruction and other approaches), as it has become useless to provide final or neutral interpretations, because turning to the past (the past of the text, or the past of the author) is only to the extent that it makes us understand the present based on the previous conception, and we must then go beyond this framework to a broader horizon than it, since time does not repeat itself twice: we mean by that the time of writing, and because the horizon of the interpreter is constantly changing, and that experiences are ultimately living and continuous experiences - including the experiences of readers and interpreters - and that the coming of "writing" is the coming of "play", and therefore, the text has been brought down from its height - from its sacred rank - to become an endless game of meanings.
Keywords: text, meaning, original author, reader, interpreter.
Arts in general, Computational linguistics. Natural language processing
This article explores the concept of mystical love in Arabic Muslim literature with special reference to Ibn al-‘Arabi’s ‘Tarjuman al-Ashwaq’. It highlights how Muslim literature, rooted in the Qur’an and Hadith, emphasizes truth, spirituality, and human refinement. Unlike secular literary traditions, Muslim poets and writers fuse artistic expression with moral and spiritual values, making literature a means of personality formation and social development. Within this framework, Ibn al-‘Arabi emerges as a towering figure who profoundly shaped Islamic mystical thought. Born in Andalusia, he is revered as al-Shaykh al-Akbar for his influential writings on Sufism and philosophy, including ‘Al-Futuhat al-Makkiyya’, ‘Fusus al-Hikam’, and ‘Tarjuman al-Ashwaq’. His doctrine of wahdat al-wujud (Unity of Being) interprets all existence as manifestations of ‘ Tarjuman al-Ashwaq’, Ibn al-‘Arabi presents his mystical experiences through symbolic and allegorical love poetry. Although critics accused him of romanticism, he clarified that his verses veil profound spiritual truths expressed through the language of love. Ultimately, his conception of love transcends the personal and embodies the eternal quest for Divine proximity, shaping Muslim literary expression and the symbolic tradition of the Urdu ghazal.
Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing
The development of artificial intelligence (AI) and the increased curiosity about how large language models (LLMs) may maximize an organization's opportunities and the ethical implications of LLMs, such as the ability to generate human-like text, give rise to concerns regarding disinformation and fake news. As a result, it is crucial to develop evaluation benchmarks that take into account the social and ethical implications involved. The great challenges of LLMs lack awareness of their own limitations, yet they persist in producing responses to the best of their capabilities. This often results in seemingly plausible but ultimately incorrect answers, posing challenges to the implementation of reliable generative AI in industry. This paper aims to delve into the evaluation metrics of machine-learning models' performance, specifically focusing on LLM. Therefore, bibliometric analysis utilized to explore and analyze various techniques and methods used in evaluating large language models. Additionally, it sheds light on the specific areas of focus when evaluating these models. The results show that natural language processing systems, classification of information, and computational linguistics are some of the techniques used to evaluate large language models. This work paves the way for future investigations employing extensive language models.
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.
The gender identification of authors in literary texts is a compelling research area at the intersection of computational linguistics and natural language processing, offering insights into historical biases and socio-cultural dynamics while enriching our understanding of literary traditions. This study is inspired by the historical context of women adopting male pseudonyms to navigate a male-dominated literary domain. By leveraging machine learning and state-of-the-art language models, we investigate the feasibility and accuracy of inferring an author’s gender from their writings. Our key contributions include (1) the creation of a large-scale, diverse dataset of literary texts spanning various literary epochs and (2) the evaluation of multiple classification models. Our experiments reveal that the best-performing model achieves an accuracy above <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>90</mn><mo>%</mo></mrow></semantics></math></inline-formula>, highlighting the potential of computational methods to uncover stylistic and linguistic markers tied to gender. These findings open avenues for further research into stylistic and linguistic patterns across literary history and their relationship to authorial identity.
Abstract : This study emphasizes the centrality of education in the reformist movement of Algerian Muslim Scholars Association in Algeria, led by figures like Abdehamid Ben Badis. They believed education to be the cornerstone of social awareness and vigilance against colonial influence. Ben Badis dedicated himself to spreading this message through diverse means, including mosques, schools, clubs, and newspapers, aiming to fight ignorance and superstition. His efforts inspired a generation that patiently and persistently worked to expose French colonial schemes. The ultimate goal of these scholars was to cultivate a generation capable of achieving a powerful renaissance rooted in their faith. However, they recognized the obstacles posed by colonial authorities who sought to maintain control. The Algerian Muslim Scholars Association, therefore, called upon all reformists to unite and resist colonial efforts that undermined their national identity, beliefs, and cultural heritage. In essence, education became a weapon of empowerment and cultural preservation in the face of colonial oppression.
Keywords: Confrontation; Algeria; Association; Abdelhamid Ben Badis; French-colonialism.
Arts in general, Computational linguistics. Natural language processing
Saima Malik-Moraleda, Maya Taliaferro, S. Shannon
et al.
What constitutes a language? Natural languages share features with other domains: from math, to music, to gesture. However, the brain mechanisms that process linguistic input are highly specialized, showing little response to diverse non-linguistic tasks. Here, we examine constructed languages (conlangs) to ask whether they draw on the same neural mechanisms as natural languages, or whether they instead pattern with domains like math and programming languages. Using individual-subject fMRI analyses, we show that understanding conlangs recruits the same brain areas as natural language comprehension. This result holds for Esperanto (n=19 speakers) and four fictional conlangs (Klingon (n=10), Na’vi (n=9), High Valyrian (n=3), and Dothraki (n=3)). These findings suggest that conlangs and natural languages share critical features that allow them to draw on the same representations and computations, implemented in the left-lateralized network of brain areas. The features of conlangs that differentiate them from natural languages—including recent creation by a single individual, often for an esoteric purpose, the small number of speakers, and the fact that these languages are typically learned in adulthood— appear to not be consequential for the reliance on the same cognitive and neural mechanisms. We argue that the critical shared feature of conlangs and natural languages is that they are symbolic systems capable of expressing an open-ended range of meanings about our outer and inner worlds. Significance Statement What constitutes a language has been of interest to diverse disciplines – from philosophy and linguistics to psychology, anthropology, and sociology. An empirical approach is to test whether the system in question recruits the brain system that processes natural languages. In spite of their similarity to natural languages, math and programming languages recruit a distinct brain system. Using fMRI, we test brain responses to stimuli not previously investigated—constructed languages (conlangs)—and find that they are processed by the same brain network as natural languages. Thus, an ability for a symbolic system to express diverse meanings about the world— but not the recency, manner, and purpose of its creation, or a large user base—is a defining characteristic of a language.
Résumé : Le concept « état de droit » entraîne une multitude de compréhension ou une incompréhension chez certains congolais. Cette situation est l’objet de fissure communicationnelle. L’objectif du présent article est de clarifier cette notion et d’en faciliter la compréhension chez les populations. Pour ce faire, nous mettons en place une communication pouvant expliciter cette grande question de l’état de droit en RDC pour rompre la fissure existante. En utilisant l’Ethnométhodologie pour cette recherche entant qu’étude de raisonnement pratique de sens commun dans des situations courantes d’action et la théorie de la communication de Jean-Luc Michel avec le premier théorème de Shannon de l’efficacité maximale d’un code (binaire)
Les mots-clés : Fissure, communication, compréhension, état, droit
Arts in general, Computational linguistics. Natural language processing
Résumé : Cette étude montre que P. J. Hountondji et B. Ndoye considèrent qu’une philosophie africaine authentique doit, en s’insurgeant contre les malentendus ethnophilosophiques ainsi que les préjugés européocentriques et afrocentriques, répondre aux exigences de penser, sans complaisance, les problèmes et les préoccupations de l’Afrique et de contribuer à la construction d’un vrai universel nanti des différences. Elle montre également que, pour P. J. Hountondji et B. Ndoye, la prise en charge de la première exigence contribue à la réalisation de la seconde. Autant dire que, selon ces philosophes béninois et sénégalais, l’intention philosophie qui doit se déployer en philosophie africaine – comme dans toute philosophie, ainsi que le suggère E. Husserl, – est la construction d’un vrai universel par le détour du monde environnant ou le retour aux choses mêmes. Ce qui constitue, pour les spécialistes de philosophie africaine, une invite à penser l’Afrique-monde.
Mots-clés : Philosophie africaine, monde environnant, universel, ethnophilosophie, anti-ethnophilosophie.
Arts in general, Computational linguistics. Natural language processing
Mostafa Abdou, Ana Valeria González, Mariya Toneva
et al.
Neuroscientists evaluate deep neural networks for natural language processing as possible candidate models for how language is processed in the brain. These models are often trained without explicit linguistic supervision, but have been shown to learn some linguistic structure in the absence of such supervision (Manning et al., 2020), potentially questioning the relevance of symbolic linguistic theories in modeling such cognitive processes (Warstadt and Bowman, 2020). We evaluate across two fMRI datasets whether language models align better with brain recordings, if their attention is biased by annotations from syntactic or semantic formalisms. Using structure from dependency or minimal recursion semantic annotations, we find alignments improve significantly for one of the datasets. For another dataset, we see more mixed results. We present an extensive analysis of these results. Our proposed approach enables the evaluation of more targeted hypotheses about the composition of meaning in the brain, expanding the range of possible scientific inferences a neuroscientist could make, and opens up new opportunities for cross-pollination between computational neuroscience and linguistics.