Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.
Abstract Large Language Models (LLMs) have transformed natural language processing tasks successfully. Yet, their large size and high computational needs pose challenges for practical use, especially in resource-limited settings. Model compression has emerged as a key research area to address these challenges. This paper presents a survey of model compression techniques for LLMs. We cover methods like quantization, pruning, and knowledge distillation, highlighting recent advancements. We also discuss benchmarking strategies and evaluation metrics crucial for assessing compressed LLMs. This survey offers valuable insights for researchers and practitioners, aiming to enhance efficiency and real-world applicability of LLMs while laying a foundation for future advancements.
Abstract The extensive deployment of smart grids has heightened the demands for communication reliability. By leveraging rapidly advancing satellite communication technologies, these systems can enhance the smart grid’s coverage and robustness in network performance. The integration of drones with smart grids for transmission line inspections facilitates early detection and mitigation of faults and potential risks, thereby enhancing network resilience while reducing problem-solving costs. To address the inefficiencies, complexities, and high expenses associated with traditional manual inspections, a pre-clustering traveling salesman problem model is developed alongside an unmanned aerial vehicle inspection strategy based on an enhanced ant colony algorithm. This approach enables simultaneous safety inspections across critical areas, essential components, and transmission lines. The simulation experiment results indicate that the designed method greatly improves both convergence speed and effectiveness under multiple task requirements, particularly in large-scale scenarios.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Abstract Research on object detection methods (ODM) has increased over the past decades due to their practical implementations across various sectors. The growing demand for better ODM in real situations has catalysed its advancements in academic research and publications, making it challenging to track progress. Bibliometric analysis offers an effective method to summarise these advancements efficiently. It is valuable for visualising and identifying a comprehensive ODM research structure and overview. However, despite the high volume of ODM publications since 2014, bibliometric analyses in this field remain limited. Hence, this study analysed the ODM research landscape using bibliometric analysis, highlighting imperative materials for initial reference and emphasising the apparent ODM topics commonly discussed. The bibliometric data for this study was retrieved from the Web of Science database using a configured search query. VOSviewer software analysed the data collected with performance analysis and science mapping. The findings reveal that “Foundational Architectural and Data Processing Tasks of Object Detection Methods” is the most prominent ODM theme that employs statistical models within the detection framework. Additionally, this study suggests the integration of probabilistic inference approaches with ODM to quantify the prediction uncertainties. One of the probabilistic inference approaches, nonparametric predictive inference, potentially improves detection accuracy, which is another popular theme in ODM studies. This study also identifies autonomous detection applications as one of the emerging trends within the thematic clusters. These insights guide researchers who seek to navigate the evolving ODM research areas, particularly in contributing to ODM progress for more adaptable and efficient detections.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Uttam U. Deshpande, Supriya Shanbhag, Rudragoud Patil
et al.
Abstract Uncontrolled road traffic conditions are commonly seen in South Asian countries, which result in the majority of motorcycle accidents due to triple riding, and helmetless driving traffic violation incidents. Triple riding is a dangerous act that can result in serious legal consequences. Each rider should be aware of the stringent traffic safety regulations of helmet wear and triple-riding violations. Public safety can be improved by reducing the number of road accidents. To do this, these riders must be identified and prosecuted. With little assistance from humans, the automated traffic monitoring system can enforce rigorous adherence to traffic laws. The current methods are effective when applied to widely used datasets, like Kaggle and COCO, which offer a helpful research platform. However, it is difficult to obtain satisfactory detection accuracies because this dataset contains minimal triple-riding images and lacks the sensation of realistic traffic CCTV images obtained from specific heights and angles. We provide a real-time solution that employs surveillance cameras placed at various angles and heights to detect two-wheelers, identify the number of riders, and recognize the vehicle involved in this traffic violation. To address challenging environments like occlusions and precise vehicle detection from a long distance we use the ResNet18-based DetectNet_v2 model. To reliably predict triple riding from several riders sitting on a two-wheeler and extract license plate information, we employ a cutting-edge YOLOv8 object-detection algorithm that operates on the Darknet framework. After experiment analysis, we found that our proposed model demonstrated a promising triple-rider, two-wheeler, and numberplate detection accuracy of 91.42%, 98%, and 81% respectively under challenging situations.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Recent advances in deep neural networks (DNNs) have significantly improved various audio processing applications, including speech enhancement, synthesis, and hearing-aid algorithms. DNN-based closed-loop systems have gained popularity in these applications due to their robust performance and ability to adapt to diverse conditions. Despite their effectiveness, current DNN-based closed-loop systems often suffer from sound quality degradation caused by artifacts introduced by suboptimal sampling methods. To address this challenge, we introduce dCoNNear, a novel DNN architecture designed for seamless integration into closed-loop frameworks. This architecture specifically aims to prevent the generation of spurious artifacts-most notably tonal and aliasing artifacts arising from non-ideal sampling layers. We demonstrate the effectiveness of dCoNNear through a proof-of-principle example within a closed-loop framework that employs biophysically realistic models of auditory processing for both normal and hearing-impaired profiles to design personalized hearing-aid algorithms. We further validate the broader applicability and artifact-free performance of dCoNNear through speech-enhancement experiments, confirming its ability to improve perceptual sound quality without introducing architecture-induced artifacts. Our results show that dCoNNear not only accurately simulates all processing stages of existing non-DNN biophysical models but also significantly improves sound quality by eliminating audible artifacts in both hearing-aid and speech-enhancement applications. This study offers a robust, perceptually transparent closed-loop processing framework for high-fidelity audio applications.
In this paper, we introduce a combination of novel and exciting tasks: the solution and generation of linguistic puzzles. We focus on puzzles used in Linguistic Olympiads for high school students. We first extend the existing benchmark for the task of solving linguistic puzzles. We explore the use of Large Language Models (LLMs), including recent state-of-the-art models such as OpenAI's o1, for solving linguistic puzzles, analyzing their performance across various linguistic topics. We demonstrate that LLMs outperform humans on most puzzles types, except for those centered on writing systems, and for the understudied languages. We use the insights from puzzle-solving experiments to direct the novel task of puzzle generation. We believe that automating puzzle generation, even for relatively simple puzzles, holds promise for expanding interest in linguistics and introducing the field to a broader audience. This finding highlights the importance of linguistic puzzle generation as a research task: such puzzles can not only promote linguistics but also support the dissemination of knowledge about rare and understudied languages.
This work investigates the optimal allocation of inference compute across three key scaling factors in video vision language models: language model size, frame count, and the number of visual tokens per frame. While prior works typically focuses on optimizing model efficiency or improving performance without considering resource constraints, we instead identify optimal model configuration under fixed inference compute budgets. We conduct large-scale training sweeps and careful parametric modeling of task performance to identify the inference compute-optimal frontier. Our experiments reveal how task performance depends on scaling factors and finetuning data size, as well as how changes in data size shift the compute-optimal frontier. These findings translate to practical tips for selecting these scaling factors.
Farha Nausheen, Khandakar Ahmed, M Imad Khan
et al.
In recent developments, deep learning methodologies applied to Natural Language Processing (NLP) have revealed a paradox: They improve performance but demand considerable data and resources for their training. Alternatively, quantum computing exploits the principles of quantum mechanics to overcome the computational limitations of current methodologies, thereby establishing an emerging field known as quantum natural language processing (QNLP). This domain holds the potential to attain a quantum advantage in the processing of linguistic structures, surpassing classical models in both efficiency and accuracy. In this paper, it is proposed to categorise QNLP models based on quantum computing principles, architecture, and computational approaches. This paper attempts to provide a survey on how quantum meets language by mapping state-of-the-art in this area, embracing quantum encoding techniques for classical data, QNLP models for prevalent NLP tasks, and quantum optimisation techniques for hyper parameter tuning. The landscape of quantum computing approaches applied to various NLP tasks is summarised by showcasing the specific QNLP methods used, and the popularity of these methods is indicated by their count. From the findings, it is observed that QNLP approaches are still limited to small data sets, with only a few models explored extensively, and there is increasing interest in the application of quantum computing to natural language processing tasks.
In this paper we introduce the first effort to adapt large language models (LLMs) to the Ukrainian dialect (in our case Hutsul), a low-resource and morphologically complex dialect spoken in the Carpathian Highlands. We created a parallel corpus of 9852 dialect-to-standard Ukrainian sentence pairs and a dictionary of 7320 dialectal word mappings. We also addressed data shortage by proposing an advanced Retrieval-Augmented Generation (RAG) pipeline to generate synthetic parallel translation pairs, expanding the corpus with 52142 examples. We have fine-tuned multiple open-source LLMs using LoRA and evaluated them on a standard-to-dialect translation task, also comparing with few-shot GPT-4o translation. In the absence of human annotators, we adopt a multi-metric evaluation strategy combining BLEU, chrF++, TER, and LLM-based judgment (GPT-4o). The results show that even small(7B) finetuned models outperform zero-shot baselines such as GPT-4o across both automatic and LLM-evaluated metrics. All data, models, and code are publicly released at: https://github.com/woters/vuyko-hutsul
Résumé : Depuis l’Antiquité, le théâtre est un genre artistique et un moyen de communication qui permet de transmettre des valeurs culturelles, morales et sociales dans une tonalité comique, abordant des sujets importants de manière productive de génération en génération. Le présent article étudie la désacralisation de l’enfant dans la littérature postcoloniale. Eni questionne cette désacralisation pour dénoncer les avatars de la société postcoloniale à travers l’œuvre "Le Triomphe de Mukom" de Eni Mokube. L’auteur invite les victimes à exposer les auteurs de trafic, d'esclavage, de maltraitance et d’exploitation humaine. Nous allons examiner toutes les formes de trafics, de maltraitance, d’exploitation et d'esclavage des enfants de la région du Sud-Ouest vers le Littoral dans "The Trials of Mukom" de Eni Mokube. Notre hypothèse s'articule autour de la question suivante : dans quelle mesure ce phénomène d’exploitation des enfants est-il étroitement lié au développement personnel ? Pour éradiquer l’esclavage et le phénomène des enfants de la rue, sources d’insécurité, de terrorisme et d'autres maux, quelles stratégies et solutions peuvent être envisagées ? Pour cette analyse, nous avons utilisé la méthode thématique de Jean-Pierre Richard.
Mots-clés : trafic humain-esclavage- enfant de la rue-insécurité-terrorisme.
Arts in general, Computational linguistics. Natural language processing
Résumé : Wende signifie en langue mooré (une des langues nationales du Burkina Faso) Dieu. Chez les Mossé, Wende (Dieu) est présent dans tous les moindres détails de la vie. On le retrouve dans les contes, sous divers aspects, tantôt comme humain, et tantôt comme esprit. Le projet de cet article se donne pour objectif d’examiner le statut de Wende (Dieu) dans l’œuvre Contes du Burkina Faso (anciennement Haute Volta). Ce livre est un ensemble de contes mossé, recueillis et publiés par Louis TAUXIER, depuis le temps de la Haute Volta, et publié en 1986. Cette étude qui se fait dans le cadre de la sémiotique, veut dresser une typologie de l’actant Wende, sous des apparitions humaines et divines.
Mots-clefs : « Wende » ; « Contes du Burkina Faso » ; « Statut sémiotique » ; « actant » ; « mystagogue ».
Arts in general, Computational linguistics. Natural language processing
The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-facing applications. In order to reap the benefits of this technology and reduce potential harms, it is important to quantify the reliability of model predictions and the uncertainties that shroud their development. This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective, and how it can be reduced and quantified through the design of the experimental pipeline. We further explore uncertainty quantification in modeling by theoretically and empirically investigating the effect of inductive model biases in text classification tasks. The corresponding experiments include data for three different languages (Danish, English and Finnish) and tasks as well as a large set of different uncertainty quantification approaches. Additionally, we propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction, which provides tighter token sets with better coverage of the actual continuation. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors, where the confidence is predicted from the input to and generated output text of the target model alone.
Tingting Hong, Xiaohui Huang, Guangjian Chen
et al.
Abstract Green Infrastructure (GI) has garnered increasing attention from various regions due to its potential to mitigate urban heat island (UHI), which has been exacerbated by global climate change. This study focuses on the central area of Fuzhou city, one of the “furnace” cities, and aims to explore the correlation between the GI pattern and land surface temperature (LST) in the spring and autumn seasons. The research adopts a multiscale approach, starting from the urban scale and using urban geographic spatial characteristics, multispectral remote sensing data, and morphological spatial pattern analysis (MSPA). Significant MSPA elements were tested and combined with LST to conduct a geographic weighted regression (GWR) experiment. The findings reveal that the UHI in the central area of Fuzhou city has a spatial characteristic of “high temperature in the middle and low temperature around,” which is coupled with a “central scattered and peripheral concentrated” distribution of GI. This suggests that remote sensing data can effectively be utilised for UHI inversion. Additionally, the study finds that the complexity of GI, whether from the perspective of the overall GI pattern or the classification study based on the proportion of the core area, has an impact on the alleviation of UHI in both seasons. In conclusion, this study underscores the importance of a reasonable layout of urban green infrastructure for mitigating UHI.
Computational linguistics. Natural language processing, Computer software
Liliane Surprise OKOME ENGOUANG Epouse NZESSEU & Mathurin OVONO EBE
Résumé : Œuvre de Trifonia Melibea Obono, La bastarda (2016) est essentiellement une traduction en espagnol de données sociologiques et anthropologiques fàŋ vers l'espagnol de Guinée-Équatoriale. En 2020, ce roman a été traduit de l'espagnol au français par Anne-Laure Bonvalot, ce qui contribue à accroître sa visibilité car elle permet aux multiples lecteurs de pénétrer cet univers socioculturel subsaharien. Cette étude n'est pas une critique de la traduction de l'œuvre dans son ensemble, mais celle des titres en espagnol La bastarda et français La bâtarde. L’étude consiste donc en un essai titrologique contrastif dans le domaine de l'ethnotraductologie qui convoque convoque l’horizon d’attente de la théorie de la réception. Il nous intéresse de savoir si, à travers ces deux titres, la société du roman est caractéristique de la société de référence ou si les deux auteurs naviguent à contre-courant de la pensée fàŋ.
Mots-clés : Traduction – Culture – Titrologie - La bastarda - Guinée équatoriale.
Arts in general, Computational linguistics. Natural language processing
Kyivan Rus’ had extensive political, economic and cultural connections with other European states. Knowledge of foreign languages, the Latin language in particular, was in demand to maintain these connections. The article outlines the context in which the Latin literature in medieval Kyiv emerged and also the spheres where the Latin language was used. The history of one ruling family, Prince Iziaslav of Kyiv, Princess Gertruda of Kyiv, their son Prince Yaropolk and daughter-in-law Cunigunda, is preserved in texts and artefacts. Primary and secondary sources as well as the sphragistic data, related to international contacts of the family with Pope Gregory VII, the Papal legates, Duke Bolesław II the Bold of Poland and King of Germany Heinrich IV, provide facts of usage of Latin by the Kyivan royals. The article analyses usage of Latin in foreign relations of Kyiv and in literature of the second half of the 11th century. The Latin language was used in Kyivan Rus’ in the second half of the 11th century in literature (prayers, religious poetry and chants), votive inscriptions, in administration (seals) and ecclesiastic and foreign correspondence.
Discourse analysis, Computational linguistics. Natural language processing
Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks. For completing the complex task, we still need a plan for the task to guide LLMs to generate the specific solutions step by step. LLMs can directly generate task plans, but these plans may still contain factual errors or are incomplete. A high-quality task plan contains correct step-by-step solutions for solving all situations and behavioral instructions for avoiding mistakes. To obtain it, we propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback. (2) In the subsequent test phase, the LLM uses the learned task plan to guide the inference of LLM on the test set. We demonstrate the effectiveness of our method on the five different reasoning type tasks (8 datasets). Further, our analysis experiment shows that the task plan learned by one LLM can directly guide another LLM to improve its performance, which reveals a new transfer learning paradigm. We release the code at \url{https://github.com/Eureka6174/LearnNLPlan}
Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references. We conclude that the performance of today's LLMs can augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans.
The age of social media is rife with memes. Understanding and detecting harmful memes pose a significant challenge due to their implicit meaning that is not explicitly conveyed through the surface text and image. However, existing harmful meme detection approaches only recognize superficial harm-indicative signals in an end-to-end classification manner but ignore in-depth cognition of the meme text and image. In this paper, we attempt to detect harmful memes based on advanced reasoning over the interplay of multimodal information in memes. Inspired by the success of Large Language Models (LLMs) on complex reasoning, we first conduct abductive reasoning with LLMs. Then we propose a novel generative framework to learn reasonable thoughts from LLMs for better multimodal fusion and lightweight fine-tuning, which consists of two training stages: 1) Distill multimodal reasoning knowledge from LLMs; and 2) Fine-tune the generative framework to infer harmfulness. Extensive experiments conducted on three meme datasets demonstrate that our proposed approach achieves superior performance than state-of-the-art methods on the harmful meme detection task.