Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures in natural language processing, with a focus on those designed to work with vector representations of the textual data. We propose a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the input and/or output. We present the examples of how prior information can be exploited in attention models and discuss ongoing research efforts and open challenges in the area, providing the first extensive categorization of the vast body of literature in this exciting domain.
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Accurate estimates of the burden of antimicrobial resistance (AMR) are needed to establish the magnitude of this global threat in terms of both health and cost, and to paramaterise cost-effectiveness evaluations of interventions aiming to tackle the problem. This review aimed to establish the alternative methodologies used in estimating AMR burden in order to appraise the current evidence base. MEDLINE, EMBASE, Scopus, EconLit, PubMed and grey literature were searched. English language studies evaluating the impact of AMR (from any microbe) on patient, payer/provider and economic burden published between January 2013 and December 2015 were included. Independent screening of title/abstracts followed by full texts was performed using pre-specified criteria. A study quality score (from zero to one) was derived using Newcastle-Ottawa and Philips checklists. Extracted study data were used to compare study method and resulting burden estimate, according to perspective. Monetary costs were converted into 2013 USD. Out of 5187 unique retrievals, 214 studies were included. One hundred eighty-seven studies estimated patient health, 75 studies estimated payer/provider and 11 studies estimated economic burden. 64% of included studies were single centre. The majority of studies estimating patient or provider/payer burden used regression techniques. 48% of studies estimating mortality burden found a significant impact from resistance, excess healthcare system costs ranged from non-significance to $1 billion per year, whilst economic burden ranged from $21,832 per case to over $3 trillion in GDP loss. Median quality scores (interquartile range) for patient, payer/provider and economic burden studies were 0.67 (0.56-0.67), 0.56 (0.46-0.67) and 0.53 (0.44-0.60) respectively. This study highlights what methodological assumptions and biases can occur dependent on chosen outcome and perspective. Currently, there is considerable variability in burden estimates, which can lead in-turn to inaccurate intervention evaluations and poor policy/investment decisions. Future research should utilise the recommendations presented in this review. This systematic review is registered with PROSPERO (PROSPERO CRD42016037510).
Abhik Bhattacharjee, Tahmid Hasan, Kazi Samin Mubasshir
et al.
In this work, we introduce BanglaBERT, a BERT-based Natural Language Understanding (NLU) model pretrained in Bangla, a widely spoken yet low-resource language in the NLP literature. To pretrain BanglaBERT, we collect 27.5 GB of Bangla pretraining data (dubbed `Bangla2B+') by crawling 110 popular Bangla sites. We introduce two downstream task datasets on natural language inference and question answering and benchmark on four diverse NLU tasks covering text classification, sequence labeling, and span prediction. In the process, we bring them under the first-ever Bangla Language Understanding Benchmark (BLUB). BanglaBERT achieves state-of-the-art results outperforming multilingual and monolingual models. We are making the models, datasets, and a leaderboard publicly available at https://github.com/csebuetnlp/banglabert to advance Bangla NLP.
Badr AlKhamissi, Millicent Li, Asli Celikyilmaz
et al.
Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major advantage over traditional KBs in that this method requires no human supervision. In this paper, we present a set of aspects that we deem a LM should have to fully act as a KB, and review the recent literature with respect to those aspects.
Language models (LMs) are trained on collections of documents, written by individual human agents to achieve specific goals in an outside world. During training, LMs have access only to text of these documents, with no direct evidence of the internal states of the agents that produced them -- a fact often used to argue that LMs are incapable of modeling goal-directed aspects of human language production and comprehension. Can LMs trained on text learn anything at all about the relationship between language and use? I argue that LMs are models of intentional communication in a specific, narrow sense. When performing next word prediction given a textual context, an LM can infer and represent properties of an agent likely to have produced that context. These representations can in turn influence subsequent LM generation in the same way that agents' communicative intentions influence their language. I survey findings from the recent literature showing that -- even in today's non-robust and error-prone models -- LMs infer and use representations of fine-grained communicative intentions and more abstract beliefs and goals. Despite the limited nature of their training data, they can thus serve as building blocks for systems that communicate and act intentionally.
NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
Paper in, product out A typical chemist running a known reaction will start by finding the method described in a published paper. Mehr et al. report a software platform that uses natural language processing to translate the organic chemistry literature directly into editable code, which in turn can be compiled to drive automated synthesis of the compound in the laboratory. The synthesis procedure is intended to be universally applicable to robotic systems operating in a batch reaction architecture. The full process is demonstrated for synthesis of an analgesic as well as common oxidizing and fluorinating agents. Science, this issue p. 101 A software platform translates the organic chemistry literature into a format executable by automated laboratory apparatus. Robotic systems for chemical synthesis are growing in popularity but can be difficult to run and maintain because of the lack of a standard operating system or capacity for direct access to the literature through natural language processing. Here we show an extendable chemical execution architecture that can be populated by automatically reading the literature, leading to a universal autonomous workflow. The robotic synthesis code can be corrected in natural language without any programming knowledge and, because of the standard, is hardware independent. This chemical code can then be combined with a graph describing the hardware modules and compiled into platform-specific, low-level robotic instructions for execution. We showcase automated syntheses of 12 compounds from the literature, including the analgesic lidocaine, the Dess-Martin periodinane oxidation reagent, and the fluorinating agent AlkylFluor.
Raj Sanjay Shah, Kunal Chawla, Dheeraj Eidnani
et al.
Pre-trained language models have shown impressive performance on a variety of tasks and domains. Previous research on financial language models usually employs a generic training scheme to train standard model architectures, without completely leveraging the richness of the financial data. We propose a novel domain specific Financial LANGuage model (FLANG) which uses financial keywords and phrases for better masking, together with span boundary objective and in-filing objective. Additionally, the evaluation benchmarks in the field have been limited. To this end, we contribute the Financial Language Understanding Evaluation (FLUE), an open-source comprehensive suite of benchmarks for the financial domain. These include new benchmarks across 5 NLP tasks in financial domain as well as common benchmarks used in the previous research. Experiments on these benchmarks suggest that our model outperforms those in prior literature on a variety of NLP tasks. Our models, code and benchmark data will be made publicly available on Github and Huggingface.
Deep learning has been widely applied in computer vision, natural language processing, and audio-visual recognition. The overwhelming success of deep learning as a data processing technique has sparked the interest of the research community. Given the proliferation of Fintech in recent years, the use of deep learning in finance and banking services has become prevalent. However, a detailed survey of the applications of deep learning in finance and banking is lacking in the existing literature. This study surveys and analyzes the literature on the application of deep learning models in the key finance and banking domains to provide a systematic evaluation of the model preprocessing, input data, and model evaluation. Finally, we discuss three aspects that could affect the outcomes of financial deep learning models. This study provides academics and practitioners with insight and direction on the state-of-the-art of the application of deep learning models in finance and banking.
Allan de Barcelos Silva, M. M. Gomes, C. Costa
et al.
Abstract Natural Language Interfaces allow human-computer interaction through the translation of human intention into devices’ control commands, analyzing the user’s speech or gestures. This novel interaction mode arises from advancements of artificial intelligence, expert systems, speech recognition, semantic web, dialog systems, and natural language processing, bringing the concept of Intelligent Personal Assistant (IPA). There is currently a vast literature on this subject. However, in the best of our knowledge, there is no thorough analysis of the state-of-the-art in the field. In this context, we present in this article a survey of the field, discussing the main trends, critical areas, and challenges of an IPA. Another contribution is the proposition of a taxonomy for IPA classification. The method used to achieve these objectives consisted of a systematic literature review based on the population, intervention, comparison, outcome, and context (PICOC) criteria. As a result, we started from more than 3472 scientific articles published in the last six years, searched on a set of databases chosen to increase the probability of finding highly relevant articles. The review selected the 58 most significant articles, identifying challenges and open questions. We also discuss in the article the current status, usage, security and privacy issues, types, and architectures regarding an IPA. We conclude that usability, security, and privacy directly affect the confidence of the user in adopting an IPA.
Multimodal research has predominantly focused on single-image reasoning, with limited exploration of multi-image scenarios. Recent models have sought to enhance multi-image understanding through large-scale pretraining on interleaved image-text datasets. However, most Vision-Language Models (VLMs) are trained primarily on English datasets, leading to inadequate representation of Indian languages. To address this gap, we introduce the Chitrakshara dataset series, covering 11 Indian languages sourced from Common Crawl. It comprises (1) Chitrakshara-IL, a large-scale interleaved pretraining dataset with 193M images, 30B text tokens, and 50M multilingual documents, and (2) Chitrakshara-Cap, which includes 44M image-text pairs with 733M tokens. This paper details the data collection pipeline, including curation, filtering, and processing methodologies. Additionally, we present a comprehensive quality and diversity analysis to assess the dataset's representativeness across Indic languages and its potential for developing more culturally inclusive VLMs.
Eneko Valero, Maria Ribalta i Albado, Oscar Sainz
et al.
Large Language Models (LLMs) remain heavily centered on English, with limited performance in low-resource languages. Existing adaptation approaches, such as continual pre-training, demand significant computational resources. In the case of instructed models, high-quality instruction data is also required, both of which are often inaccessible for low-resource language communities. Under these constraints, model merging offers a lightweight alternative, but its potential in low-resource contexts has not been systematically explored. In this work, we explore whether it is possible to transfer language knowledge to an instruction-tuned LLM by merging it with a language-specific base model, thereby eliminating the need of language-specific instructions and repeated fine-tuning processes whenever stronger instructed variants become available. Through experiments covering four Iberian languages (Basque, Catalan, Galician, and Spanish) and two model families, we show that merging enables effective instruction following behavior in new languages and even supports multilingual capability through the combination of multiple language-specific models. Our results indicate that model merging is a viable and efficient alternative to traditional adaptation methods for low-resource languages, achieving competitive performance while greatly reducing computational cost.
Marco Antônio Calijorne Soares, Fernando Silva Parreiras
Abstract Background Question Answering (QA) systems enable users to retrieve exact answers for questions posed in natural language. Objective This study aims at identifying QA techniques, tools and systems, as well as the metrics and indicators used to measure these approaches for QA systems and also to determine how the relationship between Question Answering and natural language processing is built. Method The method adopted was a Systematic Literature Review of studies published from 2000 to 2017. Results 130 out of 1842 papers have been identified as describing a QA approach developed and evaluated with different techniques. Conclusion Question Answering researchers have concentrated their efforts in natural language processing, knowledge base and information retrieval paradigms. Most of the researches focused on open domain. Regarding the metrics used to evaluate the approaches, Precision and Recall are the most addressed.
Feedback plays an important role in language learning. Feedback-seeking be-havior (FSB) includes feedback monitoring and inquiry and the diagnostic in-formation obtained through FSB can help seekers improve their performance. Most of the previous studies have explored the factors that influence FSB, such as language mindsets and motivational factors. However, FSB itself has an im-portant role to play in second language writing. Therefore, this study tries to combine FSB with second language writing to investigate the following ques-tions: 1. What is the role of FSB in second language writing? 2. How can FSB ex-ert its influence in second language writing? This research has selected 20 sen-ior students from science class to join in an semi-structural interview and ques-tionnaire. The results reveal that monitoring feedback unconditionally but in-quiring feedback conditionally. Herein, implications for L2 writing pedagogy are provided.
В статье представлен анализ необычных вакансий, появившихся на рынке труда с 2023 года. Особое внимание уделено изменениям в экономике, в индустриях, в социокультурной среде, которые повлекли за собой появление новых вакансий. Все закономерности детально рассмотрены по блокам: дефицит кадров, технологическая трансформация, развитие well-ness и well-being культур, новый уровень персонализированного сервиса и нишевизация. Присутствует анализ каждого блока, рассмотрены вакансии, которые появились исходя из описанных изменений. Выводы базируются из проделанного анализа всех закономерностей и заключается в том, что текущая ситуация на рынке труда результат экономических, индустриальных и социокультурных изменений, которые влекут за собой использование креативных стратегий для привлечения сотрудников.
Large language models exhibit strong multilingual capabilities despite limited exposure to non-English data. Prior studies show that English-centric large language models map multilingual content into English-aligned representations at intermediate layers and then project them back into target-language token spaces in the final layer. From this observation, we hypothesize that this cross-lingual transition is governed by a small and sparse set of dimensions, which occur at consistent indices across the intermediate to final layers. Building on this insight, we introduce a simple, training-free method to identify and manipulate these dimensions, requiring only as few as 50 sentences of either parallel or monolingual data. Experiments on a multilingual generation control task reveal the interpretability of these dimensions, demonstrating that the interventions in these dimensions can switch the output language while preserving semantic content, and that it surpasses the performance of prior neuron-based approaches at a substantially lower cost.