Oleksandr Stroyuk, Oleksandra Raievska, Sachin Kinge
et al.
High-throughput compositional screening of Cs 3 M 2 X 9 compounds based on anion exchanges yielded one hundred single-phase products with independently tunable compositions of M = Bi, Sb, and Bi + Sb and X = Cl, Cl + Br, Br, Br + I, I, and Cl + Br + I.
This work aims to delve deeper into prompt-based event argument extraction (EAE) models. We explore the impact of incorporating various types of information into the prompt on model performance, including trigger, other role arguments for the same event, and role arguments across multiple events within the same document. Further, we provide the best possible performance that the prompt-based EAE model can attain and demonstrate such models can be further optimized from the perspective of the training objective. Experiments are carried out on three small language models and two large language models in RAMS.
Akseli Reunamo, Laura-Maria Peltonen, Hans Moen
et al.
This paper reports on pretraining ModernBERT encoder models in six different sizes, ranging from 51M to 475M parameters, with a focus on limited multilingualism, emphasizing languages relevant to Finland. Our models are competitive with, or superior to, existing multilingual models. They outperform monolingual models on tasks that require a context longer than 512 tokens. We present empirical results on using different data in the final stage of training. The code and models are publicly released.
Johan J. Bolhuis, Andrea Moro, Stephen Crain
et al.
Large Language Models are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language is underpinned by a mind-internal computational system that recursively generates hierarchical thought structures. The language system grows with minimal external input and can readily distinguish between real language and impossible languages.
This paper addresses the challenge of transforming complex sentences into sequences of logical, simplified sentences while preserving semantic and logical integrity with the help of Large Language Models. We propose a hybrid approach that combines advanced prompting with multi-agent architectures to enhance the sentence simplification process. Experimental results show that our approach was able to successfully simplify 70% of the complex sentences written for video game design application. In comparison, a single-agent approach attained a 48% success rate on the same task.
Root Cause Analysis (RCA) in telecommunication networks is a critical task, yet it presents a formidable challenge for Artificial Intelligence (AI) due to its complex, graph-based reasoning requirements and the scarcity of realistic benchmarks.
We examine the role of transcription inconsistencies in the Faetar Automatic Speech Recognition benchmark, a challenging low-resource ASR benchmark. With the help of a small, hand-constructed lexicon, we conclude that find that, while inconsistencies do exist in the transcriptions, they are not the main challenge in the task. We also demonstrate that bigram word-based language modelling is of no added benefit, but that constraining decoding to a finite lexicon can be beneficial. The task remains extremely difficult.
The drafting of documents in the procurement field has progressively become more complex and diverse, driven by the need to meet legal requirements, adapt to technological advancements, and address stakeholder demands. While large language models (LLMs) show potential in document generation, most LLMs lack specialized knowledge in procurement. To address this gap, we use retrieval-augmented techniques to achieve professional document generation, ensuring accuracy and relevance in procurement documentation.
We showcase that ChatGPT can be used to disambiguate lemmas in two endangered languages ChatGPT is not proficient in, namely Erzya and Skolt Sami. We augment our prompt by providing dictionary translations of the candidate lemmas to a majority language - Finnish in our case. This dictionary augmented generation approach results in 50\% accuracy for Skolt Sami and 41\% accuracy for Erzya. On a closer inspection, many of the error types were of the kind even an untrained human annotator would make.
Nigel Markey, Ilyass El-Mansouri, Gaetan Rensonnet
et al.
This manuscript has now been published: - Link to article on journal website: https://journals.sagepub.com/doi/10.1177/17407745251320806 - Pubmed link: https://pubmed.ncbi.nlm.nih.gov/40013826/
The ability to predict an NLP model's accuracy on unseen, potentially out-of-distribution data is a prerequisite for trustworthiness. We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data. We achieve this by training a discriminator which predicts whether the output of a given sequence-to-sequence model is correct or not. We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds, and that these bounds are remarkably close together.
Aqel Mashot Jafar, Kawther A Khalaph, Hussein B Al Husseini
Abstract In the double perovskites structures, Cs 2 SbAgX 6 , X is I, Br, or Cl, the structural, electronic, thermodynamic, thermoelectric and optical, properties have been investigated by using the density functional theory (DFT) correction method. The XRD structural study exhibits that the double perovskite structures are stable in the cubic phase structures. Elastic parameters reveal all structures to be very hard and ductile in nature. The energy band profiles display indirect band-gap of semiconductor behavior for the structures Cs 2 SbAgX 6 ; X is Cl or Br, while exhibiting metallic behavior of the structure Cs 2 SbAgI 6 . The thermoelectric transport properties were verified in the temperature range (5–1000) K, which includes electrical conductivity, thermal conductivity, Seebeck coefficients, and the figure of merit, ZT, for Cs 2 SbAgX 6 structures. These structures exhibit high thermal conductivity with good Seebeck coefficients at room temperature. The semiconducting structure, Cs 2 SbAgBr 6 , has appropriate band gaps and best Seebeck coefficients; therefore, it has the best values of ZT reached 0.000 16 at 1000 K, which means the suitable structure for employment in thermoelectric and spintronic devices applications. The optical properties of these structures exhibit that the absorption effective region at the Visible-Ultraviolet region, therefore these materials are suitable in the applications of solar cells and optoelectronic devices.
Interaction regularities of the complex particle SmX6 3- (X – F-, Cl-) with an outer-sphere shell and an external environment were studied by quantum chemistry methods. A direct calculation of the interaction energies of the fragments of the second coordination sphere both with the complex itself and with a fragment of the external environment was performed. In all studied systems, the composition of the most stable particle was 3M+∙SmX6 3- (M – Na, K, Rb, Cs). It was concluded the formation of stable particles "complex-outer-sphere shell". In the first approximation, it was assumed that the content of particles will be proportional to the energy of its formation. In all studied model systems, the content of 3M+∙SmX6 3- particles was higher than that of the others and amounts to 22-28%. The content of 2M+∙SmX6 3- and 4M+∙SmX6 3- particles was 18-23% and 21-24%, respectively. The obtained results are in a good agreement with the results of Raman spectroscopy.
Daniel Amgar, Tal Binyamin, Vladimir Uvarov
et al.
This paper reports on a mixed-cation system of RbxCs1−xPbX3 (where X = Cl or Br) nanoparticles. Interestingly an attempt to synthesize Cl- and Br-based nanoparticles with high Rb+ content was successful, although possessing low tolerance factors.
We propose a shared task on methodologies and algorithms for evaluating the accuracy of generated texts. Participants will measure the accuracy of basketball game summaries produced by NLG systems from basketball box score data.
ObjectiveTo evaluate the national antenatal syphilis screening programme and provide evidence for improving screening and management strategies.DesignNational population‐based surveillance.SettingUnited Kingdom (UK).PopulationAll pregnant women screening positive for syphilis, 2010–2011.MethodsDemographic, laboratory and treatment details for each pregnancy were collected from UK antenatal units (~210), along with follow‐up information on all infants born to women requiring syphilis treatment in pregnancy.Main outcome measuresProportion of women with newly or previously diagnosed syphilis among those with positive screening tests in pregnancy; proportion requiring treatment.ResultsOverall, 77% (1425/1840) of reported pregnancies were confirmed syphilis screen‐positive. Of these, 71% (1010/1425) were in women with previously diagnosed syphilis (155 requiring treatment), 26% (374/1425) with newly diagnosed syphilis (all requiring treatment) and 3% (41/1425) required treatment but the reason for treatment was unclear. Thus 40% (570/1425) required treatment overall; of these, 96% (516/537) were treated (missing data: 33/570), although for 18% (83/456), this was not until the third trimester (missing data: 60/537). Follow up of infants born to treated women was poor, with at least a third not followed. Six infants were diagnosed with congenital syphilis; two mothers were untreated, three had delayed treatment and one had incomplete treatment (first trimester).ConclusionOver 2 years, among pregnant women with confirmed positive syphilis screening results in the UK, a quarter had newly diagnosed infections and 40% required treatment. Despite high uptake of treatment, antenatal syphilis management could be improved by earlier detection, earlier treatment, and stronger links between healthcare teams.Tweetable abstract25% of pregnant women screening positive for syphilis in the UK were newly diagnosed and 40% needed treatment.
The problem of aggression for Internet communities is rampant. Anonymous forums usually called imageboards are notorious for their aggressive and deviant behaviour even in comparison with other Internet communities. This study is aimed at studying ways of automatic detection of verbal expression of aggression for the most popular American (4chan.org) and Russian (2ch.hk) imageboards. A set of 1,802,789 messages was used for this study. The machine learning algorithm word2vec was applied to detect the state of aggression. A decent result is obtained for English (88%), the results for Russian are yet to be improved.