Hasil untuk "Latin America. Spanish America"

Menampilkan 20 dari ~366220 hasil · dari arXiv, DOAJ, Semantic Scholar

JSON API
arXiv Open Access 2025
How Does a Deep Neural Network Look at Lexical Stress in English Words?

Itai Allouche, Itay Asael, Rotem Rousso et al.

Despite their success in speech processing, neural networks often operate as black boxes, prompting the question: what informs their decisions, and how can we interpret them? This work examines this issue in the context of lexical stress. A dataset of English disyllabic words was automatically constructed from read and spontaneous speech. Several Convolutional Neural Network (CNN) architectures were trained to predict stress position from a spectrographic representation of disyllabic words lacking minimal stress pairs (e.g., initial stress WAllet, final stress exTEND), achieving up to 92% accuracy on held-out test data. Layerwise Relevance Propagation (LRP), a technique for neural network interpretability analysis, revealed that predictions for held-out minimal pairs (PROtest vs. proTEST ) were most strongly influenced by information in stressed versus unstressed syllables, particularly the spectral properties of stressed vowels. However, the classifiers also attended to information throughout the word. A feature-specific relevance analysis is proposed, and its results suggest that our best-performing classifier is strongly influenced by the stressed vowel's first and second formants, with some evidence that its pitch and third formant also contribute. These results reveal deep learning's ability to acquire distributed cues to stress from naturally occurring data, extending traditional phonetic work based around highly controlled stimuli.

en cs.CL, cs.LG
arXiv Open Access 2025
Myrvold's Results on Orthogonal Triples of $10 \times 10$ Latin Squares: A SAT Investigation

Curtis Bright, Amadou Keita, Brett Stevens

Ever since E. T. Parker constructed an orthogonal pair of $10\times10$ Latin squares in 1959, an orthogonal triple of $10\times10$ Latin squares has been one of the most sought-after combinatorial designs. Despite extensive work, the existence of such an orthogonal triple remains an open problem, though some negative results are known. In 1999, W. Myrvold derived some highly restrictive constraints in the special case in which one of the Latin squares in the triple contains a $4\times4$ Latin subsquare. In particular, Myrvold showed there were twenty-eight possible cases for an orthogonal pair in such a triple, twenty of which were removed from consideration. We implement a computational approach that quickly verifies all of Myrvold's nonexistence results and in the remaining eight cases finds explicit examples of orthogonal pairs -- thus explaining for the first time why Myrvold's approach left eight cases unsolved. As a consequence, the eight remaining cases cannot be removed by a strategy of focusing on the existence of an orthogonal pair; the third square in the triple must necessarily be considered as well. Our approach uses a Boolean satisfiability (SAT) solver to derive the nonexistence of twenty of the orthogonal pair types and find explicit examples of orthogonal pairs in the eight remaining cases. To reduce the existence problem into Boolean logic we use a duality between the concepts of transversal representation and orthogonal pair and we provide a formulation of this duality in terms of a composition operation on Latin squares. Using our SAT encoding, we find transversal representations (and equivalently orthogonal pairs) in the remaining eight cases in under two hours of computing on a large computing cluster.

en math.CO, cs.DM
arXiv Open Access 2025
Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Visual speech recognition remains an open research problem where different challenges must be considered by dispensing with the auditory sense, such as visual ambiguities, the inter-personal variability among speakers, and the complex modeling of silence. Nonetheless, recent remarkable results have been achieved in the field thanks to the availability of large-scale databases and the use of powerful attention mechanisms. Besides, multiple languages apart from English are nowadays a focus of interest. This paper presents noticeable advances in automatic continuous lipreading for Spanish. First, an end-to-end system based on the hybrid CTC/Attention architecture is presented. Experiments are conducted on two corpora of disparate nature, reaching state-of-the-art results that significantly improve the best performance obtained to date for both databases. In addition, a thorough ablation study is carried out, where it is studied how the different components that form the architecture influence the quality of speech recognition. Then, a rigorous error analysis is carried out to investigate the different factors that could affect the learning of the automatic system. Finally, a new Spanish lipreading benchmark is consolidated. Code and trained models are available at https://github.com/david-gimeno/evaluating-end2end-spanish-lipreading.

arXiv Open Access 2025
Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish

Kevin Cohen, Laura Manrique-Gómez, Rubén Manrique

This study explores the use of large language models (LLMs) to enhance datasets and improve irony detection in 19th-century Latin American newspapers. Two strategies were employed to evaluate the efficacy of BERT and GPT-4o models in capturing the subtle nuances nature of irony, through both multi-class and binary classification tasks. First, we implemented dataset enhancements focused on enriching emotional and contextual cues; however, these showed limited impact on historical language analysis. The second strategy, a semi-automated annotation process, effectively addressed class imbalance and augmented the dataset with high-quality annotations. Despite the challenges posed by the complexity of irony, this work contributes to the advancement of sentiment analysis through two key contributions: introducing a new historical Spanish dataset tagged for sentiment analysis and irony detection, and proposing a semi-automated annotation methodology where human expertise is crucial for refining LLMs results, enriched by incorporating historical and cultural contexts as core features.

en cs.CL, cs.AI
DOAJ Open Access 2025
Volver a las leyes del inca y asentar el buen gobierno; a propósito del Parecer cerca de la perpetuidad y buen gobierno de los indios del Perú y aviso de lo que deben hacer los encomenderos para salvarse (1563)

German Morong-Reyes, Matthias Gloël

En 1563, y en medio de la discusión sobre la perpetuidad de las encomiendas en el Perú (1560-1570), un parecer fue remitido al presidente del Consejo de Indias, Juan Sarmiento (ca.1518-1564). Tal escrito, titulado Parecer cerca de la perpetuidad y buen gobierno de los indios del Perú y aviso de lo que deben hacer los encomenderos para salvarse –cuya autoría es desconocida– constituye una respuesta teóricamente elaborada, opuesta a los informes y pareceres de factura dominica que promovían el fin de las encomiendas como responsables directas del abuso, explotación y miseria de los indios. En este artículo se analiza este documento considerando su contexto de producción en virtud de ponderar el ejercicio del buen gobierno respecto de la necesidad de mantener los fueros y costumbres de los naturales. La hipótesis central plantea que este texto, junto a un conjunto no menor de textos oficiales, seculares y religiosos, al servicio del buen gobierno y del asentamiento de la policía cristiana en los reinos del Perú, es parte de una discursividad general que en la década de 1560 a 1570 resaltaba positivamente las prácticas de gobernanza incaicas, a la vez que ratificaba  el argumento sobre la inferioridad natural de los indios.

Archaeology, Anthropology
S2 Open Access 2024
Prevalence of Chagas disease among Latin American immigrants in non-endemic countries: an updated systematic review and meta-analysis

Gisele Nepomuceno de Andrade, P. Bosch-Nicolau, B.R. Nascimento et al.

Summary Background Chagas disease (CD), endemic in 21 Latin American countries, has gradually spread beyond its traditional borders due to migratory movements and emerging as a global health concern. We conducted a systematic review and meta-analysis of available data to establish updated prevalence estimates of CD in Latin American migrants residing in non-endemic countries. Methods A systematic search was conducted in MEDLINE/PubMed, Embase, Cochrane Library, Scopus, Web of Science, and LILACS via Virtual Health Library (Biblioteca Virtual em Saúde - BVS), including references published until November 1st, 2023. Pooled prevalence estimates and 95% confidence intervals (CI) were calculated using random effect models. Heterogeneity was assessed by the chi-square test and the I2 statistic. Subgroup analyses were performed to explore potential sources of heterogeneity among studies. The study was registered in the PROSPERO database (CRD42022354237). Findings From a total of 1474 articles screened, 51 studies were included. Studies were conducted in eight non-endemic countries (most in Spain), between 2006 and 2023, and involving 82,369 screened individuals. The estimated pooled prevalence of CD in Latin American migrants living in non-endemic countries was 3.5% (95% CI: 2.5–4.7; I2: 97.7%), considering studies in which screening was indicated simply because the person was Latin American. Per subgroups, the pooled CD prevalence was 11.0% (95% CI: 7.7–15.5) in non-targeted screening (unselected population in reference centers) (27 studies); in blood donors (4 studies), the pooled prevalence was 0.8% (95% CI: 0.2–3.4); among people living with HIV Latin American immigrants (4 studies) 2.4% (95% CI: 1.4–4.3) and for Latin American pregnant and postpartum women (14 studies) 3.7% (95 CI: 2.4–5.6). The pooled proportion of congenital transmission was 4.4% (95% CI: 3.3–5.8). Regarding the participants’ country of origin, 7964 were from Bolivia, of which 1715 (21,5%) were diagnosed with CD, and 21,304 were from other Latin American countries of which 154 (0,72%) were affected. Interpretation CD poses a significant burden of disease in Latin American immigrants in non-endemic countries, suggesting that CD is no longer a problem limited to the American continent and must be considered as a global health challenge. Funding This study was funded by the 10.13039/501100015708World Heart Federation, through a research collaboration with 10.13039/100008792Novartis Pharma AG.

18 sitasi en Medicine
arXiv Open Access 2024
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin

Stephen Bothwell, Brian DuSell, David Chiang et al.

Computational historical linguistics seeks to systematically understand processes of sound change, including during periods at which little to no formal recording of language is attested. At the same time, few computational resources exist which deeply explore phonological and morphological connections between proto-languages and their descendants. This is particularly true for the family of Italic languages. To assist historical linguists in the study of Italic sound change, we introduce the Proto-Italic to Latin (PILA) dataset, which consists of roughly 3,000 pairs of forms from Proto-Italic and Latin. We provide a detailed description of how our dataset was created and organized. Then, we exhibit PILA's value in two ways. First, we present baseline results for PILA on a pair of traditional computational historical linguistics tasks. Second, we demonstrate PILA's capability for enhancing other historical-linguistic datasets through a dataset compatibility study.

en cs.CL
arXiv Open Access 2024
LiMe: a Latin Corpus of Late Medieval Criminal Sentences

Alessandra Bassani, Beatrice Del Bo, Alfio Ferrara et al.

The Latin language has received attention from the computational linguistics research community, which has built, over the years, several valuable resources, ranging from detailed annotated corpora to sophisticated tools for linguistic analysis. With the recent advent of large language models, researchers have also started developing models capable of generating vector representations of Latin texts. The performances of such models remain behind the ones for modern languages, given the disparity in available data. In this paper, we present the LiMe dataset, a corpus of 325 documents extracted from a series of medieval manuscripts called Libri sententiarum potestatis Mediolani, and thoroughly annotated by experts, in order to be employed for masked language model, as well as supervised natural language processing tasks.

en cs.CL
arXiv Open Access 2024
Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration

Xiliang Zhu, Chia-Tien Chang, Shayna Gardiner et al.

Punctuation restoration is a crucial step after Automatic Speech Recognition (ASR) systems to enhance transcript readability and facilitate subsequent NLP tasks. Nevertheless, conventional lexical-based approaches are inadequate for solving the punctuation restoration task in Spanish, where ambiguity can be often found between unpunctuated declaratives and questions. In this study, we propose a novel hybrid acoustic-lexical punctuation restoration system for Spanish transcription, which consolidates acoustic and lexical signals through a modular process. Our experiment results show that the proposed system can effectively improve F1 score of question marks and overall punctuation restoration on both public and internal Spanish conversational datasets. Additionally, benchmark comparison against LLMs (Large Language Model) indicates the superiority of our approach in accuracy, reliability and latency. Furthermore, we demonstrate that the Word Error Rate (WER) of the ASR module also benefits from our proposed system.

en cs.CL, cs.AI
arXiv Open Access 2024
Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Anton Lavrouk, Ian Ligon, Tarek Naous et al.

The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation. In the Stanceosaurus 2.0 iteration, we extend this framework to encompass Russian and Spanish. The former is of current significance due to prevalent misinformation amid escalating tensions with the West and the violent incursion into Ukraine. The latter, meanwhile, represents an enormous community that has been largely overlooked on major social media platforms. By incorporating an additional 3,874 Spanish and Russian tweets over 41 misinformation claims, our objective is to support research focused on these issues. To demonstrate the value of this data, we employed zero-shot cross-lingual transfer on multilingual BERT, yielding results on par with the initial Stanceosaurus study with a macro F1 score of 43 for both languages. This underlines the viability of stance classification as an effective tool for identifying multicultural misinformation.

en cs.CL, cs.CY
arXiv Open Access 2024
Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini et al.

Counter Narratives (CNs) are non-negative textual responses to Hate Speech (HS) aiming at defusing online hatred and mitigating its spreading across media. Despite the recent increase in HS content posted online, research on automatic CN generation has been relatively scarce and predominantly focused on English. In this paper, we present CONAN-EUS, a new Basque and Spanish dataset for CN generation developed by means of Machine Translation (MT) and professional post-edition. Being a parallel corpus, also with respect to the original English CONAN, it allows to perform novel research on multilingual and crosslingual automatic generation of CNs. Our experiments on CN generation with mT5, a multilingual encoder-decoder model, show that generation greatly benefits from training on post-edited data, as opposed to relying on silver MT data only. These results are confirmed by their correlation with a qualitative manual evaluation, demonstrating that manually revised training data remains crucial for the quality of the generated CNs. Furthermore, multilingual data augmentation improves results over monolingual settings for structurally similar languages such as English and Spanish, while being detrimental for Basque, a language isolate. Similar findings occur in zero-shot crosslingual evaluations, where model transfer (fine-tuning in English and generating in a different target language) outperforms fine-tuning mT5 on machine translated data for Spanish but not for Basque. This provides an interesting insight into the asymmetry in the multilinguality of generative models, a challenging topic which is still open to research.

en cs.CL
S2 Open Access 2023
Latin American anaphylaxis registry

E. Jares, V. Cardona, R. Gómez et al.

Background Recent data about clinical features, triggers and management of anaphylaxis in Latin America is lacking. Objective To provide updated and extended data on anaphylaxis in this region. Method An online questionnaire was used, with 67 allergy units involved from 12 Latin-American countries and Spain. Among data recorded, demographic information, clinical features, severity, triggering agents, and treatment were received. Results Eight hundred and seventeen anaphylactic reactions were recorded. No difference in severity, regardless of pre-existing allergy or asthma history was found. Drug induced anaphylaxis (DIA) was most frequent (40.6%), followed by food induced anaphylaxis (FIA) (32.9%) and venom induced anaphylaxis (VIA) (12%). FIA and VIA were more common in children-adolescents. Non-steroidal anti-inflammatory drugs (NSAIDs) and beta-lactam antibiotics (BLA) were the most frequent drugs involved. Milk (61.1% of FIA) and egg (15.4% of FIA) in children, and shellfish (25.5% of FIA), fresh fruits (14.2% of FIA), and fish (11.3% of FIA) in adults were the most common FIA triggers. Fire ants were the most frequent insect triggers, and they induced more severe reactions than triggers of FIA and DIA (p < 0.0001). Epinephrine was used in 43.8% of anaphylaxis episodes. After Emergency Department treatment, epinephrine was prescribed to 13% of patients. Conclusions Drugs (NSAIDs and BLA), foods (milk and egg in children and shellfish, fruits and fish in adults) and fire ants were the most common inducers of anaphylaxis. Epinephrine was used in less than half of the episodes emphasizing the urgent need to improve dissemination and implementation of anaphylaxis guidelines.

15 sitasi en Medicine
arXiv Open Access 2023
Dating of a Latin astrolabe

Emmanuel Davoust

We have determined the most probable date for the catalog of 34 stars that was used in the construction of a Latin astrolabe originally owned by the Dominican preacher friars and presently at Musée des Arts précieux Paul-Dupuy (Toulouse, France). To this end we digitized a photograph of the rete and the rule of the astrolabe, computed the equatorial coordinates of the ends of the 34 star pointers of the rete, and produced a list of 113 reference stars taken from several lists of stars on astrolabes. We then compared the coordinates of the ends of the pointers and those of the reference stars for dates between 1400 and 1700. The most probable date for this astrolabe is 1550.

en physics.hist-ph, astro-ph.IM
arXiv Open Access 2023
Translating scientific Latin texts with artificial intelligence: the works of Euler and contemporaries

Sylvio R. Bistafa

The major hindrance in the study of earlier scientific literature is the availability of Latin translations into modern languages. This is particular true for the works of Euler who authored about 850 manuscripts and wrote a thousand letters and received back almost two thousand more. The translation of many of these manuscripts, books and letters have been published in various sources over the last two centuries, but many more have not yet appeared. Fortunately, nowadays, artificial intelligence (AI) translation can be used to circumvent the challenges of translating such substantial number of texts. To validate this tool, benchmark tests have been performed to compare the performance of two popular AI translating algorithms, namely Google Translate and ChatGPT. Additional tests were accomplished in translating an excerpt of a 1739 letter from Johann Bernoulli to Euler, where he announces that he was sending Euler the first part of his manuscript Hydraulica. Overall, the comparative results show that ChatGPT performed better that Google Translate not only in the benchmark tests but also in the translation of this letter, highlighting the superiority of ChatGPT as a translation tool, catering not only to general Latin practitioners but also proving beneficial for specialized Latin translators.

en math.HO, cs.CL
arXiv Open Access 2023
Sequential Estimation using Hierarchically Stratified Domains with Latin Hypercube Sampling

Sebastian Krumscheid, Per Pettersson

Quantifying the effect of uncertainties in systems where only point evaluations in the stochastic domain but no regularity conditions are available is limited to sampling-based techniques. This work presents an adaptive sequential stratification estimation method that uses Latin Hypercube Sampling within each stratum. The adaptation is achieved through a sequential hierarchical refinement of the stratification, guided by previous estimators using local (i.e., stratum-dependent) variability indicators based on generalized polynomial chaos expansions and Sobol decompositions. For a given total number of samples $N$, the corresponding hierarchically constructed sequence of Stratified Sampling estimators combined with Latin Hypercube sampling is adequately averaged to provide a final estimator with reduced variance. Numerical experiments illustrate the procedure's efficiency, indicating that it can offer a variance decay proportional to $N^{-2}$ in some cases.

en stat.ME
arXiv Open Access 2023
Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec

Atnafu Lambebo Tonja, Christian Maldonado-Sifuentes, David Alejandro Mendoza Castillo et al.

In this paper, we present a parallel Spanish-Mazatec and Spanish-Mixtec corpus for machine translation (MT) tasks, where Mazatec and Mixtec are two indigenous Mexican languages. We evaluated the usability of the collected corpus using three different approaches: transformer, transfer learning, and fine-tuning pre-trained multilingual MT models. Fine-tuning the Facebook M2M100-48 model outperformed the other approaches, with BLEU scores of 12.09 and 22.25 for Mazatec-Spanish and Spanish-Mazatec translations, respectively, and 16.75 and 22.15 for Mixtec-Spanish and Spanish-Mixtec translations, respectively. The findings show that the dataset size (9,799 sentences in Mazatec and 13,235 sentences in Mixtec) affects translation performance and that indigenous languages work better when used as target languages. The findings emphasize the importance of creating parallel corpora for indigenous languages and fine-tuning models for low-resource translation tasks. Future research will investigate zero-shot and few-shot learning approaches to further improve translation performance in low-resource settings. The dataset and scripts are available at \url{https://github.com/atnafuatx/Machine-Translation-Resources}

en cs.CL
DOAJ Open Access 2022
La lucha contra la lepra y el paludismo en Michoacán durante el gobierno de Lázaro Cárdenas del Rio, 1934-1940

Mayra Berenice Espinoza Rodríguez

El presente trabajo examina el inicio y desarrollo de las campañas de salud y prevención de enfermedades infectocontagiosas como lo fue la lepra y el paludismo en diversas regiones del estado de Michoacán durante el gobierno de Lázaro Cárdenas del Río. 1934-1940. A partir del registro de las campañas médicas se vislumbra las condiciones de higiene y salud de la población, así como diversos aspectos que el Estado buscó erradicar y controlar en diversos sectores de la población con el fin de mejorar la calidad de vida de sus habitantes. El trabajo analiza mediante fuentes de archivo los esfuerzos de las autoridades posrevolucionarias por enseñar y promover una cultura de la prevención e higiene principalmente en las áreas rurales.

History of scholarship and learning. The humanities, History (General) and history of Europe

Halaman 24 dari 18311