Rafael Rojas, Josefina Zoraida Vázquez, Clara E. Lida et al.
-
Menampilkan 20 dari ~1742010 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar
Rafael Rojas, Josefina Zoraida Vázquez, Clara E. Lida et al.
-
Kipus
Archivo histórico de Kipus: Revista Andina de Letras y Estudios Culturales, 1993.
Davor Lauc, Attapol Rutherford, Weerin Wongwarawipatr
This study introduces AyutthayaAlpha, an advanced transformer-based machine learning model designed for the transliteration of Thai proper names into Latin script. Our system achieves state-of-the-art performance with 82.32% first-token accuracy and 95.24% first-three-token accuracy, while maintaining a low character error rate of 0.0047. The complexity of Thai phonology, including tonal features and vowel length distinctions, presents significant challenges for accurate transliteration, which we address through a novel two-model approach: AyutthayaAlpha-Small, based on the ByT5 architecture, and AyutthayaAlpha-VerySmall, a computationally efficient variant that unexpectedly outperforms its larger counterpart. Our research combines linguistic rules with deep learning, training on a carefully curated dataset of 1.2 million Thai-Latin name pairs, augmented through strategic upsampling to 2.7 million examples. Extensive evaluations against existing transliteration methods and human expert benchmarks demonstrate that AyutthayaAlpha not only achieves superior accuracy but also effectively captures personal and cultural preferences in name romanization. The system's practical applications extend to cross-lingual information retrieval, international data standardization, and identity verification systems, with particular relevance for government databases, academic institutions, and global business operations. This work represents a significant advance in bridging linguistic gaps between Thai and Latin scripts, while respecting the cultural and personal dimensions of name transliteration.
Javier Conde, Miguel González, Nina Melero et al.
The growing interest in Large Language Models (LLMs) and in particular in conversational models with which users can interact has led to the development of a large number of open-source chat LLMs. These models are evaluated on a wide range of benchmarks to assess their capabilities in answering questions or solving problems on almost any possible topic or to test their ability to reason or interpret texts. Instead, the evaluation of the knowledge that these models have of the languages has received much less attention. For example, the words that they can recognize and use in different languages. In this paper, we evaluate the knowledge that open-source chat LLMs have of Spanish words by testing a sample of words in a reference dictionary. The results show that open-source chat LLMs produce incorrect meanings for an important fraction of the words and are not able to use most of the words correctly to write sentences with context. These results show how Spanish is left behind in the open-source LLM race and highlight the need to push for linguistic fairness in conversational LLMs ensuring that they provide similar performance across languages.
Stephen Bothwell, Abigail Swenor, David Chiang
This paper describes submissions from the team Nostra Domina to the EvaLatin 2024 shared task of emotion polarity detection. Given the low-resource environment of Latin and the complexity of sentiment in rhetorical genres like poetry, we augmented the available data through automatic polarity annotation. We present two methods for doing so on the basis of the $k$-means algorithm, and we employ a variety of Latin large language models (LLMs) in a neural architecture to better capture the underlying contextual sentiment representations. Our best approach achieved the second highest macro-averaged Macro-$F_1$ score on the shared task's test set.
Andrés Lou, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez et al.
The Mayan languages comprise a language family with an ancient history, millions of speakers, and immense cultural value, that, nevertheless, remains severely underrepresented in terms of resources and global exposure. In this paper we develop, curate, and publicly release a set of corpora in several Mayan languages spoken in Guatemala and Southern Mexico, which we call MayanV. The datasets are parallel with Spanish, the dominant language of the region, and are taken from official native sources focused on representing informal, day-to-day, and non-domain-specific language. As such, and according to our dialectometric analysis, they differ in register from most other available resources. Additionally, we present neural machine translation models, trained on as many resources and Mayan languages as possible, and evaluated exclusively on our datasets. We observe lexical divergences between the dialects of Spanish in our resources and the more widespread written standard of Spanish, and that resources other than the ones we present do not seem to improve translation performance, indicating that many such resources may not accurately capture common, real-life language usage. The MayanV dataset is available at https://github.com/transducens/mayanv.
Marina Mayor-Rocher, Nina Melero, Elena Merino-Gómez et al.
Large Language Models (LLMs) have been profusely evaluated on their ability to answer questions on many topics and their performance on different natural language understanding tasks. Those tests are usually conducted in English, but most LLM users are not native English speakers. Therefore, it is of interest to analyze how LLMs understand other languages at different levels: from paragraphs to morphems. In this paper, we evaluate the performance of state-of-the-art LLMs in TELEIA, a recently released benchmark with similar questions to those of Spanish exams for foreign students, covering topics such as reading comprehension, word formation, meaning and compositional semantics, and grammar. The results show that LLMs perform well at understanding Spanish but are still far from achieving the level of a native speaker in terms of grammatical competence.
Luisa Tombini Wittmann
Este artigo analisa parte da produção do Coletivo Mbyá-Guarani de Cinema a partir das noções de tempo e de história expressas em seus filmes (e filmagens) que, por um lado, denunciam um passado violento que se mantém no presente e, por outro, fortalecem suas lutas pelo território, pelo Bem Viver (nhanderekó). O foco da análise documental são os primeiros filmes do Coletivo - Mokoi Tekoá, Petei Jeguatá: Duas aldeias, uma caminhada (2008) e Bicicletas de Nhanderú (2011) -, quando o grupo constrói um fazer fílmico coletivo impulsionado pelos desafios da colonialidade. Para melhor compreender esta linguagem, fazemos uso do conceito de aesthesis decolonial, demonstrando que essa narrativa guarani não apenas combate o discurso eurocentrado, mas cria uma forma singular de filmar e de contar sua história ancorada nas relações entre oralidade, território e ancestralidade.
Emily C. Snyder
This essay reviews the following works:
Viacheslav Shved
The article deals with the foreign course of President Joe Biden’s administration in regard to making an anti-iranian coalition in the Middle East. Due to a plethora of intra and extra factors, this particular deal has become crucial. The significance is defined by analyzing one of the most important aspects of American foreign policy in the Middle East for the last short period of time. Its realization will help not only oppose Iranian expansionism in the region but also stand against a possible alliance of two dictators regimes – Putin’s and Ayatollahs’ – in the Russian-Ukrainian war, and in the future could lead to serious danger for all regional security system in the Middle East. The aim of the article is to systematic investigation of the US’ foreign strategy to create an anti-Iranian alliance in the Middle East, mentioning new features, that occur from late 2022 and the beginning of 2023. The article shows the importance of the anti-Iranian alliance as one of the pivotal fronts of war between the global democratic team led by the US and the global dictators’ bunch, which can possibly doom the future of humanity. Methods include a broad mix of general scientific and narrow methods like systemic method, comparative analysis, sources’ critic analysis. The scientific novelty of the research lies in the fact that the author is first in Ukrainian historiography, who tried to investigate the ongoing process of reforming the political and security situation in the Middle East and made a complex analysis of one of the top priority foreign policy themes, conducted by President Joe Biden’s administration in the Middle East, with inventing into research agenda new American, Israel and Arab sources. New reasons and facets in the Middle Eastern geopolitical situation are highlighted, which indicate the cruciality of the research problem. The author draws a conclusion about the correlation of processes in the Middle East and Ukraine, in particular, that the policy of J. Biden’s administration in the specified direction allows more effective resistance to Russian aggression in Ukraine.
Luis Chiruzzo, Marvin Agüero-Torales, Gustavo Giménez-Lugo et al.
We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from news articles and tweets, around 25 thousand tokens, with the information for the tasks. Three teams took part in the evaluation phase, obtaining in general good results for Task 1, and more mixed results for Tasks 2 and 3.
Mike Sharples
John Clark was inventor of the Eureka machine to generate hexameter Latin verse. He labored for 13 years from 1832 to implement the device that could compose at random over 26 million different lines of well-formed verse. This paper proposes that Clark should be regarded as an early cognitive scientist. Clark described his machine as an illustration of a theory of "kaleidoscopic evolution" whereby the Latin verse is "conceived in the mind of the machine" then mechanically produced and displayed. We describe the background to automated generation of verse, the design and mechanics of Eureka, its reception in London in 1845 and its place in the history of language generation by machine. The article interprets Clark's theory of kaleidoscopic evolution in terms of modern cognitive science. It suggests that Clark has not been given the recognition he deserves as a pioneer of computational creativity.
Jaime Ortega-Reyna
No contiene resumen.
Gregor Mendel
Mendel performed his experiments from 1856 and 1863, presented his results in two meetings of the Natural Science Society in Brunn in February and March of 1865, and finally published them in the iconical paper of 1866, Versuche uber Plflanzenhybriden. Two main translations to Spanish are available: the one done by Prevosti in 1977, and a more recent one by Gomez-Graciani in 2014. Both were translated from the English version of Druery and Bateson. Here, we present the first direct translation from German to Spanish of this paper. This new version corrects some errors detected in the previous versions, uses a more agile style and includes the Darwinized terms pointed out by Abbot and Faribanks in a recent paper (Genetics, 204(2):401-405, 2016; doi: 10.1534/genetics.116.194613).
Lucía Céspedes
ABSTRACT This paper presents a descriptive analysis of SCOPUS’ and Web of Science’s journal lists, in order to illustrate and critically assess the current presence of Latin American journals included in these mainstream databases and their working languages for publication. The latest lists of journals released by both databases as of March 2020 were analyzed in terms of journal language and country of publication. Results show Brazil clearly emerges as the regional leader, especially in WoS’ Science Citation Index Expanded and Emerging Sources Citation Index. However, this predominance of Brazilian journals does not entail a corresponding relevance of the Portuguese language. Spanish is the predominant language in mainstream Latin American journals, especially in the Social Sciences and Humanities, while journals identified as multilingual tend to associate either Spanish or Portuguese with English. The combination of Spanish and Portuguese is significantly smaller. This calls for a critical revision of the state of the Latin American scientific-editorial field as a linguistic market, as well as for further questioning the role of English as the lingua franca of academia.
Jérémie Vidal, David Cébron
The acoustic modes of a rotating fluid-filled cavity can be used to determine the effective rotation rate of a fluid (since the resonant frequencies are modified by the flows). To be accurate, this method requires a prior knowledge of the acoustic modes in rotating fluids. Contrary to the Coriolis force, centrifugal gravity has received much less attention in the experimental context. Motivated by on-going experiments in rotating ellipsoids, we study how global rotation and buoyancy modify the acoustic modes of fluid-filled ellipsoids in isothermal (or isentropic) hydrostatic equilibrium. We go beyond the standard acoustic equation, which neglects solid-body rotation and gravity, by deriving an exact wave equation for the acoustic velocity. We then solve the wave problem using a polynomial spectral method in ellipsoids, which is compared with finite-element solutions of the primitive fluid-dynamic equations. We show that the centrifugal acceleration has measurable effects on the acoustic frequencies when $M_Ω\gtrsim 0.3$, where $M_Ω$ is the rotational Mach number defined as the ratio of the sonic and rotational time scales. Such a regime can be reached with experiments rotating at a few tens of Hz, by replacing air with a highly compressible gas (e.g. SF$_6$ or C$_4$F$_8$).
Ian Stewart, Diyi Yang, Jacob Eisenstein
Speakers of non-English languages often adopt loanwords from English to express new or unusual concepts. While these loanwords may be borrowed unchanged, speakers may also integrate the words to fit the constraints of their native language, e.g. creating Spanish "tuitear" from English "tweet." Linguists have often considered the process of loanword integration to be more dependent on language-internal constraints, but sociolinguistic constraints such as speaker background remain only qualitatively understood. We investigate the role of social context and speaker background in Spanish speakers' use of integrated loanwords on social media. We find first that newspaper authors use the integrated forms of loanwords and native words more often than social media authors, showing that integration is associated with formal domains. In social media, we find that speaker background and expectations of formality explain loanword and native word integration, such that authors who use more Spanish and who write to a wider audience tend to use integrated verb forms more often. This study shows that loanword integration reflects not only language-internal constraints but also social expectations that vary by conversation and speaker.
J. Cortés-Sánchez
Abstract Research on business, management and accounting (BMA) in the past century has been overwhelming. Regardless of its significance, regions such as Ibero-America have been overlooked from exhaustive studies on bibliometrics in the subject of BMA. Here, a bibliometric outlook of the subject of BMA in Ibero-America using 19 variables was conducted by analyzing the ten most cited documents in BMA in each country from 1996 to 2017 using the citation database Scopus. The main findings showed a rapid increase in intellectual production led by Spain and Portugal, which also constitute most of the citations. The majority of the most cited studies are behind paywalls. Institutional status (i.e., private or public) has a significant effect on AACSB accreditation. A negative concern that arises for the whole region, mainly Latin-America, is the discriminated use of a journal with predatory features.
Bernat Esquirol, Luce Prignano, Albert Díaz-Guilera et al.
People use Online Social Media to make sense of crisis events. A pandemic crisis like the Covid-19 outbreak is a complex event, involving numerous aspects of the social life on multiple temporal scales. Focusing on the Spanish Twittersphere, we characterized users activity behaviour across the different phases of the Covid-19 first wave. Firstly, we analyzed a sample of timelines of different classes of users from the Spanish Twittersphere in terms of their propensity to produce new information or to amplify information produced by others. Secondly, by performing stepwise segmented regression analysis and Bayesian switchpoint analysis, we looked for a possible behavioral footprint of the crisis in the statistics of users' activity. We observed that generic Spanish Twitter users and journalists experienced an abrupt increment of their tweeting activity between March 9 and March 14, in coincidence with control measures being announced by regional and State level authorities. However, they displayed a stable proportion of retweets before and after the switching point. On the contrary, politicians represented an exception, being the only class of users not experimenting this abrupt change and following a completely endogenous dynamics determined by institutional agenda. On the one hand, they did not increment their overall activity, displaying instead a slight decrease. On the other hand, in times of crisis, politicians tended to strengthen their propensity to amplify information rather than produce it.
Alberto Barbado, Víctor Fresno, Ángeles Manjarrés Riesco et al.
Nowadays, there are many applications of text mining over corpora from different languages. However, most of them are based on texts in prose, lacking applications that work with poetry texts. An example of an application of text mining in poetry is the usage of features derived from their individual words in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, even though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, identifying with tags when a sonnet is related to a specific term. Then, we study how GAM changes according to each of those psychological terms. The corpus used contains 274 Spanish sonnets from authors of different centuries, from 15th to 19th. This corpus was annotated by different domain experts. The experts annotated the poems with affective and lexico-semantic features, as well as with domain concepts that belong to psychology. Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, personality text mining studies of the authors, or the usage of poetry for therapeutic purposes.
Halaman 26 dari 87101