Andrea Gonzáles-Lombardi, César Astuhuamán-Gonzáles
Durante la tercera campaña de extirpación de idolatrías (1650) en una doctrina rural del Arzobispado de Lima, sierra central de Perú, se elaboraron dos expedientes criminales contra Juana Ycha e Ines Carua Chumbi, siendo acusadas de hechicería. Con el objetivo de contextualizar espacial y temporalmente estos documentos, se realiza un análisis y contrastación con datos de reconocimiento arqueológico e imágenes aéreas. Así, se propone la ubicación espacial y la caracterización de los paisajes sagrados en donde las oficiantes realizaron rituales a Apo Parato y Apo Quircay.
Marisa Hudspeth, Patrick J. Burns, Brendan O'Connor
Tokenization is a critical component of language model pretraining, yet standard tokenization methods often prioritize information-theoretical goals like high compression and low fertility rather than linguistic goals like morphological alignment. In fact, they have been shown to be suboptimal for morphologically rich languages, where tokenization quality directly impacts downstream performance. In this work, we investigate morphologically-aware tokenization for Latin, a morphologically rich language that is medium-resource in terms of pretraining data, but high-resource in terms of curated lexical resources -- a distinction that is often overlooked but critical in discussions of low-resource language modeling. We find that morphologically-guided tokenization improves overall performance on four downstream tasks. Performance gains are most pronounced for out of domain texts, highlighting our models' improved generalization ability. Our findings demonstrate the utility of linguistic resources to improve language modeling for morphologically complex languages. For low-resource languages that lack large-scale pretraining data, the development and incorporation of linguistic resources can serve as a feasible alternative to improve LM performance.
Guillem García Subies, Álvaro Barbero Jiménez, Paloma Martínez Fernández
We present a novel contribution to Spanish clinical natural language processing by introducing the largest publicly available clinical corpus, ClinText-SP, along with a state-of-the-art clinical encoder language model, RigoBERTa Clinical. Our corpus was meticulously curated from diverse open sources, including clinical cases from medical journals and annotated corpora from shared tasks, providing a rich and diverse dataset that was previously difficult to access. RigoBERTa Clinical, developed through domain-adaptive pretraining on this comprehensive dataset, significantly outperforms existing models on multiple clinical NLP benchmarks. By publicly releasing both the dataset and the model, we aim to empower the research community with robust resources that can drive further advancements in clinical NLP and ultimately contribute to improved healthcare applications.
Este trabajo analiza las modalidades de producción del audiovisual universitario del Nordeste argentino (NEA) entre las décadas del sesenta y noventa del siglo XX. El estudio recupera experiencias precursoras llevadas adelante por la Universidad Nacional del Nordeste, la Universidad Nacional de Misiones y el Sistema Provincial de Teleducación y Desarrollo (SIPTED). El artículo estudia los modos de producción a partir de caracterizar los agentes implicados, las redes de relaciones entre actores sociales, instituciones y fomentos. Asimismo, describe los métodos y condiciones de trabajo y sus vínculos con la concepción de formatos estéticos-narrativos específicos. El enfoque analítico adoptado busca apartarse de la perspectiva centralista que caracterizó a la historiografía del cine y contribuye a construir una mirada regionalista y localizada. El corpus de análisis está compuesto por audiovisuales del periodo de recorte, notas gráficas y entrevistas a realizadores de la región.
Matheus Puime Pedra, Josune Hernantes, Leire Casals
et al.
Climate change-associated disasters have become a significant concern, principally when affecting urban areas. Assessing these regions' resilience to strengthen their disaster management is crucial, especially in the areas vulnerable to windstorms, one of Spain's most critical disasters. Smart cities and machine learning offer promising solutions to manage disasters, but accurately estimating economic losses from windstorms can be difficult due to the unique characteristics of each region and limited data. This study proposes utilizing ML classification models to enhance disaster resilience by analyzing publicly available data on windstorms in the Spanish areas. This approach can help decision-makers make informed decisions regarding preparedness and mitigation actions, ultimately creating a more resilient urban environment that can better withstand windstorms in the future.
Sentiment analysis, the automated process of determining emotions or opinions expressed in text, has seen extensive exploration in the field of natural language processing. However, one aspect that has remained underrepresented is the sentiment analysis of the Moroccan dialect, which boasts a unique linguistic landscape and the coexistence of multiple scripts. Previous works in sentiment analysis primarily targeted dialects employing Arabic script. While these efforts provided valuable insights, they may not fully capture the complexity of Moroccan web content, which features a blend of Arabic and Latin script. As a result, our study emphasizes the importance of extending sentiment analysis to encompass the entire spectrum of Moroccan linguistic diversity. Central to our research is the creation of the largest public dataset for Moroccan dialect sentiment analysis that incorporates not only Moroccan dialect written in Arabic script but also in Latin letters. By assembling a diverse range of textual data, we were able to construct a dataset with a range of 20 000 manually labeled text in Moroccan dialect and also publicly available lists of stop words in Moroccan dialect. To dive into sentiment analysis, we conducted a comparative study on multiple Machine learning models to assess their compatibility with our dataset. Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing step. We were able to achieve 92% accuracy in our model and to further prove its liability we tested our model on smaller publicly available datasets of Moroccan dialect and the results were favorable.
The article aims to study the urban experience of the working class by analysing the narrative that workers produced and published in the workers' press. Workers' literature was not only concerned with politics and ideology but also with the projections, values and visions of the urban proletariat's experience. Through the analysis of the proletarian poetic and narrative production, the aim is to delve into the everyday life, meanings, imagination, aesthetic preferences, and the popular way of life, which are fundamental dimensions of working-class culture in the first quarter of the 20th century.
A mídia radiofônica foi uma das mais impressionantes tecnologias a se desenvolver no século XX, deixando de ser a caixa de ruídos de 1918 para se transformar num sólido sistema de publicidade, de difusão de ideias e conceitos de todas as naturezas. Ao começar a II Guerra Mundial, os aperfeiçoamentos tinham levado a um sistema mundial, baseado no alcance das ondas curtas, que varavam os continentes. Nas nações mais fortes surgiram conglomerados privados ou públicos capazes de exercer grande difusão mundial, como as redes da CBS americana, da BBC britânica ou da DW alemã. Como o Brasil foi conduzido à guerra como integrante das Nações Unidas, o rádio brasileiro teve relações muito estreitas e especiais com os norte-americanos e britânicos durante o conflito.
This work presents a new resource for borrowing identification and analyzes the performance and errors of several models on this task. We introduce a new annotated corpus of Spanish newswire rich in unassimilated lexical borrowings -- words from one language that are introduced into another without orthographic adaptation -- and use it to evaluate how several sequence labeling models (CRF, BiLSTM-CRF, and Transformer-based models) perform. The corpus contains 370,000 tokens and is larger, more borrowing-dense, OOV-rich, and topic-varied than previous corpora available for this task. Our results show that a BiLSTM-CRF model fed with subword embeddings along with either Transformer-based embeddings pretrained on codeswitched data or a combination of contextualized word embeddings outperforms results obtained by a multilingual BERT-based model.
The weighted projection of an alternating sign matrix (ASM) was introduced by Brualdi and Dahl (2018) as a step towards characterising a generalisation of Latin squares they introduced using alternating sign hypermatrices. If $z_n = (n,\dots,2,1)$, then the weighted projection of an ASM $A$ is equal to $z_n^TA$. Brualdi and Dahl proved that the weighted projection of an $n \times n$ ASM is majorized by the vector $z_n$, and conjectured that any positive integer vector majorized by $z_n$ is the weighted projection of some ASM. The main result of this paper presents a proof of this conjecture, via monotone triangles. A relaxation of a monotone triangle, called a row-increasing triangle, is introduced. It is shown that for any row-increasing triangle $T$, there exists a monotone triangle $M$ such that each entry of $M$ occurs the same number of times as in $T$. A construction is also outlined for an ASM with given weighted projection. The relationship of the main result to existing results concerning the ASM polytope $ASM_n$ is examined, and a characterisation is given for the relationship between elements of $ASM_n$ corresponding to the same point in the permutohedron of order $n$. Finally, the limitations of the main result for characterising alternating sign hypermatrix Latin-like squares is considered.
Abstract Planners use various terms—Hispanic, Latino, and Latinx—to refer to people of Latin American descent and from Spanish-speaking countries. Understanding the differences among these terms is relevant to planning given that Hispanics/Latin/o/a/x are a fast-growing group in the United States and terms have evolved over time and differ between users. I ask three questions: Why do planners need to better understand the ethnonyms Hispanic/Latin/o/a/x? How can planners understand the evolution of Hispanic, Latino, and Latinx terminology? Which strategies can help planners to decide which terms to use? Recommendations for practicing planners include hiring more diverse planners; developing a context-specific manual of style; organizing conversations with local organizations, their state APA chapter, or planning school about ethno-racial identity; mapping identities; and creating equity plans at the department or city level. By becoming more culturally competent, planners can plan for and with the Hispanic/Latin/o/a/x community more effectively.
Julio Vezub, Marcelo Godoy, Margarita Alvarado
et al.
El artículo da cuenta de la exploración realizada por un equipo interdisciplinario y binacional durante 2015 entre Valdivia y el boquete cordillerano de Llifén, como parte de una investigación sobre las relaciones interculturales y territoriales en los asentamientos y rutas de la “Frontera de Arriba” de Valdivia colonial. Se siguió el itinerario de Guillermo Cox, quien buscaba una vía fluvial que conectara el Pacífico con el Atlántico en 1863, propósito impedido por los pehuenche del Limay (Neuquén), que no le permitieron continuar viaje hasta Carmen de Patagones (Argentina). Metodológicamente se realizó un contrapunto arqueológico, histórico, etnográfico y cartográfico que evidenció continuidades y variaciones del paisaje al cotejar el testimonio de Cox con los archivos misionales del tránsito del siglo XVIII al XIX y la historia regional reciente. El camino y la crónica decimonónicos fueron las fuentes principales para comprender las relaciones culturales, económicas y políticas entre hispano-criollos y mapuche en ambas vertientes andinas antes de la fijación de límites nacionales. Se sostiene como hipótesis que el proceso de colonización socioterritorial estaba avanzado en 1863, y que la población de la “Frontera de Arriba” era muy significativa, lo que explica los afanes del Estado chileno por terminar con su autonomía.
Antonio José Alves de Oliveira, Giovana Zamboni Rossi
Durante a segunda metade do século XVIII a ideia de “fronteira natural” ocupava um lugar importante no pensamento social, ajudando a tecer importantes considerações no avanço sistemático sobre o mundo colonial e na formação e legitimação dos territórios coloniais. Esforço que causa estranhamentos entre sujeitos e paisagens sempre novas ou insubmissas. No século XIX, na Europa, o pensamento romântico toma o conceito de fronteira como bordas e limites entre territórios e constrói mitologias de identidades radicais entre ambientes, paisagens e sujeitos na transformação dos estados modernos em estados nacionais, comunidades imaginadas, ancoradas numa tradição que se quer longínqua. Neste trabalho buscamos traçar algumas notas de pesquisa comparando os processos históricos da formação do espaço colonial do Brasil e na construção do território nacional alemão. Pensando os conceitos de fronteira, território e paisagem, intentamos perceber qual o papel que a “fronteira” ocupa nesses processos, como os narradores elencam o mundo natural em suas narrativas e finalmente de que maneira as narrativas construídas em torno de tais conceitos geram efeitos nas relações entre os sujeitos e os territórios. Consideramos igualmente perceber o esforço de fechamento das fronteiras no mundo colonial e no âmbito nacional como faces de um mesmo processo histórico na modernidade que possui suas reverberações na construção do sentimento nacional por um lado e, por outro lado, no vínculo ambíguo de estranhamento e alteridade da relação com a paisagem e com o mundo natural no mundo colonial.
Environmental sciences, Latin America. Spanish America
Noah Rubin, Curtis Bright, Kevin K. H. Cheung
et al.
In this paper we provide results on using integer programming (IP) and constraint programming (CP) to search for sets of mutually orthogonal latin squares (MOLS). Both programming paradigms have previously successfully been used to search for MOLS, but solvers for IP and CP solvers have significantly improved in recent years and data on how modern IP and CP solvers perform on the MOLS problem is lacking. Using state-of-the-art solvers as black boxes we were able to quickly find pairs of MOLS (or prove their nonexistence) in all orders up to ten. Moreover, we improve the effectiveness of the solvers by formulating an extended symmetry breaking method as well as an improvement to the straightforward CP encoding. We also analyze the effectiveness of using CP and IP solvers to search for triples of MOLS, compare our timings to those which have been previously published, and estimate the running time of using this approach to resolve the longstanding open problem of determining the existence of a triple of MOLS of order ten.
Emmanuel Maqueda, Javier Alvarez-Jimenez, Carlos Mena
et al.
With the advent of digital technology, it is more common that committed crimes or legal disputes involve some form of speech recording where the identity of a speaker is questioned [1]. In face of this situation, the field of forensic speaker identification has been looking to shed light on the problem by quantifying how much a speech recording belongs to a particular person in relation to a population. In this work, we explore the use of speech embeddings obtained by training a CNN using the triplet loss. In particular, we focus on the Spanish language which has not been extensively studies. We propose extracting the embeddings from speech spectrograms samples, then explore several configurations of such spectrograms, and finally, quantify the embeddings quality. We also show some limitations of our data setting which is predominantly composed by male speakers. At the end, we propose two approaches to calculate the Likelihood Radio given out speech embeddings and we show that triplet loss is a good alternative to create speech embeddings for forensic speaker identification.
This article discusses the feasibility and requirements for using Ginga as the middleware of a Digital Radio System. Ginga was adopted by Brazil and several other countries in Latin America as the standard for interactivity in Digital TV (ISDB-T International). In this article the two digital radio standards being considered for adoption by Brazil are briefly presented, a discussion about the requirements for broadcast radio interactivity is presented, and finally a prototype of a Ginga interactive application for digital radio that was transmitted and received over the air using a digital radio standard is detailed.
Alberto Barbado, María Dolores González, Débora Carrera
Text classification tasks have improved substantially during the last years by the usage of transformers. However, the majority of researches focus on prose texts, with poetry receiving less attention, specially for Spanish language. In this paper, we propose a semi-supervised learning approach for inferring 21 psychological categories evoked by a corpus of 4572 sonnets, along with 10 affective and lexico-semantic multiclass ones. The subset of poems used for training an evaluation includes 270 sonnets. With our approach, we achieve an AUC beyond 0.7 for 76% of the psychological categories, and an AUC over 0.65 for 60% on the multiclass ones. The sonnets are modelled using transformers, through sentence embeddings, along with lexico-semantic and affective features, obtained by using external lexicons. Consequently, we see that this approach provides an AUC increase of up to 0.12, as opposed to using transformers alone.