Determination of the longitude difference between Baghdad and Khwarezm using a lunar eclipse (the method of Abu Rayhan al-Biruni and Abu al-Wafa al-Buzjani)
Rizoi Bakhromzod
This paper examines how, in the tenth century, medieval Iranian scholars Abu Rayhan al-Biruni and Abu al-Wafa al-Buzjani determined the difference in geographical longitude between the cities of Baghdad and Khwarezm through simultaneous observation of a lunar eclipse. Brief academic biographies of these scholars are presented, with emphasis on their contributions to mathematics and astronomy. The study discusses the importance of determining geographical coordinates - especially longitude - in the science of the 10th-11th centuries, provides an overview of the methods of coordinate determination available at the time, and highlights the problem of synchronizing remote observations prior to the advent of electronic communication. Particular attention is devoted to a detailed analysis of the method based on observing a lunar eclipse to simultaneously measure longitude differences: the necessary conditions and organization of the experiment, the instruments employed, the mathematical calculations, and error estimates are described. The longitude difference obtained by al-Biruni and al-Buzjani is compared with modern values. The conclusion discusses the scientific significance of this method for the history of science and astronomy.
en
physics.hist-ph, astro-ph.IM
Poisson Inventory Models with Many Items: An Empirical Bayes Approach
Edward Anderson, Nam Ho-Nguyen, Peter Radchenko
We consider inventory decisions with many items, each of which has Poisson demand. The rate of demand for individual items is estimated on the basis of observations of past demand. The problem is to determine the items to hold in stock and the amount of each one. Our setting provides a natural framework for the application of the empirical Bayes methodology. We show how to do this in practice and demonstrate the importance of making posterior estimates of different demand levels, rather than just estimating the Poisson rate. We also address the question of when it is beneficial to separately analyse a group of items which are distinguished in some way. An example occurs when looking at inventory for a book retailer, who may find it advantageous to look separately at certain types of book (e.g. biographies). The empirical Bayes methodology is valuable when dealing with items having Poisson demand, and can be effective even with relatively small numbers of distinct items (e.g. 100). We discuss the best way to apply an empirical Bayes methodology in this context, and also show that doing this in the wrong way will reduce or eliminate the potential benefits.
Protecting De-identified Documents from Search-based Linkage Attacks
Pierre Lison, Mark Anderson
While de-identification models can help conceal the identity of the individuals mentioned in a document, they fail to address linkage risks, defined as the potential to map the de-identified text back to its source. One straightforward way to perform such linkages is to extract phrases from the de-identified document and check their presence in the original dataset. This paper presents a method to counter search-based linkage attacks while preserving the semantic integrity of the text. The method proceeds in two steps. We first construct an inverted index of the N-grams occurring in the text collection, making it possible to efficiently determine which N-grams appear in fewer than $k$ documents, either alone or in combination with other N-grams. An LLM-based rewriter is then iteratively queried to reformulate those spans until linkage is no longer possible. Experimental results on two datasets (court cases and Wikipedia biographies) show that the rewriting method can effectively prevent search-based linkages while remaining faithful to the original content. However, we also highlight that linkages remain feasible with the help of more advanced, semantics-oriented approaches.
The image of «home» in N.A. Baykov’s short story collection «By the Fire»
L. Dorofeeva, L. Gu
<p>Home is a recurring theme in the literature of the Russian diaspora of the first wave, reflecting the state of being of Russians in a “foreign” space. The image of home in the works of N. Baykov acquires its unique features associated with both the peculiarities of his biography and personality, and with the specifics of his work. The article examines the semantics of the space of home based on several stories from N. Baykov’s collection “By the Fire”. The opposition of “one’s own”/“foreign” space and the specifics of their interaction as “home” and “homelessness” are revealed. The key theme in the stories “New Year’s Eve”, “On the Deck of a Steamship” and “The Secret of a Bottle” is emigration and the image of Russia. The authors come to the conclusion that the image of home is realized at the levels of external and internal spaces, the semantics of which is determined by the category of freedom.</p>
Emergent Simplicities in the Living Histories of Individual Cells
Charles S. Wright, Kunaal Joshi, Rudro R. Biswas
et al.
Organisms maintain the status quo, holding key physiological variables constant to within an acceptable tolerance, and yet adapt with precision and plasticity to dynamic changes in externalities. What organizational principles ensure such exquisite yet robust control of systems-level "state variables" in complex systems with an extraordinary number of moving parts and fluctuating variables? Here we focus on these issues in the specific context of intra- and intergenerational life histories of individual bacterial cells, whose biographies are precisely charted via high-precision dynamic experiments using the SChemostat technology. We highlight intra- and intergenerational scaling laws and other "emergent simplicities" revealed by these high-precision data. In turn, these facilitate a principled route to dimensional reduction of the problem, and serve as essential building blocks for phenomenological and mechanistic theory. Parameter-free data-theory matches for multiple organisms validate theory frameworks, and explicate the systems physics of stochastic homeostasis and adaptation.
en
cond-mat.stat-mech, q-bio.CB
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
Victoria R. Li, Yida Chen, Naomi Saphra
While the biases of language models in production are extensively documented, the biases of their guardrails have been neglected. This paper studies how contextual information about the user influences the likelihood of an LLM to refuse to execute a request. By generating user biographies that offer ideological and demographic information, we find a number of biases in guardrail sensitivity on GPT-3.5. Younger, female, and Asian-American personas are more likely to trigger a refusal guardrail when requesting censored or illegal information. Guardrails are also sycophantic, refusing to comply with requests for a political position the user is likely to disagree with. We find that certain identity groups and seemingly innocuous information, e.g., sports fandom, can elicit changes in guardrail sensitivity similar to direct statements of political ideology. For each demographic category and even for American football team fandom, we find that ChatGPT appears to infer a likely political ideology and modify guardrail behavior accordingly.
Multilingual Hallucination Gaps in Large Language Models
Cléa Chataigner, Afaf Taïk, Golnoosh Farnadi
Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Yi Cheng, Xiao Liang, Yeyun Gong
et al.
Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.
Know When To Stop: A Study of Semantic Drift in Text Generation
Ava Spataru, Eric Hambro, Elena Voita
et al.
In this work, we explicitly show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later: this was occasionally observed but never properly measured. We develop a semantic drift score that measures the degree of separation between correct and incorrect facts in generated texts and confirm our hypothesis when generating Wikipedia-style biographies. This correct-then-incorrect generation pattern suggests that factual accuracy can be improved by knowing when to stop generation. Therefore, we explore the trade-off between information quantity and factual accuracy for several early stopping methods and manage to improve factuality by a large margin. We further show that reranking with semantic similarity can further improve these results, both compared to the baseline and when combined with early stopping. Finally, we try calling external API to bring the model back to the right generation path, but do not get positive results. Overall, our methods generalize and can be applied to any long-form text generation to produce more reliable information, by balancing trade-offs between factual accuracy, information quantity and computational cost.
Boris Hrinchenko and the Opening of the Monument to Ivan Kotlyarevsky in Poltava, 1903
Bezzub Yurii, Bezzub Iryna
The article highlights the role of Boris Hrinchenko in the events related to the construction and opening ceremony of the monument to Ivan Kotlyarevsky in Poltava on August 30-31, 1903 (old style). Until now, this episode of B. Hrinchenko’s biography has not been the subject of separate research interest. The authors’ methodological approach is the systematic and critical use of available sources, primarily the materials of the periodicals of that time and the ego-documents of the participants and contemporaries of the events (memoirs, epistolary, diaries, etc.). They contain the most factual information about B. Hrinchenko’s participation in the “festive of the Ukrainian intelligentsia” in Poltava (S. Efremov’s words), include reflections of participants and interested persons, and reproduce public and political responses to events. It was found out that B. Hrinchenko was the organizer and active participant of many events during the “Kotlyarevsky days” in Poltava in 1903. At the same time, along with examples of his organizational activities, the sources recorded episodes of his practical performances of public duties. It is shown that the cases of cooperation of B. Hrinchenko and other famous public and political actors of the older generation with representatives of the Ukrainian youth radical environment during the Poltava events proved the possibility of joint, coordinated actions against the government’s anti-Ukrainian measures. It is proved that the open protest of the Ukrainian intelligentsia at the meeting of the Poltava City Duma against the ban on the public use of the Ukrainian language with the participation of B. Hrinchenko demonstrated the national maturity of the Ukrainian intellectual forces. Together with other Ukrainian figures, B. Hrinchenko managed to make information about the Poltava events knownin Ukraine and abroad through periodicals in Russia and Austria-Hungary.
Bibliography. Library science. Information resources
Hermeneutyka poetyckiego głosu w najnowszych badaniach literackich
Joanna Dembińska-Pawelec
The article discusses the latest research on poetry performance in Polish literature. At the outset, the author situates it within the scope of sound studies as well as acoustic philology and audio anthropology. Then, the author focuses on Aleksandra Kremer’s book The Sound of Modern Polish Poetry: Performance and Recording After World War II and the research methods presented there for studying vocal performance. Kremer refers to Charles Bernstein’s concept of close listening, which she enriches with analyses using the computer program Praat, designed for scientific analysis of speech and phonetic phenomena. Kremer analyzes the sound waves of recorded poetic performances. Using this method, she examines recordings of Czesław Miłosz, Julia Hartwig, Miron Białoszewski, Wisława Szymborska, Aleksander Wat, Zbigniew Herbert, Anna Kamieńska, Anna Świrszczyńska, Tadeusz Różewicz, and Krystyna Miłobędzka. The author of the article notes that Kremer’s close listening studies have a hermeneutic character, encompassing biography, history, and culture. This extensive anthropological context, supported by sound visualization, brings us closer to capturing the phenomenon of the poet’s recorded voice.
Investigative Pattern Detection Framework for Counterterrorism
Shashika R. Muramudalige, Benjamin W. K. Hung, Rosanne Libretti
et al.
Law-enforcement investigations aimed at preventing attacks by violent extremists have become increasingly important for public safety. The problem is exacerbated by the massive data volumes that need to be scanned to identify complex behaviors of extremists and groups. Automated tools are required to extract information to respond queries from analysts, continually scan new information, integrate them with past events, and then alert about emerging threats. We address challenges in investigative pattern detection and develop an Investigative Pattern Detection Framework for Counterterrorism (INSPECT). The framework integrates numerous computing tools that include machine learning techniques to identify behavioral indicators and graph pattern matching techniques to detect risk profiles/groups. INSPECT also automates multiple tasks for large-scale mining of detailed forensic biographies, forming knowledge networks, and querying for behavioral indicators and radicalization trajectories. INSPECT targets human-in-the-loop mode of investigative search and has been validated and evaluated using an evolving dataset on domestic jihadism.
Functional Analytics for Document Ordering for Curriculum Development and Comprehension
Arturo N. Villanueva, Steven J. Simske
We propose multiple techniques for automatic document order generation for (1) curriculum development and for (2) creation of optimal reading order for use in learning, training, and other content-sequencing applications. Such techniques could potentially be used to improve comprehension, identify areas that need expounding, generate curricula, and improve search engine results. We advance two main techniques: The first uses document similarities through various methods. The second uses entropy against the backdrop of topics generated through Latent Dirichlet Allocation (LDA). In addition, we try the same methods on the summarized documents and compare them against the results obtained using the complete documents. Our results showed that while the document orders for our control document sets (biographies, novels, and Wikipedia articles) could not be predicted using our methods, our test documents (textbooks, courses, journal papers, dissertations) provided more reliability. We also demonstrated that summarized documents were good stand-ins for the complete documents for the purposes of ordering.
Unsupervised Text Deidentification
John X. Morris, Justin T. Chiu, Ramin Zabih
et al.
Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that masks words that leak personally-identifying information. The approach utilizes a specially trained reidentification model to identify individuals from redacted personal documents. Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank for the correct profile of the document. To evaluate this approach, we consider the task of deidentifying Wikipedia Biographies, and evaluate using an adversarial reidentification metric. Compared to a set of unsupervised baselines, our approach deidentifies documents more completely while removing fewer words. Qualitatively, we see that the approach eliminates many identifying aspects that would fall outside of the common named entity based approach.
WDV: A Broad Data Verbalisation Dataset Built from Wikidata
Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl
Data verbalisation is a task of great importance in the current field of natural language processing, as there is great benefit in the transformation of our abundant structured and semi-structured data into human-readable formats. Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text. Although KG verbalisation datasets exist for some KGs, there are still gaps in their fitness for use in many scenarios. This is especially true for Wikidata, where available datasets either loosely couple claim sets with textual information or heavily focus on predicates around biographies, cities, and countries. To address these gaps, we propose WDV, a large KG claim verbalisation dataset built from Wikidata, with a tight coupling between triples and text, covering a wide variety of entities and predicates. We also evaluate the quality of our verbalisations through a reusable workflow for measuring human-centred fluency and adequacy scores. Our data and code are openly available in the hopes of furthering research towards KG verbalisation.
Geografías del conflicto, geometrías de la percepción: una propuesta metodológica para mapear la opinión de un territorio = Geographies of conflict, geometries of perception: A methodological proposal for mapping the opinion of a territory
Josep María Sole Gras
Resumen
Esta investigación parte de la voluntad de detectar y caracterizar los lugares del conflicto urbano que han protagonizado las tres últimas décadas de desarrollo de la ciudad de Tarragona. Para ello, este ensayo parte de la observación sistematizada y crítica del relato mediático de la prensa escrita para identificar, acotar y examinar aquellos ámbitos geográficos donde colidan, con mayor o menor virulencia, las carencias, nostalgia, resistencia al cambio y expectativas de transformación. Son los lugares donde compiten múltiples intereses, a menudo contradictorios, por orientar el devenir urbano, espacios donde toman cuerpo los designios abstractos del mercado inmobiliario y las contestaciones, resistencias y fricciones derivadas. Son también los escenarios del error, el exceso, el infortunio o la obsolescencia y abandono. En todos ellos, el conflicto urbano puede actuar como motor del cambio y principio catalizador de transformación o, por el contrario, como condena a la parálisis eterna. El análisis sistemático de la hemeroteca local y la ponderación de variables como el momento, el grado de impacto, el sentimiento asociado a cada noticia o los ámbitos referentes emerge, pues, como estrategia innovadora para estructurar un hilo argumental de la biografía, geometría y geografía de los principales hechos urbanos de cualquier realidad metropolitana contemporánea.
Abstract
This research is based on the desire to detect and characterize the places of urban conflict that have played the leading roles of the last three decades of development in the city Tarragona. To do so, this essay starts from the systematized and critical observation of the media narrative of the written press to identify, delimit and examine those geographical areas where, with greater or lesser virulence, deficiencies, nostalgia, resistance to change and expectations of transformation collide. They are the places where multiple interests, often contradictory, compete to guide the urban future, spaces where the abstract designs of the real estate market and the resulting disputes, resistances and frictions take shape. They are also scenarios of error, excess, misfortune or obsolescence and abandonment. In all of them, the urban conflict can act as a motor and a catalyst for change or, on the contrary, as a sentence to eternal paralysis. Thus, the systematic analysis of the local newspaper archive and the weighting of variables such as the moment, the impact degree, the sentiment associated with each piece of news or the relevant areas emerges as an innovative strategy to structure a plot line of biography, geometry and geography of the main urban facts of any contemporary metropolitan reality.
Aesthetics of cities. City planning and beautifying
The Role of the Akathist by Metropolitan Tryphon (Turkestanov) “Glory to God for All Things” in Preserving the Orthodox Worldview in Russian Culture of the 20th — Early 21st Centuries
Maria R. Nenarokova, Natalya V. Nalegach
The proposed article deals with understanding the text of the akathist by Metropolitan Tryphon (Turkestanov) “Glory to God for All Things” (1929). The role of the akathist in the preservation of the Orthodox spiritual values is noted both in the era of persecution of the Church in Soviet Russia and in modern times, marked by the strengthening of the positions of mass culture. The given Akathist was created in full accordance with the genre of canon, but for the language, which causes controversy about its value in modern society, as evidenced by the monitoring of Orthodox forums. The analysis of Akathist in the context of its author’s biography and theological views suggests that the choice of the Russian language is conditioned by Vladyka Tryphon’s desire to convey Christian consolation to those separated from the Orthodox ritual culture and its liturgical language. This hypothesis is supported by the figurative-thematic structure of Akathist and its intertexts. The title “Glory to God for All Things” is noteworthy. It was used as a prayer formula by the Byzantine saint John Chrysostom, the Optina elder Macarius and the Petrograd New Martyr Metropolitan Veniamin. The choice of Akathist’s title combines the spiritual experience of Byzantium, Russian Elder tradition, and of the years of persecution in the first half of the 20th century. The high values of meekness, Christian love, faith and hope, embodied in the text of Metropolitan Tryphon’s prayer, act as universal spiritual support for a human being in the world. It ensures the preservation and affirmation of the values of the Orthodox worldview in any historical and cultural situation unfavourable for the religious tradition.
Миграции в однокультурной среде в XVII–XIX веках (на примерах Среднего Прииртышья и Среднего Притомья)
Тихонов Сергей Семенович
При изучении этнографо-археологических комплексов аялынских татар в Среднем Прииртышье, а также этнографии и истории русских старожилов, проживающих в среднем течении р.Томь, были получены материалы, позволяющие ставить вопрос о возможности изучения миграций позднесредневекового населения в однокультурной среде. Пока можно говорить о двух вариантах таких передвижения. Первый – это переселения на небольшое расстояние в 5–10 км в пределах территории, которой владеет коллектив. Такая ситуация была прослежена на примере материалов об одной из групп аялынских татар, проживавших в Бергамацких юртах. За 2–3 столетия они минимум три раза последовательно переносили свои юрты на новое место. Второй вариант – переселение на незанятые земли в пределах ареала проживания населения, родственного им по происхождению и культуре. Это было прослежено у русских Среднего Притомья, когда на свободные земли в верховьях р.Иня и на Томи в месте ее выхода из гор на равнину русские крестьяне и служилые люди основали несколько деревень, что привело к необходимости для их охраны поставить Мунгатский острог. Не исключено, что передвижения населения на свободные земли или в пределах принадлежащей ему территории были и в более ранние времена. Однако пока нет достоверных и убедительных археологических свидетельств подобных передвижений, относящихся ко времени ранее появления письменных источников в изучаемых регионах, хотя не исключена вероятность того, что подобные материалы появятся.
Social Norm Bias: Residual Harms of Fairness-Aware Algorithms
Myra Cheng, Maria De-Arteaga, Lester Mackey
et al.
Many modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race. However, these algorithms seldom account for within-group heterogeneity and biases that may disproportionately affect some members of a group. In this work, we characterize Social Norm Bias (SNoB), a subtle but consequential type of algorithmic discrimination that may be exhibited by machine learning models, even when these systems achieve group fairness objectives. We study this issue through the lens of gender bias in occupation classification. We quantify SNoB by measuring how an algorithm's predictions are associated with conformity to inferred gender norms. When predicting if an individual belongs to a male-dominated occupation, this framework reveals that "fair" classifiers still favor biographies written in ways that align with inferred masculine norms. We compare SNoB across algorithmic fairness methods and show that it is frequently a residual bias, and post-processing approaches do not mitigate this type of bias at all.
Exploring Personality and Online Social Engagement: An Investigation of MBTI Users on Twitter
Partha Kadambi
Text-based personality prediction by computational models is an emerging field with the potential to significantly improve on key weaknesses of survey-based personality assessment. We investigate 3848 profiles from Twitter with self-labeled Myers-Briggs personality traits (MBTI) - a framework closely related to the Five Factor Model of personality - to better understand how text-based digital traces from social engagement online can be used to predict user personality traits. We leverage BERT, a state-of-the-art NLP architecture based on deep learning, to analyze various sources of text that hold most predictive power for our task. We find that biographies, statuses, and liked tweets contain significant predictive power for all dimensions of the MBTI system. We discuss our findings and their implications for the validity of the MBTI and the lexical hypothesis, a foundational theory underlying the Five Factor Model that links language use and behavior. Our results hold optimistic implications for personality psychologists, computational linguists, and other social scientists aiming to predict personality from observational text data and explore the links between language and core behavioral traits.