In this paper, we provide a comprehensive cross-country validation study of compositional mortality modeling and forecasting methods. Thus, we consider two one-to-one transformations: the cumulative distribution function and the centered log-ratio transformation in compositional data analysis. Between the two transformations, the cumulative distribution function provides a scale-free way to visualize the gender gap and cross-country heterogeneity in the probability of dying by sex and country. Drawing on age-specific period life-table death counts from 24 countries in the Human Mortality Database (2025), we assess and compare the point and interval forecast accuracy of the two transformations, using the same forecasting method. Enhancing the forecast accuracy of period life-table death counts is of significant value to demographers, who rely on such forecasts to estimate survival probabilities and life expectancy, and to actuaries, who use them to price annuities across various entry ages and maturities.
The share of the world population living in cities with more than one million people rose from 11% in 1975 to 24% in 2025 (our estimates). Will this trend towards greater concentration in large cities continue or level off? We introduce two new city population datasets that use consistent city definitions across countries and over time. The first covers the world between 1975 and 2025, using satellite imagery. The second covers the U.S. between 1850 and 2020, using census microdata. We find that urban growth follows a characteristic life cycle. In the early stages of a country's urbanization process, large cities grow faster than smaller ones. At later stages, growth rates equalize across sizes. We use this life cycle to project future population concentration in large cities. Our projections suggest that 38% of the world population will be living in cities with more than one million people by 2100. This estimate is higher than the 33% implied by the well-known theory of proportional growth, but lower than the 42% obtained by extrapolating current trends.
Mikhail Salnikov, Dmitrii Korzh, Ivan Lazichny
et al.
This paper evaluates geopolitical biases in LLMs with respect to various countries though an analysis of their interpretation of historical events with conflicting national perspectives (USA, UK, USSR, and China). We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries. Our findings show significant geopolitical biases, with models favoring specific national narratives. Additionally, simple debiasing prompts had a limited effect in reducing these biases. Experiments with manipulated participant labels reveal models' sensitivity to attribution, sometimes amplifying biases or recognizing inconsistencies, especially with swapped labels. This work highlights national narrative biases in LLMs, challenges the effectiveness of simple debiasing methods, and offers a framework and dataset for future geopolitical bias research.
Adarsh Raghuvanshi, Hrishidev Unni, Vinayak
et al.
Science is driven by community endeavors across diverse fields and specializations, forming a complex structure that renders conventional performance evaluation methods inadequate. Using established indicators, the network-based normalized citation score, and the disruptive index, combined with the GENEPY algorithm, we evaluate the complexity rank of countries based on their breakthrough performance across 89 subfields of physical sciences, drawing on nearly 60 million articles (1900-2023). This quality-focused integrated approach reveals pronounced asymmetries: while countries such as the United States, Israel, and several in Europe sustain long-term structural advantages, emerging nations show rapid gains in later decades. A power-law relationship between aggregated breakthrough performance and countries' R&D expenditure underscores the unequal and scale-dependent nature of global science. These results demonstrate that scientific advancement arises not from uniform growth but from asymmetric complexity, offering actionable insights for policymakers and funding agencies aiming to foster sustainable, high-quality research ecosystems.
The Pierre Auger Observatory, dedicated to measuring ultra-high energy cosmic rays, has been promoting for more than two decades educational and scientific outreach activities to make its results known in an understandable language to diverse audiences. Among its most notable efforts, we can mention the creation of a visitor center at the Observatory site in Malargüe, Argentina, the production of brochures, posters, videos, talks, and special actions with the community in the site of the Observatory and beyond. In addition to joint efforts, collaborators from participating countries carry out local efforts, some of them related to national initiatives in their respective regions, such as the International Cosmic Day, Masterclasses, exhibitions of artworks with the theme of astroparticles, the Night of the Stars, European Researchers' Night, summer schools and initiatives to improve the gender balance in the science community. In addition, there have been board games based on the dynamics of the observatory's work, online graphic viewers of the different events, talks, workshops, etc. In recent years, the Pierre Auger Outreach group has focused on presenting actions that directly impact the community in Malargüe. However, this time, special emphasis will be placed on highlighting the outreach efforts of Pierre Auger collaborators in various countries.
Shauna Mooney, Leontine Alkema, Emily Sonneveldt
et al.
Monitoring family planning indicators, such as modern contraceptive prevalence rate (mCPR), is essential for family planning programming. The Family Planning Estimation Tool (FPET) uses survey data to estimate and forecast family planning indicators, including mCPR, over time. However, sole reliance on large-scale surveys, carried out on average every 3-5 years, can lead to data gaps. Service statistics are a readily available data source, routinely collected in conjunction with service delivery. Various service statistics data types can be used to derive a family planning indicator called Estimated Modern Use (EMU). In a number of countries, annual rates of change in EMU have been found to be predictive of true rates of change in mCPR. However, it has been challenging to capture the varying levels of uncertainty associated with the EMU indicator across different countries and service statistics data types and to subsequently quantify this uncertainty when using EMU in FPET. We present a new approach to using EMUs in FPET to inform mCPR estimates, using annual EMU rates of change as input, and accounting for uncertainty associated with the EMU derivation process. The approach also considers additional country-type-specific uncertainty. We assess the EMU type-specific uncertainty at the country level, via a Bayesian hierarchical modelling approach. Validation results and anonymised country-level case studies highlight improved predictive performance and provide insights into the impact of including EMU data on mCPR estimates compared to using survey data alone. Together, they demonstrate that EMUs can help countries monitor progress toward their family planning goals more effectively.
Despite the need for financial data on company activities in developing countries for development research and economic analysis, such data does not exist. In this project, we develop and evaluate two Natural Language Processing (NLP) based techniques to address this issue. First, we curate a custom dataset specific to the domain of financial text data on developing countries and explore multiple approaches for information extraction. We then explore a text-to-text approach with the transformer-based T5 model with the goal of undertaking simultaneous NER and relation extraction. We find that this model is able to learn the custom text structure output data corresponding to the entities and their relations, resulting in an accuracy of 92.44\%, a precision of 68.25\% and a recall of 54.20\% from our best T5 model on the combined task. Secondly, we explore an approach with sequential NER and relation extration. For the NER, we run pre-trained and fine-tuned models using SpaCy, and we develop a custom relation extraction model using SpaCy's Dependency Parser output and some heuristics to determine entity relationships \cite{spacy}. We obtain an accuracy of 84.72\%, a precision of 6.06\% and a recall of 5.57\% on this sequential task.
Karim Assi, Lakmal Meegahapola, William Droz
et al.
Smartphones enable understanding human behavior with activity recognition to support people's daily lives. Prior studies focused on using inertial sensors to detect simple activities (sitting, walking, running, etc.) and were mostly conducted in homogeneous populations within a country. However, people are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study settings, making detecting simple activities less meaningful for context-aware applications. Hence, the understanding of (i) how multimodal smartphone sensors and machine learning models could be used to detect complex daily activities that can better inform about people's daily lives and (ii) how models generalize to unseen countries, is limited. We analyzed in-the-wild smartphone data and over 216K self-reports from 637 college students in five countries (Italy, Mongolia, UK, Denmark, Paraguay). Then, we defined a 12-class complex daily activity recognition task and evaluated the performance with different approaches. We found that even though the generic multi-country approach provided an AUROC of 0.70, the country-specific approach performed better with AUROC scores in [0.79-0.89]. We believe that research along the lines of diversity awareness is fundamental for advancing human behavior understanding through smartphones and machine learning, for more real-world utility across countries.
The formulation of a measurement theory for relativistic quantum field theory (QFT) has recently been an active area of research. In contrast to the asymptotic measurement framework that was enshrined in QED, the new proposals aim to supply a measurement framework for measurements in local spacetime regions. This paper surveys episodes in the history of quantum theory that contemporary researchers have identified as precursors to their own work and discusses how they laid the groundwork for current approaches to local measurement theory for QFT.
This paper estimates how electricity price divergence within Sweden has affected incentives to invest in photovoltaic (PV) generation between 2016 and 2022 based on a synthetic control approach. Sweden is chosen as the research subject since it is together with Italy the only EU country with multiple bidding zones and is facing dramatic divergence in electricity prices between low-tariff bidding zones in Northern and high-tariff bidding zones in Southern Sweden since 2020. The results indicate that PV uptake in municipalities located north of the bidding zone border is reduced by 40.9-48% compared to their Southern counterparts. Based on these results, the creation of separate bidding zones within countries poses a threat to the expansion of PV generation and other renewables since it disincentivizes investment in areas with low electricity prices.
Responses to the global climate crisis often focus on the largest current emitters of greenhouse gases. However, analysis shows that about a third of emissions come from a collection of small emitters, each contributing one- to two-percent of the total additional CO$_2$ injected into the communal atmosphere. Attempts to hold global warming to less than 1.5\textcelsius~ cannot succeed without also reducing emissions from these small countries.
Jacky Chen Long Chai, Tiong-Sik Ng, Cheng-Yaw Low
et al.
Very low-resolution face recognition (VLRFR) poses unique challenges, such as tiny regions of interest and poor resolution due to extreme standoff distance or wide viewing angle of the acquisition devices. In this paper, we study principled approaches to elevate the recognizability of a face in the embedding space instead of the visual quality. We first formulate a robust learning-based face recognizability measure, namely recognizability index (RI), based on two criteria: (i) proximity of each face embedding against the unrecognizable faces cluster center and (ii) closeness of each face embedding against its positive and negative class prototypes. We then devise an index diversion loss to push the hard-to-recognize face embedding with low RI away from unrecognizable faces cluster to boost the RI, which reflects better recognizability. Additionally, a perceptibility attention mechanism is introduced to attend to the most recognizable face regions, which offers better explanatory and discriminative traits for embedding learning. Our proposed model is trained end-to-end and simultaneously serves recognizability-aware embedding learning and face quality estimation. To address VLRFR, our extensive evaluations on three challenging low-resolution datasets and face quality assessment demonstrate the superiority of the proposed model over the state-of-the-art methods.
Age-specific and sex-specific cause of death determination is becoming very important task particularly for low- and middle-income countries (LMICs). Therefore, consistent openly accessible information with reproducibility may have significant role in regulating the major causes of mortality both in premature child and adults. The United Nations (UN) reported that 86% deaths (48 million deaths) out of 56 million globally deaths occurred in the LMICs in 2010. The major dilemma is that most of the deaths do not have a diagnosis of COD in such countries. Despite of the allocation of a large portion of resources to decrease the devastating impacts of chronic illnesses, their prevalence as well as the health and economic consequences remains staggeringly high. There are multiple levels of interventions that can help in bringing about significant and promising improvements in the healthcare system. Currently, Pakistan is facing double burden of malnutrition with record high prevalence rates of chronic diseases. Pakistan spends only a marginal of its GDP (1.2%) versus the recommended 5% by World Health Organization. On average, there are eight hospitals per district, with people load per hospital being 165512.452 and poor data management in the country, and we lack a consistent local registry on all-cause of mortality. This article was planned to compile the data related to major causes and disease specific mortality rates for Pakistan and link these factors to the social-economic determinants of health.
Although not explicitly declared, most research rankings of countries and institutions are supposed to reveal their contribution to the advancement of knowledge. However, such advances are based on very highly cited publications with very low frequency, which can only very exceptionally be counted with statistical reliability. Percentile indicators enable calculations of the probability or frequency of such rare publications using counts of much more frequent publications; the general rule is that rankings based on the number of top 10% or 1% cited publications (Ptop 10%, Ptop 1%) will also be valid for the rare publications that push the boundaries of knowledge. Japan and its universities are exceptions, as their frequent Nobel Prizes contradicts their low Ptop 10% and Ptop 1%. We explain that this occurs because, in single research fields, the singularity of percentile indicators holds only for research groups that are homogeneous in their aims and efficiency. Correct calculations for ranking countries and institutions should add the results of their homogeneous groups, instead of considering all publications as a single set. Although based on Japan, our findings have a general character. Common predictions of scientific advances based on Ptop 10% might be severalfold lower than correct calculations.
Md. Aktaruzzaman Pramanik, Md Mahbubur Rahman, ASM Iftekhar Anam
et al.
Traffic congestion research is on the rise, thanks to urbanization, economic growth, and industrialization. Developed countries invest a lot of research money in collecting traffic data using Radio Frequency Identification (RFID), loop detectors, speed sensors, high-end traffic light, and GPS. However, these processes are expensive, infeasible, and non-scalable for developing countries with numerous non-motorized vehicles, proliferated ride-sharing services, and frequent pedestrians. This paper proposes a novel approach to collect traffic data from Google Map's traffic layer with minimal cost. We have implemented widely used models such as Historical Averages (HA), Support Vector Regression (SVR), Support Vector Regression with Graph (SVR-Graph), Auto-Regressive Integrated Moving Average (ARIMA) to show the efficacy of the collected traffic data in forecasting future congestion. We show that even with these simple models, we could predict the traffic congestion ahead of time. We also demonstrate that the traffic patterns are significantly different between weekdays and weekends.
Ratnabali Pal, Arif Ahmed Sekh, Samarjit Kar
et al.
The recent worldwide outbreak of the novel coronavirus (COVID-19) has opened up new challenges to the research community. Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic. Such predictions can be helpful to control and prevent the spread of such diseases. The main challenges of applying AI is the small volume of data and the uncertain nature. Here, we propose a shallow long short-term memory (LSTM) based neural network to predict the risk category of a country. We have used a Bayesian optimization framework to optimize and automatically design country-specific networks. The results show that the proposed pipeline outperforms state-of-the-art methods for data of 180 countries and can be a useful tool for such risk categorization. We have also experimented with the trend data and weather data combined for the prediction. The outcome shows that the weather does not have a significant role. The tool can be used to predict long-duration outbreak of such an epidemic such that we can take preventive steps earlier
High quality census data are not always available in developing countries. Instead, mobile phone data are becoming a trending proxy to evaluate population density, activity and social characteristics. They offer additional advantages for infrastructure planning such as being updated in real-time, including mobility information and recording visitors' activity. We combine various data sets from Senegal to evaluate mobile phone data's potential to replace insufficient census data for infrastructure planning in developing countries. As an applied case, we test their ability at predicting domestic electricity consumption. We show that, contrary to common belief, average mobile phone activity does not correlate well with population density. However, it can provide better electricity consumption estimates than basic census data. More importantly, we use curve and network clustering techniques to enhance the accuracy of the predictions, to recover good population mapping potential and to reduce the collection of required data to substantially smaller samples.