Eric Sun, Kuan-hui Elaine Lin, Wan-Ling Tseng
et al.
We propose a novel approach for reconstructing annual temperatures in East Asia from 1368 to 1911, leveraging the Reconstructed East Asian Climate Historical Encoded Series (REACHES). The lack of instrumental data during this period poses significant challenges to understanding past climate conditions. REACHES digitizes historical documents from the Ming and Qing dynasties of China, converting qualitative descriptions into a four-level ordinal temperature scale. However, these index-based data are biased toward abnormal or extreme weather phenomena, leading to data gaps that likely correspond to normal conditions. To address this bias and reconstruct historical temperatures at any point within East Asia, including locations without direct historical data, we employ a three-tiered statistical framework. First, we perform kriging to interpolate temperature data across East Asia, adopting a zero-mean assumption to handle missing information. Next, we utilize the Last Millennium Ensemble (LME) reanalysis data and apply quantile mapping to calibrate the kriged REACHES data to Celsius temperature scales. Finally, we introduce a novel Bayesian data assimilation method that integrates the kriged Celsius data with LME simulations to enhance reconstruction accuracy. We model the LME data at each geographic location using a flexible nonstationary autoregressive time series model and employ regularized maximum likelihood estimation with a fused lasso penalty. The resulting dynamic distribution serves as a prior, which is refined via Kalman filtering by incorporating the kriged Celsius REACHES data to yield posterior temperature estimates. This comprehensive integration of historical documentation, contemporary climate models, and advanced statistical methods improves the accuracy of historical temperature reconstructions and provides a crucial resource for future environmental and climate studies.
Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by its rich linguistic diversity, has lacked adequate language technology support. SeaLLMs 3 aims to bridge this gap by covering a comprehensive range of languages spoken in this region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese. Leveraging efficient language enhancement techniques and a specially constructed instruction tuning dataset, SeaLLMs 3 significantly reduces training costs while maintaining high performance and versatility. Our model excels in tasks such as world knowledge, mathematical reasoning, translation, and instruction following, achieving state-of-the-art performance among similarly sized models. Additionally, we prioritized safety and reliability by addressing both general and culture-specific considerations and incorporated mechanisms to reduce hallucinations. This work underscores the importance of inclusive AI, showing that advanced LLM capabilities can benefit underserved linguistic and cultural communities.
Nicolò Paternoster, Carmine Giardino, Michael Unterkalmsteiner
et al.
Context: Software startups are newly created companies with no operating history and fast in producing cutting-edge technologies. These companies develop software under highly uncertain conditions, tackling fast-growing markets under severe lack of resources. Therefore, software startups present an unique combination of characteristics which pose several challenges to software development activities. Objective: This study aims to structure and analyze the literature on software development in startup companies, determining thereby the potential for technology transfer and identifying software development work practices reported by practitioners and researchers. Method: We conducted a systematic mapping study, developing a classification schema, ranking the selected primary studies according their rigor and relevance, and analyzing reported software development work practices in startups. Results: A total of 43 primary studies were identified and mapped, synthesizing the available evidence on software development in startups. Only 16 studies are entirely dedicated to software development in startups, of which 10 result in a weak contribution (advice and implications (6); lesson learned (3); tool (1)). Nineteen studies focus on managerial and organizational factors. Moreover, only 9 studies exhibit high scientific rigor and relevance. From the reviewed primary studies, 213 software engineering work practices were extracted, categorized and analyzed. Conclusion: This mapping study provides the first systematic exploration of the state-of-art on software startup research. The existing body of knowledge is limited to a few high quality studies. Furthermore, the results indicate that software engineering work practices are chosen opportunistically, adapted and configured to provide value under the constrains imposed by the startup context.
Inequality prevails in science. Individual inequality means that most perish quickly and only a few are successful, while gender inequality implies that there are differences in achievements for women and men. Using large-scale bibliographic data and following a computational approach, we study the evolution of individual and gender inequality for cohorts from 1970 to 2000 in the whole field of computer science as it grows and becomes a team-based science. We find that individual inequality in productivity (publications) increases over a scholar's career but is historically invariant, while individual inequality in impact (citations), albeit larger, is stable across cohorts and careers. Gender inequality prevails regarding productivity, but there is no evidence for differences in impact. The Matthew Effect is shown to accumulate advantages to early achievements and to become stronger over the decades, indicating the rise of a "publish or perish" imperative. Only some authors manage to reap the benefits that publishing in teams promises. The Matthew Effect then amplifies initial differences and propagates the gender gap. Women continue to fall behind because they continue to be at a higher risk of dropping out for reasons that have nothing to do with early-career achievements or social support. Our findings suggest that mentoring programs for women to improve their social-networking skills can help to reduce gender inequality.
The South and East Asian summer monsoons are globally significant meteorological features, creating a strongly seasonal pattern of precipitation. The stability of the monsoon is of extreme importance for a vast range of ecosystems and for the livelihoods of a large share of the world's population. Simulations are performed with an intermediate complexity climate model, PLASIM, to assess the future response of the monsoons to changing concentrations of aerosols and greenhouse gases. The aerosol loading consists of a mid-tropospheric warming and a surface cooling, which is applied to India, Southeast Asia and East China, both concurrently and independently. The primary effect of increased aerosol loading is a decrease in summer precipitation in the vicinity of the applied forcing, although the regional response varies significantly. The decrease in precipitation is only partially ascribable to a decrease in the precipitable water, and instead derives from a reduction of the precipitation efficiency, due to changes in the stratification of the atmosphere. When the aerosol loading is added in all regions simultaneously, precipitation in East China is most strongly affected, with a quite distinct transition to a low precipitation regime as the radiative forcing increases beyond 60 W/m^2. The response is less abrupt as we move westward, with precipitation in South India being least affected. This lower sensitivity in South India is attributed to aerosol forcing over East China. Additionally, the effect on precipitation is approximately linear with the forcing. The impact of doubling carbon dioxide levels is to increase precipitation over the regions and weaken the circulation. When the carbon dioxide and aerosol forcings are applied at the same time, the carbon dioxide forcing partially offsets the surface cooling and reduction in precipitation associated with the aerosol response.
John D. Ilee, Catherine Walsh, Alice S. Booth
et al.
The precursors to larger, biologically-relevant molecules are detected throughout interstellar space, but determining the presence and properties of these molecules during planet formation requires observations of protoplanetary disks at high angular resolution and sensitivity. Here we present 0.3" observations of HC$_3$N, CH$_3$CN, and $c$-C$_3$H$_2$ in five protoplanetary disks observed as part of the Molecules with ALMA at Planet-forming Scales (MAPS) Large Program. We robustly detect all molecules in four of the disks (GM Aur, AS 209, HD 163296 and MWC 480) with tentative detections of $c$-C$_3$H$_2$ and CH$_3$CN in IM Lup. We observe a range of morphologies -- central peaks, single or double rings -- with no clear correlation in morphology between molecule nor disk. Emission is generally compact and on scales comparable with the millimetre dust continuum. We perform both disk-integrated and radially-resolved rotational diagram analysis to derive column densities and rotational temperatures. The latter reveals 5-10 times more column density in the inner 50-100 au of the disks when compared with the disk-integrated analysis. We demonstrate that CH$_3$CN originates from lower relative heights in the disks when compared with HC$_3$N, in some cases directly tracing the disk midplane. Finally, we find good agreement between the ratio of small to large nitriles in the outer disks and comets. Our results indicate that the protoplanetary disks studied here are host to significant reservoirs of large organic molecules, and that this planet- and comet-building material can be chemically similar to that in our own Solar System. This paper is part of the MAPS special issue of the Astrophysical Journal Supplement Series.
Difference-in-differences (DID) is a widely used approach for drawing causal inference from observational panel data. Two common estimation strategies for DID are outcome regression and propensity score weighting. In this paper, motivated by a real application in traffic safety research, we propose a new double-robust DID estimator that hybridizes regression and propensity score weighting. We particularly focus on the case of discrete outcomes. We show that the proposed double-robust estimator possesses the desirable large-sample robustness property. We conduct a simulation study to examine its finite-sample performance and compare with alternative methods. Our empirical results from a Pennsylvania Department of Transportation data suggest that rumble strips are marginally effective in reducing vehicle crashes.
In the singularity and differential topological theory of Morse functions and higher dimensional versions or fold maps and application to algebraic and differential topology of manifolds, constructing explicit fold maps and investigating their source manifolds is fundamental, important and difficult. The author has introduced surgery operations (bubbling operations) to fold maps, motivated by studies of Kobayashi, Saeki etc. since 1990 and has explicitly shown that homology groups of Reeb spaces of maps constructed by iterations of these operations are flexible in several cases. Such operations seem to be strong tools in construction of maps and precise studies of manifolds. More precisely, the author has also noticed that the resulting groups are represented as direct sums of the original homology groups and suitable finitely generated commutative groups. The Reeb space of a map is the space of all connected components of inverse images of the maps. Reeb spaces inherit fundamental invariants of the manifolds such as homology groups etc. much in simple cases as polyhedra whose dimensions are equal to those of the target spaces. This paper is on a new explicit study of changes of homology groups of Reeb spaces of fold maps by the surgery operations. We present explicit changes obtained by an approach via elementary theory of sequences of numbers and fundamental continuous or differentiable functions.
Ilyas Bakbergenuly, David C. Hoaglin, Elena Kulinskaya
Methods for random-effects meta-analysis require an estimate of the between-study variance, $τ^2$. The performance of estimators of $τ^2$ (measured by bias and coverage) affects their usefulness in assessing heterogeneity of study-level effects, and also the performance of related estimators of the overall effect. For the effect measure log-response-ratio (LRR, also known as the logarithm of the ratio of means, RoM), we review four point estimators of $τ^2$ (the popular methods of DerSimonian-Laird (DL), restricted maximum likelihood, and Mandel and Paule (MP), and the less-familiar method of Jackson), four interval estimators for $τ^2$ (profile likelihood, Q-profile, Biggerstaff and Jackson, and Jackson), five point estimators of the overall effect (the four related to the point estimators of $τ^2$ and an estimator whose weights use only study-level sample sizes), and seven interval estimators for the overall effect (four based on the point estimators for $τ^2$, the Hartung-Knapp-Sidik-Jonkman (HKSJ) interval, a modification of HKSJ that uses the MP estimator of $τ^2$ instead of the DL estimator, and an interval based on the sample-size-weighted estimator). We obtain empirical evidence from extensive simulations of data from lognormal distributions.
Lorenzo Cassi, Agénor Lahatte, Ismael Rafols
et al.
Science policy is increasingly shifting towards an emphasis in societal problems or grand challenges. As a result, new evaluative tools are needed to help assess not only the knowledge production side of research programmes or organisations, but also the articulation of research agendas with societal needs. In this paper, we present an exploratory investigation of science supply and societal needs on the grand challenge of obesity -an emerging health problem with enormous social costs. We illustrate a potential approach that uses topic modelling to explore: (a) how scientific publications can be used to describe existing priorities in science production; (b) how records of questions posed in the European parliament can be used as an instance of mapping discourse of social needs; (c) how the comparison between the two may show (mis)alignments between societal concerns and scientific outputs. While this is a technical exercise, we propose that this type of mapping methods can be useful for informing strategic planning and evaluation in funding agencies.
Bernardino Casas, Neus Català, Ramon Ferrer-i-Cancho
et al.
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from acombination of a standalone bias for low polysemy and a preference for nouns.
In the context of dealing with financial risk management problems it is desirable to have accurate bounds for option prices in situations when pricing formulae do not exist in the closed form. A unified approach for obtaining upper and lower bounds for Asian-type options, including options on VWAP, is proposed in this paper. The bounds obtained are applicable to the continuous and discrete-time frameworks for the case of time-dependent interest rates. Numerical examples are provided to illustrate the accuracy of the bounds.
Dharam Vir Lal, Andrei P. Lobanov, Sergio Jiménez-Monferrer
The Square Kilometre Array (SKA) will be operating at the time when several new large optical, X-ray and Gamma-ray facilities are expected to be working. To make SKA both competitive and complementary to these large facilities, thorough design studies are needed, focused in particular on imaging performance of the array. One of the crucial aspects of such studies is the choice of the array configuration, which affects substantially the resolution, rms noise, sidelobe level and dynamic range achievable with the SKA. We present here a quantitative assessment of the effect of the array configuration on imaging performance of the SKA, introducing the spatial dynamic range (SDR) and a measure of incompleteness of the Fourier domain coverage ($Δu/u$) as prime figures of merit.
Xin-Lin Zhou, Shuang-Nan Zhang, Ding-Xiong Wang
et al.
A calibration is made for the correlation between the X-ray Variability Amplitude (XVA) and Black Hole (BH) mass. The correlation for 21 reverberation-mapped Active Galactic Nuclei (AGN) appears very tight, with an intrinsic dispersion of 0.20 dex. The intrinsic dispersion of 0.27 dex can be obtained if BH masses are estimated from the stellar velocity dispersions. We further test the uncertainties of mass estimates from XVAs for objects which have been observed multiple times with good enough data quality. The results show that the XVAs derived from multiple observations change by a factor of 3. This means that BH mass uncertainty from a single observation is slightly worse than either reverberation-mapping or stellar velocity dispersion measurements; however BH mass estimates with X-ray data only can be more accurate if the mean XVA value from more observations is used. Applying this relation, the BH mass of RE J1034+396 is found to be $4^{+3}_{-2} \times 10^6$ $M_{\odot}$. The high end of the mass range follows the relationship between the 2$f_0$ frequencies of high-frequency QPO and the BH masses derived from the Galactic X-ray binaries. We also calculate the high-frequency constant $C= 2.37 M_\odot$ Hz$^{-1}$ from 21 reverberation-mapped AGN. As suggested by Gierliński et al., $M_{\rm BH}=C/C_{\rm M}$, where $C_{\rm M}$ is the high-frequency variability derived from XVA. Given the similar shape of power-law dominated X-ray spectra in ULXs and AGN, this can be applied to BH mass estimates of ULXs. We discuss the observed QPO frequencies and BH mass estimates in the Ultra-Luminous X-ray source M82 X-1 and NGC 5408 X-1 and favor ULXs as intermediate mass BH systems (abridged).
43024 objects, which were primarily identified as quasars in SDSS DR5 and have spectroscopic redshifts were used to study the luminosity dependence of the quasar clustering with the help of two different techniques. The obtained results reveal that brighter quasars are more clustered, but this dependence is weak, which is in agreement with the results by Porciani & Norberg, 2006 and theoretical predictions by Lidz et al., 2006.