Hasil "Bibliography. Library science. Information resources"

arXiv Open Access 2025

SDSS-IV MaStar: Quantification and Abatement of Interstellar Absorption in the Largest Empirical Stellar Spectral Library

Kate H. R. Rubin, Kyle B. Westfall, Claudia Maraston et al.

We assess the impact of CaII 3934,3969 and NaI 5891,5897 absorption arising in the interstellar medium (ISM) on the SDSS-IV MaNGA Stellar Library (MaStar) and produce corrected spectroscopy for 80% of the 24,162-star catalog. We model the absorption strength of these transitions as a function of stellar distance, Galactic latitude, and dust reddening based upon high-spectral resolution studies. With this model, we identify 6342 MaStar stars that have negligible ISM absorption ($W^\mathrm{ISM}$(CaII K) $<0.07$ Ang and $W^\mathrm{ISM}$(NaI 5891) $<0.05$ Ang). For 12,110 of the remaining stars, we replace their NaI D profile (and their CaII profile for effective temperatures $T_{\rm eff}>9000$ K) with a coadded spectrum of low-ISM stars with similar $T_{\rm eff}$, surface gravity, and metallicity. For 738 additional stars with $T_{\rm eff}>9000$ K, we replace these spectral regions with a matching ATLAS9-based BOSZ model. This results in a mean reduction in $W$(CaII K) ($W$(NaI D)) of $0.4-0.7$ Ang ($0.6-1.1$ Ang) for hot stars ($T_{\rm eff}>7610$ K), and a mean reduction in $W$(NaI D) of $0.1-0.2$ Ang for cooler stars. We show that interstellar absorption in simple stellar population (SSP) model spectra constructed from the original library artificially enhances $W$(CaII K) by $\gtrsim20\%$ at young ages ($<400$ Myr); dramatically enhances the strength of stellar NaI D in starbursting systems (by ${\gtrsim}50\%$); and enhances stellar NaI D in older stellar populations (${\gtrsim}10$ Gyr) by ${\gtrsim}10\%$. We provide SSP spectra constructed from the cleaned library, and discuss the implications of these effects for stellar population synthesis analyses constraining stellar age, [Na/Fe] abundance, and the initial mass function.

en astro-ph.GA

Detail Sumber

arXiv Open Access 2025

Machine Learning-Driven Predictive Resource Management in Complex Science Workflows

Tasnuva Chowdhury, Tadashi Maeno, Fatih Furkan Akman et al.

The collaborative efforts of large communities in science experiments, often comprising thousands of global members, reflect a monumental commitment to exploration and discovery. Recently, advanced and complex data processing has gained increasing importance in science experiments. Data processing workflows typically consist of multiple intricate steps, and the precise specification of resource requirements is crucial for each step to allocate optimal resources for effective processing. Estimating resource requirements in advance is challenging due to a wide range of analysis scenarios, varying skill levels among community members, and the continuously increasing spectrum of computing options. One practical approach to mitigate these challenges involves initially processing a subset of each step to measure precise resource utilization from actual processing profiles before completing the entire step. While this two-staged approach enables processing on optimal resources for most of the workflow, it has drawbacks such as initial inaccuracies leading to potential failures and suboptimal resource usage, along with overhead from waiting for initial processing completion, which is critical for fast-turnaround analyses. In this context, our study introduces a novel pipeline of machine learning models within a comprehensive workflow management system, the Production and Distributed Analysis (PanDA) system. These models employ advanced machine learning techniques to predict key resource requirements, overcoming challenges posed by limited upfront knowledge of characteristics at each step. Accurate forecasts of resource requirements enable informed and proactive decision-making in workflow management, enhancing the efficiency of handling diverse, complex workflows across heterogeneous resources.

en cs.DC, cs.AI

Detail DOI Sumber

arXiv Open Access 2025

On the use of information fusion techniques to improve information quality: Taxonomy, opportunities and challenges

Raúl Gutiérrez, Víctor Rampérez, Horacio Paggi et al.

The information fusion field has recently been attracting a lot of interest within the scientific community, as it provides, through the combination of different sources of heterogeneous information, a fuller and/or more precise understanding of the real world than can be gained considering the above sources separately. One of the fundamental aims of computer systems, and especially decision support systems, is to assure that the quality of the information they process is high. There are many different approaches for this purpose, including information fusion. Information fusion is currently one of the most promising methods. It is particularly useful under circumstances where quality might be compromised, for example, either intrinsically due to imperfect information (vagueness, uncertainty) or because of limited resources (energy, time). In response to this goal, a wide range of research has been undertaken over recent years. To date, the literature reviews in this field have focused on problem-specific issues and have been circumscribed to certain system types. Therefore, there is no holistic and systematic knowledge of the state of the art to help establish the steps to be taken in the future. In particular, aspects like what impact different information fusion methods have on information quality, how information quality is characterised, measured and evaluated in different application domains depending on the problem data type or whether fusion is designed as a flexible process capable of adapting to changing system circumstances and their intrinsically limited resources have not been addressed. This paper aims precisely to review the literature on research into the use of information fusion techniques specifically to improve information quality, analysing the above issues in order to identify a series of challenges and research directions, which are presented in this paper.

en cs.IT

Detail DOI Sumber

arXiv Open Access 2024

Information-theoretic Analysis of the Gibbs Algorithm: An Individual Sample Approach

Youheng Zhu, Yuheng Bu

Recent progress has shown that the generalization error of the Gibbs algorithm can be exactly characterized using the symmetrized KL information between the learned hypothesis and the entire training dataset. However, evaluating such a characterization is cumbersome, as it involves a high-dimensional information measure. In this paper, we address this issue by considering individual sample information measures within the Gibbs algorithm. Our main contribution lies in establishing the asymptotic equivalence between the sum of symmetrized KL information between the output hypothesis and individual samples and that between the hypothesis and the entire dataset. We prove this by providing explicit expressions for the gap between these measures in the non-asymptotic regime. Additionally, we characterize the asymptotic behavior of various information measures in the context of the Gibbs algorithm, leading to tighter generalization error bounds. An illustrative example is provided to verify our theoretical results, demonstrating our analysis holds in broader settings.

en cs.IT

Detail Sumber

arXiv Open Access 2024

PyMatterSim: a Python Data Analysis Library for Computer Simulations of Materials Science, Physics, Chemistry, and Beyond

Y. -C. Hu, J. Tian

Computer simulation has become one of the most important tools in scientific research in many disciplines. Benefiting from the dynamical trajectories regulated by versatile interatomic interactions, various material properties can be quantitatively characterized at the atomic scale. This greatly deepens our understanding of Nature and provides incredible insights supplementing experimental observations. Hitherto, a plethora of literature discusses the computational discoveries in studying glasses in which positional disorder is inherent in their configurations. Motivated by active research and knowledge sharing, we developed a data analysis library in Python for computational materials science research. We hope to help promote scientific progress and narrow some technical gaps for the wide communities. The toolkit mainly focuses on physical analyses of glassy properties from the open-source simulator LAMMPS. Nevertheless, the code design renders high flexibility, with functionalities extendable to other computational tools. The library provides data-driven insights for different subjects and can be incorporated into advanced machine-learning workflows. The scope of the data analysis methodologies applies not only to materials science but also to physics, chemistry, and beyond.

en cond-mat.mtrl-sci, cond-mat.soft

Detail Sumber

arXiv Open Access 2024

Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries

João Eduardo Montandon, Luciana Lourdes Silva, Cristiano Politowski et al.

Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks -- Scikit Learn, NumPy, and Pandas -- studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while only 0.13% of NumPy clients are impacted. The main reason for introducing DABCs is to enhance API maintainability, but they often change the function's behavior. We discuss the importance of managing DABCs in third-party DS libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.

en cs.SE

Detail Sumber

DOAJ Open Access 2022

The Effectiveness of Arabic Stemmers Using Arabized Word Removal

Hamood ALshalabi, Sabrina Tiun, Nazlia Omar et al.

<p>Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are known as Arabised words. Arabised words affect the Arabic natural language processing (NLP) task because identifying a correct stem or root from an Arabic word becomes more difficult. Therefore, a more efficient Arabic NLP can be developed if Arabised word removal is part of a pre-processing task. In this paper, we propose an algorithm for detecting and extracting Arabised words as a pre-processing task for an Arabic stemming task. This algorithm is a combination of lexicon-based and rule-based approaches. The lexicon list has been developed based on various sources of Arabic text resources, and the rule-based algorithm has been designed to cater to Arabised words with definite articles and use pattern matching on prefixes and suffixes. To evaluate the effectiveness of the proposed Arabised word removal algorithm on the Arabic NLP task, we use Arabised word removal as part of pre-processing in Arabic stemmers. Three Arabic stemmers are used in our evaluation, namely, light stemming, condition light and ARLS, on three types of Arabic standard datasets. Comparisons were made by measuring the performance of precision, recall and IFC on the stemmers with or without our Arabised word removal pre-processing. Results show that the performance on all the stemmers improves if Arabised word removal is included as part of the stemming's pre-processing. Therefore, an efficient Arabic NLP application or task can be developed if Arabised word removal is included in the pre-processing stage for Arabic NLP application, mainly Arabic stemming.</p><p>https://dorl.net/dor/20.1001.1.20088302.2022.20.4.6.5</p>

Information resources (General), Transportation and communications

Detail Sumber

DOAJ Open Access 2022

Archivierung persönlicher digitaler Unterlagen

Achim Oßwald, Martin Iordanidis

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2022

Independent component analysis in the light of Information Geometry

Jean-François Cardoso

I recall my first encounter with Professor Shun-ichi Amari who, once upon a time in Las Vegas, gave me a precious hint about connecting Independent Component Analysis (ICA) to Information Geometry. The paper sketches, rather informally, some of the insights gained in following this lead.

en cs.IT, eess.SP

Detail DOI Sumber

DOAJ Open Access 2020

Revistas iberoamericanas de comunicación a través de las bases de datos Latindex, Dialnet, DOAJ, Scopus, AHCI, SSCI, REDIB, MIAR, ESCI y Google Scholar Metrics

Rafael Gonzalez-Pardo, Rafael Repiso, Jesús Arroyave-Cabrera

El presente trabajo es un análisis de las revistas de Comunicación iberoamericanas, sus características y presencias en Latindex, Dialnet, DOAJ, Scopus, AHCI, SSCI, REDIB, MIAR, ESCI y Google Scholar Metrics - GSM. Se analizan las revistas presentes en estos productos, comparando su cobertura, características, representatividad por países, periodicidad de las cabeceras, antigüedad y producción medida en artículos. El objetivo de este trabajo es identificar las revistas científicas de Comunicación en el ámbito iberoamericano para posteriormente estudiar su presencia en las principales bases de datos de revistas. Se analizan elementos sustanciales como la nacionalidad, naturaleza de las instituciones editoras, periodicidad, producción y antigüedad. Se encuentra que la mayor parte de las revistas provienen de instituciones educativas y con una periodicidad semestral. Latindex es la que mayor número de publicaciones tiene seguida por GSM. Las revistas que pertenecen SSCI y Scopus, son las que mayoritariamente están presentes en las bases de datos de revistas.

Bibliography. Library science. Information resources

Detail DOI Sumber

DOAJ Open Access 2020

Eulogy for the Information Age

David Lankes

This is the text of a 2017 address to the ALIA New Librarians Symposium 8. The author takes on the concepts and language associated with an information approach to librarianship. Particular attention is put in examining the role of libraries as an answer to access issues instead of an agent of impact. The Data, Information, Knowledge, Wisdom hierarchy is used to focus the role of librarians on impact and knowledge over data and information. This shift from industrial scale provision of materials to information to knowledge requires a new approach to the study of librarians and library service. A knowledge focus – what has been referred to as the knowledge school of thought in librarianship-necessitates the use theory of learning, in particular, constructivist concepts of learning.

Bibliography. Library science. Information resources

Detail DOI Sumber

DOAJ Open Access 2020

Aplicación y evaluación del modelo MM5 para pronóstico de lluvia y temperatura en Chihuahua, México

Daniel Núñez-López, Víctor Manuel Reyes-Gómez, Óscar Alejandro Viramontes-Olivas et al.

En Chihuahua no existe un modelo local que permita pronosticar a corto plazo las posibilidades extremas de lluvia y temperatura, que considere los procesos atmosféricos de meso-escala. El objetivo del presente estudio fue adaptar y probar la eficiencia del modelo MM5 para predecir condiciones extremas de temperatura y lluvia en Chihuahua. Se integró un sistema de pronóstico que opera en una estación de trabajo donde se pueden generar mapas de pronóstico de lluvia y temperatura cada hora, hasta por dos días, para todo el estado de Chihuahua sobre una malla con resolución de 8 km. Las pruebas visuales muestran que el MM5 acierta en más del 90% de los eventos pluviométricos y puede estimar correctamente la lámina de lluvia y las temperaturas extremas en alrededor del 80% de los pronósticos. Para lluvia, el modelo puede subestimar en la región de planicies y valles de Chihuahua y predecir correctamente en la zona serrana; en tanto que para temperaturas máxima y mínima (respectivamente medidas a las 15:00 y 7:00), en general sobreestima entre 0.75 y 2.14°C. Las pruebas estadísticas de eficiencia del modelo para pronóstico de lluvia y temperaturas extremas, muestran valores significativos para poderlo utilizar con alto grado de confianza en Chihuahua (valores de eficiencia del modelo superiores a 0.57). Abstract In Chihuahua there is no local model to forecast short-term extreme rainfall and temperature events, considered as atmospheric meso-scale. This study was designed to adapt and evaluate the efficiency of the model MM5 to predict extremes temperature and rain in Chihuahua. A forecast system was programmed to operate from a workstation where rain and temperature forecast maps were generated every hour, for two days, for the State of Chihuahua on a mesh with 8 km resolution. Visual evidence shows that the MM5 was successful in more than 90 % of the rain events and can forecast correctly the rainfall and extreme temperatures in about 80 % of the times. With respect to rain, the model may underestimate the precipitation on the plains and valleys of Chihuahua and predicts in correctly in the mountains region. For minimum and maximum temperatures (respectively, measured at 7:00 and 15:00 hrs.), the model generally overestimates these values for up to forecast rain and extreme temperatures, show that this model can be used with a high degree of confidence in Chihuahua (the model efficiency exceeded 0.57 values). Keywords: Mesoscale meteorological model, local weather forecast, high resolution, model validation

Information resources (General)

Detail Sumber

arXiv Open Access 2020

Stellar Tidal Disruption Events with Abundances and Realistic Structures (STARS): Library of Fallback Rates

Jamie A. P. Law-Smith, David A. Coulter, James Guillochon et al.

We present the STARS library, a grid of tidal disruption event (TDE) simulations interpolated to provide the mass fallback rate ($dM/dt$) to the black hole for a main-sequence star of any stellar mass, stellar age, and impact parameter. We use a one-dimensional stellar evolution code to construct stars with accurate stellar structures and chemical abundances, then perform tidal disruption simulations in a three-dimensional adaptive-mesh hydrodynamics code with a Helmholtz equation of state, in unprecedented resolution: from 131 to 524 cells across the diameter of the star. The interpolated library of fallback rates is available on GitHub (https://github.com/jamielaw-smith/STARS_library) and version 1.0.0 is archived on Zenodo; one can query the library for any stellar mass, stellar age, and impact parameter. We provide new fitting formulae for important disruption quantities ($β_{\rm crit}, ΔM, \dot M_{\rm peak}, t_{\rm peak}, n_\infty$) as a function of stellar mass, stellar age, and impact parameter. Each of these quantities vary significantly with stellar mass and stellar age, but we are able to reduce all of our simulations to a single relationship that depends only on stellar structure, characterized by a single parameter $ρ_c/\barρ$, and impact parameter $β$. We also find that, in general, more centrally concentrated stars have steeper $dM/dt$ rise slopes and shallower decay slopes. For the same $ΔM$, the $dM/dt$ shape varies significantly with stellar mass, promising the potential determination of stellar properties from the TDE light curve alone. The $dM/dt$ shape depends strongly on stellar structure and to a certain extent stellar mass, meaning that fitting TDEs using this library offers a better opportunity to determine the nature of the disrupted star and the black hole.

en astro-ph.HE, astro-ph.SR

Detail DOI Sumber

arXiv Open Access 2020

Resilience and elasticity of co-evolving information ecosystems

María J. Palazzi, Albert Solé-Ribalta, Violeta Calleja-Solanas et al.

Human perceptual and cognitive abilities are limited resources. Today, in the age of cheap information --cheap to produce, to manipulate, to disseminate--, this cognitive bottleneck translates into hypercompetition for visibility among actors (individuals, institutions, etc). The same social communication incentive --visibility-- pushes actors to mutualistically interact with specific memes, seeking the virality of their messages. In turn, contents are driven by selective pressure, i.e. the chances to persist and reach widely are tightly subject to changes in the communication environment. In spite of all this complexity, here we show that the underlying architecture of the users-memes interaction in information ecosystems, apparently chaotic and noisy, actually evolves towards emergent patterns, reminiscent of those found in natural ecosystems. In particular we show, through the analysis of empirical, large data streams, that communication networks are structurally elastic, i.e. fluctuating from modular to nested architecture as a response to environmental perturbations (e.g. extraordinary events). We then propose an ecology-inspired modelling framework, bringing to light the precise mechanisms causing the observed dynamical reorganisation. Finally, from numerical simulations, the model predicts --and the data confirm-- that the users' struggle for visibility induces a re-equilibration of the network towards a very constrained organisation: the emergence of self-similar nested arrangements.

en physics.soc-ph, cs.SI

Detail Sumber

arXiv Open Access 2019

Non-Stochastic Information Theory

Anshuka Rangi, Massimo Franceschetti

In an effort to develop the foundations for a non-stochastic theory of information, the notion of $δ$-mutual information between uncertain variables is introduced as a generalization of Nair's non-stochastic information functional. Several properties of this new quantity are illustrated, and used to prove a channel coding theorem in a non-stochastic setting. Namely, it is shown that the largest $δ$-mutual information between received and transmitted codewords over $ε$-noise channels equals the $(ε, δ)$-capacity. This notion of capacity generalizes the Kolmogorov $ε$-capacity to packing sets of overlap at most $δ$, and is a variation of a previous definition proposed by one of the authors. Results are then extended to more general noise models, and to non-stochastic, memoryless, stationary channels. Finally, sufficient conditions are established for the factorization of the $δ$-mutual information and to obtain a single letter capacity expression. Compared to previous non-stochastic approaches, the presented theory admits the possibility of decoding errors as in Shannon's probabilistic setting, while retaining a worst-case, non-stochastic character.

en cs.IT

Detail Sumber

DOAJ Open Access 2018

Appropriation of Information, Knowledge Construction and the Role of Mediator

Heloá Cristina Oliveira-DelMassa, Oswaldo Francisco Almeida Junior

Knowledge construction is a complex process in which the widespread use of the term information is not, by itself, sufficient to cover its nuances. This leads to a host of discussions of the importance of appropriation of information, which is a key concept for understanding the mediation of information. This study aims to explore the following questions: What is the relation between appropriation of information and knowledge construction? Would mediation be the way to these terms association? The overall purpose of this article is to assess the links between appropriation of information and knowledge construction. Knowledge construction, interaction between subject and appropriation of information are explored themes from bibliographical researches. The results obtained could clarify and bring the importance and therefore outlining the mediating posture of the information professional, as well as underlining some aspects of discussions on characterization of information.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2018

Stability of Local Information based Centrality Measurements under Degree Preserving Randomizations

Chandni Saxena, M. N. Doja, Tanvir Ahmad

Node centrality is one of the integral measures in network analysis with wide range of applications from socio-economic to personalized recommendation. We argue that an effective centrality measure should undertake stability even under information loss or noise introduced in the network. With six local information based centrality metric, we investigate the effect of varying assortativity while keeping degree distribution unchanged, using networks with scale free and exponential degree distribution. This model provides a novel scope to analyze stability of centrality metric which can further finds many applications in social science, biology, information science, community detection and so on.

en cs.SI, physics.soc-ph

Detail Sumber

arXiv Open Access 2018

Noisy Private Information Retrieval: On Separability of Channel Coding and Information Retrieval

Karim Banawan, Sennur Ulukus

We consider the problem of noisy private information retrieval (NPIR) from $N$ non-communicating databases, each storing the same set of $M$ messages. In this model, the answer strings are not returned through noiseless bit pipes, but rather through \emph{noisy} memoryless channels. We aim at characterizing the PIR capacity for this model as a function of the statistical information measures of the noisy channels such as entropy and mutual information. We derive a general upper bound for the retrieval rate in the form of a max-min optimization. We use the achievable schemes for the PIR problem under asymmetric traffic constraints and random coding arguments to derive a general lower bound for the retrieval rate. The upper and lower bounds match for $M=2$ and $M=3$, for any $N$, and any noisy channel. The results imply that separation between channel coding and retrieval is optimal except for adapting the traffic ratio from the databases. We refer to this as \emph{almost separation}. Next, we consider the private information retrieval problem from multiple access channels (MAC-PIR). In MAC-PIR, the database responses reach the user through a multiple access channel (MAC) that mixes the responses together in a stochastic way. We show that for the additive MAC and the conjunction/disjunction MAC, channel coding and retrieval scheme are \emph{inseparable} unlike in NPIR. We show that the retrieval scheme depends on the properties of the MAC, in particular on the linearity aspect. For both cases, we provide schemes that achieve the full capacity without any loss due to the privacy constraint, which implies that the user can exploit the nature of the channel to improve privacy. Finally, we show that the full unconstrained capacity is not always attainable by determining the capacity of the selection channel.

en cs.IT, cs.CR

Detail Sumber

DOAJ Open Access 2017

Nathaniel Tkacz, Wikipedia and the Politics of Openness

David Mark Purdy

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2017

Shared High Value Research Resources: The CamCAN Human Lifespan Neuroimaging Dataset Processed on the Open Science Grid

Don Krieger, Paul Shepard, Ben Zusman et al.

The CamCAN Lifespan Neuroimaging Dataset, Cambridge (UK) Centre for Ageing and Neuroscience, was acquired and processed beginning in December, 2016. The referee consensus solver deployed to the Open Science Grid was used for this task. The dataset includes demographic and screening measures, a high-resolution MRI scan of the brain, and whole-head magnetoencephalographic (MEG) recordings during eyes closed rest (560 sec), a simple task (540 sec), and passive listening/viewing (140 sec). The data were collected from 619 neurologically normal individuals, ages 18-87. The processed results from the resting recordings are completed and available online. These constitute 1.7 TBytes of data including the location within the brain (1 mm resolution), time stamp (1 msec resolution), and 80 msec time course for each of 3.7 billion validated neuroelectric events, i.e. mean 6.1 million events for each of the 619 participants. The referee consensus solver provides high yield (mean 11,000 neuroelectric currents/sec; standard deviation (sd): 3500/sec) high confidence (p < 10-12 for each identified current) measures of the neuroelectric currents whose magnetic fields are detected in the MEG recordings. We describe the solver, the implementation of the solver deployed on the Open Science Grid, the workflow management system, the opportunistic use of high performance computing (HPC) resources to add computing capacity to the Open Science Grid reserved for this project, and our initial findings from the recently completed processing of the resting recordings. This required 14 million core hours, i.e. 40 core hours per second of data.

en q-bio.NC, cs.DC

Detail Sumber

Hasil untuk "Bibliography. Library science. Information resources"