Hasil "Bibliography. Library science. Information resources"

DOAJ Open Access 2026

Wasteback Machine: a method for quantitative measurement of the archived web

David Mahoney

Introduction. Web archives are traditionally viewed as repositories of cultural memory, yet they have been theorised as computational sources for quantitative, longitudinal analysis of the web. This paper examines their potential for mapping the structural and environmental impacts of web pages, demonstrating broader applicability for web analytics research. Method. We introduce Wasteback Machine, an open-source, extensible framework that operationalises the analytical potential of web archives. It enables reproducible, scalable measurement of page size and composition through programmatic access, structured resource extraction and mechanisms to mitigate distortions introduced during archiving and replay. Analysis. The method is demonstrated through a case study of the United Nations Climate Change (UNFCCC) homepage, performing longitudinal analyses to capture temporal dynamics in size and compositional evolution. By situating web content within socio-technical and infrastructural contexts, the approach allows consistent comparison over time while accounting for archival limitations. Results. Findings reveal trends in page growth, complexity and cumulative digital resource use. Despite their fragmentary nature, web archives provide sufficient fidelity to reconstruct historical practices and estimate relative environmental impacts. Conclusion. Wasteback Machine demonstrates that web archives function as computational infrastructures, enabling rigorous, evidence-based investigation of web evolution and the environmental footprint of digital content.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2026

SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Shuaike Shen, Wenduo Cheng, Mingqian Ma et al.

Modern scientific ecosystems are rich in procedural knowledge across repositories, APIs, scripts, notebooks, documentation, databases, and papers, yet much of this knowledge remains fragmented across heterogeneous artifacts that agents cannot readily operationalize. This gap between abundant scientific know-how and usable agent capabilities is a key bottleneck for building effective scientific agents. We present SkillFoundry, a self-evolving framework that converts such resources into validated agent skills, reusable packages that encode task scope, inputs and outputs, execution steps, environment assumptions, provenance, and tests. SkillFoundry organizes a target domain as a domain knowledge tree, mines resources from high-value branches, extracts operational contracts, compiles them into executable skill packages, and then iteratively expands, repairs, merges, or prunes the resulting library through a closed-loop validation process. SkillFoundry produces a substantially novel and internally valid skill library, with 71.1\% of mined skills differing from existing skill libraries such as SkillHub and SkillSMP. We demonstrate that these mined skills improve coding agent performance on five of the six MoSciBench datasets. We further show that SkillFoundry can design new task-specific skills on demand for concrete scientific objectives, and that the resulting skills substantially improve performance on two challenging genomics tasks: cell type annotation and the scDRS workflow. Together, these results show that automatically mined skills improve agent performance on benchmarks and domain-specific tasks, expand coverage beyond hand-crafted skill libraries, and provide a practical foundation for more capable scientific agents.

en cs.AI

Detail Sumber

arXiv Open Access 2026

ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics

Hend Al-Khalifa, Nadia Ghezaiel, Maria Bounnit et al.

The growing importance of culturally-aware natural language processing systems has led to an increasing demand for resources that capture sociopragmatic phenomena across diverse languages. Nevertheless, Arabic-language resources for politeness detection remain under-explored, despite the rich and complex politeness expressions embedded in Arabic communication. In this paper, we introduce ADAB (Arabic Politeness Dataset), a new annotated Arabic dataset collected from four online platforms, including social media, e-commerce, and customer service domains, covering Modern Standard Arabic and multiple dialects (Gulf, Egyptian, Levantine, and Maghrebi). The dataset was annotated based on Arabic linguistic traditions and pragmatic theory, resulting in three classes: polite, impolite, and neutral. It contains 10,000 samples with linguistic feature annotations across 16 politeness categories and achieves substantial inter-annotator agreement (kappa = 0.703). We benchmark 40 model configurations, including traditional machine learning, transformer-based models, and large language models. The dataset aims to support research on politeness-aware Arabic NLP.

en cs.CL

Detail Sumber

DOAJ Open Access 2025

Ethikkodex der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare

VÖB-Arbeitsgruppe Informationsethik

Bibliography. Library science. Information resources

Detail DOI Sumber

DOAJ Open Access 2025

Notas del equipo editorial sobre el vol. 8, núm. 2

Elizabeth Treviño, Víctor Manuel Bañuelos Aquino

Este número de la revista Bibliographica (Instituto de Investigaciones Bibliográficas, unam), en sus secciones Bibliographia, Monographia e Instumenta, recorre objetos y discursos de la cultura escrita, desde incunables y prensa decimonónica hasta grafiti y fanfiction. Los artículos reunidos abordan textos y soportes no convencionales, y replantean el canon desde nuevas perspectivas críticas.

General bibliography, Information resources (General)

Detail DOI Sumber

CrossRef Open Access 2024

Information Support of the Library and Information Sphere: Updating Dissertation Researches

Marina I. Kamysheva

The article considers the domestic dissertation works devoted to the information support of library specialists. On the example of studies conducted in the late 20th and early 21st centuries, the dynamics of development of library and information sciences in the context of information technology development is traced. It shows the increasing demand of scientists and library practitioners for industry-specific information — both full-text and bibliographic. The possibilities of creation and delivery of branch information products to the consumer are analyzed, taking into account social changes, as well as material, resource and technological conditions, which allows us to consider this topic relevant at present.

1 sitasi en

Detail DOI Sumber

DOAJ Open Access 2024

MOOC y otros dispositivos de instrucción en línea de bibliotecas universitarias argentinas a partir de la pandemia de COVID-19

Nancy Blanco, Gabriela De Pedro, Nancy Bentivegna et al.

Se propone una investigación acerca de la participación de las bibliotecas universitarias del ámbito público y privado de Argentina en el uso y la gestión de los cursos en línea masivos y abiertos (MOOC) y de otros dispositivos de instrucción en línea a partir de la pandemia de COVID-19. En función de los resultados obtenidos por el equipo de investigación en un proyecto previo sobre la misma temática, se aplicará un estudio exploratorio, multi metodológico y no experimental con el objetivo de conocer el rol de las bibliotecas universitarias con relación al dictado de MOOC y de otros dispositivos de instrucción en línea en el período de la pandemia y la pospandemia, con énfasis en los programas ofrecidos de forma gratuita y abierta. A su vez, se explorará en qué medida las bibliotecas bajo análisis se encuentran vinculadas con los servicios de tecnología de la universidad y se estudiarán los mecanismos de almacenamiento, difusión y preservación digital de los MOOC y otros dispositivos de instrucción en línea = An investigation is proposed on the participation of public and private university libraries in Argentina on the use and management of massive open online courses (MOOCs) and other online learning devices in the wake of the COVID-19 pandemic. Based on the results obtained by the research team in a previous project on the same topic, an exploratory, multi-methodological and non-experimental study will be applied with the aim of understanding the role of university libraries in relation to the delivery of MOOCs and other online learning devices in the period of the pandemic and post-pandemic, with emphasis on free and open programs. In turn, the extent to which the libraries under analysis are linked to university technology services will be explored, and the mechanisms of storage, dissemination, and digital preservation of MOOCs and other online learning devices will be studied.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2024

OpenChemIE: An Information Extraction Toolkit For Chemistry Literature

Vincent Fan, Yujie Qian, Alex Wang et al.

Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level. OpenChemIE approaches the problem in two steps: extracting relevant information from individual modalities and then integrating the results to obtain a final list of reactions. For the first step, we employ specialized neural models that each address a specific task for chemistry information extraction, such as parsing molecules or reactions from text or figures. We then integrate the information from these modules using chemistry-informed algorithms, allowing for the extraction of fine-grained reaction data from reaction condition and substrate scope investigations. Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69.5%. Additionally, the reaction extraction results of \ours attain an accuracy score of 64.3% when directly compared against the Reaxys chemical database. We provide OpenChemIE freely to the public as an open-source package, as well as through a web interface.

en cs.LG, cs.CL

Detail Sumber

DOAJ Open Access 2023

Medical Data Literacy Education System in Reproducibility Crisis

KONG Xianghui, SUN Pu

[Purpose/Significance] The biomedical research field is suffering from reproducibility crisis, which has become one of important issues under the background of the rise of the data-intensive research paradigm. As one of the most important attributes of scientific research by empirical data study,reproducibility needs to be improved by good data practices of researchers. How to effectively improve the data literacy of researchers has become the key point to solve the crisis. However, the relevant research is basically in the blank condition. The paper aims to establish a new data literacy education system for reproducibility crisis, in order to fill the current research gap and provide reference for implementing the relevant education in our country. [Method/Process] Firstly, the paper clarifies the relationship between reproducibility crisis and data literacy by using content analysis: the inappropriate data behavior of researchers may bring serious problems in many respects, such as research data, methods, process, environments and results, which could eventually lead to the irreproducible research. Then, we redefine the concept of data literacy education. Secondly, based on the summarization of the existing foreign research results and practice, the paper builds the Reproducibility Data Literacy Education (Re-DLE) system from the perspective of educational goals and content, subjects and objects, teaching methods, implementation strategies, and evaluation. At last, it proposes the necessary guarantee factors for the operation of the system. [Results/Conclusions] The ultimate goal of Re-DLE is to improve research reproducibility, bulid the educational content framework on the theory of data life cycle, and divide the main content into three dimensions: re-data awareness, re-data skills, and re-data ethics, each of which includes some clear educational objectives, subject modules and detailed instructions Medical libraries have a wealth of teaching experience and should become the educational main body for the broader biomedical research community. the establishment of diversified training methods, diversified teaching strategies and evaluation methods, in other words, we need to strengthen the team building of teaching librarians, consolidate the educational resources foundation, promote educational exchanges, and improve the internal and external cooperative system, so as to push forward the building of the Re-DLE system. The research results of this paper not only can be seen as a theoretical breakthrough, but also provide the theory basis for the development and implementation of education. In addition, due to the limitation of methods, the paper can be used as a qualitative research, which still has some problems to be solved. In the future work, we need to build more scientific and effective Re-DLE system by using empirical research methods.

Bibliography. Library science. Information resources, Agriculture

Detail DOI Sumber

arXiv Open Access 2023

Shared Information for a Markov Chain on a Tree

Sagnik Bhattacharya, Prakash Narayan

Shared information is a measure of mutual dependence among multiple jointly distributed random variables with finite alphabets. For a Markov chain on a tree with a given joint distribution, we give a new proof of an explicit characterization of shared information. The Markov chain on a tree is shown to possess a global Markov property based on graph separation; this property plays a key role in our proofs. When the underlying joint distribution is not known, we exploit the special form of this characterization to provide a multiarmed bandit algorithm for estimating shared information, and analyze its error performance.

en cs.IT

Detail DOI Sumber

CrossRef Open Access 2022

Library and information science applied studies on collaborative information behavior

Remigiusz Sapa

4 sitasi en

Detail DOI Sumber

arXiv Open Access 2022

Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn't, and Future Directions

Raquib Bin Yousuf, Subhodip Biswas, Kulendra Kumar Kaushal et al.

Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key insights are only available when considering full-text. Although researchers have made significant progress in information extraction from short documents, extraction of scientific entities from full-text scholarly literature remains a challenging problem. This work presents an automated End-to-end Research Entity Extractor called EneRex to extract technical facets such as dataset usage, objective task, method from full-text scholarly research articles. Additionally, we extracted three novel facets, e.g., links to source code, computing resources, programming language/libraries from full-text articles. We demonstrate how EneRex is able to extract key insights and trends from a large-scale dataset in the domain of computer science. We further test our pipeline on multiple datasets and found that the EneRex improves upon a state of the art model. We highlight how the existing datasets are limited in their capacity and how EneRex may fit into an existing knowledge graph. We also present a detailed discussion with pointers for future research. Our code and data are publicly available at https://github.com/DiscoveryAnalyticsCenter/EneRex.

en cs.IR, cs.AI

Detail Sumber

arXiv Open Access 2022

Network science approach for identifying disruptive elements of an airline

Vinod Kumar Chauhan, Anna Ledwoch, Alexandra Brintrup et al.

Currently, flight delays are common and they propagate from an originating flight to connecting flights, leading to large disruptions in the overall schedule. These disruptions cause massive economic losses, affect airlines' reputations, waste passengers' time and money, and directly impact the environment. This study adopts a network science approach for solving the delay propagation problem by modeling and analyzing the flight schedules and historical operational data of an airline. We aim to determine the most disruptive airports, flights, flight-connections, and connection types in an airline network. Disruptive elements are influential or critical entities in an airline network. They are the elements that can either cause (airline schedules) or have caused (historical data) the largest disturbances in the network. An airline can improve its operations by avoiding delays caused by the most disruptive elements. The proposed network science approach for disruptive element analysis was validated using a case study of an operating airline. The analysis indicates that potential disruptive elements in a schedule of an airline are also actual disruptive elements in the historical data and they should be considered to improve operations. The airline network exhibits small-world effects and delays can propagate to any part of the network with a minimum of four delayed flights. Finally, we observed that passenger connections between flights are the most disruptive connection type. Therefore, the proposed methodology provides a tool for airlines to build robust flight schedules that reduce delays and propagation.

en physics.soc-ph, cs.CE

Detail DOI Sumber

DOAJ Open Access 2021

Leveraging Existing Technology: Developing a Trusted Digital Repository for the U.S. Geological Survey

Vivian B. Hutchison, Tamar Norkin, Maddison L. Langseth et al.

As Federal Government agencies in the United States pivot to increase access to scientific data (Sheehan, 2016), the U.S. Geological Survey (USGS) has made substantial progress (Kriesberg et al., 2017). USGS authors are required to make federally funded data publicly available in an approved data repository (USGS, 2016b). This type of public data product, known as a USGS data release, serves as a method for publishing reviewed and approved data. In this paper, we present major milestones in the approach the USGS took to transition an existing technology platform to a Trusted Digital Repository. We describe both the technical and the non-technical actions that contributed to a successful outcome.We highlight how initial workflows revealed patterns that were later automated, and the ways in which assessments and user feedback influenced design and implementation. The paper concludes with lessons learned, such as the importance of a community of practice, application programming interface (API)-driven technologies, iterative development, and user-centered design. This paper is intended to offer a potential roadmap for organizations pursuing similar goals.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2021

A partial information decomposition for discrete and continuous variables

Kyle Schick-Poland, Abdullah Makkeh, Aaron J. Gutknecht et al.

Conceptually, partial information decomposition (PID) is concerned with separating the information contributions several sources hold about a certain target by decomposing the corresponding joint mutual information into contributions such as synergistic, redundant, or unique information. Despite PID conceptually being defined for any type of random variables, so far, PID could only be quantified for the joint mutual information of discrete systems. Recently, a quantification for PID in continuous settings for two or three source variables was introduced. Nonetheless, no ansatz has managed to both quantify PID for more than three variables and cover general measure-theoretic random variables, such as mixed discrete-continuous, or continuous random variables yet. In this work we will propose an information quantity, defining the terms of a PID, which is well-defined for any number or type of source or target random variable. This proposed quantity is tightly related to a recently developed local shared information quantity for discrete random variables based on the idea of shared exclusions. Further, we prove that this newly proposed information-measure fulfills various desirable properties, such as satisfying a set of local PID axioms, invariance under invertible transformations, differentiability with respect to the underlying probability density, and admitting a target chain rule.

en cs.IT, math.LO

Detail Sumber

DOAJ Open Access 2020

Nachruf auf Bruno Bauer

Reimann, Iris

Bibliography. Library science. Information resources, Medicine (General)

Detail DOI Sumber

DOAJ Open Access 2020

Mediewistyka: Średniowiecze, historia, badania – portal internetowy

Andrzej Dąbrówka

Bibliography. Library science. Information resources, History

Detail DOI Sumber

arXiv Open Access 2020

YAM2: Yet another library for the $M_2$ variables using sequential quadratic programming

Chan Beom Park

The $M_2$ variables are devised to extend $M_{T2}$ by promoting transverse masses to Lorentz-invariant ones and making explicit use of on-shell mass relations. Unlike simple kinematic variables such as the invariant mass of visible particles, where the variable definitions directly provide how to calculate them, the calculation of the $M_2$ variables is undertaken by employing numerical algorithms. Essentially, the calculation of $M_2$ corresponds to solving a constrained minimization problem in mathematical optimization, and various numerical methods exist for the task. We find that the sequential quadratic programming method performs very well for the calculation of $M_2$, and its numerical performance is even better than the method implemented in the existing software package for $M_2$. As a consequence of our study, we have developed and released yet another software library, YAM2, for calculating the $M_2$ variables using several numerical algorithms.

en hep-ph, hep-ex

Detail DOI Sumber

DOAJ Open Access 2019

La Unió Europea com a cap i garant del sistema d'avaluació i difusió de la producció científica

Caldera Serrano, Jorge

Es presenta i es desenvolupa un sistema potencial d'avaluació i difusió de l'activitat científica generada a partir de recursos de finançament públics. Aquest sistema té una estructura piramidal, el cap de la qual és la Unió Europea (o l'organisme dependent creat a aquest efecte), de la qual ha de dependre l'avaluació i la difusió dels continguts científics dels estats membres i de les diferents divisions territorials que puguin tenir. Fent nostra la Declaració de San Francisco, el mètode proposat és una alternativa al factor d'impacte com a mètode d'avaluació; en aquest mètode es torna a donar la importància a l'avaluació d'experts i s'incorpora la informació generada amb diners públics a un repositori amb diferents nivells de qualitat.<hr/> The article introduces and explains a model that could be used to assess and disseminate publicly-financed scientific activity more effectively. The pyramidal model would take as its head the European Union or an EU-dependent body charged with appraising the scientific knowledge produced by the EU's member states and their different administrative regions, and making this knowledge more visible. Inspired by the San Francisco Declaration, the model provides an alternative to using journal impact factor as a quality measure by reinstating the peer review process and incorporating the information generated by publicly-financed scientific activity into a repository with different quality levels.

Bibliography. Library science. Information resources, Communication. Mass media

Detail DOI Sumber

DOAJ Open Access 2019

Library Science in the System of Sciences of Noocommunicological Cycle: Terminology Aspect

Shvetsova-Vodka Halyna

The Aim of the Study is to consider descriptions of scientific disciplines, which can be consided as metatheory in relation to library science and to ground qualification of library science as noocommunicological discipline. Methods are applied: systems (structurally-functional) approach, terminology and concept analysis. Well-proven description of library science as information and document-communication science, that is the constituent of social communication complex of sciences. Informatiion science arose up and developed simultaneously as an information theory and theory of communication. Information approach generated the name of science as “informology”, and communication approach - the name «communicology». Social informatics is equality to theory of social information communication and can be adopted as noocommunicology. Noocommunicological approach is built on confession of information nature of social communications which provide functioning of noosphere. Within the limits of noocommunicology distinguish a social information theory, communicatyvistic (science about mass communications), scientific informatics (science about scientific communications) and documentology as a complex of sciences about a document. Within the limits of documentology distinguish such complex sciences, as document science (science about preparation of document), archive science (science about the archived business), bibliology (science about book business or book culture). Library science is one of disciplines of bibliological complex. A library social institute is documentological and informological, as deals with documents which are facilities of maintenance and passed to information in society. Bibliosphere is the sphere of social communication, in which social information transfer in time and in space carried out by creation, distribution, storage and use of book as the special type of document. Results of the Study. Library science is sociohumanity informological noocommunicological documentological bibliological scientific discipline which investigates a library social institute as separate area of bibliosphere, which is a terminal documentarily-communication system. In the system of sciences of noocommunicological cycle library science can be defined as noocommunicological discipline, which studies processes organizations of library business, related to the accumulation, arrangement and organization of the use of knowledge, fixed in the documents of book type. Further researches of terminology aspects of theory of library science can be devoted to the deep analysis each of afore-named scientific discipline.

Bibliography. Library science. Information resources

Detail Sumber

Hasil untuk "Bibliography. Library science. Information resources"