Hasil "Bibliography. Library science. Information resources"

DOAJ Open Access 2026

Repository renewal project: a case study from White Rose Libraries

Thom Blake, Andy Bussey, Kate Petherbridge

White Rose Libraries (WRL) is a collaboration between the university libraries of Leeds, Sheffield and York. WRL runs two shared repositories to host research outputs and electronic theses which, between them, represent one of the biggest institutional repository services in the UK. Starting from 2021 WRL undertook the repositories renewal project to look for the next iteration of the repository platforms. In this case study we discuss the motivation for the repositories renewal project and the process that was undertaken. We talk about outcomes of the project – which was a conscious decision to retain and further develop the open source EPrints platform – and highlight some of the lessons learned from the project. We reflect that the open source research repository market has remained largely static for some time and intend this as a provocation for institutions and platform developers to further engage in defining the requirements for future repositories and setting a course to get there.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2025

LEAP: LLM-powered End-to-end Automatic Library for Processing Social Science Queries on Unstructured Data

Chuxuan Hu, Austin Peters, Daniel Kang

Social scientists are increasingly interested in analyzing the semantic information (e.g., emotion) of unstructured data (e.g., Tweets), where the semantic information is not natively present. Performing this analysis in a cost-efficient manner requires using machine learning (ML) models to extract the semantic information and subsequently analyze the now structured data. However, this process remains challenging for domain experts. To demonstrate the challenges in social science analytics, we collect a dataset, QUIET-ML, of 120 real-world social science queries in natural language and their ground truth answers. Existing systems struggle with these queries since (1) they require selecting and applying ML models, and (2) more than a quarter of these queries are vague, making standard tools like natural language to SQL systems unsuited. To address these issues, we develop LEAP, an end-to-end library that answers social science queries in natural language with ML. LEAP filters vague queries to ensure that the answers are deterministic and selects from internally supported and user-defined ML functions to extend the unstructured data to structured tables with necessary annotations. LEAP further generates and executes code to respond to these natural language queries. LEAP achieves a 100% pass @ 3 and 92% pass @ 1 on QUIET-ML, with a \$1.06 average end-to-end cost, of which code generation costs \$0.02.

en cs.DB

Detail DOI Sumber

arXiv Open Access 2025

Are Information Retrieval Approaches Good at Harmonising Longitudinal Survey Questions in Social Science?

Wing Yan Li, Zeqiang Wang, Jon Johnson et al.

Automated detection of semantically equivalent questions in longitudinal social science surveys is crucial for long-term studies informing empirical research in the social, economic, and health sciences. Retrieving equivalent questions faces dual challenges: inconsistent representation of theoretical constructs (i.e. concept/sub-concept) across studies as well as between question and response options, and the evolution of vocabulary and structure in longitudinal text. To address these challenges, our multi-disciplinary collaboration of computer scientists and survey specialists presents a new information retrieval (IR) task of identifying concept (e.g. Housing, Job, etc.) equivalence across question and response options to harmonise longitudinal population studies. This paper investigates multiple unsupervised approaches on a survey dataset spanning 1946-2020, including probabilistic models, linear probing of language models, and pre-trained neural networks specialised for IR. We show that IR-specialised neural models achieve the highest overall performance with other approaches performing comparably. Additionally, the re-ranking of the probabilistic model's results with neural models only introduces modest improvements of 0.07 at most in F1-score. Qualitative post-hoc evaluation by survey specialists shows that models generally have a low sensitivity to questions with high lexical overlap, particularly in cases where sub-concepts are mismatched. Altogether, our analysis serves to further research on harmonising longitudinal studies in social science.

en cs.CL, cs.IR

Detail Sumber

DOAJ Open Access 2025

The Avaliação dos Atributos dos Programas de Compliance para o desenvolvimento do Sistema Blockchain no Contexto das organizações

Henrique Rodrigues Lelis, Daniel Jardim Pardini, Eloy Pereira Lemos Junior

Compliance programs have legal, administrative and technological attributes that help organizations find solutions related to strategy, management and organizational governance. In turn, blockchain has been described as a digital system with potential for use in numerous activities, as any activity whose function is to protect and transfer digital assets can be impacted by the system. However, there are criticisms and reservations regarding its adoption by organizations, especially regarding issues related to the regulatory framework, corporate governance and technological management. From this perspective, it becomes relevant to relate the attributes of compliance programs to the development of blockchain in the organizational dimension, which is the proposal of this research. The gap explored with this research is to describe the implications that the attributes of compliance programs can bring to the development of blockchain technology, in the context of organizations. To explore the topic, a panel of experts and a Delphi round were created to structure a survey that sought evidence that demonstrates the existence or not of contributions from compliance programs to the development of the blockchain. This article presents the results relating to the organizational dimension of the doctoral thesis “Attributes of Compliance Programs for the blockchain, in the context of the Dimensions of the State, Organization and Individual”, defended by the first author, in the Doctoral program in Information and Management Systems of Knowledge at Universidade Fumec, with UNIVERSIDADE FUMEC and FAPEMIG as funding institutions.

Social sciences (General), Bibliography. Library science. Information resources

Detail Sumber

arXiv Open Access 2024

Mining Weighted Sequential Patterns in Incremental Uncertain Databases

Kashob Kumar Roy, Md Hasibul Haque Moon, Md Mahmudur Rahman et al.

Due to the rapid development of science and technology, the importance of imprecise, noisy, and uncertain data is increasing at an exponential rate. Thus, mining patterns in uncertain databases have drawn the attention of researchers. Moreover, frequent sequences of items from these databases need to be discovered for meaningful knowledge with great impact. In many real cases, weights of items and patterns are introduced to find interesting sequences as a measure of importance. Hence, a constraint of weight needs to be handled while mining sequential patterns. Besides, due to the dynamic nature of databases, mining important information has become more challenging. Instead of mining patterns from scratch after each increment, incremental mining algorithms utilize previously mined information to update the result immediately. Several algorithms exist to mine frequent patterns and weighted sequences from incremental databases. However, these algorithms are confined to mine the precise ones. Therefore, we have developed an algorithm to mine frequent sequences in an uncertain database in this work. Furthermore, we have proposed two new techniques for mining when the database is incremental. Extensive experiments have been conducted for performance evaluation. The analysis showed the efficiency of our proposed framework.

en cs.DB, cs.AI

Detail Sumber

arXiv Open Access 2023

A Logarithmic Decomposition for Information

Keenan J. A. Down, Pedro A. M. Mediano

The Shannon entropy of a random variable $X$ has much behaviour analogous to a signed measure. Previous work has concretized this connection by defining a signed measure $μ$ on an abstract information space $\tilde{X}$, which is taken to represent the information that $X$ contains. This construction is sufficient to derive many measure-theoretical counterparts to information quantities such as the mutual information $I(X; Y) = μ(\tilde{X} \cap \tilde{Y})$, the joint entropy $H(X,Y) = μ(\tilde{X} \cup \tilde{Y})$, and the conditional entropy $H(X|Y) = μ(\tilde{X}\, \setminus \, \tilde{Y})$. We demonstrate that there exists a much finer decomposition with intuitive properties which we call the logarithmic decomposition (LD). We show that this signed measure space has the useful property that its logarithmic atoms are easily characterised with negative or positive entropy, while also being coherent with Yeung's $I$-measure. We present the usability of our approach by re-examining the Gács-Körner common information from this new geometric perspective and characterising it in terms of our logarithmic atoms. We then highlight that our geometric refinement can account for an entire class of information quantities, which we call logarithmically decomposable quantities.

en cs.IT

Detail Sumber

arXiv Open Access 2023

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Catherine Gitau, VUkosi Marivate

In this work we investigate the impact of applying textual data augmentation tasks to low resource machine translation. There has been recent interest in investigating approaches for training systems for languages with limited resources and one popular approach is the use of data augmentation techniques. Data augmentation aims to increase the quantity of data that is available to train the system. In machine translation, majority of the language pairs around the world are considered low resource because they have little parallel data available and the quality of neural machine translation (NMT) systems depend a lot on the availability of sizable parallel corpora. We study and apply three simple data augmentation techniques popularly used in text classification tasks; synonym replacement, random insertion and contextual data augmentation and compare their performance with baseline neural machine translation for English-Swahili (En-Sw) datasets. We also present results in BLEU, ChrF and Meteor scores. Overall, the contextual data augmentation technique shows some improvements both in the $EN \rightarrow SW$ and $SW \rightarrow EN$ directions. We see that there is potential to use these methods in neural machine translation when more extensive experiments are done with diverse datasets.

en cs.CL

Detail Sumber

DOAJ Open Access 2023

Smart City Planning Futures Studies

Mohammadreza Hasanpour

The present study was conducted to identify the future dimensions of smart city planning research. Participants in this study were municipal managers and urban planners with at least 15 years of experience and a master's degree or higher. Individuals were selected by purposive sampling. Sampling was performed with the participation of 10 experts. Data collection tools fell into two groups: 1- review and upstream documents, urban planning documents in the library section, 2- semi-structured interview in the field section where the semi-structured interview with the participants continued until the theoretical saturation stage. Content analysis method was used to analyze the qualitative data. In order to ensure the validity, the interview questions were approved by 3 experienced urban planning experts and managers, 1 of whom had a master's degree and 2 of whom had a doctorate. In order to measure the reliability, the krippendorf coefficient was used, the overall coefficient of which was 84%. ATLASTI software has been used in the content analysis section. In order to identify future smart city planning research scenarios, SCENARIOWIZARD software has been used. The results of factor analysis show that out of 176 available indicators (items), 33 basic themes can be identified and 9 categories of constructive themes have been obtained. Finally, 9 scenarios were identified based on the importance of all 9 factors. The results indicate that the main output of the realization of smart cities and e-municipality is to set conditions for providing services in the healthiest way to citizens, eliminating corruption, creating new job opportunities, and service and transformation in the economic and commercial sectors, increasing the effective presence of the private sector and improving the business environment, reducing damage to the environment, smart governance and increasing satisfaction

Bibliography. Library science. Information resources, Communication. Mass media

Detail DOI Sumber

DOAJ Open Access 2023

Strategi Pengembangan Sumber Daya Perpustakaan Pusat Universitas Pendidikan Indonesia Melalui Kerja Sama Perpustakaan IAIN Salatiga

Rafi Helmi Rabani, Prijana Prijana

Problems in libraries can be overcome one way by collaborating between libraries. Collaboration can be carried out by any type of library, including state university libraries. College libraries have the same functions as libraries in general and the functions included in the Tri Dharma of Higher Education include education, research and community service. The development of library resources at the Central Library of the Indonesian Education University is carried out in collaboration with the Salatiga State Islamic Institute Library. This research aims to find out what cooperation looks like, how cooperation is carried out, and what are the challenges in carrying out collaboration between the two libraries. The research method used is a qualitative research method and data collection techniques in the form of interviews with the Head of the Library Services Division of the UPI Central Library. The research results show that the cooperation carried out is cooperation in the field of information services and is bound by a memorandum of agreement with six scopes of cooperation. Several challenges were found, namely in terms of busyness of library managers, permission from the head of the library, and budget in carrying out collaboration. There are several scopes that cannot be realized in the implementation of cooperation. This cannot be used as a barrier to working together, but is a motivation for both parties because the priority is service to users.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2022

How Do Data Science Workers Communicate Intermediate Results?

Rock Yuren Pang, Ruotong Wang, Joely Nelson et al.

Data science workers increasingly collaborate on large-scale projects before communicating insights to a broader audience in the form of visualization. While prior work has modeled how data science teams, oftentimes with distinct roles and work processes, communicate knowledge to outside stakeholders, we have little knowledge of how data science workers communicate intermediately before delivering the final products. In this work, we contribute a nuanced description of the intermediate communication process within data science teams. By analyzing interview data with 8 self-identified data science workers, we characterized the data science intermediate communication process with four factors, including the types of audience, communication goals, shared artifacts, and mode of communication. We also identified overarching challenges in the current communication process. We also discussed design implications that might inform better tools that facilitate intermediate communication within data science teams.

en cs.HC

Detail Sumber

arXiv Open Access 2022

The Information Bottleneck Principle in Corporate Hierarchies

Cameron Gordon

The hierarchical nature of corporate information processing is a topic of great interest in economic and management literature. Firms are characterised by a need to make complex decisions, often aggregating partial and uncertain information, which greatly exceeds the attention capacity of constituent individuals. However, the efficient transmission of these signals is still not fully understood. Recently, the information bottleneck principle has emerged as a powerful tool for understanding the transmission of relevant information through intermediate levels in a hierarchical structure. In this paper we note that the information bottleneck principle may similarly be applied directly to corporate hierarchies. In doing so we provide a bridge between organisation theory and that of rapidly expanding work in deep neural networks (DNNs), including the use of skip connections as a means of more efficient transmission of information in hierarchical organisations.

en cs.SI, econ.TH

Detail Sumber

arXiv Open Access 2021

Mutual Information for Electromagnetic Information Theory Based on Random Fields

Zhongzhichao Wan, Jieao Zhu, Zijian Zhang et al.

Traditional channel capacity based on the discrete spatial dimensions mismatches the continuous electromagnetic fields. For the wireless communication system in a limited region, the spatial discretization may results in information loss because the continuous field can not be perfectly recovered from the sampling points. Therefore, electromagnetic information theory based on spatially continuous electromagnetic fields becomes necessary to reveal the fundamental theoretical capacity bound of communication systems. In this paper, we propose analyzing schemes for the performance limit between continuous transceivers. Specifically, we model the communication process between two continuous regions by random fields. Then, for the white noise model, we use Mercer expansion to derive the mutual information between the source and the destination. For the close-form expression, an analytic method is introduced based on autocorrelation functions with rational spectrum. Moreover, the Fredholm determinant is used for the general autocorrelation functions to provide the numerical calculation scheme. Further works extend the white noise model to colored noise and discuss the mutual information under it. Finally, we build an ideal model with infinite-length source and destination which shows a strong correpsondence with the time-domain model in classical information theory. The mutual information and the capacity are derived through the spatial spectral density.

en cs.IT

Detail Sumber

DOAJ Open Access 2021

Editorial Board and Table of Contents

Aušra Navickienė

Bibliography. Library science. Information resources

Detail Sumber

DOAJ Open Access 2020

Machine Readable Race: Constructing Racial Information in the Third Reich

Munn Luke

This paper examines how informational processing drove new structures of racial classification in the Third Reich. The Deutsche Hollerith-Maschinen Gesellschaft mbH (Dehomag) worked closely with the government in designing and integrating punch-card informational systems. As a German subsidiary of IBM, Dehomag’s technology was deployed initially for a census in order to provide a more detailed racial analysis of the population. However the racial data was not detailed enough. The Nuremberg Race Laws provided a more precise and procedural definition of Jewishness that could be rendered machine-readable. As the volume and velocity of information in the Reich increased, Dehomag’s technology was adopted by other agencies like the Race and Settlement Office, and culminated in the vision of a single machinic number for each citizen. Through the lens of these proto-technologies, the paper demonstrates the historical interplay between race and information. Yet if the indexing and sorting of race anticipates big-data analytics, contemporary power is more sophisticated and subtle. The complexity of modern algorithmic regimes diffuses obvious racial markers, engendering a racism without race.

Bibliography. Library science. Information resources

Detail DOI Sumber

DOAJ Open Access 2020

Open Science for private Interests? How the Logic of Open Science Contributes to the Commercialization of Research

Manuela Fernández Pinto

Financial conflicts of interest, several cases of scientific fraud, and research limitations from strong intellectual property laws have all led to questioning the epistemic and social justice appropriateness of industry-funded research. At first sight, the ideal of Open Science, which promotes transparency, sharing, collaboration, and accountability, seems to target precisely the type of limitations uncovered in commercially-driven research. The Open Science movement, however, has primarily focused on publicly funded research, has actively encouraged liaisons with the private sector, and has also created new strategies for commercializing science. As a consequence, I argue that Open Science ends up contributing to the commercialization of science, instead of overcoming its limitations. I use the examples of research publications and citizen science to illustrate this point. Accordingly, the asymmetry between private and public science, present in the current plea to open science, ends up compromising the values of transparency, democracy, and accountability.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2019

Weakly-Private Information Retrieval

Hsuan-Yin Lin, Siddhartha Kumar, Eirik Rosnes et al.

Private information retrieval (PIR) protocols make it possible to retrieve a file from a database without disclosing any information about the identity of the file being retrieved. These protocols have been rigorously explored from an information-theoretic perspective in recent years. While existing protocols strictly impose that no information is leaked on the file's identity, this work initiates the study of the tradeoffs that can be achieved by relaxing the requirement of perfect privacy. In case the user is willing to leak some information on the identity of the retrieved file, we study how the PIR rate, as well as the upload cost and access complexity, can be improved. For the particular case of replicated servers, we propose two weakly-private information retrieval schemes based on two recent PIR protocols and a family of schemes based on partitioning. Lastly, we compare the performance of the proposed schemes.

en cs.IT

Detail Sumber

arXiv Open Access 2019

Discovering Differential Features: Adversarial Learning for Information Credibility Evaluation

Lianwei Wu, Yuan Rao, Ambreen Nazir et al.

A series of deep learning approaches extract a large number of credibility features to detect fake news on the Internet. However, these extracted features still suffer from many irrelevant and noisy features that restrict severely the performance of the approaches. In this paper, we propose a novel model based on Adversarial Networks and inspirited by the Shared-Private model (ANSP), which aims at reducing common, irrelevant features from the extracted features for information credibility evaluation. Specifically, ANSP involves two tasks: one is to prevent the binary classification of true and false information for capturing common features relying on adversarial networks guided by reinforcement learning. Another extracts credibility features (henceforth, private features) from multiple types of credibility information and compares with the common features through two strategies, i.e., orthogonality constraints and KL-divergence for making the private features more differential. Experiments first on two six-label LIAR and Weibo datasets demonstrate that ANSP achieves the state-of-the-art performance, boosting the accuracy by 2.1%, 3.1%, respectively and then on four-label Twitter16 validate the robustness of the model with 1.8% performance improvements.

en cs.CY, cs.CL

Detail Sumber

arXiv Open Access 2018

Quantifying Biases in Online Information Exposure

Dimitar Nikolov, Mounia Lalmas, Alessandro Flammini et al.

Our consumption of online information is mediated by filtering, ranking, and recommendation algorithms that introduce unintentional biases as they attempt to deliver relevant and engaging content. It has been suggested that our reliance on online technologies such as search engines and social media may limit exposure to diverse points of view and make us vulnerable to manipulation by disinformation. In this paper, we mine a massive dataset of Web traffic to quantify two kinds of bias: (i) homogeneity bias, which is the tendency to consume content from a narrow set of information sources, and (ii) popularity bias, which is the selective exposure to content from top sites. Our analysis reveals different bias levels across several widely used Web platforms. Search exposes users to a diverse set of sources, while social media traffic tends to exhibit high popularity and homogeneity bias. When we focus our analysis on traffic to news sites, we find higher levels of popularity bias, with smaller differences across applications. Overall, our results quantify the extent to which our choices of online systems confine us inside "social bubbles."

en cs.SI, cs.CY

Detail DOI Sumber

DOAJ Open Access 2018

Reassembling the Republic of Letters – A Linked Data Approach

Jouni Tuominen, Eetu Mäkelä, Eero Hyvönen et al.

Between 1500 and 1800, a revolution in postal communication allowed ordinary men and women to scatter letters across and beyond Europe. This exchange helped knit together what contemporaries called the respublica litteraria, or Republic of Letters, a knowledge-based civil society, crucial to that era’s intellectual breakthroughs, and formative of many modern European values and institutions. To enable effective Digital Humanities research on the epistolary data distributed in different countries and collections, metadata about the letters have been aggregated, harmonised, and provided for the research community through the Early Modern Letters Online (EMLO) catalogue. This paper discusses the idea and benefits of using Linked Data as the basis for a potential future framework for EMLO, and presents our experiences with a first demonstrator implementation of such a system.

Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2017

DES Science Portal: Creating Science-Ready Catalogs

Angelo Fausti Neto, Luiz da Costa, Aurelio Carnero Rosell et al.

We present a novel approach for creating science-ready catalogs through a software infrastructure developed for the Dark Energy Survey (DES). We integrate the data products released by the DES Data Management and additional products created by the DES collaboration in an environment known as DES Science Portal. Each step involved in the creation of a science-ready catalog is recorded in a relational database and can be recovered at any time. We describe how the DES Science Portal automates the creation and characterization of lightweight catalogs for DES Year 1 Annual Release, and show its flexibility in creating multiple catalogs with different inputs and configurations. Finally, we discuss the advantages of this infrastructure for large surveys such as DES and the Large Synoptic Survey Telescope. The capability of creating science-ready catalogs efficiently and with full control of the inputs and configurations used is an important asset for supporting science analysis using data from large astronomical surveys.

en astro-ph.IM

Detail DOI Sumber

Hasil untuk "Bibliography. Library science. Information resources"