Directed Information: Estimation, Optimization and Applications in Communications and Causality
Dor Tsur, Oron Sabag, Navin Kashyap
et al.
Directed information (DI) is an information measure that attempts to capture directionality in the flow of information from one random process to another. It is closely related to other causal influence measures, such as transfer entropy, Granger causality, and Pearl's causal framework. This monograph provides an overview of DI and its main application in information theory, namely, characterizing the capacity of channels with feedback and memory. We begin by reviewing the definitions of DI, its basic properties, and its relation to Shannon's mutual information. Next, we provide a survey of DI estimation techniques, ranging from classic plug-in estimators to modern neural-network-based estimators. Considering the application of channel capacity estimation, we describe how such estimators numerically optimize DI rate over a class of joint distributions on input and output processes. A significant part of the monograph is devoted to techniques to compute the feedback capacity of finite-state channels (FSCs). The feedback capacity of a strongly connected FSC involves the maximization of the DI rate from the channel input process to the output process. This maximization is performed over the class of causal conditioned probability input distributions. When the FSC is also unifilar, i.e., the next state is given by a time-invariant function of the current state and the new input-output symbol pair, the feedback capacity is the optimal average reward of an appropriately formulated Markov decision process (MDP). This MDP formulation has been exploited to develop several methods to compute exactly, or at least estimate closely, the feedback capacity of a unifilar FSC. This monograph describes these methods, starting from the value iteration algorithm, to Q-graph methods, and reinforcement learning algorithms that can handle large input and output alphabets.
Harmonizing Community Science Datasets to Model Highly Pathogenic Avian Influenza (HPAI) in Birds in the Subantarctic
Richard Littauer, Kris Bubendorfer
Community science observational datasets are useful in epidemiology and ecology for modeling species distributions, but the heterogeneous nature of the data presents significant challenges for standardization, data quality assurance and control, and workflow management. In this paper, we present a data workflow for cleaning and harmonizing multiple community science datasets, which we implement in a case study using eBird, iNaturalist, GBIF, and other datasets to model the impact of highly pathogenic avian influenza in populations of birds in the subantarctic. We predict population sizes for several species where the demographics are not known, and we present novel estimates for potential mortality rates from HPAI for those species, based on a novel aggregated dataset of mortality rates in the subantarctic.
Cross Mutual Information
Chetan Gohil, Oliver M Cliff, James M. Shine
et al.
Mutual information (MI) is a useful information-theoretic measure to quantify the statistical dependence between two random variables: $X$ and $Y$. Often, we are interested in understanding how the dependence between $X$ and $Y$ in one set of samples compares to another. Although the dependence between $X$ and $Y$ in each set of samples can be measured separately using MI, these estimates cannot be compared directly if they are based on samples from a non-stationary distribution. Here, we propose an alternative measure for characterising how the dependence between $X$ and $Y$ as defined by one set of samples is expressed in another, \textit{cross mutual information}. We present a comprehensive set of simulation studies sampling data with $X$-$Y$ dependencies to explore this measure. Finally, we discuss how this relates to measures of model fit in linear regression, and some future applications in neuroimaging data analysis.
Evaluation of a Virtual Laboratory Platform in General Education on Quantum Information Science
Hongbin Song
This paper presents the findings of pedagogical research on the efficacy of a virtual laboratory platform in general education courses on quantum information science. Specifically, a virtual laboratory activity based on the Bell test has been developed using a commercially available Quantum Optical Simulation Laboratory, QLab. The experiential activity is designed to help undergraduates from diverse academic disciplines understand the counterintuitive yet foundational concept of quantum entanglement. Qualitative and quantitative evaluations conducted over three academic years using carefully designed questionnaires indicated that the virtual laboratory enabled over 80% of students to grasp the complex concepts of quantum entanglement. These results demonstrate the effectiveness of the virtual laboratory in making abstract quantum concepts accessible and engaging, regardless of students' prior knowledge of advanced mathematics or technical skills. Despite certain limitations, such as the relatively small sample sizes in the last two semesters, this study offers valuable insights and a practical framework for addressing the challenges of teaching quantum information science in undergraduate curricula, particularly within general education courses designed for both science and non-science students.
Publication Trend in DESIDOC Journal of Library and Information Technology during 2013-2017: A Scientometric Approach
M Sadik Batcha, S Roselin Jahina, Muneer Ahmad
DESIDOC Journal of Library & Information Technology (DJLIT) formerly known as DESIDOC Bulletin of Information Technology is a peer-reviewed, open access, bimonthly journal. This paper presents a Scientometric analysis of the DESIDOC Journal. The paper analyses the pattern of growth of the research output published in the journal, pattern of authorship, author productivity, and, subjects covered to the papers over the period (2013-2017). It is found that 227 papers were published during the period of study (2001-2012). The maximum numbers of articles were collaborative in nature. The subject concentration of the journal noted is Scientometrics. The maximum numbers of articles (65%) have ranged their thought contents between 6 and 10 pages. The study applied standard formula and statistical tools to bring out the factual result.
On the Separability of Information in Diffusion Models
Akhil Premkumar
Diffusion models transform noise into data by injecting information that was captured in their neural network during the training phase. In this paper, we ask: \textit{what} is this information? We find that, in pixel-space diffusion models, (1) a large fraction of the total information in the neural network is committed to reconstructing small-scale perceptual details of the image, and (2) the correlations between images and their class labels are informed by the semantic content of the images, and are largely agnostic to the low-level details. We argue that these properties are intrinsically tied to the manifold structure of the data itself. Finally, we show that these facts explain the efficacy of classifier-free guidance: the guidance vector amplifies the mutual information between images and conditioning signals early in the generative process, influencing semantic structure, but tapers out as perceptual details are filled in.
en
cs.LG, cond-mat.stat-mech
Isolated, individualised, and immobilised: information behaviour in the context of academic casualisation
Rebekah Willson, Owen Stewart-Robertson, Heidi Julien
et al.
Introduction. Universities rely increasingly on contract academic staff for teaching and research activities; yet, working in precarious conditions, these staff face significant challenges in finding relevant workplace information, in engaging with colleagues, and in building their careers. This study examines contract academic staff perceptions of precarity and workplace marginalisation, focusing on the implications of situational and environmental influences on their information practices.
Method. In-depth, semi-structured interviews with 34 contract academic staff, working in various disciplines across Canadian universities, were conducted to examine their information practices.
Analysis. Interview data were analysed using reflexive thematic analysis, drawing on everyday life information seeking and information marginalisation theories.
Results. Results of the study show that 1) contract academic staff conduct their work within isolated information environments; 2) this isolation leads these staff to develop highly individualised information practices; and 3) the information activities of contract academic staff are often immobilised, due to the precarious contexts that shape their work and personal lives.
Conclusion. Precarious employment and information marginalisation are deeply entwined for contract academic staff. This results in frustration, disappointment, and uncertainty with their work and personal circumstances. Institutional challenges can seem intractable, particularly where task-related information provision (when available) cannot address systemic concerns.
Bibliography. Library science. Information resources
Ympäristö- ja hyvinvointitiedon kytkeminen kaupunkisuunnittelun prosesseihin yhteiskehittelyn ja osallistamisen avulla
Anna Suorsa, Anna-Maija Multas, Heidi Enwald
et al.
Bibliography. Library science. Information resources
The Asymptotic Behaviour of Information Leakage Metrics
Sophie Taylor, Praneeth Kumar Vippathalla, Justin P. Coon
Information theoretic leakage metrics quantify the amount of information about a private random variable $X$ that is leaked through a correlated revealed variable $Y$. They can be used to evaluate the privacy of a system in which an adversary, from whom we want to keep $X$ private, is given access to $Y$. Global information theoretic leakage metrics quantify the overall amount of information leaked upon observing $Y$, whilst their pointwise counterparts define leakage as a function of the particular realisation $Y=y$ that the adversary sees, and thus can be viewed as random variables. We consider an adversary who observes a large number of independent identically distributed realisations of $Y$. We formalise the essential asymptotic behaviour of an information theoretic leakage metric, considering in turn what this means for pointwise and global metrics. With the resulting requirements in mind, we take an axiomatic approach to defining a set of pointwise leakage metrics, as well as a set of global leakage metrics that are constructed from them. The global set encompasses many known measures including mutual information, Sibson mutual information, Arimoto mutual information, maximal leakage, min entropy leakage, $f$-divergence metrics, and g-leakage. We prove that both sets follow the desired asymptotic behaviour. Finally, we derive composition theorems which quantify the rate of privacy degradation as an adversary is given access to a large number of independent observations of $Y$. It is found that, for both pointwise and global metrics, privacy degrades exponentially with increasing observations for the adversary, at a rate governed by the minimum Chernoff information between distinct conditional channel distributions. This extends the work of Wu et al. (2024), who have previously found this to be true for certain known metrics, including some that fall into our more general set.
On Local Mutual-Information Privacy
Khac-Hoang Ngo, Johan Östman, Alexandre Graell i Amat
Local mutual-information privacy (LMIP) is a privacy notion that aims to quantify the reduction of uncertainty about the input data when the output of a privacy-preserving mechanism is revealed. We study the relation of LMIP with local differential privacy (LDP), the de facto standard notion of privacy in context-independent (CI) scenarios, and with local information privacy (LIP), the state-of-the-art notion for context-dependent settings. We establish explicit conversion rules, i.e., bounds on the privacy parameters for an LMIP mechanism to also satisfy LDP/LIP, and vice versa. We use our bounds to formally verify that LMIP is a weak privacy notion. We also show that uncorrelated Gaussian noise is the best-case noise in terms of CI-LMIP if both the input data and the noise are subject to an average power constraint.
The Role of Data in an Emerging Research Community:
Danielle Pollock, An Yan, Michelle Parker
et al.
Open science data benefit society by facilitating convergence across domains that are examining the same scientific problem. While cross-disciplinary data sharing and reuse is essential to the research done by convergent communities, so far little is known about the role data play in how these communities interact. An understanding of the role of data in these collaborations can help us identify and meet the needs of emerging research communities which may predict the next challenges faced by science. This paper represents an exploratory study of one emerging community, the environmental health community, examining how environmental health research groups form, collaborate, and share data. Five key insights about the role of data in emerging research communities are identified and suggestions are made for further research.
Bibliography. Library science. Information resources
Análise dos decretos estaduais sobre sistemas eletrônicos de gestão de documentos à luz da governança arquivística
Josemar Henrique de Melo, Julianne Teixeira, Rita de Cássia São Paio de Azeredo Esteves
Trata este artigo, da análise dos decretos dos entes federativos a respeito do uso do meio digital para a gestão de documentos administrativos, a partir da perspectiva da governança arquivística. Foram considerados os avanços e as limitações neles colocadas e o impacto desses marcos legais para a produção, preservação e o acesso dos documentos arquivísticos digitais da administração pública estadual. É uma pesquisa documental e aplicada, com abordagem qualiquantitativa, de caráter descritivo. A utilização dos meios informatizados para gestão dos processos administrativos pelos estados é estabelecida pelo próprio ente federativo no uso de suas atribuições, a partir de decretos ou leis que definem a estrutura estadual para produzir, capturar, tramitar, preservar, descrever e acessar os documentos digitais. O levantamento dos decretos estaduais e federal foi realizado em dois momentos: o primeiro, no buscador Google, e o segundo, por meio de pedido de informação no Sistema Eletrônico de Informações ao Cidadão (e-SIC). A fundamentação teórica foi estruturada à luz da governança arquivística com ênfase nas normas, padrões e recomendações estabelecidas pelo Conselho Nacional de Arquivos. Como resultado, percebeu-se, nos decretos analisados, um afastamento dos princípios arquivísticos o que poderá comprometer, por exemplo, a presunção de autenticidade, o caráter jurídico-probatório e a integridade dos documentos arquivísticos, colocando em risco o acesso, a transparência pública e a preservação da memória administrativa estadual para as gerações futuras.
Bibliography. Library science. Information resources
Building Bridges: Establishing a Dialogue Between Software Engineering Research and Computational Science
Reed Milewicz, Miranda Mundt
There has been growing interest within the computational science and engineering (CSE) community in engaging with software engineering research -- the systematic study of software systems and their development, operation, and maintenance -- to solve challenges in scientific software development. Historically, there has been little interaction between scientific computing and the field, which has held back progress. With the ranks of scientific software teams expanding to include software engineering researchers and practitioners, we can work to build bridges to software science and reap the rewards of evidence-based practice in software development.
Information-Theoretic Analysis of Minimax Excess Risk
Hassan Hafez-Kolahi, Behrad Moniri, Shohreh Kasaei
Two main concepts studied in machine learning theory are generalization gap (difference between train and test error) and excess risk (difference between test error and the minimum possible error). While information-theoretic tools have been used extensively to study the generalization gap of learning algorithms, the information-theoretic nature of excess risk has not yet been fully investigated. In this paper, some steps are taken toward this goal. We consider the frequentist problem of minimax excess risk as a zero-sum game between the algorithm designer and the world. Then, we argue that it is desirable to modify this game in a way that the order of play can be swapped. We then prove that, under some regularity conditions, if the world and designer can play randomly the duality gap is zero and the order of play can be changed. In this case, a Bayesian problem surfaces in the dual representation. This makes it possible to utilize recent information-theoretic results on minimum excess risk in Bayesian learning to provide bounds on the minimax excess risk. We demonstrate the applicability of the results by providing information theoretic insight on two important classes of problems: classification when the hypothesis space has finite VC-dimension, and regularized least squares.
Bibliotecas acadêmicas e gestão de dados de pesquisa: uma revisão bibliográfica
Érika Rayanne Silva de Carvalho, F. Leite, P. Bertin
This article aimed to analyze the challenges faced by academic libraries in the research data management, as it is presented in the scientific literature. To this end, a bibliographic investigation was carried out using the Library and Information Science Abstracts (LISA) database. After careful evaluation, sixteen articles were selected, which constituted the basis for the elaboration of a descriptive model for research data management in academic libraries. The results of this showed that the elaboration of the policy and of the research data management plan, the development of technological infrastructure, the processing and analysis of data, the sharing and preservation of data, and the training of researchers and librarians, are the main actions developed by libraries. However, the libraries face challenges such as dealing with disciplinary differences, the lack of financial resources, the awareness of researchers for developing the research data management plan, the creation of data repositories, the definition of standards for data sharing and archiving, and the lack of training for librarians in research data management services. To overcome all of this, it is concluded that the professional updating of the librarian, is essential.
1 sitasi
en
Political Science
Problemas en la difusión de fonogramas desde instituciones de patrimonio y soluciones mediante procesos de gestión
Luis-Fernando Ramos-Simón, Ignacio Miró-Charbonnier
La gestión de los fonogramas en las instituciones de patrimonio plantea un sinfín de dificultades, sobre todo desde la perspectiva de los derechos de los titulares de las obras. El estudio describe un proceso de gestión documental, representado mediante diagramas de flujo, que está dirigido a resolver las principales dificultades a las que se enfrentan los gestores al momento de divulgar colecciones de fonogramas. En cuanto a la situación de los posibles titulares de derechos sobre los fonogramas, se analizan sucesivamente: a) los diferentes tipos de derechos sobre los documentos sonoros; b) la duración de la protección de los derechos de los autores; c) la duración de los derechos conexos; y d) algunos problemas específicos de la gestión de derechos de autor en las instituciones de patrimonio. En conjunto, el estudio ofrece una herramienta para dar respuesta a la mayoría de los problemas que puede plantear en este entorno la difusión de los fonogramas, y demuestra las ventajas de disponer de una política de derechos de autor en las instituciones, tanto para la gestión interna de sus colecciones de fonogramas como para la difusión de éstas en la sociedad actual.
Bibliography. Library science. Information resources, Bibliography
VFSIE -- Development and Testing Framework for Federated Science Instruments
Anees Al-Najjar, Nageswara S. V. Rao, Neena Imam
et al.
Recent developments in softwarization of networked infrastructures combined with containerization of computing workflows promise unprecedented compute anywhere and everywhere capabilities for federations of edge and remote computing systems and science instruments. The development and testing of software stacks that implement these capabilities over physical production federations, however, is not very practical nor cost-effective. In response, we develop a digital twin of the physical infrastructure, called the Virtual Federated Science Instrument Environment (VFSIE). This framework emulates the federation using containers and hosts connected over an emulated network, and supports the development and testing of federation stacks and workflows. We illustrate its use in a case study involving Jupiter Notebook computations and instrument control.
Research on teaching portuguese as a second language for foreigners. Investigación sobre la enseñanza del portugués como segunda lengua para extranjeros
Mateus Gabilanes Rodrigues, Airton José Vinholi Júnior
Reimagining Canadian Art Practices and Art Collections
J. Dufour, S. Ellis, J. Latour
The authors examine two Canadian art initiatives that librarians from Canadian universities have undertaken at individual and institutional levels. The first project addresses an in-progress artists’ biographical dictionary that focuses on an under-documented form of art practice and situates the dictionary within an evolving landscape of biographical art reference resources in Canada. The second initiative reports on a collection management project that assembles essential Canadiana print material and recontextualizes it with renewed visibility and access. These projects are supplemented with an extensive literature review by a third art librarian that parses the library and information science literature related to these two topics and focuses on Canadian scholarship, where available, as a frame of reference. Together, the three sections of this article enrich the bio-bibliographic information about, and exhibition histories of, Canadian artists while improving access to essential research publications and collections.
Classification on the Web: an analysis of Dewey Linked Data
Kazumi Tomoyose, Ana Carolina Simionato Arakaki
With the availability of information in the World Wide Web its access and retrieval by the users is facilitated, and the Library and Information Science (LIS) field’s knowledge and techniques can be applied to this environment in order to help with the process. The present study is descriptive, qualitative and exploratory, based on bibliographical sources, in which it was explored how the Classification discipline interacts with Linked Data, focusing on the analysis of Dewey Linked Data. From four catalogs analyzed, referred to in the literature as adhering to Dewey Linked Data, only two actually has links in their records redirecting to the system. Despite this, its presence in The Linked Open Data Cloud appears as a positive factor in its dissemination, since it boosts its visibility. It is concluded that the Classification discipline allows the thematic standardization of information resources, so that there is uniformity in the Web environment and quality retrieval of information, while promoting interoperability between data in the Linked Data context. The standardization of metadata values using classifications optimizes the representation of information and its retrieval in the Web, while also providing the reuse of data. In addition, studies that align the area of Library and Information Science with the Semantic Web and its technologies can provide new perspectives for the area, as well as contemplate the users’ always changing needs, thus, fulfilling the objective of the field.
1 sitasi
en
Computer Science