Hasil untuk "cs.DB"

Menampilkan 20 dari ~93754 hasil · dari arXiv, DOAJ, CrossRef

JSON API
arXiv Open Access 2024
DatAasee -- A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake

Christian Himpe

Metadata management for distributed data sources is a long-standing but ever-growing problem. To counter this challenge in a research-data and library-oriented setting, this work constructs a data architecture, derived from the data-lake: the metadata-lake. A proof-of-concept implementation of this proposed metadata aggregator is presented, too, and also evaluated.

en cs.DB, cs.DL
arXiv Open Access 2024
Database Theory + X: Database Visualization

Eugene Wu

We draw a connection between data modeling and visualization, namely that a visualization specification defines a mapping from database constraints to visual representations of those constraints. Using this formalism, we show how many visualization design decisions are, in fact, data modeling choices and extend data visualization from single-dataset visualizations to database visualization

en cs.DB, cs.HC
arXiv Open Access 2022
Tree edit distance for hierarchical data compatible with HMIL paradigm

Břetislav Šopík, Tomáš Strenáčik

We define edit distance for hierarchically structured data compatible with the hierarchical multi-instance learning paradigm. Example of such data is dataset represented in JSON format where inner Array objects are interpreted as unordered bags of elements. We prove correct analytical properties of the defined distance.

en cs.DB, cs.LG
arXiv Open Access 2021
ConQuer-92 -- The revised report on the conceptual query language LISA-D

H. A. Proper

In this report the conceptual query language ConQuer-92 is introduced. This query language serves as the backbone of InfoAssistant's query facilities. Furthermore, this language can also be used for the specification of derivation rules (e.g. subtype defining rules) and textual constraints in InfoModeler. This report is solely concerned with a formal definition, and the explanation thereof, of ConQuer-92. The implementation of ConQuer-92 in SQL-92 will be treated in a separate report.

en cs.DB
arXiv Open Access 2021
Cost models for geo-distributed massively parallel streaming analytics

Anna-Valentini Michailidou, Anastasios Gounaris, Konstantinos Tsichlas

This report is part of the DataflowOpt project on optimization of modern dataflows and aims to introduce a data quality-aware cost model that covers the following aspects in combination: (1) heterogeneity in compute nodes, (2) geo-distribution, (3) massive parallelism, (4) complex DAGs and (5) streaming applications. Such a cost model can be then leveraged to devise cost-based optimization solutions that deal with task placement and operator configuration.

en cs.DB
arXiv Open Access 2019
Transactional Smart Contracts in Blockchain Systems

Victor Zakhary, Divyakant Agrawal, Amr El Abbadi

This paper presents TXSC, a framework that provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions.

en cs.DB, cs.DC
arXiv Open Access 2018
The Historic Development of the Zooarchaeological Database OssoBook and the xBook Framework for Scientific Databases

Daniel Kaltenthaler, Johannes-Y. Lohrer

In this technical report, we describe the historic development of the zooarchaeological database OssoBook and the resulting framework xBook, a generic infrastructure for distributed, relational data management that is mainly designed for the needs of scientific data. We describe the concepts of the architecture and its most important features. We especially point out the Server-Client architecture, the synchronization process, the Launcher application, and the structure and features of the application.

en cs.DB
arXiv Open Access 2017
Index and Materialized View Selection in Data Warehouses

Kamel Aouiche, Jérôme Darmont

The aim of this article is to present an overview of the major families of state-of-the-art index and materialized view selection methods, and to discuss the issues and future trends in data warehouse performance optimization. We particularly focus on data mining-based heuristics we developed to reduce the selection problem complexity and target the most pertinent candidate indexes and materialized views.

en cs.DB
arXiv Open Access 2016
A Survey of RDF Data Management Systems

M. Tamer Özsu

RDF is increasingly being used to encode data for the semantic web and for data exchange. There have been a large number of works that address RDF data management. In this paper we provide an overview of these works.

en cs.DB
arXiv Open Access 2014
A Fast Minimal Infrequent Itemset Mining Algorithm

Kostyantyn Demchuk, Douglas J. Leith

A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance measurements on a broad range of datasets demonstrate substantial reductions in run-time relative to the state of the art and the scalability of the algorithm to realistically-sized datasets up to several million records.

en cs.DB
arXiv Open Access 2014
On the BDD/FC Conjecture

Tomasz Gogacz, Jerzy Marcinkowski

Bounded Derivation Depth property (BDD) and Finite Controllability (FC) are two properties of sets of datalog rules and tuple generating dependencies (known as Datalog +/- programs), which recently attracted some attention. We conjecture that the first of these properties implies the second, and support this conjecture by some evidence proving, among other results, that it holds true for all theories over binary signature.

en cs.DB
arXiv Open Access 2012
Weak Forms of Monotonicity and Coordination-Freeness

Daniel Zinn

Our earlier work titled: "Win-move is Coordination-Free (Sometimes)" has shown that the classes of queries that can be distributedly computed in a coordination-free manner form a strict hierarchy depending on the assumptions of the model for distributed computations. In this paper, we further characterize these classes by revealing a tight relationship between them and novel weakened forms of monotonicity.

en cs.DB, cs.DC
arXiv Open Access 2012
Privacy Preserving Web Query Log Publishing: A Survey on Anonymization Techniques

Amin Milani Fard

Releasing Web query logs which contain valuable information for research or marketing, can breach the privacy of search engine users. Therefore rendering query logs to limit linking a query to an individual while preserving the data usefulness for analysis, is an important research problem. This survey provides an overview and discussion on the recent studies on this direction.

en cs.DB, cs.CR
arXiv Open Access 2008
On the Probability Distribution of Superimposed Random Codes

Bernd Günther

A systematic study of the probability distribution of superimposed random codes is presented through the use of generating functions. Special attention is paid to the cases of either uniformly distributed but not necessarily independent or non uniform but independent bit structures. Recommendations for optimal coding strategies are derived.

en cs.DB, cs.DM
arXiv Open Access 2008
Some results on $\mathbb{R}$-computable structures

Wesley Calvert, John E. Porter

This survey paper examines the effective model theory obtained with the BSS model of real number computation. It treats the following topics: computable ordinals, satisfaction of computable infinitary formulas, forcing as a construction technique, effective categoricity, effective topology, and relations with other models for the effective theory of uncountable structures.

en cs.DB, cs.LO

Halaman 3 dari 4688