Hasil untuk "cs.DB"

Menampilkan 20 dari ~93751 hasil · dari CrossRef, DOAJ, arXiv

JSON API
arXiv Open Access 2024
DatAasee -- A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake

Christian Himpe

Metadata management for distributed data sources is a long-standing but ever-growing problem. To counter this challenge in a research-data and library-oriented setting, this work constructs a data architecture, derived from the data-lake: the metadata-lake. A proof-of-concept implementation of this proposed metadata aggregator is presented, too, and also evaluated.

en cs.DB, cs.DL
arXiv Open Access 2023
A parallel algorithm for automated labeling of large time series

Andrey Goglachev

This article presents the PaSTiLa algorithm for automated labeling of large time series on a cluster with GPUs. The method automatically selects snippet length values based on the new proposed criterion and allows to search for patterns with high performance. Experiments showed high accuracy of pattern search and the advantage of the method compared to analogues.

en cs.DB
arXiv Open Access 2023
Eight Transaction Papers by Jim Gray

Philip A. Bernstein

This article is a summary of eight of Jim Gray's transaction papers. It was written at the invitation of Pat Helland to be a chapter of a forthcoming book in the ACM Turing Award winners' series, "Curiosity, Clarity, and Caring: How Jim Gray's Passion for Learning, Teaching, and People Changed Computing."

en cs.DB, cs.DC
arXiv Open Access 2022
Tree edit distance for hierarchical data compatible with HMIL paradigm

Břetislav Šopík, Tomáš Strenáčik

We define edit distance for hierarchically structured data compatible with the hierarchical multi-instance learning paradigm. Example of such data is dataset represented in JSON format where inner Array objects are interpreted as unordered bags of elements. We prove correct analytical properties of the defined distance.

en cs.DB, cs.LG
arXiv Open Access 2019
Frequent Itemset Mining using QUBO

Jonas Nüßlein

In this paper we propose a R-step approximation to solve frequent itemset mining on quantum hardware like quantum annealing or QAOA. The idea is to search for the set of items where the minimal 2-item frequency is maximal. This can be represented as a maximum clique problem.

en cs.DB
arXiv Open Access 2019
A framework supporting imprecise queries and data

Giacomo Bergami

This technical report provides some lightweight introduction and some generic use case scenarios motivating the definition of a database supporting uncertainties in both queries and data. This technical report is only providing the logical framework, which implementation is going to be provided in the final paper.

en cs.DB
arXiv Open Access 2017
Fault Tolerant Consensus Agreement Algorithm

Marius Rafailescu

Recently a new fault tolerant and simple mechanism was designed for solving commit consensus problem. It is based on replicated validation of messages sent between transaction participants and a special dispatcher validator manager node. This paper presents a correctness, safety proofs and performance analysis of this algorithm.

en cs.DB, cs.DC
arXiv Open Access 2014
NoSQL Databases

Massimo Carro

In this document, I present the main notions of NoSQL databases and compare four selected products (Riak, MongoDB, Cassandra, Neo4J) according to their capabilities with respect to consistency, availability, and partition tolerance, as well as performance. I also propose a few criteria for selecting the right tool for the right situation.

en cs.DB
arXiv Open Access 2014
A Fast Minimal Infrequent Itemset Mining Algorithm

Kostyantyn Demchuk, Douglas J. Leith

A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance measurements on a broad range of datasets demonstrate substantial reductions in run-time relative to the state of the art and the scalability of the algorithm to realistically-sized datasets up to several million records.

en cs.DB
arXiv Open Access 2014
On the BDD/FC Conjecture

Tomasz Gogacz, Jerzy Marcinkowski

Bounded Derivation Depth property (BDD) and Finite Controllability (FC) are two properties of sets of datalog rules and tuple generating dependencies (known as Datalog +/- programs), which recently attracted some attention. We conjecture that the first of these properties implies the second, and support this conjecture by some evidence proving, among other results, that it holds true for all theories over binary signature.

en cs.DB
arXiv Open Access 2012
Covering Rough Sets From a Topological Point of View

Nguyen Duc Thuan

Covering-based rough set theory is an extension to classical rough set. The main purpose of this paper is to study covering rough sets from a topological point of view. The relationship among upper approximations based on topological spaces are explored.

en cs.DB
arXiv Open Access 2008
An Array Algebra

Albrecht Schmidt

This is a proposal of an algebra which aims at distributed array processing. The focus lies on re-arranging and distributing array data, which may be multi-dimensional. The context of the work is scientific processing; thus, the core science operations are assumed to be taken care of in external libraries or languages. A main design driver is the desire to carry over some of the strategies of the relational algebra into the array domain.

en cs.DB
arXiv Open Access 2008
An Introduction to Knowledge Management

Sabu M. Thampi

Knowledge has been lately recognized as one of the most important assets of organizations. Managing knowledge has grown to be imperative for the success of a company. This paper presents an overview of Knowledge Management and various aspects of secure knowledge management. A case study of knowledge management activities at Tata Steel is also discussed

en cs.DB, cs.CR

Halaman 2 dari 4688