In data-intensive real-time applications, such as smart transportation and manufacturing, ensuring data freshness is essential, as using obsolete data can lead to negative outcomes. Validity intervals serve as the standard means to specify freshness requirements in real-time databases. In this paper, we bring attention to significant drawbacks of validity intervals that have largely been unnoticed and introduce a new definition of data freshness, while discussing future research directions to address these limitations.
The unification of large language models (LLMs) and knowledge graphs (KGs) has emerged as a hot topic. At the LLM+KG'24 workshop, held in conjunction with VLDB 2024 in Guangzhou, China, one of the key themes explored was important data management challenges and opportunities due to the effective interaction between LLMs and KGs. This report outlines the major directions and approaches presented by various speakers during the LLM+KG'24 workshop.
One of the most popular setups for a back-end of a high performance website consists of a relational database and a cache which stores results of performed queries. Several application frameworks support caching of queries made to the database, but few of them handle cache invalidation correctly, resorting to simpler solutions such as short TTL values, or flushing the whole cache after any write to the database. In this paper a simple, correct, efficient and tested in real world application solution is presented, which allows for infinite TTL, and very fine grained cache invalidation. Algorithm is proven to be correct in a concurrent environment, both theoretically and in practice.
To retrieve the best results in a database we use Top-K queries and Skyline queries but some problems arise. The formers rely too much on user preferences, which are difficult to quantify and may skew the fetching of the data, while the latters tend to output too much data. In this paper, we explore three different branches of research that seek to overcome such limitations: Flexible/Restricted Skylines, Skyline Ordering/Ranking, and Regret Minimization. We analyze how they work and we make comparisons among them to guide the reader to choose the approach that best fits their use cases.
This report describes a technical methodology to render the Apache Spark execution engine adaptive. It presents the engineering solutions, which specifically target to adaptively reorder predicates in data streams with evolving statistics. The system extension developed is available as an open-source prototype. Indicative experimental results show its overhead and sensitivity to tuning parameters.
We propose a generic numerical measure of inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. A particular inconsistency measure associated to cardinality-repairs is investigated; and we show that it can be computed via answer-set programs. Keywords: Integrity constraints in databases, inconsistent databases, database repairs, inconsistency measure.
We propose a generic numerical measure of the inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. In particular, an inconsistency measure associated to cardinality-repairs is investigated in detail. More specifically, it is shown that it can be computed via answer-set programs, but sometimes its computation can be intractable in data complexity. However, polynomial-time deterministic and randomized approximations are exhibited. The behavior of this measure under small updates is analyzed, obtaining fixed-parameter tractability results. Furthermore, alternative inconsistency measures are proposed and discussed.
The integrated management of business processes and mas- ter data is being increasingly considered as a fundamental problem, by both the academia and the industry. In this position paper, we focus on the foundations of the problem, arguing that contemporary approaches struggle to find a suitable equilibrium between data- and process-related aspects. We then propose db-nets, a new formal model that balances such two pillars through the marriage of colored Petri nets and relational databases. We invite the research community to build on this model, discussing its potential in modeling, formal verification, and simulation.
Marcin Korytkowski, Rafal Scherer, Pawel Staszewski
et al.
This paper presents a novel relational database architecture aimed to visual objects classification and retrieval. The framework is based on the bag-of-features image representation model combined with the Support Vector Machine classification and is integrated in a Microsoft SQL Server database.
Emad Soroush, Magdalena Balazinska, Simon Krughoff
et al.
Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native support for iterative processing. In this paper, we develop a model for iterative array computations and a series of optimizations. We evaluate the benefits of an optimized, native support for iterative array processing on the SciDB engine and real workloads from the astronomy domain.
A novel approach for creating ER conceptual models and an algorithm for transforming them to the relational model has been developed by modifying and extending the existing methods. A part of the new algorithm has previously been presented. This paper presents the rest of the algorithm. One of the objectives of this paper is to use it as a supportive document for ongoing empirical evaluations of the new approach being conducted using the cognitive engagement method and with the participation of different segments of the field as respondents.
George Garbis, Kostis Kyzirakos, Manolis Koubarakis
Geospatial extensions of SPARQL like GeoSPARQL and stSPARQL have recently been defined and corresponding geospatial RDF stores have been implemented. However, there is no widely used benchmark for evaluating geospatial RDF stores which takes into account recent advances to the state of the art in this area. In this paper, we develop a benchmark, called Geographica, which uses both real-world and synthetic data to test the offered functionality and the performance of some prominent geospatial RDF stores.
Multilevel association rules explore the concept hierarchy at multiple levels which provides more specific information. Apriori algorithm explores the single level association rules. Many implementations are available of Apriori algorithm. Fast Apriori implementation is modified to develop new algorithm for finding multilevel association rules. In this study the performance of this new algorithm is analyzed in terms of running time in seconds.
This document is part of original research work by the authors in a bid to explore new fields for applying Data Mining Techniques. The sample data is part of a large data set from University of Maryland (UMD) and outlines how more meaningful patterns can be discovered by preprocessing the data in the form of OLAP cubes.
This research is about an online forum designed and developed to improve the communication process between alumni, new, old and upcoming students. In this research paper we present targeted problems, designed architecture, used technologies in development and final end product in detail.
In this article, we describe the XML storage system used in the WebContent project. We begin by advocating the use of an XML database in order to store WebContent documents, and we present two different ways of storing and querying these documents : the use of a centralized XML database and the use of a P2P XML database.
Mohammad-Reza Feizi-Derakhshi, Hasan Asil, Amir Asil
This paper proposes a multi agent system by compiling two technologies, query processing optimization and agents which contains features of personalized queries and adaption with changing of requirements. This system uses a new algorithm based on modeling of users' long-term requirements and also GA to gather users' query data. Experimented Result shows more adaption capability for presented algorithm in comparison with classic algorithms.
Iterated hash functions process strings recursively, one character at a time. At each iteration, they compute a new hash value from the preceding hash value and the next character. We prove that iterated hashing can be pairwise independent, but never 3-wise independent. We show that it can be almost universal over strings much longer than the number of hash values; we bound the maximal string length given the collision probability.