Hasil "cs.DS" - JURNALIN

arXiv Open Access 2025

FCDB (Functorial-Categorical Database): A Compositional Framework for Information Preservation and Anti-Commutativity Reduction

Jun Kawasaki

Conventional database architectures often secure local consistency by discarding information, entangling correctness with loss. We introduce the Functorial-Categorical Database (FCDb), which models data operations as morphisms in a layered functor category and establishes a Complete Preserving Family (CPF) of projections spanning content invariance (CAS), capability, and ownership, with optional observational projections for local order (B+Tree), temporal history (append-only/LSM), and adjacency (Graph). We identify a minimal kernel (F_core = Own o Cap o CAS) that preserves information and collapses non-commutativity to the ethical grant/revoke boundary. Under adjoint lifts and a fibred structure, operational pairs commute in the categorical limit while ownership integrity and capability constraints are maintained. The framework connects to information geometry via projection interpretations and supports empirical validation without discarding semantic, temporal, or relational entropy.

en cs.DB

Detail Sumber

CrossRef Open Access 2022

Gene-Based Association Tests Using New Polygenic Risk Scores and Incorporating Gene Expression Data

Shijia Yan, Qiuying Sha, Shuanglin Zhang

Recently, gene-based association studies have shown that integrating genome-wide association studies (GWAS) with expression quantitative trait locus (eQTL) data can boost statistical power and that the genetic liability of traits can be captured by polygenic risk scores (PRSs). In this paper, we propose a new gene-based statistical method that leverages gene-expression measurements and new PRSs to identify genes that are associated with phenotypes of interest. We used a generalized linear model to associate phenotypes with gene expression and PRSs and used a score-test statistic to test the association between phenotypes and genes. Our simulation studies show that the newly developed method has correct type I error rates and can boost statistical power compared with other methods that use either gene expression or PRS in association tests. A real data analysis figure based on UK Biobank data for asthma shows that the proposed method is applicable to GWAS.

5 sitasi en

Detail DOI Sumber

arXiv Open Access 2021

Deterministic Algorithms for the Hidden Subgroup Problem

Ashwin Nayak

We present deterministic algorithms for the Hidden Subgroup Problem. The first algorithm, for abelian groups, achieves the same asymptotic worst-case query complexity as the optimal randomized algorithm, namely O($\sqrt{ n}\,$), where $n$ is the order of the group. The analogous algorithm for non-abelian groups comes within a $\sqrt{ \log n}$ factor of the optimal randomized query complexity. The best known randomized algorithm for the Hidden Subgroup Problem has expected query complexity that is sensitive to the input, namely O($\sqrt{ n/m}\,$), where $m$ is the order of the hidden subgroup. In the first version of this article (arXiv:2104.14436v1 [cs.DS]), we asked if there is a deterministic algorithm whose query complexity has a similar dependence on the order of the hidden subgroup. Prompted by this question, Ye and Li (arXiv:2110.00827v1 [cs.DS]) present deterministic algorithms for abelian groups which solve the problem with O($\sqrt{ n/m }\,$ ) queries, and find the hidden subgroup with O($\sqrt{ n (\log m) / m} + \log m$) queries. Moreover, they exhibit instances which show that in general, the deterministic query complexity of the problem may be o($\sqrt{ n/m } \,$), and that of finding the entire subgroup may also be o($\sqrt{ n/m } \,$) or even $ω(\sqrt{ n/m } \,)$. We present a different deterministic algorithm for the Hidden Subgroup Problem that also has query complexity O($\sqrt{ n/m }\,$) for abelian groups. The algorithm is arguably simpler. Moreover, it works for non-abelian groups, and has query complexity O($\sqrt{ (n/m) \log (n/m) }\,$) for a large class of instances, such as those over supersolvable groups. We build on this to design deterministic algorithms to find the hidden subgroup for all abelian and some non-abelian instances, at the cost of a $\log m$ multiplicative factor increase in the query complexity.

en cs.DS, cs.CC

Detail DOI Sumber

arXiv Open Access 2021

Single-Sample Prophet Inequalities via Greedy-Ordered Selection

Constantine Caramanis, Paul Dütting, Matthew Faw et al.

We study single-sample prophet inequalities (SSPIs), i.e., prophet inequalities where only a single sample from each prior distribution is available. Besides a direct, and optimal, SSPI for the basic single choice problem [Rubinstein et al., 2020], most existing SSPI results were obtained via an elegant, but inherently lossy, reduction to order-oblivious secretary (OOS) policies [Azar et al., 2014]. Motivated by this discrepancy, we develop an intuitive and versatile greedy-based technique that yields SSPIs directly rather than through the reduction to OOSs. Our results can be seen as generalizing and unifying a number of existing results in the area of prophet and secretary problems. Our algorithms significantly improve on the competitive guarantees for a number of interesting scenarios (including general matching with edge arrivals, bipartite matching with vertex arrivals, and certain matroids), and capture new settings (such as budget additive combinatorial auctions). Complementing our algorithmic results, we also consider mechanism design variants. Finally, we analyze the power and limitations of different SSPI approaches by providing a partial converse to the reduction from SSPI to OOS given by Azar et al.

en cs.DS, cs.GT

Detail DOI Sumber

arXiv Open Access 2016

Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors

Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn et al.

We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extends to the entire space $\mathbb{R}^d$ using the techniques from [Andoni, Razenshteyn 2015]). We also show tight, unconditional cell-probe lower bounds for one and two probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than for one probe. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

en cs.DS, cs.CC

Detail Sumber

arXiv Open Access 2016

Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn et al.

[See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the $c$-Approximate Near Neighbor Search problem. For the $d$-dimensional Euclidean space and $n$-point datasets, we develop a data structure with space $n^{1 + ρ_u + o(1)} + O(dn)$ and query time $n^{ρ_q + o(1)} + d n^{o(1)}$ for every $ρ_u, ρ_q \geq 0$ such that: \begin{equation} c^2 \sqrt{ρ_q} + (c^2 - 1) \sqrt{ρ_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor $c > 1$, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for $ρ_q = 0$, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

en cs.DS, cs.CC

Detail DOI Sumber

arXiv Open Access 2015

Tradeoffs for nearest neighbors on the sphere

Thijs Laarhoven

We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity $n^{ρ_q}$ and update complexity $n^{ρ_u}$ for data sets of size $n$ is given by the following equation in terms of the approximation factor $c$ and the exponents $ρ_q$ and $ρ_u$: $$c^2\sqrt{ρ_q}+(c^2-1)\sqrt{ρ_u}=\sqrt{2c^2-1}.$$ For small $c=1+ε$, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity $n^{1-4ε^2}$. Balancing the query and update costs leads to optimal complexities $n^{1/(2c^2-1)}$, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity $n^{o(1)}$ can be achieved at the cost of a space complexity of the order $n^{1/(4ε^2)}$, matching the bound $n^{Ω(1/ε^2)}$ of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large $c$, minimizing the update complexity results in a query complexity of $n^{2/c^2+O(1/c^4)}$, improving upon the related exponent for large $c$ of [Kapralov, PODS'15] by a factor $2$, and matching the bound $n^{Ω(1/c^2)}$ of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities $n^{1/(2c^2-1)}$, while a minimum query time complexity can be achieved with update complexity $n^{2/c^2+O(1/c^4)}$, improving upon the previous best exponents of Kapralov by a factor $2$.

en cs.DS, cs.CG

Detail Sumber

arXiv Open Access 2015

A Refutation of the Clique-Based P=NP Proofs of LaPlante and Tamta-Pande-Dhami

Hector A. Cardenas, Chester Holtz, Maria Janczak et al.

In this work, we critique two papers, "A Polynomial-Time Solution to the Clique Problem" by Tamta, Pande, and Dhami, and "A Polynomial-Time Algorithm For Solving Clique Problems" by LaPlante. We summarize and analyze both papers, noting that the algorithms presented in both papers are flawed. We conclude that neither author has successfully established that P = NP.

en cs.CC

Detail Sumber

S2 Open Access 2014

Improved Approximation of Maximum Vertex Coverage Problem on Bipartite Graphs

N. Apollonio, B. Simeone

14 sitasi en Mathematics, Computer Science

Detail DOI Sumber

arXiv Open Access 2014

A Fast Quartet Tree Heuristic for Hierarchical Clustering

Rudi L. Cilibrasi, Paul M. B. Vitanyi

The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the $3{n \choose 4}$ weighted quartet topologies on $n$ objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized. Keywords: Data and knowledge visualization, Pattern matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering, Global optimization, Quartet tree, Randomized hill-climbing,

en cs.LG, cs.CE

Detail Sumber

S2 Open Access 2013

d-COS-R is FPT via Interval Deletion

N. Narayanaswamy, R. Subashini

A binary matrix $M$ has the Consecutive Ones Property (COP) if there exists a permutation of columns that arranges the ones consecutively in all the rows. Given a matrix, the $d$-COS-R problem is to determine if there exists a set of at most $d$ rows whose deletion results in a matrix with COP. We consider the parameterized complexity of this problem with respect to the number $d$ of rows to be deleted as the parameter. The closely related Interval Deletion problem has recently shown to be FPT [Y. Cao and D. Marx, Interval Deletion is Fixed-Parameter Tractable, arXiv:1211.5933 [cs.DS],2012]. In this work, we describe a recursive depth-bounded search tree algorithm in which the problems at the leaf-level are solved as instances of Interval Deletion. The running time of the algorithm is dominated by the running time of Interval Deletion, and therefore we show that $d$-COS-R is fixed-parameter tractable and has a run-time of $O^*(10^d)$.

1 sitasi en Computer Science, Mathematics

Detail Sumber

DOAJ Open Access 2012

Adaptive compression against a countable alphabet

Dominique Bontemps, Stephane Boucheron, Elisabeth Gassiat

This paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and non-decreasing hazard rate. We prove that the auto-censuring (AC) code introduced by Bontemps (2011) is adaptive with respect to the collection of such classes. The analysis builds on the tight characterization of universal redundancy rate in terms of metric entropy by Haussler and Opper (1997) and on a careful analysis of the performance of the AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of samples from discrete distributions with finite and non-decreasing hazard rate.

Mathematics

Detail DOI Sumber

DOAJ Open Access 2012

Additive tree functionals with small toll functions and subtrees of random trees

Stephan Wagner

Many parameters of trees are additive in the sense that they can be computed recursively from the sum of the branches plus a certain toll function. For instance, such parameters occur very frequently in the analysis of divide-and-conquer algorithms. Here we are interested in the situation that the toll function is small (the average over all trees of a given size $n$ decreases exponentially with $n$). We prove a general central limit theorem for random labelled trees and apply it to a number of examples. The main motivation is the study of the number of subtrees in a random labelled tree, but it also applies to classical instances such as the number of leaves.

Mathematics

Detail DOI Sumber

DOAJ Open Access 2012

Infinite Systems of Functional Equations and Gaussian Limiting Distributions

Michael Drmota, Bernhard Gittenberger, Johannes F. Morgenbesser

In this paper infinite systems of functional equations in finitely or infinitely many random variables arising in combinatorial enumeration problems are studied. We prove sufficient conditions under which the combinatorial random variables encoded in the generating function of the system tend to a finite or infinite dimensional limiting distribution.

Mathematics

Detail DOI Sumber

S2 Open Access 2011

Greedy Algorithms for Multi-Queue Buffer Management with Class Segregation (New Trends in Algorithms and Theory of Computation)

T. Itoh, Seiji Yoshimoto

In this paper, we focus on a multi-queue buffer management in which packets of different values are segregated in different queues. Our model consists of m packets values and m queues. Recently, Al-Bawani and Souza (arXiv:1103.6049v2 [cs.DS] 30 Mar 2011) presented an online multi-queue buffer management algorithm Greedy and showed that it is 2-competitive for the general m-valued case, i.e., m packet values are 0 < v_{1} < v_{2} < ... < v_{m}, and (1+v_{1}/v_{2})-competitive for the two-valued case, i.e., two packet values are 0 < v_{1} < v_{2}. For the general m-valued case, let c_i = (v_{i} + \sum_{j=1}^{i-1} 2^{j-1} v_{i-j})/(v_{i+1} + \sum_{j=1}^{i-1}2^{j-1}v_{i-j}) for 1 \leq i \leq m-1, and let c_{m}^{*} = \max_{i} c_{i}. In this paper, we precisely analyze the competitive ratio of Greedy for the general m-valued case, and show that the algorithm Greedy is (1+c_{m}^{*})-competitive.

1 sitasi en Computer Science, Mathematics

Detail Sumber

arXiv Open Access 2011

Greedy Algorithms for Multi-Queue Buffer Management with Class Segregation

Toshiya Itoh, Seiji Yoshimoto

In this paper, we focus on a multi-queue buffer management in which packets of different values are segregated in different queues. Our model consists of m packets values and m queues. Recently, Al-Bawani and Souza (arXiv:1103.6049v2 [cs.DS] 30 Mar 2011) presented an online multi-queue buffer management algorithm Greedy and showed that it is 2-competitive for the general m-valued case, i.e., m packet values are 0 < v_{1} < v_{2} < ... < v_{m}, and (1+v_{1}/v_{2})-competitive for the two-valued case, i.e., two packet values are 0 < v_{1} < v_{2}. For the general m-valued case, let c_i = (v_{i} + \sum_{j=1}^{i-1} 2^{j-1} v_{i-j})/(v_{i+1} + \sum_{j=1}^{i-1}2^{j-1}v_{i-j}) for 1 \leq i \leq m-1, and let c_{m}^{*} = \max_{i} c_{i}. In this paper, we precisely analyze the competitive ratio of Greedy for the general m-valued case, and show that the algorithm Greedy is (1+c_{m}^{*})-competitive.

en cs.DM

Detail Sumber

S2 Open Access 2011

Greedy Algorithms for Multi-Queue Buffer Management Policies with Class Segregation

T. Itoh, Seiji Yoshimoto

en Mathematics, Computer Science

Detail Sumber

S2 Open Access 2011

HIM Trimester Program on Analysis and Numerics for High Dimensional Problems

M. Hegland, V. Pestov, I. Sloan

en

Detail Sumber

arXiv Open Access 2009

Randomised Buffer Management with Bounded Delay against Adaptive Adversary

Łukasz Jeż

We give a new analysis of the RMix algorithm by Chin et al. for the Buffer Management with Bounded Delay problem (or online scheduling of unit jobs to maximise weighted throughput). Unlike the original proof of e/(e-1)-competitiveness, the new one holds even in adaptive-online adversary model. In fact, the proof works also for a slightly more general problem studied by Bie{ń}kowski et al.

en cs.DS

Detail Sumber

Hasil untuk "cs.DS"