We consider a matching problem for time series with values in an arbitrary metric space, with the stretching penalty given by the Hellinger kernel. To optimize this matching, we introduce the Elastic Time Warping algorithm with a cubic computational complexity.
In this work, we introduce a German version for ColBERT, a late interaction multi-dense vector retrieval method, with a focus on RAG applications. We also present the main features of our package for ColBERT models, supporting both retrieval and fine-tuning workflows.
Cloud-based data commons, data meshes, data hubs, and other data platforms are important ways to manage, analyze and share data to accelerate research and to support reproducible research. This is an annotated glossary of some of the more common terms used in articles and discussions about these platforms.
New local ultrametricity measures for finite metric data are proposed through the viewpoint that their Vietoris-Rips corners are samples from p-adic Mumford curves endowed with a Radon measure coming from a regular differential 1-form. This is experimentally applied to the iris dataset.
We present Thistle, a fully functional vector database. Thistle is an entry into the domain of latent knowledge use in answering search queries, an ongoing research topic at both start-ups and search engine companies. We implement Thistle with several well-known algorithms, and benchmark results on the MS MARCO dataset. Results help clarify the latent knowledge domain as well as the growing Rust ML ecosystem.
Recommender Systems have been the cornerstone of online retailers. Traditionally they were based on rules, relevance scores, ranking algorithms, and supervised learning algorithms, but now it is feasible to use reinforcement learning algorithms to generate meaningful recommendations. This work investigates and develops means to setup a reproducible testbed, and evaluate different state of the art algorithms in a realistic environment. It entails a proposal, literature review, methodology, results, and comments.
This article presents a summary graph to show the relationships between Information Retrieval (IR) and other related disciplines. The figure tells the key differences between them and the conditions under which one would transition into another.
This paper aims to improve upon the generic recommendations that Reddit provides for its users. We propose a novel personalized recommender system that learns from both, the presence and the content of user-subreddit interaction, using implicit and explicit signals to provide robust recommendations.
This paper presents a visual tool, AVIATOR, that integrates the progressive visual analytics paradigm in the IR evaluation process. This tool serves to speed-up and facilitate the performance assessment of retrieval models enabling a result analysis through visual facilities. AVIATOR goes one step beyond the common "compute wait visualize" analytics paradigm, introducing a continuous evaluation mechanism that minimizes human and computational resource consumption.
This paper considers the problem of document ranking in information retrieval systems by Learning to Rank. We propose ConvRankNet combining a Siamese Convolutional Neural Network encoder and the RankNet ranking model which could be trained in an end-to-end fashion. We prove a general result justifying the linear test-time complexity of pairwise Learning to Rank approach. Experiments on the OHSUMED dataset show that ConvRankNet outperforms systematically existing feature-based models.
We design a recommender system for research papers based on topic-modeling. The users feedback to the results is used to make the results more relevant the next time they fire a query. The user's needs are understood by observing the change in the themes that the user shows a preference for over time.
Decentralized recommender system does not rely on the central service provider, and the users can keep the ownership of their ratings. This article brings the theoretically well-studied matrix factorization method into the decentralized recommender system, where the formerly prevalent algorithms are heuristic and hence lack of theoretical guarantee. Our preliminary simulation results show that this method is promising.
In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification.
The information that mobiles can access becomes very wide nowadays, and the user is faced with a dilemma: there is an unlimited pool of information available to him but he is unable to find the exact information he is looking for. This is why the current research aims to design Recommender Systems (RS) able to continually send information that matches the user's interests in order to reduce his navigation time. In this paper, we treat the different approaches to recommend.
The notion of profile appeared in the 1970s decade, which was mainly due to the need to create custom applications that could be adapted to the user. In this paper, we treat the different aspects of the user's profile, defining it, profile, its features and its indicators of interest, and then we describe the different approaches of modelling and acquiring the user's interests.
Recent numerical results show that non-Bayesian knowledge revision may be helpful in search engine training and optimization. In order to demonstrate how basic assumption about about the physical nature (and hence the observed statistics) of retrieved documents can affect the performance of search engines we suggest an idealized toy model with minimal number of parameters.
In this paper we study the relationship between query and search engine by exploring the adaptive properties based on a simple search engine. We used set theory and utilized the words and terms for defining singleton space of event in a search engine model, and then provided the inclusion between one singleton to another.
It is known that humans can easily read words where the letters have been jumbled in a certain way. This paper examines this problem by associating a distance measure with the jumbling process. Modifications to text were generated according to the Damerau-Levenshtein distance and it was checked if the users are able to read it. Graphical representations of the results are provided.