Using LLMs as rerankers requires experimenting with various hyperparameters, such as prompt formats, model choice, and reformulation strategies. We introduce PyTerrier-GenRank, a PyTerrier plugin to facilitate seamless reranking experiments with LLMs, supporting popular ranking strategies like pointwise and listwise prompting. We validate our plugin through HuggingFace and OpenAI hosted endpoints.
Fairness is a popular research topic in recent years. A research topic closely related to fairness is bias and debiasing. Among different types of bias problems, position bias is one of the most widely encountered symptoms. Position bias means that recommended items on top of the recommendation list has a higher likelihood to be clicked than items on bottom of the same list. To mitigate this problem, we propose to use regularization technique to reduce the bias effect. In the experiment section, we prove that our method is superior to other modern algorithms.
We conducted a global comparative analysis of the coverage of American topics in different language versions of Wikipedia, using over 90 million Wikidata items and 40 million Wikipedia articles in 58 languages. Our study aimed to investigate whether Americanization is more or less dominant in different regions and cultures and to determine whether interest in American topics is universal.
This paper presents an open-source toolbox, MMRec for multimodal recommendation. MMRec simplifies and canonicalizes the process of implementing and comparing multimodal recommendation models. The objective of MMRec is to provide a unified and configurable arena that can minimize the effort in implementing and testing multimodal recommendation models. It enables multimodal models, ranging from traditional matrix factorization to modern graph-based algorithms, capable of fusing information from multiple modalities simultaneously. Our documentation, examples, and source code are available at \url{https://github.com/enoche/MMRec}.
This work is devoted to a certain class of probabilistic snapshots for elements of the observed data stream. We show you how one can control their probabilistic properties and we show some potential applications. Our solution can be used to store information from the observed history with limited memory. It can be used for both web server applications and Ad hoc networks and, for example, for automatic taking snapshots from video stream online of unknown size.
Resolving the contextual dependency is one of the most challenging tasks in the Conversational system. Our submission to CAsT-2021 aimed to preserve the key terms and the context in all subsequent turns and use classical Information retrieval methods. It was aimed to pull as relevant documents as possible from the corpus. We have participated in automatic track and submitted two runs in the CAsT-2021. Our submission has produced a mean NDCG@3 performance better than the median model.
Keyphrase extraction methods can provide insights into large collections of documents such as social media posts. Existing methods, however, are less suited for the real-time analysis of streaming data, because they are computationally too expensive or require restrictive constraints regarding the structure of keyphrases. We propose an efficient approach to extract keyphrases from large document collections and show that the method also performs competitively on individual documents.
Recommender systems (RecSys) have been well developed to assist user decision making. Traditional RecSys usually optimize a single objective (e.g., rating prediction errors or ranking quality) in the model. There is an emerging demand in multi-objective optimization recently in RecSys, especially in the area of multi-stakeholder and multi-task recommender systems. This article provides an overview of multi-objective recommendations, followed by the discussions with case studies. The document is considered as a supplementary material for our tutorial on multi-objective recommendations at ACM SIGKDD 2021.
Ciro Javier Diaz Penedo, Lucas Leonardo Silveira Costa
In this work we deal with the problem of grouping in headlines of the newspaper ABC (Australian Bro-adcasting Corporation) using unsupervised machine learning techniques. We present and discuss the results on the clusters found
This paper describes problems with the current way we compare the diversity of different recommendation lists in offline experiments. We illustrate the problems with a case study. We propose the Sudden Death score as a new and better way of making these comparisons.
Recently, chatbots received an increased attention from industry and diverse research communities as a dialogue-based interface providing advanced human-computer interactions. On the other hand, Open Data continues to be an important trend and a potential enabler for government transparency and citizen participation. This paper shows how these two paradigms can be combined to help non-expert users find and discover open government datasets through dialogue.
We consider recommendation systems that need to operate under wireless bandwidth constraints, measured as number of broadcast transmissions, and demonstrate a (tight for some instances) tradeoff between regret and bandwidth for two scenarios: the case of multi-armed bandit with context, and the case where there is a latent structure in the message space that we can exploit to reduce the learning phase.
The interpretation of the experimental data collected by testing systems across input datasets and model parameters is of strategic importance for system design and implementation. In particular, finding relationships between variables and detecting the latent variables affecting retrieval performance can provide designers, engineers and experimenters with useful if not necessary information about how a system is performing. This paper discusses the use of Structural Equation Modelling (SEM) in providing an in-depth explanation of evaluation results and an explanation of failures and successes of a system; in particular, we focus on the case of Information Retrieval.
We investigate exact indexing for high dimensional Lp norms based on the 1-Lipschitz property and projection operators. The orthogonal projection that satisfies the 1-Lipschitz property for the Lp norm is described. The adaptive projection defined by the first principal component is introduced.
Comments for a product or a news article are rapidly growing and became a medium of measuring quality products or services. Consequently, spammers have been emerged in this area to bias them toward their favor. In this paper, we propose an efficient spammer detection method using structural rank of author specific term-document matrices. The use of structural rank was found effective and far faster than similar methods.
Pierre Joulin, Romain Deveaud, Eric SanJuan-Ibekwe
et al.
We describe a novel, "focusable", scalable, distributed web crawler based on GNU/Linux and PostgreSQL that we designed to be easily extendible and which we have released under a GNU public licence. We also report a first use case related to an analysis of Twitter's streams about the french 2012 presidential elections and the URL's it contains.
Future Information Retrieval, especially in connection with the internet, will incorporate the content descriptions that are generated with social network extraction technologies and preferably incorporate the probability theory for assigning the semantic. Although there is an increasing interest about social network extraction, but a little of them has a significant impact to infomation retrieval. Therefore this paper proposes a model of information retrieval from the social network extraction.
In this paper we present new improvement ideas of the original PageRank algorithm. The first idea is to introduce an evaluation of the statistical reliability of the ranking score of each node based on the local graph property and the second one is to introduce the notion of the path diversity. The path diversity can be exploited to dynamically modify the increment value of each node in the random surfer model or to dynamically adapt the damping factor. We illustrate the impact of such modifications through examples and simple simulations.