Towards better explainability in the field of information retrieval, we present CREDENCE, an interactive tool capable of generating counterfactual explanations for document rankers. Embracing the unique properties of the ranking problem, we present counterfactual explanations in terms of document perturbations, query perturbations, and even other documents. Additionally, users may build and test their own perturbations, and extract insights about their query, documents, and ranker.
This paper describes our participation to the 2022 TREC Deep Learning challenge. We submitted runs to all four tasks, with a focus on the full retrieval passage task. The strategy is almost the same as 2021, with first stage retrieval being based around SPLADE, with some added ensembling with ColBERTv2 and DocT5. We also use the same strategy of last year for the second stage, with an ensemble of re-rankers trained using hard negatives selected by SPLADE. Initial result analysis show that the strategy is still strong, but is still unclear to us what next steps should we take.
Learned sparse document representations using a transformer-based neural model has been found to be attractive in both relevance effectiveness and time efficiency. This paper describes a representation sparsification scheme based on hard and soft thresholding with an inverted index approximation for faster SPLADE-based document retrieval. It provides analytical and experimental results on the impact of this learnable hybrid thresholding scheme.
In the paper is proposed a model of multi-agent security system for searching a medical information in Internet. The advantages when using mobile agent are described, so that to perform searching in Internet. Nowadays, multi-agent systems found their application into distribution of decisions. For modeling the proposed multi-agent medical system is used JADE. Finally, the results when using mobile agent are generated that could reflect performance when working with BIG DATA. The proposed system is having also relatively high precision 96%.
These lecture notes focus on the recent advancements in neural information retrieval, with particular emphasis on the systems and models exploiting transformer networks. These networks, originally proposed by Google in 2017, have seen a large success in many natural language processing and information retrieval tasks. While there are many fantastic textbook on information retrieval and natural language processing as well as specialised books for a more advanced audience, these lecture notes target people aiming at developing a basic understanding of the main information retrieval techniques and approaches based on deep learning. These notes have been prepared for a IR graduate course of the MSc program in Artificial Intelligence and Data Engineering at the University of Pisa, Italy.
E-commerce recommender systems are becoming increasingly important in the current digital world. They are used to personalize user experience, help customers find what they need quickly and efficiently, and increase revenue for the business. However, there are several challenges associated with big data-based e-commerce recommender systems. These challenges include limited resources, data validity period, cold start, long tail problem, scalability. In this paper, we discuss the challenges and potential solutions to overcome these challenges. We also discuss the different types of e-commerce recommender systems, their advantages, and disadvantages. We conclude with some future research directions to improve the performance of e-commerce recommender systems.
Knowledge graphs (KGs) provide information in machine interpretable form. In cases where multiple KGs are used in the same system, that information needs to be integrated. This is usually done by automated matching systems. Most of those systems consider only 1:1 (binary) matching tasks. Thus, matching a larger number of knowledge graphs with such systems would lead to quadratic efforts. In this paper, we empirically analyze different approaches to reduce the task of multi-source matching to a linear number of executions of binary matching systems. We show that the matching order of KGs and the multi-source strategy actually matter and that near-optimal results can be achieved with linear efforts.
Researches about COVID-19 has increased largely, no matter in the biology field or the others. This research conducted a text analysis using LDA topic model. We firstly scraped totally 1127 articles and 5563 comments on SCMP covering COVID-19 from Jan 20 to May 19, then we trained the LDA model and tuned parameters based on the Cv coherence as the model evaluation method. With the optimal model, dominant topics, representative documents of each topic and the inconsistence between articles and comments are analyzed. 3 possible improvements are discussed at last.
Recommendation systems today exert a strong influence on consumer behavior and individual perceptions of the world. By using collaborative filtering (CF) methods to create recommendations, it generates a continuous feedback loop in which user behavior becomes magnified in the algorithmic system. Popular items get recommended more frequently, creating the bias that affects and alters user preferences. In order to visualize and compare the different biases, we will analyze the effects of recommendation systems and quantify the inequalities resulting from them.
With the increasing number of cybersecurity threats, it becomes more difficult for researchers to skim through the security reports for malware analysis. There is a need to be able to extract highly relevant sentences without having to read through the entire malware reports. In this paper, we are finding relevant malware behavior mentions from Advanced Persistent Threat Reports. This main contribution is an opening attempt to Transformer the approach for malware behavior analysis.
With the advance of science and technology, people are used to record their daily life events via writing blogs, uploading social media posts, taking photos, or filming videos. Such rich repository personal information is useful for supporting human living assistance. The main challenge is how to store and manage personal knowledge from various sources. In this position paper, we propose a research agenda on mining personal knowledge from various sources of lifelogs, personal knowledge base construction, and information recall for assisting people to recall their experiences.
For a long time, public health events, such as disease incidence or vaccination activity, have been monitored to keep track of the health status of the population, allowing to evaluate the effect of public health initiatives and to decide where resources for improving public health are best spent. This thesis investigates the use of web data mining for public health monitoring, and makes contributions in the following two areas: New approaches for predicting public health events from web mined data, and novel applications of web mined data for public health monitoring.
List-wise based learning to rank methods are generally supposed to have better performance than point- and pair-wise based. However, in real-world applications, state-of-the-art systems are not from list-wise based camp. In this paper, we propose a new non-linear algorithm in the list-wise based framework called ListMLE, which uses the Plackett-Luce (PL) loss. Our experiments are conducted on the two largest publicly available real-world datasets, Yahoo challenge 2010 and Microsoft 30K. This is the first time in the single model level for a list-wise based system to match or overpass state-of-the-art systems in real-world datasets.
Business Entrepreneurs frequently thrive on looking for ways to test business ideas, without giving too much information. Recent techniques in startup development promote the use of surveys to measure the potential client's interest. In this preliminary report, we describe the concept behind Idealize, a Shiny R application to measure the local trend strength of a potential idea. Additionally, the system might provide a relative distance to the capital city of the country. The tests were made for the United States of America, i.e., made available regarding native English language. This report shows some of the tests results with this system.
Fernando S. Aguiar Neto, Arthur F. da Costa, Marcelo G. Manzato
Neighborhood-based collaborative filtering algorithms usually adopt a fixed neighborhood size for every user or item, although groups of users or items may have different lengths depending on users' preferences. In this paper, we propose an extension to a non-personalized recommender based on confidence intervals and hierarchical clustering to generate groups of users with optimal sizes. The evaluation shows that the proposed technique outperformed the traditional recommender algorithms in four publicly available datasets.
Benchmarking recommender system and matrix completion algorithms could be greatly simplified if the entire matrix was known. We built a \url{sweetrs.org} platform with $77$ candies and sweets to rank. Over $2000$ users submitted over $44000$ grades resulting in a matrix with $28\%$ coverage. In this report, we give the full description of the environment and we benchmark the \textsc{Soft-Impute} algorithm on the dataset.
The purpose of a clickbait is to make a link so appealing that people click on it. However, the content of such articles is often not related to the title, shows poor quality, and at the end leaves the reader unsatisfied. To help the readers, the organizers of the clickbait challenge (http://www.clickbait-challenge.org/) asked the participants to build a machine learning model for scoring articles with respect to their "clickbaitness". In this paper we propose to solve the clickbait problem with an ensemble of Linear SVM models, and our approach was tested successfully in the challenge: it showed great performance of 0.036 MSE and ranked 3rd among all the solutions to the contest.
Reseña del libro:Guerrero, Mauricio (ed.) (2014). Objetos públicos, espacios privados. Usuarios y relaciones sociales en tres centros comerciales de Santiago de Cali. Cali: Universidad Icesi, pp. 158.