Hasil untuk "cs.IR"

Menampilkan 20 dari ~218173 hasil · dari CrossRef, arXiv, DOAJ

JSON API
arXiv Open Access 2025
Preference-based learning for news headline recommendation

Alexandre Bouras, Audrey Durand, Richard Khoury

This study explores strategies for optimizing news headline recommendations through preference-based learning. Using real-world data of user interactions with French-language online news posts, we learn a headline recommender agent under a contextual bandit setting. This allows us to explore the impact of translation on engagement predictions, as well as the benefits of different interactive strategies on user engagement during data collection. Our results show that explicit exploration may not be required in the presence of noisy contexts, opening the door to simpler but efficient strategies in practice.

en cs.IR, cs.LG
arXiv Open Access 2025
Reranking with Compressed Document Representation

Hervé Déjean, Stéphane Clinchant

Reranking, the process of refining the output of a first-stage retriever, is often considered computationally expensive, especially with Large Language Models. Borrowing from recent advances in document compression for RAG, we reduce the input size by compressing documents into fixed-size embedding representations. We then teach a reranker to use compressed inputs by distillation. Although based on a billion-size model, our trained reranker using this compressed input can challenge smaller rerankers in terms of both effectiveness and efficiency, especially for long documents. Given that text compressors are still in their early development stages, we view this approach as promising.

en cs.IR
arXiv Open Access 2025
UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

Nolwenn Bernard, Krisztian Balog

Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.

en cs.IR
arXiv Open Access 2024
Supplier Recommendation in Online Procurement

Victor Coscrato, Derek Bridge

Supply chain optimization is key to a healthy and profitable business. Many companies use online procurement systems to agree contracts with suppliers. It is vital that the most competitive suppliers are invited to bid for such contracts. In this work, we propose a recommender system to assist with supplier discovery in road freight online procurement. Our system is able to provide personalized supplier recommendations, taking into account customer needs and preferences. This is a novel application of recommender systems, calling for design choices that fit the unique requirements of online procurement. Our preliminary results, using real-world data, are promising.

en cs.IR, cs.LG
arXiv Open Access 2024
JungleGPT: Designing and Optimizing Compound AI Systems for E-Commerce

Sherry Ruan, Tian Zhao

LLMs have significantly advanced the e-commerce industry by powering applications such as personalized recommendations and customer service. However, most current efforts focus solely on monolithic LLMs and fall short in addressing the complexity and scale of real-world e-commerce scenarios. In this work, we present JungleGPT, the first compound AI system tailored for real-world e-commerce applications. We outline the system's design and the techniques used to optimize its performance for practical use cases, which have proven to reduce inference costs to less than 1% of what they would be with a powerful, monolithic LLM.

en cs.IR
arXiv Open Access 2024
Domain Adaptation of Multilingual Semantic Search -- Literature Review

Anna Bringmann, Anastasia Zhukova

This literature review gives an overview of current approaches to perform domain adaptation in a low-resource and approaches to perform multilingual semantic search in a low-resource setting. We developed a new typology to cluster domain adaptation approaches based on the part of dense textual information retrieval systems, which they adapt, focusing on how to combine them efficiently. We also explore the possibilities of combining multilingual semantic search with domain adaptation approaches for dense retrievers in a low-resource setting.

en cs.IR, cs.LG
arXiv Open Access 2020
An Improved Relevance Feedback in CBIR

Subhadip Maji, Smarajit Bose

Relevance Feedback in Content-Based Image Retrieval is a method where the feedback of the performance is being used to improve itself. Prior works use feature re-weighting and classification techniques as the Relevance Feedback methods. This paper shows a novel addition to the prior methods to further improve the retrieval accuracy. In addition to all of these, the paper also shows a novel idea to even improve the 0-th iteration retrieval accuracy from the information of Relevance Feedback.

en cs.IR, stat.ML
arXiv Open Access 2019
How to define co-occurrence in different domains of study?

Mathieu Roche

This position paper presents a comparative study of co-occurrences. Some similarities and differences in the definition exist depending on the research domain (e.g. linguistics, NLP, computer science). This paper discusses these points, and deals with the methodological aspects in order to identify co-occurrences in a multidisciplinary paradigm.

en cs.IR, cs.AI
CrossRef Open Access 2018
КЛАСТЕРНАЯ САМООРГАНИЗАЦИЯ ИНТЕРМЕТАЛЛИЧЕСКИХ СИСТЕМ: МЕТАЛЛОКЛАСТЕРЫ Cs И Cs И МЕТАЛЛООКСИДНЫЙ КЛАСТЕР CsO ДЛЯ САМОСБОРКИ КРИСТАЛЛИЧЕСКОЙ СТРУКТУРЫ (Cs)(Cs)(CsO), "Физика и химия стекла"

В. Я. Шевченко, В.А. Блатов, Г.Д. Илюшин

Проведен геометрический и топологический анализ металлооксида с минимальным известным содержанием кислорода CsO, образующегося из кислородсодержащего расплава металлического Cs. Для определения кластеров-прекурсоров кристаллических структур использованы специальные алгоритмы разложения структурных графов на кластерные субструктуры (пакет программ ToposPro). Определены участвующие в самосборке кристаллических структур кластеры-прекурсоры: трехоктаэдрические кластеры CsO, октаэдрические кластеры Cs, тетраэдрические кластеры Cs. Реконструированы симметрийный и топологический коды процессов самосборки кристаллических структур из кластеров-прекурсоров в виде: первичная цепь микрослой микрокаркас.

arXiv Open Access 2018
Demystifying Core Ranking in Pinterest Image Search

Linhong Zhu

Pinterest Image Search Engine helps hundreds of millions of users discover interesting content everyday. This motivates us to improve the image search quality by evolving our ranking techniques. In this work, we share how we practically design and deploy various ranking pipelines into Pinterest image search ecosystem. Specifically, we focus on introducing our novel research and study on three aspects: training data, user/image featurization and ranking models. Extensive offline and online studies compared the performance of different models and demonstrated the efficiency and effectiveness of our final launched ranking models.

en cs.IR
arXiv Open Access 2018
Towards a simplified ontology for better e-commerce search

Aliasgar Kutiyanawala, Prateek Verma, Zheng et al.

Query Understanding is a semantic search method that can classify tokens in a customer's search query to entities such as Product, Brand, etc. This method can overcome the limitations of bag-of-words methods but requires an ontology. We show that current ontologies are not optimized for search and propose a simplified ontology framework designed specifically for e-commerce search and retrieval. We also present three methods for automatically extracting product classes for the proposed ontology and compare their performance relative to each other.

en cs.IR
arXiv Open Access 2018
Information Retrieval in African Languages

Hussein Suleman

Developing Information Retrieval (IR) tools and techniques in African languages suffers from the dual problems of a lack of algorithms and very small test data collections. This affects the creation of practical IR systems and limits the ability to apply IR to address human and socio-economic problems, which is an urgent need in poor countries. This position paper presents an overview of recent and current work conducted at the University of Cape Town in this area. While many problems have been investigated at an early stage, limited dataset sizes for local African languages still persists as a significant limitation and stumbling block.

en cs.IR
arXiv Open Access 2017
TableQA: Question Answering on Tabular Data

Svitlana Vakulenko, Vadim Savenkov

Tabular data is difficult to analyze and to search through, yielding for new tools and interfaces that would allow even non tech-savvy users to gain insights from open datasets without resorting to specialized data analysis tools or even without having to fully understand the dataset structure. The goal of our demonstration is to showcase answering natural language questions from tabular data, and to discuss related system configuration and model training aspects. Our prototype is publicly available and open-sourced (see https://svakulenko.ai.wu.ac.at/tableqa).

en cs.IR
arXiv Open Access 2017
Annotation based automatic action processing

Elias Kärle, Dieter Fensel

With a strong motivational background in search engine optimization the amount of structured data on the web is growing rapidly. The main search engine providers are promising great increase in visibility through annotation of the web page's content with the vocabulary of schema.org and thus providing it as structured data. But besides the usage by search engines the data can be used in various other ways, for example for automatic processing of annotated web services or actions. In this work we present an approach to consume and process schema.org annotated data on the web and give an idea how a best practice can look like.

en cs.IR
arXiv Open Access 2016
A Minimum Spanning Tree Representation of Anime Similarities

Canggih Puspo Wibowo

In this work, a new way to represent Japanese animation (anime) is presented. We applied a minimum spanning tree to show the relation between anime. The distance between anime is calculated through three similarity measurements, namely crew, score histogram, and topic similarities. Finally, the centralities are also computed to reveal the most significance anime. The result shows that the minimum spanning tree can be used to determine the similarity anime. Furthermore, by using centralities calculation, we found some anime that are significant to others.

en cs.IR, cs.DM
arXiv Open Access 2015
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval

Richard Zanibbi, Awelemdy Orakwue

We summarize math search engines and search interfaces produced by the Document and Pattern Recognition Lab in recent years, and in particular the min math search interface and the Tangent search engine. Source code for both systems are publicly available. "The Masses" refers to our emphasis on creating systems for mathematical non-experts, who may be looking to define unfamiliar notation, or browse documents based on the visual appearance of formulae rather than their mathematical semantics.

en cs.IR
arXiv Open Access 2015
Large Scale Discovery of Seasonal Music From User Data

Cameron Summers, Phillip Popp

The consumption history of online media content such as music and video offers a rich source of data from which to mine information. Trends in this data are of particular interest because they reflect user preferences as well as associated cultural contexts that can be exploited in systems such as recommendation or search. This paper classifies songs as seasonal using a large, real-world dataset of user listening data. Results show strong performance of classification of Christmas music with Gaussian Mixture Models.

en cs.IR, cs.MM

Halaman 9 dari 10909