Ketone-directed ortho -amidation of aryl ketones with sulfonyl azides has been achieved in water via Ir( iii )-catalysis. Notably, regio-selective C5-amidation of chromanones was successfully carried out using the present catalytic conditions.
This study explores strategies for optimizing news headline recommendations through preference-based learning. Using real-world data of user interactions with French-language online news posts, we learn a headline recommender agent under a contextual bandit setting. This allows us to explore the impact of translation on engagement predictions, as well as the benefits of different interactive strategies on user engagement during data collection. Our results show that explicit exploration may not be required in the presence of noisy contexts, opening the door to simpler but efficient strategies in practice.
Reranking, the process of refining the output of a first-stage retriever, is often considered computationally expensive, especially with Large Language Models. Borrowing from recent advances in document compression for RAG, we reduce the input size by compressing documents into fixed-size embedding representations. We then teach a reranker to use compressed inputs by distillation. Although based on a billion-size model, our trained reranker using this compressed input can challenge smaller rerankers in terms of both effectiveness and efficiency, especially for long documents. Given that text compressors are still in their early development stages, we view this approach as promising.
Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.
Supply chain optimization is key to a healthy and profitable business. Many companies use online procurement systems to agree contracts with suppliers. It is vital that the most competitive suppliers are invited to bid for such contracts. In this work, we propose a recommender system to assist with supplier discovery in road freight online procurement. Our system is able to provide personalized supplier recommendations, taking into account customer needs and preferences. This is a novel application of recommender systems, calling for design choices that fit the unique requirements of online procurement. Our preliminary results, using real-world data, are promising.
LLMs have significantly advanced the e-commerce industry by powering applications such as personalized recommendations and customer service. However, most current efforts focus solely on monolithic LLMs and fall short in addressing the complexity and scale of real-world e-commerce scenarios. In this work, we present JungleGPT, the first compound AI system tailored for real-world e-commerce applications. We outline the system's design and the techniques used to optimize its performance for practical use cases, which have proven to reduce inference costs to less than 1% of what they would be with a powerful, monolithic LLM.
This literature review gives an overview of current approaches to perform domain adaptation in a low-resource and approaches to perform multilingual semantic search in a low-resource setting. We developed a new typology to cluster domain adaptation approaches based on the part of dense textual information retrieval systems, which they adapt, focusing on how to combine them efficiently. We also explore the possibilities of combining multilingual semantic search with domain adaptation approaches for dense retrievers in a low-resource setting.
Relevance Feedback in Content-Based Image Retrieval is a method where the feedback of the performance is being used to improve itself. Prior works use feature re-weighting and classification techniques as the Relevance Feedback methods. This paper shows a novel addition to the prior methods to further improve the retrieval accuracy. In addition to all of these, the paper also shows a novel idea to even improve the 0-th iteration retrieval accuracy from the information of Relevance Feedback.
This position paper presents a comparative study of co-occurrences. Some similarities and differences in the definition exist depending on the research domain (e.g. linguistics, NLP, computer science). This paper discusses these points, and deals with the methodological aspects in order to identify co-occurrences in a multidisciplinary paradigm.
Проведен геометрический и топологический анализ металлооксида с минимальным известным содержанием кислорода CsO, образующегося из кислородсодержащего расплава металлического Cs. Для определения кластеров-прекурсоров кристаллических структур использованы специальные алгоритмы разложения структурных графов на кластерные субструктуры (пакет программ ToposPro). Определены участвующие в самосборке кристаллических структур кластеры-прекурсоры: трехоктаэдрические кластеры CsO, октаэдрические кластеры Cs, тетраэдрические кластеры Cs. Реконструированы симметрийный и топологический коды процессов самосборки кристаллических структур из кластеров-прекурсоров в виде: первичная цепь микрослой микрокаркас.
Pinterest Image Search Engine helps hundreds of millions of users discover interesting content everyday. This motivates us to improve the image search quality by evolving our ranking techniques. In this work, we share how we practically design and deploy various ranking pipelines into Pinterest image search ecosystem. Specifically, we focus on introducing our novel research and study on three aspects: training data, user/image featurization and ranking models. Extensive offline and online studies compared the performance of different models and demonstrated the efficiency and effectiveness of our final launched ranking models.
Aliasgar Kutiyanawala, Prateek Verma, Zheng
et al.
Query Understanding is a semantic search method that can classify tokens in a customer's search query to entities such as Product, Brand, etc. This method can overcome the limitations of bag-of-words methods but requires an ontology. We show that current ontologies are not optimized for search and propose a simplified ontology framework designed specifically for e-commerce search and retrieval. We also present three methods for automatically extracting product classes for the proposed ontology and compare their performance relative to each other.
Developing Information Retrieval (IR) tools and techniques in African languages suffers from the dual problems of a lack of algorithms and very small test data collections. This affects the creation of practical IR systems and limits the ability to apply IR to address human and socio-economic problems, which is an urgent need in poor countries. This position paper presents an overview of recent and current work conducted at the University of Cape Town in this area. While many problems have been investigated at an early stage, limited dataset sizes for local African languages still persists as a significant limitation and stumbling block.
Tabular data is difficult to analyze and to search through, yielding for new tools and interfaces that would allow even non tech-savvy users to gain insights from open datasets without resorting to specialized data analysis tools or even without having to fully understand the dataset structure. The goal of our demonstration is to showcase answering natural language questions from tabular data, and to discuss related system configuration and model training aspects. Our prototype is publicly available and open-sourced (see https://svakulenko.ai.wu.ac.at/tableqa).
With a strong motivational background in search engine optimization the amount of structured data on the web is growing rapidly. The main search engine providers are promising great increase in visibility through annotation of the web page's content with the vocabulary of schema.org and thus providing it as structured data. But besides the usage by search engines the data can be used in various other ways, for example for automatic processing of annotated web services or actions. In this work we present an approach to consume and process schema.org annotated data on the web and give an idea how a best practice can look like.
In this work, a new way to represent Japanese animation (anime) is presented. We applied a minimum spanning tree to show the relation between anime. The distance between anime is calculated through three similarity measurements, namely crew, score histogram, and topic similarities. Finally, the centralities are also computed to reveal the most significance anime. The result shows that the minimum spanning tree can be used to determine the similarity anime. Furthermore, by using centralities calculation, we found some anime that are significant to others.
We summarize math search engines and search interfaces produced by the Document and Pattern Recognition Lab in recent years, and in particular the min math search interface and the Tangent search engine. Source code for both systems are publicly available. "The Masses" refers to our emphasis on creating systems for mathematical non-experts, who may be looking to define unfamiliar notation, or browse documents based on the visual appearance of formulae rather than their mathematical semantics.
The consumption history of online media content such as music and video offers a rich source of data from which to mine information. Trends in this data are of particular interest because they reflect user preferences as well as associated cultural contexts that can be exploited in systems such as recommendation or search. This paper classifies songs as seasonal using a large, real-world dataset of user listening data. Results show strong performance of classification of Christmas music with Gaussian Mixture Models.