The search for prior art is crucial in patent application processing, it consists in retrieving other documents relevant to the invention of the application. Most methods feed a search engine with keywords that are extracted by frequency-analysis methods. We suggest and demonstrate a new method that relies on the way information is provided in patent claims.
Supply chain optimization is key to a healthy and profitable business. Many companies use online procurement systems to agree contracts with suppliers. It is vital that the most competitive suppliers are invited to bid for such contracts. In this work, we propose a recommender system to assist with supplier discovery in road freight online procurement. Our system is able to provide personalized supplier recommendations, taking into account customer needs and preferences. This is a novel application of recommender systems, calling for design choices that fit the unique requirements of online procurement. Our preliminary results, using real-world data, are promising.
Recommender system (RS) is an established technology with successful applications in social media, e-commerce, entertainment, and more. RSs are indeed key to the success of many popular APPs, such as YouTube, Tik Tok, Xiaohongshu, Bilibili, and others. This paper explores the methodology for improving modern industrial RSs. It is written for experienced RS engineers who are diligently working to improve their key performance indicators, such as retention and duration. The experiences shared in this paper have been tested in some real industrial RSs and are likely to be generalized to other RSs as well. Most contents in this paper are industry experience without publicly available references.
This manuscript introduces an autotuned algorithm for searching nearest neighbors based on neighbor graphs and optimization metaheuristics to produce Pareto-optimal searches for quality and search speed automatically; the same strategy is also used to produce indexes that achieve a minimum quality. Our approach is described and benchmarked with other state-of-the-art similarity search methods, showing convenience and competitiveness.
Elias Iosif, Klitos Christodoulou, Andreas Vlachos
The regulatory framework of cryptocurrencies (and, in general, blockchain tokens) is of paramount importance. This framework drives nearly all key decisions in the respective business areas. In this work, a computational model is proposed for quantitatively estimating the regulatory stance of countries with respect to cryptocurrencies. This is conducted via web mining utilizing web search engines. The proposed model is experimentally validated. In addition, unsupervised learning (clustering) is applied for better analyzing the automatically derived estimations. Overall, very good performance is achieved by the proposed algorithmic approach.
An unprecedented IrIII[df(CF3)ppy]2(dtbbpy)PF6-catalyzed simple photochemical process for direct addition of amines and alcohols to the relatively less reactive nitrile triple bond is described herein.
Checking and confirming factual information in texts and speeches is vital to determine the veracity and correctness of the factual statements. This work was previously done by journalists and other manual means but it is a time-consuming task. With the advancements in Information Retrieval and NLP, research in the area of Fact-checking is getting attention for automating it. CLEF-2018 and 2019 organised tasks related to Fact-checking and invited participants. This project focuses on CLEF-2019 Task-1 Check-Worthiness and experiments using the latest Sentence-BERT pre-trained embeddings, topic Modeling and sentiment score are performed. Evaluation metrics such as MAP, Mean Reciprocal Rank, Mean R-Precision and Mean Precision@N present the improvement in the results using the techniques.
In this work, we attempt to solve the Hit Song Science problem, which aims to predict which songs will become chart-topping hits. We constructed a dataset with approximately 1.8 million hit and non-hit songs and extracted their audio features using the Spotify Web API. We test four models on our dataset. Our best model was random forest, which was able to predict Billboard song success with 88% accuracy.
Research publication requires public datasets. In recommender systems, some datasets are largely used to compare algorithms against a --supposedly-- common benchmark. Problem: for various reasons, these datasets are heavily preprocessed, making the comparison of results across papers difficult. This paper makes explicit the variety of preprocessing and evaluation protocols to test the robustness of a dataset (or lack of flexibility). While robustness is good to compare results across papers, for flexible datasets we propose a method to select a preprocessing protocol and share results more transparently.
There is growing research interest in recommendation as a multi-stakeholder problem, one where the interests of multiple parties should be taken into account. This category subsumes some existing well-established areas of recommendation research including reciprocal and group recommendation, but a detailed taxonomy of different classes of multi-stakeholder recommender systems is still lacking. Fairness-aware recommendation has also grown as a research area, but its close connection with multi-stakeholder recommendation is not always recognized. In this paper, we define the most commonly observed classes of multi-stakeholder recommender systems and discuss how different fairness concerns may come into play in such systems.
The provision of multilingual event-centric temporal knowledge graphs such as EventKG enables structured access to representations of a large number of historical and contemporary events in a variety of language contexts. Timelines provide an intuitive way to facilitate an overview of events related to a query entity - i.e., an entity or an event of user interest - over a certain period of time. In this paper, we present EventKG+TL - a novel system that generates cross-lingual event timelines using EventKG and facilitates an overview of the language-specific event relevance and popularity along with the cross-lingual differences.
Tabular data is difficult to analyze and to search through, yielding for new tools and interfaces that would allow even non tech-savvy users to gain insights from open datasets without resorting to specialized data analysis tools or even without having to fully understand the dataset structure. The goal of our demonstration is to showcase answering natural language questions from tabular data, and to discuss related system configuration and model training aspects. Our prototype is publicly available and open-sourced (see https://svakulenko.ai.wu.ac.at/tableqa).
We propose a framework for discriminative Information Retrieval (IR) atop linguistic features, trained to improve the recall of tasks such as answer candidate passage retrieval, the initial step in text-based Question Answering (QA). We formalize this as an instance of linear feature-based IR (Metzler and Croft, 2007), illustrating how a variety of knowledge discovery tasks are captured under this approach, leading to a 44% improvement in recall for candidate triage for QA.
In this work, a new way to represent Japanese animation (anime) is presented. We applied a minimum spanning tree to show the relation between anime. The distance between anime is calculated through three similarity measurements, namely crew, score histogram, and topic similarities. Finally, the centralities are also computed to reveal the most significance anime. The result shows that the minimum spanning tree can be used to determine the similarity anime. Furthermore, by using centralities calculation, we found some anime that are significant to others.
Man-Hung Jong, Chong-Han Ri, Hyok-Chol Choe
et al.
We propose a method for using the scoring values of passages to effectively retrieve documents in a Question Answering system. For this, we suggest evaluation function that considers proximity between each question terms in passage. And using this evaluation function , we extract a documents which involves scoring values in the highest collection, as a suitable document for question. The proposed method is very effective in document retrieval of Korean question answering system.
This article focuses on the description and evaluation of a new unsupervised learning method of clustering of definitions in Spanish according to their semantic. Textual Energy was used as a clustering measure, and we study an adaptation of the Precision and Recall to evaluate our method.
In this paper we present a generic framework for ontology-based information retrieval. We focus on the recognition of semantic information extracted from data sources and the mapping of this knowledge into ontology. In order to achieve more scalability, we propose an approach for semantic indexing based on entity retrieval model. In addition, we have used ontology of public transportation domain in order to validate these proposals. Finally, we evaluated our system using ontology mapping and real world data sources. Experiments show that our framework can provide meaningful search results.
In this paper, we develop a dynamic exploration/ exploitation (exr/exp) strategy for contextual recommender systems (CRS). Specifically, our methods can adaptively balance the two aspects of exr/exp by automatically learning the optimal tradeoff. This consists of optimizing a utility function represented by a linearized form of the probability distributions of the rewards of the clicked and the non-clicked documents already recommended. Within an offline simulation framework we apply our algorithms to a CRS and conduct an evaluation with real event log data. The experimental results and detailed analysis demonstrate that our algorithms outperform existing algorithms in terms of click-through-rate (CTR).
When a user finds an interesting recommendation in a recommender system, the user may want to recall related items recommended in the past to reconsider or to enjoy them again. If the system can pick up such "recalled" items at each user's request, it must deepen the user experience. We propose a model and the algorithm for such personalized "recalling" in conventional recommender systems, which is an application of neural networks for associative memory. In our model, the "recalled" items can reflect each user's personality beyond naive similarities between items.