This survey examines the most effective retrieval algorithms utilized in ad recommendation and content recommendation systems. Ad targeting algorithms rely on detailed user profiles and behavioral data to deliver personalized advertisements, thereby driving revenue through targeted placements. Conversely, organic retrieval systems aim to improve user experience by recommending content that matches user preferences. This paper compares these two applications and explains the most effective methods employed in each.
Evaluasi yang dilakukan untuk desain yang sudah ada pada perkuatan diding penahan tanah telah memenuhi kriteria saat ini. Analisa Goeteknik dinding penahan tanah dilakukan pada lokasi paket sepanjang 900 meter pada Ruas Jalan Ir. Soekarno ( Ringroad Ngawi ) yang terletak pada KM 183+200 sampai dengan KM. 184+100. Paket Penanganan Longsor Jalan 028.15 Jl. Ir. Soekarno (Ngawi) Km. 185 Cs, merupakan kegiatan pekerjaan penanganan longsor jalan yang dilaksanakan oleh Kementrian PU melalui Satuan Kerja Pelaksanaan Jalan Nasional Wilayah II Propinsi Jawa Timur melalui APBN Tahun Anggaran 2020 dan dibawah koordinasi Pejabat Pembuat Komitmen (PPK) 2.6 Jawa Timur.
In this position paper, we discuss recent applications of simulation approaches for recommender systems tasks. In particular, we describe how they were used to analyze the problem of misinformation spreading and understand which data characteristics affect the performance of recommendation algorithms more significantly. We also present potential lines of future work where simulation methods could advance the work in the recommendation community.
A significant part of human activity today consists of searching for a piece of information online, utilizing knowledge repositories. This endeavor may be time-consuming if the individual searching for the information is unfamiliar with the subject matter of that information. However, experts can aid individuals find relevant information by searching online. This paper describes a theoretical framework to model the dynamic process by which requests for information come to a system of experts, who then answer the requests by searching for those pieces of information.
The thumbnail, as the first sight of a micro-video, plays a pivotal role in attracting users to click and watch. Although several pioneer efforts have been dedicated to jointly considering the quality and representativeness for selecting the thumbnail, they are limited in exploring the influence of users` interests. While in the real scenario, the more the thumbnails satisfy the users, the more likely the micro-videos will be clicked. In this paper, we aim to select the thumbnail of a given micro-video that meets most users` interests. Towards this end, we construct a large-scale dataset for the micro-video thumbnails. Ultimately, we conduct several baselines on the dataset and demonstrate the effectiveness of our dataset.
Many computational social science projects examine online discourse surrounding a specific trending topic. These works often involve the acquisition of large-scale corpora relevant to the event in question to analyze aspects of the response to the event. Keyword searches present a precision-recall trade-off and crowd-sourced annotations, while effective, are costly. This work aims to enable automatic and accurate ad-hoc retrieval of comments discussing a trending topic from a large corpus, using only a handful of seed news articles.
Sanjoy Dasgupta, Stefanos Poulis, Christopher Tosh
The formalism of anchor words has enabled the development of fast topic modeling algorithms with provable guarantees. In this paper, we introduce a protocol that allows users to interact with anchor words to build customized and interpretable topic models. Experimental evidence validating the usefulness of our approach is also presented.
This paper presents an evaluation and an analysis of some selected information retrieval models for Bengali monolingual information retrieval task. Two models, TF-IDF model and the Okapi BM25 model have been considered for our study. The developed IR models are tested on FIRE ad hoc retrieval data sets released for different years from 2008 to 2012 and the obtained results have been reported in this paper.
Musical pieces can be modeled as complex networks. This fosters innovative ways to categorize music, paving the way towards novel applications in multimedia domains, such as music didactics, multimedia entertainment and digital music generation. Clustering these networks through their main metrics allows grouping similar musical tracks. To show the viability of the approach, we provide results on a dataset of guitar solos.
Piotr Borkowski, Krzysztof Ciesielski, Mieczysław A. Kłopotek
In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approaches, including traditional classifier ensembles. The method consists in combining a document categorization technique with a single classifier or a classifier ensemble (SEMCOM algorithm - Committee with Semantic Categorizer).
Newswire and Social Media are the major sources of information in our time. While the topical demographic of Western Media was subjects of studies in the past, less is known about Chinese Media. In this paper, we apply event detection and tracking technology to examine the information overlap and differences between Chinese and Western - Traditional Media and Social Media. Our experiments reveal a biased interest of China towards the West, which becomes particularly apparent when comparing the interest in celebrities.
Recommender systems have been successfully applied to assist decision making by producing a list of item recommendations tailored to user preferences. Traditional recommender systems only focus on optimizing the utility of the end users who are the receiver of the recommendations. By contrast, multi-stakeholder recommendation attempts to generate recommendations that satisfy the needs of both the end users and other parties or stakeholders. This paper provides an overview and discussion about the multi-stakeholder recommendations from the perspective of practical applications, available data sets, corresponding research challenges and potential solutions.
In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processing of the multilingual content. Our evaluation results and code are available on-line at https://github.com/vendi12/tweet2vec_clustering
LinkedIn search is deeply personalized - for the same queries, different searchers expect completely different results. This paper presents our approach to achieving this by mining various data sources available in LinkedIn to infer searchers' intents (such as hiring, job seeking, etc.), as well as extending the concept of homophily to capture the searcher-result similarities on many aspects. Then, learning-to-rank (LTR) is applied to combine these signals with standard search features.
Rasmus Troelsgård, Bjørn Sand Jensen, Lars Kai Hansen
Calculating similarities between objects defined by many heterogeneous data modalities is an important challenge in many multimedia applications. We use a multi-modal topic model as a basis for defining such a similarity between objects. We propose to compare the resulting similarities from different model realizations using the non-parametric Mantel test. The approach is evaluated on a music dataset.
Faceted arrangement of entities and typed relations for representing different associations between the entities are established tools in knowledge representation. In this paper, a proposal is being discussed combining both tools to draw inferences along relational paths. This approach may yield new benefit for information retrieval processes, especially when modeled for heterogeneous environments in the Semantic Web. Faceted arrangement can be used as a se-lection tool for the semantic knowledge modeled within the knowledge repre-sentation. Typed relations between the entities of different facets can be used as restrictions for selecting them across the facets.
Collaborative filtering is amongst the most preferred techniques when implementing recommender systems. Recently, great interest has turned towards parallel and distributed implementations of collaborative filtering algorithms. This work is a survey of the parallel and distributed collaborative filtering implementations, aiming not only to provide a comprehensive presentation of the field's development, but also to offer future research orientation by highlighting the issues that need to be further developed.
In this work, we conduct a joint analysis of both Vector Space and Language Models for IR using the mathematical framework of Quantum Theory. We shed light on how both models allocate the space of density matrices. A density matrix is shown to be a general representational tool capable of leveraging capabilities of both VSM and LM representations thus paving the way for a new generation of retrieval models. We analyze the possible implications suggested by our findings.
The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present a continuous and differentiable sequential document representation that goes beyond the bag of words assumption, and yet is efficient and effective. This representation employs smooth curves in the multinomial simplex to account for sequential information. We discuss the representation and its geometric properties and demonstrate its applicability for the task of text classification.
A new fast algorithm for clustering and classification of large collections of text documents is introduced. The new algorithm employs the bipartite graph that realizes the word-document matrix of the collection. Namely, the modularity of the bipartite graph is used as the optimization functional. Experiments performed with the new algorithm on a number of text collections had shown a competitive quality of the clustering (classification), and a record-breaking speed.