Eden Tzanetopoulos, Daniel R. Gamelin
Hasil untuk "cs.IR"
Menampilkan 20 dari ~218185 hasil · dari CrossRef, arXiv, DOAJ
Matan Mandelbrod, Biwei Jiang, Giald Wagner et al.
Signals are short textual or visual snippets displayed on the eBay View-Item (VI) page, providing additional, contextual information for users about the viewed item. The aim in displaying the signals is to facilitate intelligent purchase and to incentivise engagement. In this paper, we present two approaches for developing statistical models that optimally populate the VI page with signals. Both approaches were A/B tested, and yielded significant increase in business metrics.
Olena Stiazhkina
Šio straipsnio tikslas – išanalizuoti pirmuosius bolševikų valdžios „išvadavimo“ ir prieškarinio status quo atkūrimo žingsnius Ukrainos teritorijoje. Užduotis – išsiaiškinti, kokias „išvadavimo“ praktikas patyrė okupuoti žmonės; kaip bolševikai reagavo į tautinės idėjos atgaivinimą; kaip resovietizacija tapo Rusijos imperinės tvarkos atkūrimo pagrindu; kaip Ukraina tapo placdarmu iš naujo išrasti sovietinio imperializmo modelį, „orientuotą į Rusiją“. Šiame straipsnyje resovietizacija traktuojama kaip bolševikų valdžios mechanizmai ir praktikos (bei įrankiai) ukrainiečių bendruomenių, kurios po karo turėjo Kremliui pavojingų intencijų – visuomenės solidarumo ir gebėjimo pasipriešinti – atžvilgiu. Darbo šaltiniai – centrinių valdžios institucijų, saugumo tarnybų archyviniai dokumentai, atsiminimai, statistikos dokumentai, korespondencija. Metodologinis pagrindas yra agentūros samprata, kuri reiškia galimybę bet kokiomis aplinkybėmis pasirinkti savo gyvenimo variantus. Išvados. Jau pirmaisiais grįžimo į Ukrainos žemes etapais bolševikai atnaujino represijas ir plėšikavimą, kas buvo plačiai taikoma iki Antrojo pasaulinio karo. Persekiojimu ir represijomis buvo smogta Ukrainos nacionaliniam atgimimui. Vykdant plačias kampanijas prieš Ukrainos buržuazinį nacionalizmą ir už „dėkingumą rusams už išvadavimą“, ukrainiečiai buvo užmaršinami ir kaip Antrojo pasaulinio karo aukos, ir kaip didvyriai.
Yuefeng Zhang
This paper aims at a better understanding of matrix factorization (MF), factorization machines (FM), and their combination with deep algorithms' application in recommendation systems. Specifically, this paper will focus on Singular Value Decomposition (SVD) and its derivations, e.g Funk-SVD, SVD++, etc. Step-by-step formula calculation and explainable pictures are displayed. What's more, we explain the DeepFM model in which FM is assisted by deep learning. Through numerical examples, we attempt to tie the theory to real-world problems.
Min Seok Kim
We present an effective way to predict search query-item relationship. We combine pre-trained transformer and LSTM models, and increase model robustness using adversarial training, exponential moving average, multi-sampled dropout, and diversity based ensemble, to tackle an extremely difficult problem of predicting against queries not seen before. All of our strategies focus on increasing robustness of deep learning models and are applicable in any task where deep learning models are used. Applying our strategies, we achieved 10th place in KDD Cup 2022 Product Substitution Classification task.
Olegs Verhodubs
Keyword search engines are essential elements of large information spaces. The largest information space is the Web, and keyword search engines play crucial role there. The advent of keyword search engines has provided a quantum leap in the development of the Web. Since then, the Web has continued to evolve, and keyword search systems have proven inadequate. A new quantum leap in the development of keyword search engines is needed. This quantum leap can be provided with more intellectual keyword search engines. The increased intelligence of such keyword search engines can be achieved through a combination of keyword search engines and expert systems. The paper reveals how it can be done.
Jingtao Zhan, Jiaxin Mao, Yiqun Liu et al.
Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, to represent documents and queries with fixed-length contextualized embeddings. The inner products of query and document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency is comparable to bag-of-words methods.
Yingjie Hu
Replicability and reproducibility (R&R) are critical for the long-term prosperity of a scientific discipline. In GIScience, researchers have discussed R&R related to different research topics and problems, such as local spatial statistics, digital earth, and metadata (Fotheringham, 2009; Goodchild, 2012; Anselin et al., 2014). This position paper proposes to further support R&R by building benchmarking frameworks in order to facilitate the replication of previous research for effective and effcient comparisons of methods and software tools developed for addressing the same or similar problems. Particularly, this paper will use geoparsing, an important research problem in spatial and textual analysis, as an example to explain the values of such benchmarking frameworks.
Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant
Transformer-based models are nowadays state-of-the-art in ad-hoc Information Retrieval, but their behavior is far from being understood. Recent work has claimed that BERT does not satisfy the classical IR axioms. However, we propose to dissect the matching process of ColBERT, through the analysis of term importance and exact/soft matching patterns. Even if the traditional axioms are not formally verified, our analysis reveals that ColBERT: (i) is able to capture a notion of term importance; (ii) relies on exact matches for important terms.
Vasishtha Sriram Jayapati, Ajay Venkitaraman
Plagiarism is a commonly encountered problem in the academia. While there are several tools and techniques to efficiently determine plagiarism in text, the same cannot be said about source code plagiarism. To make the existing systems more efficient, we use several information retrieval techniques to find the similarity between source code files written in Java. We later use JPlag, which is a string-based plagiarism detection tool used in academia to match the plagiarized source codes. In this paper, we aim to generalize on the efficiency and effectiveness of detecting plagiarism using different information retrieval models rather than using just string manipulation algorithms.
Pranjal Dhakal, Manish Munikar, Bikram Dahal
In this paper, we propose a novel one-shot template-matching algorithm to automatically capture data from business documents with an aim to minimize manual data entry. Given one annotated document, our algorithm can automatically extract similar data from other documents having the same format. Based on a set of engineered visual and textual features, our method is invariant to changes in position and value. Experiments on a dataset of 595 real invoices demonstrate 86.4% accuracy.
Shuai Zhang, Yi Tay, Lina Yao et al.
In this paper, we propose a novel sequence-aware recommendation model. Our model utilizes self-attention mechanism to infer the item-item relationship from user's historical interactions. With self-attention, it is able to estimate the relative weights of each item in user interaction trajectories to learn better representations for user's transient interests. The model is finally trained in a metric learning framework, taking both short-term and long-term intentions into consideration. Experiments on a wide range of datasets on different domains demonstrate that our approach outperforms the state-of-the-art by a wide margin.
Elham Ashoori, Terry Rudolph
There have been suggestions within the Information Retrieval (IR) community that quantum mechanics (QM) can be used to help formalise the foundations of IR. The invoked connection to QM is mathematical rather than physical. The proposed ideas are concerned with information which is encoded, processed and accessed in classical computers. However, some of the suggestions have been thoroughly muddled with questions about applying techniques of quantum information theory in IR, and it is often unclear whether or not the suggestion is to perform actual quantum information processing on the information. This paper is an attempt to provide some conceptual clarity on the emerging issues.
Habeeb Hooshmand, Joseph Martinsen, Jonathan Arauco et al.
The H-1B visa program is a very important tool for US-based businesses and educational institutes to recruit foreign talent. While the ultimate decision to certify an application lies with the United States Department of Labor, there are signals that can be used to determine whether an application is likely to be certified or denied. In this paper we first perform a data-driven exploratory analysis. We then leverage the features to train several classifiers and compare their performance. Finally, we discuss the implications of this work and future work that can be done in this area.
Thomas Krause
This master thesis describes an algorithm for automated categorization of scientific documents using deep learning techniques and compares the results to the results of existing classification algorithms. As an additional goal a reusable API is to be developed allowing the automation of classification tasks in existing software. A design will be proposed using a convolutional neural network as a classifier and integrating this into a REST based API. This is then used as the basis for an actual proof of concept implementation presented as well in this thesis. It will be shown that the deep learning classifier provides very good result in the context of multi-class document categorization and that it is feasible to integrate such classifiers into a larger ecosystem using REST based services.
Mauricio Uribe López
La transición de la guerra a la paz puede conllevar un cambio en el centro de gravedad de la violencia hacia micro-espacios deprimidos de las ciudades que constituyen lo que se puede denominar, adaptando el concepto de Guillermo O’Donnell, zonas marrones urbanas. Las situaciones de postconflicto altamente violento y las de alta violencia societal que corresponden al tipo de casos que se pueden caracterizar como casos de paz violenta, requieren un enfoque de seguridad ciudadana urbana que vaya en sintonía con el giro local que se ha dado en las aproximaciones críticas de la construcción de paz.
Gilad Katz, Bracha Shapira
In this technical report we present a database schema used to store Wikipedia so it can be easily used in query-intensive applications. In addition to storing the information in a way that makes it highly accessible, our schema enables users to easily formulate complex queries using information such as the anchor-text of links and their location in the page, the titles and number of redirect pages for each page and the paragraph structure of entity pages. We have successfully used the schema in domains such as recommender systems, information retrieval and sentiment analysis. In order to assist other researchers, we now make the schema and its content available online.
Mahyuddin K. M. Nasution
In this paper we study the relationship between query and search engine by exploring the selective properties based on a simple search engine. We used the set theory and utilized the words and terms for defining singleton and doubleton in the event spaces and then provided their implementation for proving the existence of the shadow of micro-cluster.
B. P. Pande, Pawan Tamta, H. S. Dhami
A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.
Arkaitz Zubiaga, Alberto Pérez García-Plaza, Víctor Fresno et al.
Since very recently, users on the social bookmarking service Delicious can stack web pages in addition to tagging them. Stacking enables users to group web pages around specific themes with the aim of recommending to others. However, users still stack a small subset of what they tag, and thus many web pages remain unstacked. This paper presents early research towards automatically clustering web pages from tags to find stacks and extend recommendations.
Halaman 13 dari 10910