Hasil "cs.IR" - JURNALIN

CrossRef Open Access 2026

A Versatile Method for Synthesizing Colloidal Cr 3+ -Based Fluoride Nanocrystals: Near-IR-Emitting Cs 2 NaCrF 6 , Na 3 CrF 6 , and Yb 3+ -Doped Cs 2 NaCrF 6

Eden Tzanetopoulos, Daniel R. Gamelin

en

Detail DOI Sumber

arXiv Open Access 2025

Optimal signals assignment for eBay View Item page

Matan Mandelbrod, Biwei Jiang, Giald Wagner et al.

Signals are short textual or visual snippets displayed on the eBay View-Item (VI) page, providing additional, contextual information for users about the viewed item. The aim in displaying the signals is to facilitate intelligent purchase and to incentivise engagement. In this paper, we present two approaches for developing statistical models that optimally populate the VI page with signals. Both approaches were A/B tested, and yielded significant increase in business metrics.

en cs.IR

Detail Sumber

CrossRef Open Access 2023

„Išlaisvinti“, pasmerkti ir „nedėkingi“: Ukrainos ir ukrainiečių bolševikinė „resovietizacija“ (1943 m.–6 deš. pradžia)

Olena Stiazhkina

Šio straipsnio tikslas – išanalizuoti pirmuosius bolševikų valdžios „išvadavimo“ ir prieškarinio status quo atkūrimo žingsnius Ukrainos teritorijoje. Užduotis – išsiaiškinti, kokias „išvadavimo“ praktikas patyrė okupuoti žmonės; kaip bolševikai reagavo į tautinės idėjos atgaivinimą; kaip resovietizacija tapo Rusijos imperinės tvarkos atkūrimo pagrindu; kaip Ukraina tapo placdarmu iš naujo išrasti sovietinio imperializmo modelį, „orientuotą į Rusiją“. Šiame straipsnyje resovietizacija traktuojama kaip bolševikų valdžios mechanizmai ir praktikos (bei įrankiai) ukrainiečių bendruomenių, kurios po karo turėjo Kremliui pavojingų intencijų – visuomenės solidarumo ir gebėjimo pasipriešinti – atžvilgiu. Darbo šaltiniai – centrinių valdžios institucijų, saugumo tarnybų archyviniai dokumentai, atsiminimai, statistikos dokumentai, korespondencija. Metodologinis pagrindas yra agentūros samprata, kuri reiškia galimybę bet kokiomis aplinkybėmis pasirinkti savo gyvenimo variantus. Išvados. Jau pirmaisiais grįžimo į Ukrainos žemes etapais bolševikai atnaujino represijas ir plėšikavimą, kas buvo plačiai taikoma iki Antrojo pasaulinio karo. Persekiojimu ir represijomis buvo smogta Ukrainos nacionaliniam atgimimui. Vykdant plačias kampanijas prieš Ukrainos buržuazinį nacionalizmą ir už „dėkingumą rusams už išvadavimą“, ukrainiečiai buvo užmaršinami ir kaip Antrojo pasaulinio karo aukos, ir kaip didvyriai.

en

Detail DOI Sumber

arXiv Open Access 2022

An Introduction to Matrix factorization and Factorization Machines in Recommendation System, and Beyond

Yuefeng Zhang

This paper aims at a better understanding of matrix factorization (MF), factorization machines (FM), and their combination with deep algorithms' application in recommendation systems. Specifically, this paper will focus on Singular Value Decomposition (SVD) and its derivations, e.g Funk-SVD, SVD++, etc. Step-by-step formula calculation and explainable pictures are displayed. What's more, we explain the DeepFM model in which FM is assisted by deep learning. Through numerical examples, we attempt to tie the theory to real-world problems.

en cs.IR, cs.LG

Detail Sumber

arXiv Open Access 2022

Predicting Query-Item Relationship using Adversarial Training and Robust Modeling Techniques

Min Seok Kim

We present an effective way to predict search query-item relationship. We combine pre-trained transformer and LSTM models, and increase model robustness using adversarial training, exponential moving average, multi-sampled dropout, and diversity based ensemble, to tackle an extremely difficult problem of predicting against queries not seen before. All of our strategies focus on increasing robustness of deep learning models and are applicable in any task where deep learning models are used. Applying our strategies, we achieved 10th place in KDD Cup 2022 Product Substitution Classification task.

en cs.IR, cs.AI

Detail Sumber

arXiv Open Access 2020

Keyword Search Engine Enriched by Expert System Features

Olegs Verhodubs

Keyword search engines are essential elements of large information spaces. The largest information space is the Web, and keyword search engines play crucial role there. The advent of keyword search engines has provided a quantum leap in the development of the Web. Since then, the Web has continued to evolve, and keyword search systems have proven inadequate. A new quantum leap in the development of keyword search engines is needed. This quantum leap can be provided with more intellectual keyword search engines. The increased intelligence of such keyword search engines can be achieved through a combination of keyword search engines and expert systems. The paper reveals how it can be done.

en cs.IR

Detail Sumber

arXiv Open Access 2020

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

Jingtao Zhan, Jiaxin Mao, Yiqun Liu et al.

Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, to represent documents and queries with fixed-length contextualized embeddings. The inner products of query and document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency is comparable to bag-of-words methods.

en cs.IR

Detail Sumber

arXiv Open Access 2020

Building benchmarking frameworks for supporting replicability and reproducibility: spatial and textual analysis as an example

Yingjie Hu

Replicability and reproducibility (R&R) are critical for the long-term prosperity of a scientific discipline. In GIScience, researchers have discussed R&R related to different research topics and problems, such as local spatial statistics, digital earth, and metadata (Fotheringham, 2009; Goodchild, 2012; Anselin et al., 2014). This position paper proposes to further support R&R by building benchmarking frameworks in order to facilitate the replication of previous research for effective and effcient comparisons of methods and software tools developed for addressing the same or similar problems. Particularly, this paper will use geoparsing, an important research problem in spatial and textual analysis, as an example to explain the values of such benchmarking frameworks.

en cs.IR

Detail Sumber

arXiv Open Access 2020

A White Box Analysis of ColBERT

Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant

Transformer-based models are nowadays state-of-the-art in ad-hoc Information Retrieval, but their behavior is far from being understood. Recent work has claimed that BERT does not satisfy the classical IR axioms. However, we propose to dissect the matching process of ColBERT, through the analysis of term importance and exact/soft matching patterns. Even if the traditional axioms are not formally verified, our analysis reveals that ColBERT: (i) is able to capture a notion of term importance; (ii) relies on exact matches for important terms.

en cs.IR

Detail Sumber

arXiv Open Access 2019

A Comparison of Information Retrieval Techniques for Detecting Source Code Plagiarism

Vasishtha Sriram Jayapati, Ajay Venkitaraman

Plagiarism is a commonly encountered problem in the academia. While there are several tools and techniques to efficiently determine plagiarism in text, the same cannot be said about source code plagiarism. To make the existing systems more efficient, we use several information retrieval techniques to find the similarity between source code files written in Java. We later use JPlag, which is a string-based plagiarism detection tool used in academia to match the plagiarized source codes. In this paper, we aim to generalize on the efficiency and effectiveness of detecting plagiarism using different information retrieval models rather than using just string manipulation algorithms.

en cs.IR

Detail Sumber

arXiv Open Access 2019

One-Shot Template Matching for Automatic Document Data Capture

Pranjal Dhakal, Manish Munikar, Bikram Dahal

In this paper, we propose a novel one-shot template-matching algorithm to automatically capture data from business documents with an aim to minimize manual data entry. Given one annotated document, our algorithm can automatically extract similar data from other documents having the same format. Based on a set of engineered visual and textual features, our method is invariant to changes in position and value. Experiments on a dataset of 595 real invoices demonstrate 86.4% accuracy.

en cs.IR, cs.CL

Detail Sumber

arXiv Open Access 2018

Next Item Recommendation with Self-Attention

Shuai Zhang, Yi Tay, Lina Yao et al.

In this paper, we propose a novel sequence-aware recommendation model. Our model utilizes self-attention mechanism to infer the item-item relationship from user's historical interactions. With self-attention, it is able to estimate the relative weights of each item in user interaction trajectories to learn better representations for user's transient interests. The model is finally trained in a metric learning framework, taking both short-term and long-term intentions into consideration. Experiments on a wide range of datasets on different domains demonstrate that our approach outperforms the state-of-the-art by a wide margin.

en cs.IR

Detail Sumber

arXiv Open Access 2018

Commentary on Quantum-Inspired Information Retrieval

Elham Ashoori, Terry Rudolph

There have been suggestions within the Information Retrieval (IR) community that quantum mechanics (QM) can be used to help formalise the foundations of IR. The invoked connection to QM is mathematical rather than physical. The proposed ideas are concerned with information which is encoded, processed and accessed in classical computers. However, some of the suggestions have been thoroughly muddled with questions about applying techniques of quantum information theory in IR, and it is often unclear whether or not the suggestion is to perform actual quantum information processing on the information. This paper is an attempt to provide some conceptual clarity on the emerging issues.

en cs.IR, quant-ph

Detail Sumber

arXiv Open Access 2018

An Exploration of H-1B Visa Applications in the United States

Habeeb Hooshmand, Joseph Martinsen, Jonathan Arauco et al.

The H-1B visa program is a very important tool for US-based businesses and educational institutes to recruit foreign talent. While the ultimate decision to certify an application lies with the United States Department of Labor, there are signals that can be used to determine whether an application is likely to be certified or denied. In this paper we first perform a data-driven exploratory analysis. We then leverage the features to train several classifiers and compare their performance. Finally, we discuss the implications of this work and future work that can be done in this area.

en cs.IR

Detail Sumber

arXiv Open Access 2017

Towards the Improvement of Automated Scientific Document Categorization by Deep Learning

Thomas Krause

This master thesis describes an algorithm for automated categorization of scientific documents using deep learning techniques and compares the results to the results of existing classification algorithms. As an additional goal a reusable API is to be developed allowing the automation of classification tasks in existing software. A design will be proposed using a convolutional neural network as a classifier and integrating this into a REST based API. This is then used as the basis for an actual proof of concept implementation presented as well in this thesis. It will be shown that the deep learning classifier provides very good result in the context of multi-class document categorization and that it is feasible to integrate such classifiers into a larger ecosystem using REST based services.

en cs.IR, cs.CL

Detail Sumber

CrossRef Open Access 2016

Transición hacia la paz y zonas marrones urbanas

Mauricio Uribe López

La transición de la guerra a la paz puede conllevar un cambio en el centro de gravedad de la violencia hacia micro-espacios deprimidos de las ciudades que constituyen lo que se puede denominar, adaptando el concepto de Guillermo O’Donnell, zonas marrones urbanas. Las situaciones de postconflicto altamente violento y las de alta violencia societal que corresponden al tipo de casos que se pueden caracterizar como casos de paz violenta, requieren un enfoque de seguridad ciudadana urbana que vaya en sintonía con el giro local que se ha dado en las aproximaciones críticas de la construcción de paz.

en

Detail DOI Sumber

arXiv Open Access 2015

Enabling Complex Wikipedia Queries - Technical Report

Gilad Katz, Bracha Shapira

In this technical report we present a database schema used to store Wikipedia so it can be easily used in query-intensive applications. In addition to storing the information in a way that makes it highly accessible, our schema enables users to easily formulate complex queries using information such as the anchor-text of links and their location in the page, the titles and number of redirect pages for each page and the paragraph structure of entity pages. We have successfully used the schema in domains such as recommender systems, information retrieval and sentiment analysis. In order to assist other researchers, we now make the schema and its content available online.

en cs.IR

Detail Sumber

arXiv Open Access 2013

Simple Search Engine Model: Selective Properties

Mahyuddin K. M. Nasution

In this paper we study the relationship between query and search engine by exploring the selective properties based on a simple search engine. We used the set theory and utilized the words and terms for defining singleton and doubleton in the event spaces and then provided their implementation for proving the existence of the shadow of micro-cluster.

en cs.IR

Detail Sumber

arXiv Open Access 2013

Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm

B. P. Pande, Pawan Tamta, H. S. Dhami

A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.

en cs.IR, cs.CL

Detail Sumber

arXiv Open Access 2013

Stacking from Tags: Clustering Bookmarks around a Theme

Arkaitz Zubiaga, Alberto Pérez García-Plaza, Víctor Fresno et al.

Since very recently, users on the social bookmarking service Delicious can stack web pages in addition to tagging them. Stacking enables users to group web pages around specific themes with the aim of recommending to others. However, users still stack a small subset of what they tag, and thus many web pages remain unstacked. This paper presents early research towards automatically clustering web pages from tags to find stacks and extend recommendations.

en cs.IR

Detail Sumber

Hasil untuk "cs.IR"