arXiv Open Access 2023

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Otmane Sakhi David Rohde Nicolas Chopin

Lihat Sumber

Abstrak

An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

Topik & Kata Kunci

cs.LG cs.IR stat.ML

Penulis (3)

Otmane Sakhi

David Rohde

Nicolas Chopin

Format Sitasi

APA MLA BibTeX

Sakhi, O., Rohde, D., Chopin, N. (2023). Fast Slate Policy Optimization: Going Beyond Plackett-Luce. https://arxiv.org/abs/2308.01566

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓