arXiv Open Access 2023

MARRS: Multimodal Reference Resolution System

Halim Cagri Ates Shruti Bhargava Site Li Jiarui Lu Siddhardha Maddula +13 lainnya

Lihat Sumber

Abstrak

Successfully handling context is essential for any dialog understanding task. This context maybe be conversational (relying on previous user queries or system responses), visual (relying on what the user sees, for example, on their screen), or background (based on signals such as a ringing alarm or playing music). In this work, we present an overview of MARRS, or Multimodal Reference Resolution System, an on-device framework within a Natural Language Understanding system, responsible for handling conversational, visual and background context. In particular, we present different machine learning models to enable handing contextual queries; specifically, one to enable reference resolution, and one to handle context via query rewriting. We also describe how these models complement each other to form a unified, coherent, lightweight system that can understand context while preserving user privacy.

Topik & Kata Kunci

cs.CL cs.AI cs.LG

Penulis (18)

Halim Cagri Ates

Shruti Bhargava

Site Li

Jiarui Lu

Siddhardha Maddula

Joel Ruben Antony Moniz

Anil Kumar Nalamalapu

Roman Hoang Nguyen

Melis Ozyildirim

Alkesh Patel

Dhivya Piraviperumal

Vincent Renkens

Ankit Samal

Thy Tran

Bo-Hsiang Tseng

Hong Yu

Yuan Zhang

Rong Zou

Format Sitasi

APA MLA BibTeX

Ates, H.C., Bhargava, S., Li, S., Lu, J., Maddula, S., Moniz, J.R.A. et al. (2023). MARRS: Multimodal Reference Resolution System. https://arxiv.org/abs/2311.01650

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓