arXiv Open Access 2025

Towards Understanding the Use of MLLM-Enabled Applications for Visual Interpretation by Blind and Low Vision People

Ricardo E. Gonzalez Penuela Ruiying Hu Sharon Lin Tanisha Shende Shiri Azenkot
Lihat Sumber

Abstrak

Blind and Low Vision (BLV) people have adopted AI-powered visual interpretation applications to address their daily needs. While these applications have been helpful, prior work has found that users remain unsatisfied by their frequent errors. Recently, multimodal large language models (MLLMs) have been integrated into visual interpretation applications, and they show promise for more descriptive visual interpretations. However, it is still unknown how this advancement has changed people's use of these applications. To address this gap, we conducted a two-week diary study in which 20 BLV people used an MLLM-enabled visual interpretation application we developed, and we collected 553 entries. In this paper, we report a preliminary analysis of 60 diary entries from 6 participants. We found that participants considered the application's visual interpretations trustworthy (mean 3.75 out of 5) and satisfying (mean 4.15 out of 5). Moreover, participants trusted our application in high-stakes scenarios, such as receiving medical dosage advice. We discuss our plan to complete our analysis to inform the design of future MLLM-enabled visual interpretation systems.

Topik & Kata Kunci

Penulis (5)

R

Ricardo E. Gonzalez Penuela

R

Ruiying Hu

S

Sharon Lin

T

Tanisha Shende

S

Shiri Azenkot

Format Sitasi

Penuela, R.E.G., Hu, R., Lin, S., Shende, T., Azenkot, S. (2025). Towards Understanding the Use of MLLM-Enabled Applications for Visual Interpretation by Blind and Low Vision People. https://arxiv.org/abs/2503.05899

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓