SARCH: Multimodal Search for Archaeological Archives
Abstrak
In this paper, we describe a multi-modal search system designed to search old archaeological books and reports. This corpus is digitally available as scanned PDFs, but varies widely in the quality of scans. Our pipeline, designed for multi-modal archaeological documents, extracts and indexes text, images (classified into maps, photos, layouts, and others), and tables. We evaluated different retrieval strategies, including keyword-based search, embedding-based models, and a hybrid approach that selects optimal results from both modalities. We report and analyze our preliminary results and discuss future work in this exciting vertical.
Topik & Kata Kunci
Penulis (8)
Nivedita Sinha
Bharati Khanijo
Sanskar Singh
Priyansh Mahant
Ashutosh Roy
Saubhagya Singh Bhadouria
Arpan Jain
Maya Ramanath
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓