arXiv Open Access 2025

Best Practices and Considerations for Child Speech Corpus Collection and Curation in Educational, Clinical, and Forensic Scenarios

John Hansen Satwik Dutta Ellen Grand
Lihat Sumber

Abstrak

A child's spoken ability continues to change until their adult age. Until 7-8yrs, their speech sound development and language structure evolve rapidly. This dynamic shift in their spoken communication skills and data privacy make it challenging to curate technology-ready speech corpora for children. This study aims to bridge this gap and provide researchers and practitioners with the best practices and considerations for developing such a corpus based on an intended goal. Although primarily focused on educational goals, applications of child speech data have spread across fields including clinical and forensics fields. Motivated by this goal, we describe the WHO, WHAT, WHEN, and WHERE of data collection inspired by prior collection efforts and our experience/knowledge. We also provide a guide to establish collaboration, trust, and for navigating the human subjects research protocol. This study concludes with guidelines for corpus quality check, triage, and annotation.

Topik & Kata Kunci

Penulis (3)

J

John Hansen

S

Satwik Dutta

E

Ellen Grand

Format Sitasi

Hansen, J., Dutta, S., Grand, E. (2025). Best Practices and Considerations for Child Speech Corpus Collection and Curation in Educational, Clinical, and Forensic Scenarios. https://arxiv.org/abs/2507.12870

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓