Semantic Scholar Open Access 2023 3 sitasi

Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings

Lorenz Gutscher Michael Pucher Victor García

Abstrak

For languages where extensive audio data and text transcriptions are available, text-to-speech (TTS) systems have show-cased the ability to generate speech that closely resembles nat-ural human speech. However, the development of TTS systems for dialects and language varieties poses challenges such as limited data availability and strong regional variations. This paper presents a TTS system tailored for under-resourced language varieties spoken in Austrian regions. The system is built upon the FastSpeech 2 architecture and includes modifications to incorporate dialect embeddings for training and inference. It is demonstrated that employing dialect embeddings and a standard German grapheme-to-phoneme conversion is effective in modeling language varieties and provides means to shift a person’s spoken variety from one to another. This allows for the generation of regional standards for dialect speakers or the generation of dialect speech with the voice of a standard speaker. The findings unveil new possibilities and applications in other multilingual contexts where shared characteristics within the language or dialect embedding space can be leveraged.

Penulis (3)

Lorenz Gutscher

Michael Pucher

Victor García

Format Sitasi

APA MLA BibTeX

Gutscher, L., Pucher, M., García, V. (2023). Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings. https://doi.org/10.21437/sigul.2023-15

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.21437/sigul.2023-15

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 3×
Sumber Database: Semantic Scholar
DOI: 10.21437/sigul.2023-15
Akses: Open Access ✓