arXiv Open Access 2024

NPU-NTU System for Voice Privacy 2024 Challenge

Jixun Yao Nikita Kuzmin Qing Wang Pengcheng Guo Ziqian Ning +4 lainnya
Lihat Sumber

Abstrak

Speaker anonymization is an effective privacy protection solution that conceals the speaker's identity while preserving the linguistic content and paralinguistic information of the original speech. To establish a fair benchmark and facilitate comparison of speaker anonymization systems, the VoicePrivacy Challenge (VPC) was held in 2020 and 2022, with a new edition planned for 2024. In this paper, we describe our proposed speaker anonymization system for VPC 2024. Our system employs a disentangled neural codec architecture and a serial disentanglement strategy to gradually disentangle the global speaker identity and time-variant linguistic content and paralinguistic information. We introduce multiple distillation methods to disentangle linguistic content, speaker identity, and emotion. These methods include semantic distillation, supervised speaker distillation, and frame-level emotion distillation. Based on these distillations, we anonymize the original speaker identity using a weighted sum of a set of candidate speaker identities and a randomly generated speaker identity. Our system achieves the best trade-off of privacy protection and emotion preservation in VPC 2024.

Topik & Kata Kunci

Penulis (9)

J

Jixun Yao

N

Nikita Kuzmin

Q

Qing Wang

P

Pengcheng Guo

Z

Ziqian Ning

D

Dake Guo

K

Kong Aik Lee

E

Eng-Siong Chng

L

Lei Xie

Format Sitasi

Yao, J., Kuzmin, N., Wang, Q., Guo, P., Ning, Z., Guo, D. et al. (2024). NPU-NTU System for Voice Privacy 2024 Challenge. https://arxiv.org/abs/2409.04173

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓