arXiv Open Access 2025

Privacy Disclosure of Similarity Rank in Speech and Language Processing

Tom Bäckström Mohammad Hassan Vali My Nguyen Silas Rech
Lihat Sumber

Abstrak

Speaker, author, and other biometric identification applications often compare a sample's similarity to a database of templates to determine the identity. Given that data may be noisy and similarity measures can be inaccurate, such a comparison may not reliably identify the true identity as the most similar. Still, even the similarity rank based on an inaccurate similarity measure can disclose private information about the true identity. We propose a methodology for quantifying the privacy disclosure of such a similarity rank by estimating its probability distribution. It is based on determining the histogram of the similarity rank of the true speaker, or when data is scarce, modeling the histogram with the beta-binomial distribution. We express the disclosure in terms of entropy (bits), such that the disclosure from independent features are additive. Our experiments demonstrate that all tested speaker and author characterizations contain personally identifying information (PII) that can aid in identification, with embeddings from speaker recognition algorithms containing the most information, followed by phone embeddings, linguistic embeddings, and fundamental frequency. Our initial experiments show that the disclosure of PII increases with the length of test samples, but it is bounded by the length of database templates. The provided metric, similarity rank disclosure, provides a way to compare the disclosure of PII between biometric features and merge them to aid identification. It can thus aid in the holistic evaluation of threats to privacy in speech and other biometric technologies.

Topik & Kata Kunci

Penulis (4)

T

Tom Bäckström

M

Mohammad Hassan Vali

M

My Nguyen

S

Silas Rech

Format Sitasi

Bäckström, T., Vali, M.H., Nguyen, M., Rech, S. (2025). Privacy Disclosure of Similarity Rank in Speech and Language Processing. https://arxiv.org/abs/2508.05250

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓