arXiv Open Access 2025

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology

Rinka Nobukawa Makito Kitamura Tomohiko Nakamura Shinnosuke Takamichi Hiroshi Saruwatari
Lihat Sumber

Abstrak

This paper defines the novel task of drum-to-vocal percussion (VP) sound conversion. VP imitates percussion instruments through human vocalization and is frequently employed in contemporary a cappella music. It exhibits acoustic properties distinct from speech and singing (e.g., aperiodicity, noisy transients, and the absence of linguistic structure), making conventional speech or singing synthesis methods unsuitable. We thus formulate VP synthesis as a timbre transfer problem from drum sounds, leveraging their rhythmic and timbral correspondence. To support this formulation, we define three requirements for successful conversion: rhythmic fidelity, timbral consistency, and naturalness as VP. We also propose corresponding subjective evaluation criteria. We implement two baseline conversion methods using a neural audio synthesizer, the real-time audio variational autoencoder (RAVE), with and without vector quantization (VQ). Subjective experiments show that both methods produce plausible VP outputs, with the VQ-based RAVE model yielding more consistent conversion.

Topik & Kata Kunci

Penulis (5)

R

Rinka Nobukawa

M

Makito Kitamura

T

Tomohiko Nakamura

S

Shinnosuke Takamichi

H

Hiroshi Saruwatari

Format Sitasi

Nobukawa, R., Kitamura, M., Nakamura, T., Takamichi, S., Saruwatari, H. (2025). Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology. https://arxiv.org/abs/2509.16862

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓