arXiv Open Access 2025

EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation

Tianheng Zhu Yinfeng Yu Liejun Wang Fuchun Sun Wendong Zheng

Lihat Sumber

Abstrak

This paper presents EGSTalker, a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS). Designed to enhance both speed and visual fidelity, EGSTalker requires only 3-5 minutes of training video to synthesize high-quality facial animations. The framework comprises two key stages: static Gaussian initialization and audio-driven deformation. In the first stage, a multi-resolution hash triplane and a Kolmogorov-Arnold Network (KAN) are used to extract spatial features and construct a compact 3D Gaussian representation. In the second stage, we propose an Efficient Spatial-Audio Attention (ESAA) module to fuse audio and spatial cues, while KAN predicts the corresponding Gaussian deformations. Extensive experiments demonstrate that EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed. These results highlight EGSTalker's potential for real-time multimedia applications.

Topik & Kata Kunci

cs.SD cs.AI eess.AS

Penulis (5)

Tianheng Zhu

Yinfeng Yu

Liejun Wang

Fuchun Sun

Wendong Zheng

Format Sitasi

APA MLA BibTeX

Zhu, T., Yu, Y., Wang, L., Sun, F., Zheng, W. (2025). EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation. https://arxiv.org/abs/2510.08587

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓