arXiv Open Access 2022

Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

Jingfei Xia Mingchen Zhuge Tiantian Geng Shun Fan Yuantai Wei +2 lainnya
Lihat Sumber

Abstrak

Figure skating scoring is challenging because it requires judging the technical moves of the players as well as their coordination with the background music. Most learning-based methods cannot solve it well for two reasons: 1) each move in figure skating changes quickly, hence simply applying traditional frame sampling will lose a lot of valuable information, especially in 3 to 5 minutes long videos; 2) prior methods rarely considered the critical audio-visual relationship in their models. Due to these reasons, we introduce a novel architecture, named Skating-Mixer. It extends the MLP framework into a multimodal fashion and effectively learns long-term representations through our designed memory recurrent unit (MRU). Aside from the model, we collected a high-quality audio-visual FS1000 dataset, which contains over 1000 videos on 8 types of programs with 7 different rating metrics, overtaking other datasets in both quantity and diversity. Experiments show the proposed method achieves SOTAs over all major metrics on the public Fis-V and our FS1000 dataset. In addition, we include an analysis applying our method to the recent competitions in Beijing 2022 Winter Olympic Games, proving our method has strong applicability.

Topik & Kata Kunci

Penulis (7)

J

Jingfei Xia

M

Mingchen Zhuge

T

Tiantian Geng

S

Shun Fan

Y

Yuantai Wei

Z

Zhenyu He

F

Feng Zheng

Format Sitasi

Xia, J., Zhuge, M., Geng, T., Fan, S., Wei, Y., He, Z. et al. (2022). Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs. https://arxiv.org/abs/2203.03990

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓