The Creative Musical Achievement of AI Systems Compared to Music Students: A Replication of the Study by Schreiber et al. (2024)
Abstrak
Although the last two years have seen AI systems progress significantly when it comes to generating cultural products like literature, poems, or music, the jury is still out when it comes to determining whether the aesthetic quality of these products increases in tandem with the performance enhancements of underlying large language models (LLMs). We replicated the study by Schreiber et al. (2024) to test whether the creative performance of selected LLMs had improved over the past two years in the musical domain. In an online rating experiment based on a melody continuation paradigm, 75 melodic continuations generated by the AI systems Qwen 2 (Version 72B Instruct), Llama 3 (Version 70B Instruct), and ChatGPT (Version 4) were compared to 23 solutions composed by humans. The aesthetic quality of the sound examples was then evaluated by N = 54 listeners (music students) using four criteria (convincing, logical and meaningful, interesting, and liking). As the first main finding, human-based creative solutions outperformed all three AI systems on all four dependent variables (large effect sizes 1.11 ≤ dz ≤ 2.51), thus confirming the finding by Schreiber et al. (2024). The second main finding revealed a mean (and meaningful) discrimination sensitivity of d’ = 1.09 for AI- and human-based solutions. We conclude that merely boosting the volume of training of the AI systems does not guarantee correlating improvement in the creative musical output produced under controlled conditions.
Topik & Kata Kunci
Penulis (4)
Nicholas Meier
Kilian Sander
Anton Schreiber
Reinhard Kopiez
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.5964/jbdgm.221
- Akses
- Open Access ✓