There is a wide variety of music similarity detection algorithms, while discussions about music plagiarism in the real world are often based on audience perceptions. Therefore, we aim to conduct a study to examine the key criteria of human perception of music plagiarism, focusing on the three commonly used musical features in similarity analysis: melody, rhythm, and chord progression. After identifying the key features and levels of variation humans use in perceiving musical similarity, we propose a LLM-as-a-judge framework that applies a systematic, step-by-step approach, drawing on modules that extract such high-level attributes.
RESEARCH OBJECTIVE: The article is an attempt to show the process, indicated in the title, that plays a crucial role in learners’ development: introducing individuals and social groups to the world of music and drawing value from engagement with it.
THE RESEARCH PROBLEM AND METHODS: The central research question is how musical perception brings learners closer to audiation. The study is based on an in-depth analysis of the literature, which serves to identify themes that illuminate both the theoretical and practical dimensions of this issue from the perspective of music education.
THE PROCESS OF ARGUMENTATION: The discussion begins with an outline of the praxis- aesthetic model, which remains dominant in contemporary music education. This introduction provides the basis for examining selected aspects of music reception and for clarifying the practical applications of musical perception and audiation.
RESEARCH RESULTS: Active engagement with music in educational settings shapes learners’ preferences and promotes readiness for perception, as well as for the ideas that music conveys. It evokes emotions and subjective experiences, while also supporting practice through immersion in the structure of a musical work, grasping the composer’s language, and understanding the intentions of the performer. Auditory perception is necessary for audiation, understood as musical thinking, which allows listeners to grasp the meaning of the musical language. A listener who can recognize, assimilate, and imitate physically present sounds is prepared to develop audiational skills, that is, the ability to assign meaning to them.
CONCLUSIONS, RECOMMENDATIONS AND APPLICABLE VALUE OF RESEARCH: Introducing learners to the world of music requires consideration of both the values inherent in music itself and learners’ readiness to learn about its diverse expressions. The praxis-aesthetic perspective adopted here provides a scaffold for teachers and learners to deepen their reflections on the theoretical and practical contexts of musical perception and audiation.
Jacob Obrecht’s Missa Scaramella survives as a unicum in two partbooks (altus and bassus) in Kraków’s Biblioteka Jagiellońska. A reconstruction of the mass has recently been published by Fabrice Fitch (in collaboration with Philipp Weller and Paul Kolb). To ‘verify’ the results of that reconstruction, this review will look into claims made about the original notation of the cantus firmus (Scaramella va alla guerra) and compare it with another, independently conceived, reconstruction of the Missa Scaramella by Marc Busnel.
Aims: This project investigates whether mother and baby music sessions in the inpatient perinatal mental health setting can effectively reduce maternal anxiety, improve mood, and strengthen the mother–baby bond. It aims to review existing literature and reflect on the practical implementation of a pilot music group within a mother and baby unit (MBU).
Many music AI models learn a map between music content and human-defined labels. However, many annotations, such as chords, can be naturally expressed within the music modality itself, e.g., as sequences of symbolic notes. This observation enables both understanding tasks (e.g., chord recognition) and conditional generation tasks (e.g., chord-conditioned melody generation) to be unified under a music-for-music sequence modeling paradigm. In this work, we propose parameter-efficient solutions for a variety of symbolic music-for-music tasks. The high-level idea is that (1) we utilize a pretrained Language Model (LM) for both the reference and the target sequence and (2) we link these two LMs via a lightweight adapter. Experiments show that our method achieves superior performance among different tasks such as chord recognition, melody generation, and drum track generation. All demos, code and model weights are publicly available.
Jaza Syed, Ivan Meresman Higgs, Ondřej Cífka
et al.
Automatic lyrics transcription (ALT) remains a challenging task in the field of music information retrieval, despite great advances in automatic speech recognition (ASR) brought about by transformer-based architectures in recent years. One of the major challenges in ALT is the high amplitude of interfering audio signals relative to conventional ASR due to musical accompaniment. Recent advances in music source separation have enabled automatic extraction of high-quality separated vocals, which could potentially improve ALT performance. However, the effect of source separation has not been systematically investigated in order to establish best practices for its use. This work examines the impact of source separation on ALT using Whisper, a state-of-the-art open source ASR model. We evaluate Whisper's performance on original audio, separated vocals, and vocal stems across short-form and long-form transcription tasks. For short-form, we suggest a concatenation method that results in a consistent reduction in Word Error Rate (WER). For long-form, we propose an algorithm using source separation as a vocal activity detector to derive segment boundaries, which results in a consistent reduction in WER relative to Whisper's native long-form algorithm. Our approach achieves state-of-the-art results for an open source system on the Jam-ALT long-form ALT benchmark, without any training or fine-tuning. We also publish MUSDB-ALT, the first dataset of long-form lyric transcripts following the Jam-ALT guidelines for which vocal stems are publicly available.
Aims: The role of arts and music in supporting subjective wellbeing (SWB) is increasingly recognised. Robust evidence is needed to support policy and practice. This article reports on the first of four reviews of Culture, Sport and Wellbeing (CSW) commissioned by the Economic and Social Research Council (ESRC)-funded What Works Centre for Wellbeing (https://whatworkswellbeing.org/). Objective: To identify SWB outcomes for music and singing in adults. Methods: Comprehensive literature searches were conducted in PsychInfo, Medline, ERIC, Arts and Humanities, Social Science and Science Citation Indexes, Scopus, PILOTS and CINAHL databases. From 5,397 records identified, 61 relevant records were assessed using GRADE and CERQual schema. Results: A wide range of wellbeing measures was used, with no consistency in how SWB was measured across the studies. A wide range of activities was reported, most commonly music listening and regular group singing. Music has been associated with reduced anxiety in young adults, enhanced mood and purpose in adults and mental wellbeing, quality of life, self-awareness and coping in people with diagnosed health conditions. Music and singing have been shown to be effective in enhancing morale and reducing risk of depression in older people. Few studies address SWB in people with dementia. While there are a few studies of music with marginalised communities, participants in community choirs tend to be female, white and relatively well educated. Research challenges include recruiting participants with baseline wellbeing scores that are low enough to record any significant or noteworthy change following a music or singing intervention. Conclusions: There is reliable evidence for positive effects of music and singing on wellbeing in adults. There remains a need for research with sub-groups who are at greater risk of lower levels of wellbeing, and on the processes by which wellbeing outcomes are, or are not, achieved.
As a result of the new aesthetics and compositional techniques of musical modernism around and after 1900, the expectations of a concert and opera audience and the actual musical production of the musical avant-garde increasingly drifted apart. Scandal concerts multiplied and heated debates were carried out in the feuilleton. Musicians reacted to this discourse not only by establishing their own communities of interest and presentation platforms for new music. In addition, there was an increased interest among composers to communicate their music to an audience in such a way that it was understood. Introductory lectures, essays, interviews, program notes, and conversations increasingly accompanied performances or new releases of new music. These composers’ statements have been used by musicological research so far largely as sources of information about the respective works. Hardly has research reflected upon the fact that these utterances are (1) communicative acts that presuppose a real or imagined audience and are (2) embedded in a discursive framework to which they respond directly or indirectly.In the article, I discuss these two aspects using examples from the composer Alban Berg. The first is his Wozzeck lecture, which he held for the first time in 1929 on the occasion of the Oldenburg premiere of the work and repeated several times in the following years at other performance venues. In the lecture, Berg uses more than fifty sound examples on the piano to refer primarily to recurring chords and harmonies. Clearly discernible here is the performative strategy of conveying to the audience a specific sonority of non-tonally bound chordal combinations through concrete, sensual auditory impressions. Berg also emphasizes the compositional regularities in Wozzeck, which can be read as a reaction to the repeated reproach of the press that modern music does not follow any regularities. Similar considerations are present in the radio dialogue “What is atonal?” which was broadcast on Radio Wien on April 23, 1930, as a fictitious dialogue between Berg and the journalist Julius Bistron. A detailed analysis of the dialogue script shows that Bistron acted as a representative of various audience segments throughout the lecture. The last section of the article suggests how theories of communication, media, and performance can be used for the examination of such lectures and dialogues and how this deepens our understanding of these communicative acts within the highly controversial field of new music in the twentieth century.
Muriel T. Zaatar, Kenda Alhakim, Mohammad Enayeh
et al.
Music is a universal language that can elicit profound emotional and cognitive responses. In this literature review, we explore the intricate relationship between music and the brain, from how it is decoded by the nervous system to its therapeutic potential in various disorders. Music engages a diverse network of brain regions and circuits, including sensory-motor processing, cognitive, memory, and emotional components. Music-induced brain network oscillations occur in specific frequency bands, and listening to one's preferred music can grant easier access to these brain functions. Moreover, music training can bring about structural and functional changes in the brain, and studies have shown its positive effects on social bonding, cognitive abilities, and language processing. We also discuss how music therapy can be used to retrain impaired brain circuits in different disorders. Understanding how music affects the brain can open up new avenues for music-based interventions in healthcare, education, and wellbeing.
We introduce a project that revives a piece of 15th-century Korean court music, Chihwapyeong and Chwipunghyeong, composed upon the poem Songs of the Dragon Flying to Heaven. One of the earliest examples of Jeongganbo, a Korean musical notation system, the remaining version only consists of a rudimentary melody. Our research team, commissioned by the National Gugak (Korean Traditional Music) Center, aimed to transform this old melody into a performable arrangement for a six-part ensemble. Using Jeongganbo data acquired through bespoke optical music recognition, we trained a BERT-like masked language model and an encoder-decoder transformer model. We also propose an encoding scheme that strictly follows the structure of Jeongganbo and denotes note durations as positions. The resulting machine-transformed version of Chihwapyeong and Chwipunghyeong were evaluated by experts and performed by the Court Music Orchestra of National Gugak Center. Our work demonstrates that generative models can successfully be applied to traditional music with limited training data if combined with careful design.
In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset).
Gabriel Souza, Flavio Figueiredo, Alexei Machado
et al.
In recent years, deep learning has achieved formidable results in creative computing. When it comes to music, one viable model for music generation are Transformer based models. However, while transformers models are popular for music generation, they often rely on annotated structural information. In this work, we inquire if the off-the-shelf Music Transformer models perform just as well on structural similarity metrics using only unannotated MIDI information. We show that a slight tweak to the most common representation yields small but significant improvements. We also advocate that searching for better unannotated musical representations is more cost-effective than producing large amounts of curated and annotated data.
Much of Western classical music relies on instruments based on acoustic resonance, which produce harmonic or quasi-harmonic sounds. In contrast, since the mid-twentieth century, popular music has increasingly been produced in recording studios, where it is not bound by the constraints of harmonic sounds. In this study, we use modified MPEG-7 features to explore and characterise the evolution of noise and inharmonicity in popular music since 1961. We place this evolution in the context of other broad categories of music, including Western classical piano music, orchestral music, and musique concrète. We introduce new features that distinguish between inharmonicity caused by noise and that resulting from interactions between discrete partials. Our analysis reveals that the history of popular music since 1961 can be divided into three phases. From 1961 to 1972, inharmonicity in popular music, initially only slightly higher than in orchestral music, increased significantly. Between 1972 and 1986, this rise in inharmonicity was accompanied by an increase in noise, but since 1986, both inharmonicity and noise have moderately decreased. In recent years (up to 2020), popular music has remained much more inharmonic than popular music from the 1960s or orchestral music involving acoustic resonance instruments. However, it has become less noisy, with noise levels comparable to those of orchestral music. We relate these trends to the evolution of music production techniques. In particular, the use of multi-tracking may explain the higher inharmonicity in popular music compared to orchestral music. We illustrate these trends with analyses of key artists and tracks.
Music Genre Classification (MGC) automatically categorizes music into different genres based on various musical attributes and features in a small number of music files. This is a crucial problem in the field of music information retrieval as it provides a way to organize and analyse large amounts of music files. MGC can be performed using conventional machine learning algorithms such as SVM, k-nearest neighbours, Decision trees, and neural networks. These algorithms learn to recognize different musical features and attributes to categorize the music files into different genres. The literature shows that the performance of conventional machine learning algorithms is inferior to deep learning algorithms such as CNN, RNN, etc., in various applications. Hence, the CNN algorithm is adapted to implement the classification of music files. This aims to classify music genres using CNN deep learning techniques. The performance of the algorithms for MGC can be evaluated using metrics such as accuracy, precision, recall, and F1-score. Additionally, the impact of different features and algorithms on the performance of MGC can be studied and compared. It has applications in areas such as automated music recommendation systems, music education, and music production. An accuracy of 83% is achieved by using CNN to accomplish the task of MGC.
Hsu-Ming Teo, Karen Pearlman, Malcolm Choat
et al.
Creative practitioners inside academia are often tasked with explaining how their embodied practices constitute research. The Peribiophoty project reverses this paradigm to ask: how does academic research constitute embodied practice? By considering the personal and intellectual contexts (peri), surrounding academics and their biographies (bio), through audio-visual representation (photy), we investigate how academic thinking is embodied thinking. The notion that “traditional” research only involves the brain is challenged by the audio-visual representations of thoughts and ideas embedded in objects, experience, time, and interactions. Peribiophoty makes its propositions about academic thinking through embodied presence and rhythmic juxtapositions of gesture, things, place, text on screen and voice. It evokes the narrative pasts and selves of the project’s literature, history, and digital games scholars as substantively entangled with their ongoing research programs and demonstrates that their academic research is necessarily an embodied and embedded practice.