Hasil untuk "Literature on music"

Menampilkan 20 dari ~1754550 hasil · dari CrossRef, arXiv, DOAJ

JSON API
arXiv Open Access 2025
The Shape of Surprise: Structured Uncertainty and Co-Creativity in AI Music Tools

Eric Browne

Randomness plays a pivotal yet paradoxical role in computational music creativity: it can spark novelty, but unchecked chance risks incoherence. This paper presents a thematic review of contemporary AI music systems, examining how designers incorporate randomness and uncertainty into creative practice. I draw on the concept of structured uncertainty to analyse how stochastic processes are constrained within musical and interactive frameworks. Through a comparative analysis of six systems - Musika (Pasini and Schlüter, 2022), MIDI-DDSP (Wu et al., 2021), Melody RNN (Magenta Project), RAVE (Caillon and Esling, 2021), Wekinator (Fiebrink and Cook, 2010), and Somax 2 (Borg, 2019) - we identify recurring design patterns that support musical coherence, user control, and co-creativity. To my knowledge, this is the first thematic review examining randomness in AI music through structured uncertainty, offering practical insights for designers and artists aiming to support expressive, collaborative, or improvisational interactions.

arXiv Open Access 2025
A Study on the Data Distribution Gap in Music Emotion Recognition

Joann Ching, Gerhard Widmer

Music Emotion Recognition (MER) is a task deeply connected to human perception, relying heavily on subjective annotations collected from contributors. Prior studies tend to focus on specific musical styles rather than incorporating a diverse range of genres, such as rock and classical, within a single framework. In this paper, we address the task of recognizing emotion from audio content by investigating five datasets with dimensional emotion annotations -- EmoMusic, DEAM, PMEmo, WTC, and WCMED -- which span various musical styles. We demonstrate the problem of out-of-distribution generalization in a systematic experiment. By closely looking at multiple data and feature sets, we provide insight into genre-emotion relationships in existing data and examine potential genre dominance and dataset biases in certain feature representations. Based on these experiments, we arrive at a simple yet effective framework that combines embeddings extracted from the Jukebox model with chroma features and demonstrate how, alongside a combination of several diverse training sets, this permits us to train models with substantially improved cross-dataset generalization capabilities.

en cs.SD, cs.LG
DOAJ Open Access 2025
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging

Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos

We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.

Electrical engineering. Electronics. Nuclear engineering
DOAJ Open Access 2025
Groove into ageing: Exploring the effects of rhythmic exercise on the well-being of older adults

Kelvin Tan Cheng Kian, Sonia Chang

As the global population of older adults is increasing, it is increasingly important to address the well-being of this demographic. This paper presents a review of the literature on the efficacy of rhythmic exercise programmes that incorporate elements of physical movement and for enhancing the well-being of healthy older adults. Sixteen studies were identified for in-depth review. A systematic review of the literature following the PRISMA guidelines revealed promising outcomes across multiple domains. The results revealed that rhythmic exercise programmes demonstrate significant improvements in physical capabilities, cognitivefunctioning, psychological well-being, social connections, and physiological parameters. Gaps and limitations in the research to date, such as a lack of studies on the social benefits, limited variation in the types of exercise studied, skewed gender ratios, age-related differences, and the impact of music types and preferences, highlight avenues for future investigation. By addressing these gaps, future research can provide a more nuanced understanding of the effectiveness of rhythmic exercise programmes and inform the development of tailored interventions to meet the diverse needs of older adult populations.

arXiv Open Access 2024
Automatic Detection of Moral Values in Music Lyrics

Vjosa Preniqi, Iacopo Ghinassi, Julia Ive et al.

Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues. The possibility to extract morality rapidly from lyrics enables a deeper understanding of our music-listening behaviours. Building on the Moral Foundations Theory (MFT), we tasked a set of transformer-based language models (BERT) fine-tuned on 2,721 synthetic lyrics generated by a large language model (GPT-4) to detect moral values in 200 real music lyrics annotated by two experts.We evaluate their predictive capabilities against a series of baselines including out-of-domain (BERT fine-tuned on MFT-annotated social media texts) and zero-shot (GPT-4) classification. The proposed models yielded the best accuracy across experiments, with an average F1 weighted score of 0.8. This performance is, on average, 5% higher than out-of-domain and zero-shot models. When examining precision in binary classification, the proposed models perform on average 12% higher than the baselines.Our approach contributes to annotation-free and effective lyrics morality learning, and provides useful insights into the knowledge distillation of LLMs regarding moral expression in music, and the potential impact of these technologies on the creative industries and musical culture.

arXiv Open Access 2024
Exploring Diverse Sounds: Identifying Outliers in a Music Corpus

Le Cai, Sam Ferguson, Gengfa Fang et al.

Existing research on music recommendation systems primarily focuses on recommending similar music, thereby often neglecting diverse and distinctive musical recordings. Musical outliers can provide valuable insights due to the inherent diversity of music itself. In this paper, we explore music outliers, investigating their potential usefulness for music discovery and recommendation systems. We argue that not all outliers should be treated as noise, as they can offer interesting perspectives and contribute to a richer understanding of an artist's work. We introduce the concept of 'Genuine' music outliers and provide a definition for them. These genuine outliers can reveal unique aspects of an artist's repertoire and hold the potential to enhance music discovery by exposing listeners to novel and diverse musical experiences.

en cs.SD, cs.IR
arXiv Open Access 2024
A Diffusion-Based Generative Equalizer for Music Restoration

Eloi Moliner, Maija Turunen, Filip Elvander et al.

This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music.

en eess.AS, cs.SD
arXiv Open Access 2024
MIRFLEX: Music Information Retrieval Feature Library for Extraction

Anuradha Chopra, Abhinaba Roy, Dorien Herremans

This paper introduces an extendable modular system that compiles a range of music feature extraction models to aid music information retrieval research. The features include musical elements like key, downbeats, and genre, as well as audio characteristics like instrument recognition, vocals/instrumental classification, and vocals gender detection. The integrated models are state-of-the-art or latest open-source. The features can be extracted as latent or post-processed labels, enabling integration into music applications such as generative music, recommendation, and playlist generation. The modular design allows easy integration of newly developed systems, making it a good benchmarking and comparison tool. This versatile toolkit supports the research community in developing innovative solutions by providing concrete musical features.

en cs.SD, cs.AI
arXiv Open Access 2024
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks

Felipe Marra, Lucas N. Ferreira

This paper investigates the capabilities of text-to-audio music generation models in producing long-form music with prompts that change over time, focusing on soundtrack generation for Tabletop Role-Playing Games (TRPGs). We introduce Babel Bardo, a system that uses Large Language Models (LLMs) to transform speech transcriptions into music descriptions for controlling a text-to-music model. Four versions of Babel Bardo were compared in two TRPG campaigns: a baseline using direct speech transcriptions, and three LLM-based versions with varying approaches to music description generation. Evaluations considered audio quality, story alignment, and transition smoothness. Results indicate that detailed music descriptions improve audio quality while maintaining consistency across consecutive descriptions enhances story alignment and transition smoothness.

en cs.SD, cs.AI
arXiv Open Access 2024
ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio

Andrea Poltronieri, Valentina Presutti, Martín Rocamora

In the Western music tradition, chords are the main constituent components of harmony, a fundamental dimension of music. Despite its relevance for several Music Information Retrieval (MIR) tasks, chord-annotated audio datasets are limited and need more diversity. One way to improve those resources is to leverage the large number of chord annotations available online, but this requires aligning them with music audio. However, existing audio-to-score alignment techniques, which typically rely on Dynamic Time Warping (DTW), fail to address this challenge, as they require weakly aligned data for precise synchronisation. In this paper, we introduce ChordSync, a novel conformer-based model designed to seamlessly align chord annotations with audio, eliminating the need for weak alignment. We also provide a pre-trained model and a user-friendly library, enabling users to synchronise chord annotations with audio tracks effortlessly. In this way, ChordSync creates opportunities for harnessing crowd-sourced chord data for MIR, especially in audio chord estimation, thereby facilitating the generation of novel datasets. Additionally, our system extends its utility to music education, enhancing music learning experiences by providing accurately aligned annotations, thus enabling learners to engage in synchronised musical practices.

en cs.SD, cs.LG
arXiv Open Access 2024
Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN

Elona Shatri, Kalikidhar Palavala, George Fazekas

The generation of handwritten music sheets is a crucial step toward enhancing Optical Music Recognition (OMR) systems, which rely on large and diverse datasets for optimal performance. However, handwritten music sheets, often found in archives, present challenges for digitisation due to their fragility, varied handwriting styles, and image quality. This paper addresses the data scarcity problem by applying Generative Adversarial Networks (GANs) to synthesise realistic handwritten music sheets. We provide a comprehensive evaluation of three GAN models - DCGAN, ProGAN, and CycleWGAN - comparing their ability to generate diverse and high-quality handwritten music images. The proposed CycleWGAN model, which enhances style transfer and training stability, significantly outperforms DCGAN and ProGAN in both qualitative and quantitative evaluations. CycleWGAN achieves superior performance, with an FID score of 41.87, an IS of 2.29, and a KID of 0.05, making it a promising solution for improving OMR systems.

en cs.CV, cs.AI
arXiv Open Access 2024
Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music

Nithya Shikarpur, Krishna Maneesha Dendukuri, Yusong Wu et al.

Hindustani music is a performance-driven oral tradition that exhibits the rendition of rich melodic patterns. In this paper, we focus on generative modeling of singers' vocal melodies extracted from audio recordings, as the voice is musically prominent within the tradition. Prior generative work in Hindustani music models melodies as coarse discrete symbols which fails to capture the rich expressive melodic intricacies of singing. Thus, we propose to use a finely quantized pitch contour, as an intermediate representation for hierarchical audio modeling. We propose GaMaDHaNi, a modular two-level hierarchy, consisting of a generative model on pitch contours, and a pitch contour to audio synthesis model. We compare our approach to non-hierarchical audio models and hierarchical models that use a self-supervised intermediate representation, through a listening test and qualitative analysis. We also evaluate audio model's ability to faithfully represent the pitch contour input using Pearson correlation coefficient. By using pitch contours as an intermediate representation, we show that our model may be better equipped to listen and respond to musicians in a human-AI collaborative setting by highlighting two potential interaction use cases (1) primed generation, and (2) coarse pitch conditioning.

en cs.SD, cs.AI
DOAJ Open Access 2024
Electromagnetic informed data model considerations for near-field DOA and range estimates

Zohreh Ebadi, Amir Masoud Molaei, Muhammad Ali Babar Abbasi et al.

Abstract Localizing sources in the near-field is one of the emerging challenges for array signal processing, which has received a great deal of attention in recent years. The development of accurate localization algorithms requires the definition of a reliable model of the received signal that takes into account all wavefront characteristics, such as angle, range, and polarization, as well as electromagnetic effects, such as mutual coupling between antennas and the amplitude and phase behaviour of electromagnetic wavefronts. A system model that considers the electromagnetic-informed wave behaviour effects, independent of the type of receiver antennas, array structure, degree of correlation of sources signals and other electromagnetic effects, is considered an “ exact model ” in the literature. However, due to the mathematical complexity of this modeling approach, simplifications using several approximations are conventionally used. For instance, the phase of the exact model is approximated using the Fresnel approximation, while the magnitude of the exact model is simplified by assuming equal distances between the source and all elements in the array. In this work, we evaluate the accuracy of a localization algorithm, the multiple signal classification (MUSIC), using the exact and approximated models in the near-field region. Through a series of simulations, we demonstrate that the localization algorithm designed based on the electromagnetic-informed exact model outperforms the one designed using the approximated model. We also show that considering electromagnetic factors in the system model through the exact model results in a 13% improvement in the direction of arrival (DOA) root mean square error (RMSE) and a 57.7% improvement in range RMSE at signal-to-noise ratio (SNR) of 15 dB.

Medicine, Science
DOAJ Open Access 2024
Node attribute analysis for cultural data analytics: a case study on Italian XX–XXI century music

Michele Coscia

Abstract Cultural data analytics aims to use analytic methods to explore cultural expressions—for instance art, literature, dance, music. The common thing between cultural expressions is that they have multiple qualitatively different facets that interact with each other in non trivial and non learnable ways. To support this observation, we use the Italian music record industry from 1902 to 2024 as a case study. In this scenario, a possible research objective could be to discuss the relationships between different music genres as they are performed by different bands. Estimating genre similarity by counting the number of records each band published performing a given genre is not enough, because it assumes bands operate independently from each other. In reality, bands share members and have complex relationships. These relationships cannot be automatically learned, both because we miss the data behind their creation, but also because they are established in a serendipitous way between artists, without following consistent patterns. However, we can be map them in a complex network. We can then use the counts of band records with a given genre as a node attribute in a band network. In this paper we show how recently developed techniques for node attribute analysis are a natural choice to analyze such attributes. Alternative network analysis techniques focus on analyzing nodes, rather than node attributes, ending up either being inapplicable in this scenario, or requiring the creation of more complex n-partite high order structures that can result less intuitive. By using node attribute analysis techniques, we show that we are able to describe which music genres concentrate or spread out in this network, which time periods show a balance of exploration-versus-exploitation, which Italian regions correlate more with which music genres, and a new approach to classify clusters of coherent music genres or eras of activity by the distance on this network between genres or years.

Applied mathematics. Quantitative methods
DOAJ Open Access 2024
Fazil Say e la Musica Turca Moderna e Contemporanea

Nicoletta Leone

All’inizio del XX secolo, il nuovo governo di Mustafa Kemal Atatürk si interessò a portare innovazione nella scena musicale turca, dando inizio ad un nuovo genere di musica turca che doveva fondere gli elementi tradizionali con lo stile compositivo occidentale. È sotto questa luce che bisogna analizzare la musica di Fazil Say (1970-): le sue composizioni sono affacci nella storia e tradizioni del suo Paese, e in ognuna di esse egli trova nuovi modi di evocare suoni e ambienti, con e senza la presenza di veri e propri strumenti tradizionali turchi. Con questo lavoro si è voluto esplorare la storia della musica turca dal Novecento ad oggi e il contributo di Fazil Say in qualità di uno dei massimi esponenti della musica del suo Paese, tramite un’analisi della forma e della struttura di due sue composizioni per violino che riassumono appieno il suo stile compositivo: la prima Sonata per violino e pianoforte e Cleopatra per violino solo.

Literature on music, Musical instruction and study
DOAJ Open Access 2024
Discourse Analysis of the Historically Audible: A Cultural-Historical Approach to Sound Recordings from Colonial Contexts

Mèhèza Kalibani

In 1877, the invention of the phonograph enabled a new hearing practice that created a bridge between spaces and times. For the first time in human history, it was possible to record sound and replay it independently of its original source. The phonograph was used worldwide, including in regions where people were considered “primitive” according to the Western ideologies of the time. Today, these early recordings of non-European musical traditions are stored in European archives. From early on, they have been studied by scholars or used for cultural projects in museums. A look at the collections of many historical sound archives clearly shows that the colonial era (for Germany especially from 1900 to 1914) was the golden age of this collecting practice. Thus, these sound recordings embody a certain “sensibility” due to their relation to colonialism. In fact, a considerable part of the recordings was produced under hegemonic power relations. But how are they heard today, what stories and discourses do they transmit, and how do we deal with them? Close listening, listening to history, collective listening, and listening to the silences are some of the theoretical-methodological approaches developed in recent years in the context of dealing with historical sound recordings. In this article, I will introduce these approaches and highlight their advantages and gaps. In order to bridge the gaps thus identified, I will introduce “discourse analysis of the historically audible,” a cultural-historical approach to historically sensitive sound recordings. Discourse analysis of the historically audible is a cultural-historical approach to sound recordings from colonial contexts which can facilitate the past-present dialogue between former colonized people and former colonizers.  previous article back to index next article

Music and books on Music, Literature on music
arXiv Open Access 2023
Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation

Pavan Seshadri, Peter Knees

Music streaming services heavily rely on their recommendation engines to continuously provide content to their consumers. Sequential recommendation consequently has seen considerable attention in current literature, where state of the art approaches focus on self-attentive models leveraging contextual information such as long and short-term user history and item features; however, most of these studies focus on long-form content domains (retail, movie, etc.) rather than short-form, such as music. Additionally, many do not explore incorporating negative session-level feedback during training. In this study, we investigate the use of transformer-based self-attentive architectures to learn implicit session-level information for sequential music recommendation. We additionally propose a contrastive learning task to incorporate negative feedback (e.g skipped tracks) to promote positive hits and penalize negative hits. This task is formulated as a simple loss term that can be incorporated into a variety of deep learning architectures for sequential recommendation. Our experiments show that this results in consistent performance gains over the baseline architectures ignoring negative user feedback.

en cs.IR, cs.LG
arXiv Open Access 2023
The Music Meta Ontology: a flexible semantic model for the interoperability of music metadata

Jacopo de Berardinis, Valentina Anita Carriero, Albert Meroño-Peñuela et al.

The semantic description of music metadata is a key requirement for the creation of music datasets that can be aligned, integrated, and accessed for information retrieval and knowledge discovery. It is nonetheless an open challenge due to the complexity of musical concepts arising from different genres, styles, and periods -- standing to benefit from a lingua franca to accommodate various stakeholders (musicologists, librarians, data engineers, etc.). To initiate this transition, we introduce the Music Meta ontology, a rich and flexible semantic model to describe music metadata related to artists, compositions, performances, recordings, and links. We follow eXtreme Design methodologies and best practices for data engineering, to reflect the perspectives and the requirements of various stakeholders into the design of the model, while leveraging ontology design patterns and accounting for provenance at different levels (claims, links). After presenting the main features of Music Meta, we provide a first evaluation of the model, alignments to other schema (Music Ontology, DOREMUS, Wikidata), and support for data transformation.

en cs.IR, cs.AI

Halaman 8 dari 87728