Hasil "Music and books on Music"

arXiv Open Access 2025

The Shape of Surprise: Structured Uncertainty and Co-Creativity in AI Music Tools

Eric Browne

Randomness plays a pivotal yet paradoxical role in computational music creativity: it can spark novelty, but unchecked chance risks incoherence. This paper presents a thematic review of contemporary AI music systems, examining how designers incorporate randomness and uncertainty into creative practice. I draw on the concept of structured uncertainty to analyse how stochastic processes are constrained within musical and interactive frameworks. Through a comparative analysis of six systems - Musika (Pasini and Schlüter, 2022), MIDI-DDSP (Wu et al., 2021), Melody RNN (Magenta Project), RAVE (Caillon and Esling, 2021), Wekinator (Fiebrink and Cook, 2010), and Somax 2 (Borg, 2019) - we identify recurring design patterns that support musical coherence, user control, and co-creativity. To my knowledge, this is the first thematic review examining randomness in AI music through structured uncertainty, offering practical insights for designers and artists aiming to support expressive, collaborative, or improvisational interactions.

en cs.SD

Detail DOI Sumber

arXiv Open Access 2025

A Study on the Data Distribution Gap in Music Emotion Recognition

Joann Ching, Gerhard Widmer

Music Emotion Recognition (MER) is a task deeply connected to human perception, relying heavily on subjective annotations collected from contributors. Prior studies tend to focus on specific musical styles rather than incorporating a diverse range of genres, such as rock and classical, within a single framework. In this paper, we address the task of recognizing emotion from audio content by investigating five datasets with dimensional emotion annotations -- EmoMusic, DEAM, PMEmo, WTC, and WCMED -- which span various musical styles. We demonstrate the problem of out-of-distribution generalization in a systematic experiment. By closely looking at multiple data and feature sets, we provide insight into genre-emotion relationships in existing data and examine potential genre dominance and dataset biases in certain feature representations. Based on these experiments, we arrive at a simple yet effective framework that combines embeddings extracted from the Jukebox model with chroma features and demonstrate how, alongside a combination of several diverse training sets, this permits us to train models with substantially improved cross-dataset generalization capabilities.

en cs.SD, cs.LG

Detail DOI Sumber

arXiv Open Access 2025

Ethics Statements in AI Music Papers: The Effective and the Ineffective

Julia Barnett, Patrick O'Reilly, Jason Brent Smith et al.

While research in AI methods for music generation and analysis has grown in scope and impact, AI researchers' engagement with the ethical consequences of this work has not kept pace. To encourage such engagement, many publication venues have introduced optional or required ethics statements for AI research papers. Though some authors use these ethics statements to critically engage with the broader implications of their research, we find that the majority of ethics statements in the AI music literature do not appear to be effectively utilized for this purpose. In this work, we conduct a review of ethics statements across ISMIR, NIME, and selected prominent works in AI music from the past five years. We then offer suggestions for both audio conferences and researchers for engaging with ethics statements in ways that foster meaningful reflection rather than formulaic compliance.

en cs.CY, cs.SD

Detail Sumber

arXiv Open Access 2024

Exploring Diverse Sounds: Identifying Outliers in a Music Corpus

Le Cai, Sam Ferguson, Gengfa Fang et al.

Existing research on music recommendation systems primarily focuses on recommending similar music, thereby often neglecting diverse and distinctive musical recordings. Musical outliers can provide valuable insights due to the inherent diversity of music itself. In this paper, we explore music outliers, investigating their potential usefulness for music discovery and recommendation systems. We argue that not all outliers should be treated as noise, as they can offer interesting perspectives and contribute to a richer understanding of an artist's work. We introduce the concept of 'Genuine' music outliers and provide a definition for them. These genuine outliers can reveal unique aspects of an artist's repertoire and hold the potential to enhance music discovery by exposing listeners to novel and diverse musical experiences.

en cs.SD, cs.IR

Detail DOI Sumber

arXiv Open Access 2024

Motifs, Phrases, and Beyond: The Modelling of Structure in Symbolic Music Generation

Keshav Bhandari, Simon Colton

Modelling musical structure is vital yet challenging for artificial intelligence systems that generate symbolic music compositions. This literature review dissects the evolution of techniques for incorporating coherent structure, from symbolic approaches to foundational and transformative deep learning methods that harness the power of computation and data across a wide variety of training paradigms. In the later stages, we review an emerging technique which we refer to as "sub-task decomposition" that involves decomposing music generation into separate high-level structural planning and content creation stages. Such systems incorporate some form of musical knowledge or neuro-symbolic methods by extracting melodic skeletons or structural templates to guide the generation. Progress is evident in capturing motifs and repetitions across all three eras reviewed, yet modelling the nuanced development of themes across extended compositions in the style of human composers remains difficult. We outline several key future directions to realize the synergistic benefits of combining approaches from all eras examined.

en cs.SD, cs.LG

Detail Sumber

arXiv Open Access 2024

A Diffusion-Based Generative Equalizer for Music Restoration

Eloi Moliner, Maija Turunen, Filip Elvander et al.

This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music.

en eess.AS, cs.SD

Detail Sumber

arXiv Open Access 2024

MIRFLEX: Music Information Retrieval Feature Library for Extraction

Anuradha Chopra, Abhinaba Roy, Dorien Herremans

This paper introduces an extendable modular system that compiles a range of music feature extraction models to aid music information retrieval research. The features include musical elements like key, downbeats, and genre, as well as audio characteristics like instrument recognition, vocals/instrumental classification, and vocals gender detection. The integrated models are state-of-the-art or latest open-source. The features can be extracted as latent or post-processed labels, enabling integration into music applications such as generative music, recommendation, and playlist generation. The modular design allows easy integration of newly developed systems, making it a good benchmarking and comparison tool. This versatile toolkit supports the research community in developing innovative solutions by providing concrete musical features.

en cs.SD, cs.AI

Detail Sumber

arXiv Open Access 2024

Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks

Felipe Marra, Lucas N. Ferreira

This paper investigates the capabilities of text-to-audio music generation models in producing long-form music with prompts that change over time, focusing on soundtrack generation for Tabletop Role-Playing Games (TRPGs). We introduce Babel Bardo, a system that uses Large Language Models (LLMs) to transform speech transcriptions into music descriptions for controlling a text-to-music model. Four versions of Babel Bardo were compared in two TRPG campaigns: a baseline using direct speech transcriptions, and three LLM-based versions with varying approaches to music description generation. Evaluations considered audio quality, story alignment, and transition smoothness. Results indicate that detailed music descriptions improve audio quality while maintaining consistency across consecutive descriptions enhances story alignment and transition smoothness.

en cs.SD, cs.AI

Detail DOI Sumber

arXiv Open Access 2024

ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio

Andrea Poltronieri, Valentina Presutti, Martín Rocamora

In the Western music tradition, chords are the main constituent components of harmony, a fundamental dimension of music. Despite its relevance for several Music Information Retrieval (MIR) tasks, chord-annotated audio datasets are limited and need more diversity. One way to improve those resources is to leverage the large number of chord annotations available online, but this requires aligning them with music audio. However, existing audio-to-score alignment techniques, which typically rely on Dynamic Time Warping (DTW), fail to address this challenge, as they require weakly aligned data for precise synchronisation. In this paper, we introduce ChordSync, a novel conformer-based model designed to seamlessly align chord annotations with audio, eliminating the need for weak alignment. We also provide a pre-trained model and a user-friendly library, enabling users to synchronise chord annotations with audio tracks effortlessly. In this way, ChordSync creates opportunities for harnessing crowd-sourced chord data for MIR, especially in audio chord estimation, thereby facilitating the generation of novel datasets. Additionally, our system extends its utility to music education, enhancing music learning experiences by providing accurately aligned annotations, thus enabling learners to engage in synchronised musical practices.

en cs.SD, cs.LG

Detail Sumber

arXiv Open Access 2024

Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN

Elona Shatri, Kalikidhar Palavala, George Fazekas

The generation of handwritten music sheets is a crucial step toward enhancing Optical Music Recognition (OMR) systems, which rely on large and diverse datasets for optimal performance. However, handwritten music sheets, often found in archives, present challenges for digitisation due to their fragility, varied handwriting styles, and image quality. This paper addresses the data scarcity problem by applying Generative Adversarial Networks (GANs) to synthesise realistic handwritten music sheets. We provide a comprehensive evaluation of three GAN models - DCGAN, ProGAN, and CycleWGAN - comparing their ability to generate diverse and high-quality handwritten music images. The proposed CycleWGAN model, which enhances style transfer and training stability, significantly outperforms DCGAN and ProGAN in both qualitative and quantitative evaluations. CycleWGAN achieves superior performance, with an FID score of 41.87, an IS of 2.29, and a KID of 0.05, making it a promising solution for improving OMR systems.

en cs.CV, cs.AI

Detail Sumber

arXiv Open Access 2024

Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music

Nithya Shikarpur, Krishna Maneesha Dendukuri, Yusong Wu et al.

Hindustani music is a performance-driven oral tradition that exhibits the rendition of rich melodic patterns. In this paper, we focus on generative modeling of singers' vocal melodies extracted from audio recordings, as the voice is musically prominent within the tradition. Prior generative work in Hindustani music models melodies as coarse discrete symbols which fails to capture the rich expressive melodic intricacies of singing. Thus, we propose to use a finely quantized pitch contour, as an intermediate representation for hierarchical audio modeling. We propose GaMaDHaNi, a modular two-level hierarchy, consisting of a generative model on pitch contours, and a pitch contour to audio synthesis model. We compare our approach to non-hierarchical audio models and hierarchical models that use a self-supervised intermediate representation, through a listening test and qualitative analysis. We also evaluate audio model's ability to faithfully represent the pitch contour input using Pearson correlation coefficient. By using pitch contours as an intermediate representation, we show that our model may be better equipped to listen and respond to musicians in a human-AI collaborative setting by highlighting two potential interaction use cases (1) primed generation, and (2) coarse pitch conditioning.

en cs.SD, cs.AI

Detail Sumber

CrossRef Open Access 2023

Songs materialising as music: medieval monophony in song books and music manuscripts

OLIVER HUCK

ABSTRACTThis survey of the mise-en-page of manuscripts that include medieval monophonic song focuses on complex multigraphic written artefacts presenting music on staves. Comparing the formatting of thirteenth-century French chansonniers and fifteenth-century collections of monophonic songs (BnF fr. 9346 and BnF fr. 12744), there are obvious differences in the mise-en-page. But when, where and why did the changes in the production of manuscripts and the materialisation of songs take place? This article proposes a distinction between entirely pre-ruled ‘“full” music manuscripts’, ‘music manuscripts’ employing pre-ruling and ‘manuscripts with music’ where the staves were drawn only after the text has been written. Moreover, ‘songbooks’ mainly interested in lyrics can be distinguished from ‘song books’ focusing on the music. The interrelation of production process, content and manuscript type is discussed using the example of the conductus In hoc ortus occidente. The emergence, interrelation and particularities of layouts are discussed for vernacular thirteenth- or fourteenth-century songbooks with Dutch, English/Anglo-Norman, French, Galego-Portuguese, German, Italian and Occitan texts. The two-column layout is found in songbooks all over Europe (except for Italian laudari). This article examines models such as rolls, libelli, Dominican liturgical books, particularities of layouts such as different strophic page layouts and as the separation of verses in some troubadour chansonniers and Galego-Portuguese cancionieros as well as the dissemination in German speaking regions through minstrel schools. Comparing French, German and Italian song books of monophonic song as well lais/Leich and/or polyphony reveals differences in the production process of Italian ‘“full” music manuscripts’ (BAV Rossi 215/I-OST, I-REas and I-Fl Mediceo Palatino 87), German ‘music manuscripts’ (A-Wn 2701, A-Wn 2777 and CZ-Pu XI E 9) and French ‘manuscripts with music’ (BnF fr. 146 and the Machaut-collections).

en

Detail DOI Sumber

arXiv Open Access 2023

The Music Meta Ontology: a flexible semantic model for the interoperability of music metadata

Jacopo de Berardinis, Valentina Anita Carriero, Albert Meroño-Peñuela et al.

The semantic description of music metadata is a key requirement for the creation of music datasets that can be aligned, integrated, and accessed for information retrieval and knowledge discovery. It is nonetheless an open challenge due to the complexity of musical concepts arising from different genres, styles, and periods -- standing to benefit from a lingua franca to accommodate various stakeholders (musicologists, librarians, data engineers, etc.). To initiate this transition, we introduce the Music Meta ontology, a rich and flexible semantic model to describe music metadata related to artists, compositions, performances, recordings, and links. We follow eXtreme Design methodologies and best practices for data engineering, to reflect the perspectives and the requirements of various stakeholders into the design of the model, while leveraging ontology design patterns and accounting for provenance at different levels (claims, links). After presenting the main features of Music Meta, we provide a first evaluation of the model, alignments to other schema (Music Ontology, DOREMUS, Wikidata), and support for data transformation.

en cs.IR, cs.AI

Detail Sumber

arXiv Open Access 2023

Efficient Supervised Training of Audio Transformers for Music Representation Learning

Pablo Alonso-Jiménez, Xavier Serra, Dmitry Bogdanov

In this work, we address music representation learning using convolution-free transformers. We build on top of existing spectrogram-based audio transformers such as AST and train our models on a supervised task using patchout training similar to PaSST. In contrast to previous works, we study how specific design decisions affect downstream music tagging tasks instead of focusing on the training task. We assess the impact of initializing the models with different pre-trained weights, using various input audio segment lengths, using learned representations from different blocks and tokens of the transformer for downstream tasks, and applying patchout at inference to speed up feature extraction. We find that 1) initializing the model from ImageNet or AudioSet weights and using longer input segments are beneficial both for the training and downstream tasks, 2) the best representations for the considered downstream tasks are located in the middle blocks of the transformer, and 3) using patchout at inference allows faster processing than our convolutional baselines while maintaining superior performance. The resulting models, MAEST, are publicly available and obtain the best performance among open models in music tagging tasks.

en cs.SD, eess.AS

Detail Sumber

DOAJ Open Access 2022

Political Intrigues of the Court of Florence as a Prerequisite for Giulio Caccini’s Opera The Abduction of Cephalus

Alena D. Verin-Galitskaya

The beginnings of opera history are usually associated with Euridice by Ottavio Rinuccini and Jacopo Peri as the first staged and preserved example of the genre. Not many people know that Euridice was by no means the main event during the wedding celebrations in honour of Maria de’ Medici and King Henry IV of France in 1600, to which the opera was timed. The audience was much more drawn to the opera Il rapimento di Cefalo (The Abduction of Cephalus) by Giulio Caccini, a direct rival of Peri. As the music has not completely survived, we know about Il rapimento di Cefalo mainly from the reviews of contemporaries. Historical materials allow us to recreate the genesis of the opera, which is inseparable from the history of opera as a genre. The reported study focuses on personal ambitions, court intrigues, and the rivalry between the Florentines and Emilio de Cavalieri. It also explores similar other factors without which the genre of opera would have taken a different historical path. Besides, the article describes the political and cultural landscape at the court of Ferdinando de’ Medici. The history of Caccini’s opera is analyzed against the general backdrop of Florentine musical art of the last quarter of the 16th century.

Music

Detail DOI Sumber

DOAJ Open Access 2022

Arkihuolesi kaikki heitä? Vielä kerran kysymyksestä cis vai a Leevi Madetojan Joululaulussa, op. 20b n:o 5

Sasha Mäkilä

The melody of the well-known Christmas song Joululaulu op. 20b n:o 5 “Arkihuolesi kaikki heitä” (1916) by Leevi Madetoja has rarely been questioned, even though there are two later autographic variants, one from Suvivesper (1924) and the other from the composer’s own arrangement of the song for mixed choir (1944). The editor of Leevi Madetoja’s Solo Songs and Duets (Fennica Gehrman, 2013), Kimmo Tammivaara, made the controversial decision to alter the melody in bar 14 of Joululaulu to make it correspond with the later versions. Which one of the variants is “right”? Is it even possible to find a definite answer to this question? From the music philological point of view the answer is clear. Looking at the later variants, there is no reason to doubt the original melody. The question of why Madetoja’s arrangement for mixed choir differs from the original is more complex, and the answer might lie in the composer’s alcoholism and life circumstances in the 1940s. The question of the composer’s ability to work during his final years raises philosophical and ethical issues, with necessary implications for the new critical editions of his works. The composer’s “final intentions” may not necessarily have been his best intentions.

Music, Arts in general

Detail DOI Sumber

DOAJ Open Access 2022

Standards for Building and Developing a Responsive E-Training Environment Based on a Game-Based Incentive Strategy معايير بناء وتطوير بيئة تدريب إلكترونية متجاوبة قائمة على استراتيجية محفزات الألعاب

وليد يوسف محمد, طارق على الجبروني, محمد محمود زين الدين et al.

This research is concerned with developing a list of criteria for responsive electronic training environments based on the gamification strategy, to achieve quality and excellence in this type of environments and to achieve the desired goal from its use. The list was divided into two main fields: the field of educational standards and the field of technical standards, the field of educational standards included seven standards, while the field of technical standards included twelve standards, and the list included (176) indicators to verify the performance of those standards. This list was judged by specialists in the field of educational technology to reach the final form that allows its application to responsive electronic training environments based on gamification strategy, and the research reached a set of recommendations and suggestions that was submittied. يمکن صياغة مشکلة البحث الحالي في وجود قصور في توضيح معايير تصميم وتطوير بيئات التدريب الإلکترونية المتجاوبة القائمة على استراتيجية محفزات الألعاب ومؤشرات الآداء المرتبطة بتلک المعايير. لذا يسعى البحث الحالى إلى وضع قائمة معايير تتضمن الشروط والمواصفات الخاصة باستخدام استراتيجية محفزات الألعاب فى بيئات التدريب الإلکترونية المتجاوبة، والشروط والمواصفات الخاصة باستخدام تلک البيئات الإلکترونية المتجاوبة فنياً وتربوياً، وتوظيفها بما يتلائم مع خصائص المتدربين وأساليب تعلمهم المختلفة،، لذا أهتم البحث الحالى بوضع قائمة معايير لبيئات التدريب الإلکترونية المتجاوبة القائمة على استراتيجية محفزات الألعاب، لتحقيق الجودة والتميز فى هذا النوع من البيئات ولتحقيق الهدف المنشود من وراء استخدامها. وقد تم تقسيم تلک القائمة إلى مجالين رئيسيين هما مجال المعاير التربوية ومجال المعايير الفنية بحيث تضمن مجال المعايير التربوية سبعة معايير، بينما تضمن مجال المعايير الفنية اثنى عشر معياراً، واشتملت القائمة على (176) مؤشراً للتحقق من آداء تلک المعايير. وتم تحکيم تلک القائمة بواسطة السادة المتخصصين فى مجال تکنولوجيا التعليم لتصل للشکل النهائى الذى يتيح تطبيقها على بيئات التدريب الإلکترونية المتجاوبة القائمة على استراتيجية محفزات الألعاب، وتوصل البحث لمجموعة من التوصيات والمقترحات التى تم طرحها

Music, Fine Arts

Detail DOI Sumber

DOAJ Open Access 2021

Music Technology Tools – A Therapist-in-a-box?

Kjetil Høyer Jonassen

The purpose of this paper is to contribute to the discussion of technology in music therapy and public health, focusing on the human–computer interaction and the cocreation of mental health. Foundational theory explaining the possible therapeutic dynamics that can occur when engaged in digital technology is presented, along with two case vignettes that illustrate how adolescents interact with digital music technology to promote mental health and wellbeing. The discussion includes reflections concerning actor-network theory, agency, and affordance-theory, and it argues that the iPad should be considered a valuable co-agent in the agent-network functioning to promote adolescents’ mental health.

Music, Psychology

Detail DOI Sumber

arXiv Open Access 2021

The music box operad: Random generation of musical phrases from patterns

Samuele Giraudo

We introduce the notion of multi-patterns, a combinatorial abstraction of polyphonic musical phrases. The interest of this approach in encoding musical phrases lies in the fact that it becomes possible to compose multi-patterns in order to produce new ones. This composition is parameterized by a monoid structure on the scale degrees. This embeds the set of the musical phrases into an algebraic framework since the set of the multi-patterns is endowed with the structure of an operad. Operads are algebraic structures offering a formalization and an abstraction of the notion of operators and their compositions. Seeing musical phrases as operators allows us to perform computations on phrases and admits applications in generative music. Indeed, given a set of initial multi-patterns, we propose various algorithms to randomly generate a new and longer phrase emulating the style suggested by the inputted multi-patterns. The designed algorithms use types of grammars working with operads and colored operads, known as bud generating systems.

en cs.SD, eess.AS

Detail DOI Sumber

DOAJ Open Access 2020

Temporality and performing style in Luciano Berio’s Sequenza XI for guitar

Diego Castro-Magas

Luciano Berio’s Sequenza XI stands as a major work in the 20th-century guitar repertoire. Although it has been extensively analysed in previous scholarship, its performance practice has not been addressed. In this paper, I set out to fill this gap by analysing fifteen commercial recordings of the work, and furthermore seek to demonstrate how different categories of musical time- experience (after UTZ, 2017) can be used as a frame for discussing performers’ interpretative decisions. I propose a formal analysis of the work and provide time-based measurements of the score, which I then compare to the performed sound structures. The results allow me to map the main trends in the performance practice of Sequenza XI in commercial recordings produced between 1993 and 2016, as well as to relate these trends to the discourses of the performers involved.

Music and books on Music, Music

Detail DOI Sumber

Hasil untuk "Music and books on Music"