Hasil untuk "Music"

Menampilkan 20 dari ~1058345 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar

JSON API
arXiv Open Access 2025
Universal Music Representations? Evaluating Foundation Models on World Music Corpora

Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos

Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize across diverse musical traditions. This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora spanning Western popular, Greek, Turkish, and Indian classical traditions. We employ three complementary methodologies to investigate these models' cross-cultural capabilities: probing to assess inherent representations, targeted supervised fine-tuning of 1-2 layers, and multi-label few-shot learning for low-resource scenarios. Our analysis shows varying cross-cultural generalization, with larger models typically outperforming on non-Western music, though results decline for culturally distant traditions. Notably, our approaches achieve state-of-the-art performance on five out of six evaluated datasets, demonstrating the effectiveness of foundation models for world music understanding. We also find that our targeted fine-tuning approach does not consistently outperform probing across all settings, suggesting foundation models already encode substantial musical knowledge. Our evaluation framework and benchmarking results contribute to understanding how far current models are from achieving universal music representations while establishing metrics for future progress.

en cs.SD, cs.IR
arXiv Open Access 2025
YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation

Shao-Chien Lu, Chen-Chen Yeh, Hui-Lin Cho et al.

The field of music generation using Large Language Models (LLMs) is evolving rapidly, yet existing music notation systems, such as MIDI, ABC Notation, and MusicXML, remain too complex for effective fine-tuning of LLMs. These formats are difficult for both machines and humans to interpret due to their variability and intricate structure. To address these challenges, we introduce YNote, a simplified music notation system that uses only four characters to represent a note and its pitch. YNote's fixed format ensures consistency, making it easy to read and more suitable for fine-tuning LLMs. In our experiments, we fine-tuned GPT-2 (124M) on a YNote-encoded dataset and achieved BLEU and ROUGE scores of 0.883 and 0.766, respectively. With just two notes as prompts, the model was able to generate coherent and stylistically relevant music. We believe YNote offers a practical alternative to existing music notations for machine learning applications and has the potential to significantly enhance the quality of music generation using LLMs.

en cs.SD, cs.AI
arXiv Open Access 2025
Cross-Modal Learning for Music-to-Music-Video Description Generation

Zhuoyuan Mao, Mengjie Zhao, Qiyu Wu et al.

Music-to-music-video generation is a challenging task due to the intrinsic differences between the music and video modalities. The advent of powerful text-to-video diffusion models has opened a promising pathway for music-video (MV) generation by first addressing the music-to-MV description task and subsequently leveraging these models for video generation. In this study, we focus on the MV description generation task and propose a comprehensive pipeline encompassing training data construction and multimodal model fine-tuning. We fine-tune existing pre-trained multimodal models on our newly constructed music-to-MV description dataset based on the Music4All dataset, which integrates both musical and visual information. Our experimental results demonstrate that music representations can be effectively mapped to textual domains, enabling the generation of meaningful MV description directly from music inputs. We also identify key components in the dataset construction pipeline that critically impact the quality of MV description and highlight specific musical attributes that warrant greater focus for improved MV description generation.

en cs.SD, cs.AI
arXiv Open Access 2025
Multi Agents Semantic Emotion Aligned Music to Image Generation with Music Derived Captions

Junchang Shi, Gang Li

When people listen to music, they often experience rich visual imagery. We aim to externalize this inner imagery by generating images conditioned on music. We propose MESA MIG, a multi agent semantic and emotion aligned framework that first produces structured music captions and then refines them with cooperating agents specializing in scene, motion, style, color, and composition. In parallel, a Valence Arousal regression head predicts continuous affective states from music, while a CLIP based visual VA head estimates emotions from images. These components jointly enforce semantic and emotional alignment between music and synthesized images. Experiments on curated music image pairs show that MESA MIG outperforms caption only and single agent baselines in aesthetic quality, semantic consistency, and VA alignment, and achieves competitive emotion regression performance compared with state of the art music and image emotion models.

en cs.MM
arXiv Open Access 2024
Intelligent Text-Conditioned Music Generation

Zhouyao Xie, Nikhil Yadala, Xinyi Chen et al.

CLIP (Contrastive Language-Image Pre-Training) is a multimodal neural network trained on (text, image) pairs to predict the most relevant text caption given an image. It has been used extensively in image generation by connecting its output with a generative model such as VQGAN, with the most notable example being OpenAI's DALLE-2. In this project, we apply a similar approach to bridge the gap between natural language and music. Our model is split into two steps: first, we train a CLIP-like model on pairs of text and music over contrastive loss to align a piece of music with its most probable text caption. Then, we combine the alignment model with a music decoder to generate music. To the best of our knowledge, this is the first attempt at text-conditioned deep music generation. Our experiments show that it is possible to train the text-music alignment model using contrastive loss and train a decoder to generate music from text prompts.

en cs.MM, cs.SD
arXiv Open Access 2024
A Survey of Foundation Models for Music Understanding

Wenjun Li, Ying Cai, Ziyang Wu et al.

Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally. A better understanding of music can enhance our emotions, cognitive skills, and cultural connections. The rapid advancement of artificial intelligence (AI) has introduced new ways to analyze music, aiming to replicate human understanding of music and provide related services. While the traditional models focused on audio features and simple tasks, the recent development of large language models (LLMs) and foundation models (FMs), which excel in various fields by integrating semantic information and demonstrating strong reasoning abilities, could capture complex musical features and patterns, integrate music with language and incorporate rich musical, emotional and psychological knowledge. Therefore, they have the potential in handling complex music understanding tasks from a semantic perspective, producing outputs closer to human perception. This work, to our best knowledge, is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities. We also discussed their limitations and proposed possible future directions, offering insights for researchers in this field.

en cs.SD, cs.AI
arXiv Open Access 2024
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Baisen Wang, Le Zhuo, Zhaokai Wang et al.

Multimodal music generation aims to produce music from diverse input modalities, including text, videos, and images. Existing methods use a common embedding space for multimodal fusion. Despite their effectiveness in other modalities, their application in multimodal music generation faces challenges of data scarcity, weak cross-modal alignment, and limited controllability. This paper addresses these issues by using explicit bridges of text and music for multimodal alignment. We introduce a novel method named Visuals Music Bridge (VMB). Specifically, a Multimodal Music Description Model converts visual inputs into detailed textual descriptions to provide the text bridge; a Dual-track Music Retrieval module that combines broad and targeted retrieval strategies to provide the music bridge and enable user control. Finally, we design an Explicitly Conditioned Music Generation framework to generate music based on the two bridges. We conduct experiments on video-to-music, image-to-music, text-to-music, and controllable music generation tasks, along with experiments on controllability. The results demonstrate that VMB significantly enhances music quality, modality, and customization alignment compared to previous methods. VMB sets a new standard for interpretable and expressive multimodal music generation with applications in various multimedia fields. Demos and code are available at https://github.com/wbs2788/VMB.

en cs.CV, cs.MM
arXiv Open Access 2024
MUSIC-lite: Efficient MUSIC using Approximate Computing: An OFDM Radar Case Study

Rajat Bhattacharjya, Arnab Sarkar, Biswadip Maity et al.

Multiple Signal Classification (MUSIC) is a widely used Direction of Arrival (DoA)/Angle of Arrival (AoA) estimation algorithm applied to various application domains such as autonomous driving, medical imaging, and astronomy. However, MUSIC is computationally expensive and challenging to implement in low-power hardware, requiring exploration of trade-offs between accuracy, cost, and power. We present MUSIC-lite, which exploits approximate computing to generate a design space exploring accuracy-area-power trade-offs. This is specifically applied to the computationally intensive singular value decomposition (SVD) component of the MUSIC algorithm in an orthogonal frequency-division multiplexing (OFDM) radar use case. MUSIC-lite incorporates approximate adders into the iterative CORDIC algorithm that is used for hardware implementation of MUSIC, generating interesting accuracy-area-power trade-offs. Our experiments demonstrate MUSIC-lite's ability to save an average of 17.25% on-chip area and 19.4% power with a minimal 0.14% error for efficient MUSIC implementations.

en cs.AR, eess.SP
arXiv Open Access 2024
Language Models for Music Medicine Generation

Emmanouil Nikolakakis, Joann Ching, Emmanouil Karystinaios et al.

Music therapy has been shown in recent years to provide multiple health benefits related to emotional wellness. In turn, maintaining a healthy emotional state has proven to be effective for patients undergoing treatment, such as Parkinson's patients or patients suffering from stress and anxiety. We propose fine-tuning MusicGen, a music-generating transformer model, to create short musical clips that assist patients in transitioning from negative to desired emotional states. Using low-rank decomposition fine-tuning on the MTG-Jamendo Dataset with emotion tags, we generate 30-second clips that adhere to the iso principle, guiding patients through intermediate states in the valence-arousal circumplex. The generated music is evaluated using a music emotion recognition model to ensure alignment with intended emotions. By concatenating these clips, we produce a 15-minute "music medicine" resembling a music therapy session. Our approach is the first model to leverage Language Models to generate music medicine. Ultimately, the output is intended to be used as a temporary relief between music therapy sessions with a board-certified therapist.

en cs.SD, eess.AS
arXiv Open Access 2024
Foundation Models for Music: A Survey

Yinghao Ma, Anders Øland, Anton Ragni et al.

In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm.

en cs.SD, cs.AI
DOAJ Open Access 2024
Playing music together: Exploring the impact of a classical music ensemble on adolescent's life skills self-perception.

Anna Bussu, Marta Mangiarulo

This paper explored the effectiveness of ensemble performance on the development of adolescent's life skills. An explorative qualitative study investigated young musicians' self-perception about the benefits and challenges of learning and playing music together. A convenience sampling technique was adopted for interviewing 15 adolescents (12-18 years old) who participated in a long-term music education programme led by a charity in the North-West of England. The data were analysed using NVivo, employing a thematic analysis approach. Two main themes emerged from the analyses: (1) the main benefits of playing and learning in an ensemble: the development of music and life skills; (2) the challenges experienced by the musicians learning in the ensemble. The findings suggest that participants were conscious of the positive effects of playing in an ensemble on their lives. This extended beyond merely learning a musical instrument, i.e. acquiring music skills. In particular, young musicians recognised they had developed greater self-confidence and cognitive skills such as critical thinking and self-awareness. Primarily, they developed effective communication and interpersonal skills. At the same time, these young musicians recognised they had to face challenges related to the process of learning music in an ensemble, such as managing emotions of frustration and adapting to different music learning styles and techniques. Finally, suggestions are made for the implementation and evaluation of future projects to explore the impact and effectiveness of classical music programmes, with a particular emphasis on ensemble-based initiatives and their influence on life skills.

Medicine, Science
DOAJ Open Access 2024
Research progress on application of five-element music therapy in perioperative patients (五行音乐疗法在围手术期患者中的应用进展)

QIU Jingyi (邱静怡), RONG Mingmei (戎明梅)

Interventions targeting the psychological status and related symptoms of perioperative patients can accelerate recovery. Five-element music therapy, as a non-pharmacological intervention method, has gained increasing attention in perioperative patient care due to its drug-free nature, lack of side effects, strong feasibility, and significant therapeutic effects. This article provides an overview of the principles of five-element music therapy, as well as intervention strategies and application effects during the perioperative period, aiming to provide references and guidance for further clinical practice. (围手术期患者的心理状态与预后密切相关。作为一种非药物干预方法, 五行音乐疗法应用于围手术期患者的心理护理中有积极作用。本文阐述五行音乐疗法的原理, 综述其在围手术期患者中的应用, 探讨后续的研究方向, 为临床推广五行音乐疗法提供参考依据。)

DOAJ Open Access 2024
Mindfulness's moderating role applied on online SEL education

Chun-Heng Ho, Hang-qin Zhang, Juan Li et al.

IntroductionMild to moderate depression, anxiety, and stress imbalances are prevalent emotional issues among college students and are primary factors leading to deficiencies in social-emotional skills within this population. Without timely intervention, these mild to moderate emotional issues may escalate into more severe conditions. Social-Emotional Learning (SEL) programs are effective for building social-emotional skills. However, current research on SEL programs has not adequately addressed the issue of high-quality teacher-student interactions for students who suffer emotional problems. To tackle this issue, this study proposes a curriculum approach that integrates mindfulness with rhythmic music? and evaluated the emotional changes of students after mindfulness with rhythmic music curriculum.MethodsThis study adopted a pre-post experimental design. Two hundred and ninety-four firefighting universities students participated in a one-semester “online mindfulness combined with music rhythm SEL course”. The study used the Beck Anxiety Inventory, Center for Epidemiologic Studies Depression Scale and Perceived Stress Scale to measure the anxiety, depression and stress levels of the participants before and after the course, and used the participants' self-reflection reports as a method to explore the students' emotional transformation patterns.ResultsThe research findings indicate that: (1) eighth-note, quarter-note, and sixteenth-note rhythmic music significantly improve the emotional wellbeing of students with depression, anxiety, and stress imbalances, respectively. (2) The degree of emotional improvement has a certain impact on academic performance. (3) Students with anxiety require more instructional support focused on attention concentration during the early phases of the course; students with depression should not be scheduled for social skills learning modules in the short term and need long-term instructional guidance; individuals experiencing stress imbalances require attention to their personal music preferences and benefit from additional listening activities and exercise.DiscussionThese findings assist teachers in accurately identifying emotional changes among students with emotional problems and managing the patterns of these emotional transitions, thereby providing effective instructional support and promoting high-quality interactions between teachers and students.

arXiv Open Access 2023
In-depth analysis of music structure as a text network

Ping-Rui Tsai, Yen-Ting Chou, Nathan-Christopher Wang et al.

Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental elements of music and construct an evolutionary network from the perspective of music as a natural language, aligning with the statistical characteristics of texts. Through this approach, we aim to comprehend the structural differences in music across different periods, enabling a more scientific exploration of music. Relying on the advantages of structuralism, we can concentrate on the relationships and order between the physical elements of music, rather than getting entangled in the blurred boundaries of science and philosophy. The scientific framework we present not only conforms to past conclusions in music, but also serves as a bridge that connects music to natural language processing and knowledge graphs.

en cs.SD, cs.AI
arXiv Open Access 2023
Knowledge-based Multimodal Music Similarity

Andrea Poltronieri

Music similarity is an essential aspect of music retrieval, recommendation systems, and music analysis. Moreover, similarity is of vital interest for music experts, as it allows studying analogies and influences among composers and historical periods. Current approaches to musical similarity rely mainly on symbolic content, which can be expensive to produce and is not always readily available. Conversely, approaches using audio signals typically fail to provide any insight about the reasons behind the observed similarity. This research addresses the limitations of current approaches by focusing on the study of musical similarity using both symbolic and audio content. The aim of this research is to develop a fully explainable and interpretable system that can provide end-users with more control and understanding of music similarity and classification systems.

en cs.SD, cs.AI
DOAJ Open Access 2023
The Warsaw Autumn International Music Festival — Overcoming the Boundaries between East and West

Nikolskaya Irina I.

In the late 1940s — early 1950s, a huge gap appeared between the development of musical cultures of Western and Eastern Europe, and it was Poland that initiated bridging it. Therefore, the emergence of the Warsaw Autumn Festival of contemporary music, which has become the largest, was not coincidental. The article is devoted to studying this cultural phenomenon, which has a significant scientific novelty, because the Warsaw Autumn has not yet been subjected to a detailed scientific analysis in Russian musicology. According to the organizers of the new festival — young Polish composers Tadeusz Baird and Kazimierz Serocki, — Warsaw was to become a center of contemporary music, no less important than the avant-garde music festivals in Darmstadt, Donaueschingen, Cologne or Milan. However, the purpose of the Warsaw Autumn is more ambitious — to provide a complete aesthetic and stylistic picture of modern music, and not just avant-garde music, as was the case in Western European countries. The repertoire policy of the festival covered several areas: avantgarde music, more traditional music, the classics of the 20th century, and promotion of Polish music. The festival laid claim to being a reliable display of contemporary music in the world, reacted to the changes in global music, and soon became the largest music arena of the 20th century. It was attended by the most prominent composers of the East and West. For the socialist countries, it became a true “window on Europe” and a platform for mastering new techniques of composition. Without the Warsaw Autumn festival and its profound influence on composers of socialist countries, it would be utterly impossible to imagine the development of musical art of the entire region. This article suggests focusing on the early period of the festival, from 1956 (when it was established) to the early 1980s.

Arts in general
arXiv Open Access 2022
Bi-Sampling Approach to Classify Music Mood leveraging Raga-Rasa Association in Indian Classical Music

Mohan Rao B C, Vinayak Arkachaari, Harsha M N et al.

The impact of Music on the mood or emotion of the listener is a well-researched area in human psychology and behavioral science. In Indian classical music, ragas are the melodic structure that defines the various styles and forms of the music. Each raga has been found to evoke a specific emotion in the listener. With the advent of advanced capabilities of audio signal processing and the application of machine learning, the demand for intelligent music classifiers and recommenders has received increased attention, especially in the 'Music as a service' cloud applications. This paper explores a novel framework to leverage the raga-rasa association in Indian classical Music to build an intelligent classifier and its application in music recommendation system based on user's current mood and the mood they aspire to be in.

en cs.SD, cs.AI
DOAJ Open Access 2022
Rhythmic Relating: Bidirectional Support for Social Timing in Autism Therapies

Stuart Daniel, Dawn Wimpory, Dawn Wimpory et al.

We propose Rhythmic Relating for autism: a system of supports for friends, therapists, parents, and educators; a system which aims to augment bidirectional communication and complement existing therapeutic approaches. We begin by summarizing the developmental significance of social timing and the social-motor-synchrony challenges observed in early autism. Meta-analyses conclude the early primacy of such challenges, yet cite the lack of focused therapies. We identify core relational parameters in support of social-motor-synchrony and systematize these using the communicative musicality constructs: pulse; quality; and narrative. Rhythmic Relating aims to augment the clarity, contiguity, and pulse-beat of spontaneous behavior by recruiting rhythmic supports (cues, accents, turbulence) and relatable vitality; facilitating the predictive flow and just-ahead-in-time planning needed for good-enough social timing. From here, we describe possibilities for playful therapeutic interaction, small-step co-regulation, and layered sensorimotor integration. Lastly, we include several clinical case examples demonstrating the use of Rhythmic Relating within four different therapeutic approaches (Dance Movement Therapy, Improvisational Music Therapy, Play Therapy, and Musical Interaction Therapy). These clinical case examples are introduced here and several more are included in the Supplementary Material (Examples of Rhythmic Relating in Practice). A suite of pilot intervention studies is proposed to assess the efficacy of combining Rhythmic Relating with different therapeutic approaches in playful work with individuals with autism. Further experimental hypotheses are outlined, designed to clarify the significance of certain key features of the Rhythmic Relating approach.

Halaman 24 dari 52918