Hasil untuk "Music"

Menampilkan 20 dari ~496987 hasil · dari arXiv, CrossRef, DOAJ

JSON API
arXiv Open Access 2025
MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core

Callie C. Liao, Duoduo Liao, Ellie L. Zhang

Recent advances in generative AI have made music generation a prominent research focus. However, many neural-based models rely on large datasets, raising concerns about copyright infringement and high-performance costs. In contrast, we propose MusicAIR, an innovative multimodal AI music generation framework powered by a novel algorithm-driven symbolic music core, effectively mitigating copyright infringement risks. The music core algorithms connect critical lyrical and rhythmic information to automatically derive musical features, creating a complete, coherent melodic score solely from the lyrics. The MusicAIR framework facilitates music generation from lyrics, text, and images. The generated score adheres to established principles of music theory, lyrical structure, and rhythmic conventions. We developed Generate AI Music (GenAIM), a web tool using MusicAIR for lyric-to-song, text-to-music, and image-to-music generation. In our experiments, we evaluated AI-generated music scores produced by the system using both standard music metrics and innovative analysis that compares these compositions with original works. The system achieves an average key confidence of 85%, outperforming human composers at 79%, and aligns closely with established music theory standards, demonstrating its ability to generate diverse, human-like compositions. As a co-pilot tool, GenAIM can serve as a reliable music composition assistant and a possible educational composition tutor while simultaneously lowering the entry barrier for all aspiring musicians, which is innovative and significantly contributes to AI for music generation.

en cs.SD, cs.AI
arXiv Open Access 2025
Learning Music Audio Representations With Limited Data

Christos Plachouras, Emmanouil Benetos, Johan Pauwels

Large deep-learning models for music, including those focused on learning general-purpose music audio representations, are often assumed to require substantial training data to achieve high performance. If true, this would pose challenges in scenarios where audio data or annotations are scarce, such as for underrepresented music traditions, non-popular genres, and personalized music creation and listening. Understanding how these models behave in limited-data scenarios could be crucial for developing techniques to tackle them. In this work, we investigate the behavior of several music audio representation models under limited-data learning regimes. We consider music models with various architectures, training paradigms, and input durations, and train them on data collections ranging from 5 to 8,000 minutes long. We evaluate the learned representations on various music information retrieval tasks and analyze their robustness to noise. We show that, under certain conditions, representations from limited-data and even random models perform comparably to ones from large-dataset models, though handcrafted features outperform all learned representations in some tasks.

en cs.SD, cs.LG
arXiv Open Access 2025
Mozualization: Crafting Music and Visual Representation with Multimodal AI

Wanfang Xu, Lixiang Zhao, Haiwen Song et al.

In this work, we introduce Mozualization, a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs, such as keywords, images, and sound clips (e.g., segments from various pieces of music or even a playful cat's meow). Our work is inspired by the ways people express their emotions -- writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music. Building on this concept, we developed a tool that transforms these emotional expressions into a cohesive and expressive song, allowing users to seamlessly incorporate their unique preferences and inspirations. To evaluate the tool and, more importantly, gather insights for its improvement, we conducted a user study involving nine music enthusiasts. The study assessed user experience, engagement, and the impact of interacting with and listening to the generated music.

en cs.HC, cs.AI
arXiv Open Access 2025
MV-Crafter: An Intelligent System for Music-guided Video Generation

Chuer Chen, Shengqi Dang, Yuqi Liu et al.

Music videos, as a prevalent form of multimedia entertainment, deliver engaging audio-visual experiences to audiences and have gained immense popularity among singers and fans. Creators can express their interpretations of music naturally through visual elements. However, the creation process of music video demands proficiency in script design, video shooting, and music-video synchronization, posing significant challenges for non-professionals. Previous work has designed automated music video generation frameworks. However, they suffer from complexity in input and poor output quality. In response, we present MV-Crafter, a system capable of producing high-quality music videos with synchronized music-video rhythm and style. Our approach involves three technical modules that simulate the human creation process: the script generation module, video generation module, and music-video synchronization module. MV-Crafter leverages a large language model to generate scripts considering the musical semantics. To address the challenge of synchronizing short video clips with music of varying lengths, we propose a dynamic beat matching algorithm and visual envelope-induced warping method to ensure precise, monotonic music-video synchronization. Besides, we design a user-friendly interface to simplify the creation process with intuitive editing features. Extensive experiments have demonstrated that MV-Crafter provides an effective solution for improving the quality of generated music videos.

en cs.HC, cs.MM
arXiv Open Access 2025
Advancing the Foundation Model for Music Understanding

Yi Jiang, Wei Wang, Xianwen Guo et al.

The field of Music Information Retrieval (MIR) is fragmented, with specialized models excelling at isolated tasks. In this work, we challenge this paradigm by introducing a unified foundation model named MuFun for holistic music understanding. Our model features a novel architecture that jointly processes instrumental and lyrical content, and is trained on a large-scale dataset covering diverse tasks such as genre classification, music tagging, and question answering. To facilitate robust evaluation, we also propose a new benchmark for multi-faceted music understanding called MuCUE (Music Comprehensive Understanding Evaluation). Experiments show our model significantly outperforms existing audio large language models across the MuCUE tasks, demonstrating its state-of-the-art effectiveness and generalization ability.

en cs.SD, cs.AI
arXiv Open Access 2025
musicolors: Bridging Sound and Visuals For Synesthetic Creative Musical Experience

ChungHa Lee, Jin-Hyuk Hong

Music visualization is an important medium that enables synesthetic experiences and creative inspiration. However, previous research focused mainly on the technical and theoretical aspects, overlooking users' everyday interaction with music visualizations. This gap highlights the pressing need for research on how music visualization influences users in synesthetic creative experiences and where they are heading. Thus, we developed musicolors, a web-based music visualization library available in real-time. Additionally, we conducted a qualitative user study with composers, developers, and listeners to explore how they use musicolors to appreciate and get inspiration and craft the music-visual interaction. The results show that musicolors provides a rich value of music visualization to users through sketching for musical ideas, integrating visualizations with other systems or platforms, and synesthetic listening. Based on these findings, we also provide guidelines for future music visualizations to offer a more interactive and creative experience.

en cs.HC, cs.MM
DOAJ Open Access 2025
INVESTIGATING LOW ATTENTION SPAN IN KINDERGARTEN 3: A CASE STUDY AT SCHOOL XYZ [PENYELIDIKAN RENDAHNYA RENTANG PERHATIAN DI TAMAN KANAK-KANAK 3: SEBUAH STUDI KASUS DI SEKOLAH XYZ]

Tania Theresia Manalu, Yonathan Winardi

The attention span of early childhood children is a critical factor in forming their developmental foundation, encompassing physical, cognitive, social, and emotional growth. Informal interviews with classroom teachers reveal that students struggle to maintain attention during instruction, often displaying behaviors such as restlessness and lack of focus immediately after the lesson begins. This thesis aims to investigate the impact of internal and external factors on the attention span among Kindergarten 3 students at Sekolah XYZ, focusing on regular classroom settings and specialized classes (music and art). This research employs a case study method by combining observations, interviews, and documentation to provide a comprehensive understanding of the issue. The findings of this research include a limited direct exploration within the classroom and the type of curriculum currently used, which are reasons for the low attention span of Kindergarten 3 students. This study highlights the novelty of tactile learning approaches to enhance attention span. To expand these findings, future research is expected to consider comparative studies across Yayasan XYZ network, investigating the impact of direct exploration and different educational approaches on the attention and learning outcomes of students in specialist classes. Abstrak bahasa Indonesia Rentang perhatian anak usia dini adalah faktor kritis dalam pembentukan dasar perkembangan mereka, mencakup pertumbuhan fisik, kognitif, sosial, dan emosional. Wawancara informal dengan guru-guru kelas mengungkapkan bahwa murid kesulitan untuk mempertahankan perhatian selama instruksi, sering menunjukkan perilaku seperti gelisah dan ketidakfokusan segera setelah pembelajaran dimulai. Penelitian ini bertujuan untuk menyelidiki dampak dari faktor internal dan eksternal terhadap rentang perhatian di kalangan murid Kindergaten 3 di Sekolah XYZ, berfokus pada setting kelas reguler dan khusus (musik dan seni). Penelitian ini menggunakan metode studi kasus dengan menggabungkan observasi, wawancara, dan dokumentasi untuk memberikan pemahaman komprehensif tentang isu tersebut. Hasil dari penelitian ini ialah eksplorasi langsung yang terbatas dalam kelas, juga jenis kurikulum yang digunakan saat ini yang menjadi alasan rendahnya rentang perhatian siswa TK 3. Penelitian ini menyoroti kebaruan dari pendekatan pembelajaran taktil untuk meningkatkan rentang perhatian. Untuk memperluas temuan ini, penelitian masa depan diharapkan mempertimbangkan studi perbandingan di seluruh jaringan Yayasan XYZ, menyelidiki dampak eksplorasi langsung dan pendekatan pendidikan yang berbeda terhadap perhatian dan hasil belajar murid di kelas-kelas khusus.

Education, Education (General)
DOAJ Open Access 2025
Psychological intervention for depression, anxiety, and quality of life in patients undergoing hemodialysis

Choirunnisa Aprilia Setyo Putri, Kuswantoro Rusca Putra, Lilik Supriati

<p><strong><em>Background</em></strong><strong><em>:</em></strong><em> Psychological issues in chronic kidney disease patients undergoing hemodialysis are three times higher compared to other patients. Depression and anxiety in chronic kidney disease patients undergoing hemodialysis are the most significant variables related to the decrease in quality of life.</em></p><p><strong><em>Objectives: </em></strong><em>This systematic review aims to determine the effectiveness of psychological interventions in reducing levels of depression and anxiety as well as improving the quality of life in patients undergoing hemodialysis.<strong></strong></em></p><p><strong><em>Methods</em></strong><strong><em>: </em></strong><em>Articles were identified using online databases such as ProQuest, PubMed, and Science Direct with the keywords Psychological Intervention AND Hemodialysis AND Depression OR Anxiety OR Quality of Life, published between 2020 and 2025. The PICO framework was utilized: patients undergoing hemodialysis with psychological interventions compared to other interventions, with outcomes including levels of depression, anxiety, and quality of life. The selection process was conducted using the PRISMA method, and article evaluation was performed using the Joanna Briggs Institute (JBI) checklist.</em></p><p><strong><em>Results</em></strong><strong><em>: </em></strong><em>A total of 16 selected scientific evidences were randomized controlled trials involving 1422 chronic kidney disease patients undergoing hemodialysis, aged 18–80 years. Fifteen types of therapies were identified as potential interventions due to their demonstrated effectiveness among participants, including: cognitive behavioral intervention, cognitive behavioral group therapy, cognitive behavioral intervention + resilience model, multifaceted education, positive thinking training, resilience training, spiritual care, mindfulness meditation + progressive muscle relaxation, recreational therapy, emotional disclosure writing, self-management program, live music, acupressure, aromatherapy massage, and virtual reality exercise.</em></p><p><strong><em>Conclusions</em></strong><strong><em>: </em></strong><em>Various types of psychological therapies can serve as effective and efficient intervention options to reduce levels of depression and anxiety, as well as improve the quality of life for chronic kidney disease patients undergoing hemodialysis.</em></p>

Gynecology and obstetrics
arXiv Open Access 2024
Advancing Music Therapy: Integrating Eastern Five-Element Music Theory and Western Techniques with AI in the Novel Five-Element Harmony System

Yubo Zhou, Weizhen Bian, Kaitai Zhang et al.

In traditional medical practices, music therapy has proven effective in treating various psychological and physiological ailments. Particularly in Eastern traditions, the Five Elements Music Therapy (FEMT), rooted in traditional Chinese medicine, possesses profound cultural significance and unique therapeutic philosophies. With the rapid advancement of Information Technology and Artificial Intelligence, applying these modern technologies to FEMT could enhance the personalization and cultural relevance of the therapy and potentially improve therapeutic outcomes. In this article, we developed a music therapy system for the first time by applying the theory of the five elements in music therapy to practice. This innovative approach integrates advanced Information Technology and Artificial Intelligence with Five-Element Music Therapy (FEMT) to enhance personalized music therapy practices. As traditional music therapy predominantly follows Western methodologies, the unique aspects of Eastern practices, specifically the Five-Element theory from traditional Chinese medicine, should be considered. This system aims to bridge this gap by utilizing computational technologies to provide a more personalized, culturally relevant, and therapeutically effective music therapy experience.

en cs.HC, cs.AI
arXiv Open Access 2024
SoundSignature: What Type of Music Do You Like?

Brandon James Carone, Pablo Ripollés

SoundSignature is a music application that integrates a custom OpenAI Assistant to analyze users' favorite songs. The system incorporates state-of-the-art Music Information Retrieval (MIR) Python packages to combine extracted acoustic/musical features with the assistant's extensive knowledge of the artists and bands. Capitalizing on this combined knowledge, SoundSignature leverages semantic audio and principles from the emerging Internet of Sounds (IoS) ecosystem, integrating MIR with AI to provide users with personalized insights into the acoustic properties of their music, akin to a musical preference personality report. Users can then interact with the chatbot to explore deeper inquiries about the acoustic analyses performed and how they relate to their musical taste. This interactivity transforms the application, acting not only as an informative resource about familiar and/or favorite songs, but also as an educational platform that enables users to deepen their understanding of musical features, music theory, acoustic properties commonly used in signal processing, and the artists behind the music. Beyond general usability, the application also incorporates several well-established open-source musician-specific tools, such as a chord recognition algorithm (CREMA), a source separation algorithm (DEMUCS), and an audio-to-MIDI converter (basic-pitch). These features allow users without coding skills to access advanced, open-source music processing algorithms simply by interacting with the chatbot (e.g., can you give me the stems of this song?). In this paper, we highlight the application's innovative features and educational potential, and present findings from a pilot user study that evaluates its efficacy and usability.

en cs.SD, cs.AI
arXiv Open Access 2024
Improving Controllability and Editability for Pretrained Text-to-Music Generation Models

Yixiao Zhang

The field of AI-assisted music creation has made significant strides, yet existing systems often struggle to meet the demands of iterative and nuanced music production. These challenges include providing sufficient control over the generated content and allowing for flexible, precise edits. This thesis tackles these issues by introducing a series of advancements that progressively build upon each other, enhancing the controllability and editability of text-to-music generation models. First, we introduce Loop Copilot, a system that tries to address the need for iterative refinement in music creation. Loop Copilot leverages a large language model (LLM) to coordinate multiple specialised AI models, enabling users to generate and refine music interactively through a conversational interface. Central to this system is the Global Attribute Table, which records and maintains key musical attributes throughout the iterative process, ensuring that modifications at any stage preserve the overall coherence of the music. While Loop Copilot excels in orchestrating the music creation process, it does not directly address the need for detailed edits to the generated content. To overcome this limitation, MusicMagus is presented as a further solution for editing AI-generated music. MusicMagus introduces a zero-shot text-to-music editing approach that allows for the modification of specific musical attributes, such as genre, mood, and instrumentation, without the need for retraining. By manipulating the latent space within pre-trained diffusion models, MusicMagus ensures that these edits are stylistically coherent and that non-targeted attributes remain unchanged. This system is particularly effective in maintaining the structural integrity of the music during edits, but it encounters challenges with more complex and real-world audio scenarios. ...

en cs.SD, eess.AS
DOAJ Open Access 2024
Heterophony as a Way of Organizing of the Musical Syntax

Iulian RUSU

In this article we intend to present some aspects related to the origin of the concept of heterophony and the theoretical concerns of some Romanian and foreign composers on this subject. As a practical application model, we present an analysis of a musical text based on the model proposed by Teodor Tutuianu in his book Eterofonii in partituri Bachiene, the book underlying the Spectromorphy course that the author, as a professor, held at the National University of Music in Bucharest.

arXiv Open Access 2023
DISCO-10M: A Large-Scale Music Dataset

Luca A. Lanzendörfer, Florian Grötschla, Emil Funke et al.

Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.

en cs.SD, cs.LG
DOAJ Open Access 2023
A Stimulus Set of 40 Popular Music Drum Patterns with Perceived Complexity Measures

Olivier Senn, Florian Hoesl, Rafael Jerjen et al.

This study presents an audio stimulus set of 40 drum patterns from Western popular music with empirical measurements of perceived complexity. The audio stimuli are meticulous reconstructions of drum patterns found in commercial recordings; they are based on careful transcriptions (carried out by professional musicians), drum stroke loudness information, and highly precise onset timing measurements. The 40 stimuli are a subset selected from a previously published larger corpus of reconstructed Western popular music drum patterns (Lucerne Groove Research Library). The patterns were selected according to two criteria: a) they only feature the bass drum, snare drum, and one or more cymbals, and b) they plausibly cover the complexity range of the corpus. Perceived stimulus complexity was measured in a listening experiment using a pairwise comparison design with 220 participants (4,400 trials). In each trial, participants were presented with two stimuli, and they stated which of the two sounded more complex to them. The comparison data then served to calculate complexity estimates using the Bradley–Terry probability model. The complexity estimates have an intuitive interpretation: they allow calculation of the probability that one pattern is considered more complex than another pattern in a pairwise comparison. To our knowledge, this is the first set of naturalistic music stimuli with meaningful perceived complexity estimates. The drum pattern stimuli and complexity measurements can be used for listening experiments in music psychology. The stimuli will further allow measures and models of drum pattern complexity to be assessed.

Music, Psychology
DOAJ Open Access 2023
Survey of Research on Automatic Music Annotation and Classification Methods

ZHANG Rulin, WANG Hailong, LIU Lin, PEI Dongmei

Music is one of the most popular forms of art and entertainment, and it is an artistic language to express or entrust people??s feelings. However, with the rapid increase of digital music, it is very difficult to manage and filter music through shallow information. As an effective means to organize massive music and enrich the music information, automatic music annotation can overcome the semantic gap in music information retrieval, improve music information, make music more intuitive in expression, and promote the in-depth research of music information retrieval tasks such as music classification, music recommendation, and instrument identification. The current automatic music annotation mainly focuses on solving two problems: feature extraction and model selection. Combined with the current research focus, this paper expounds the relevant knowledge of automatic music annotation. This paper systematically sorts out various audio feature representation and feature extraction methods in the field of music automatic annotation, and conducts quantitative and qualitative analysis of each extraction method. This paper summarizes the related research results in this field, and focuses on the differences between different model methods from the perspectives of machine learning and deep learning. The commonly used datasets and performance evaluation indicators are introduced, the characteristics of different datasets are summarized, and the evaluation indicators are classified and analyzed. Finally, the difficulties and challenges faced by the research in the field of music automatic annotation are pointed out and the future is prospected.

Electronic computers. Computer science
arXiv Open Access 2022
musicaiz: A Python Library for Symbolic Music Generation, Analysis and Visualization

Carlos Hernandez-Olivan, Jose R. Beltran

In this article, we present musicaiz, an object-oriented library for analyzing, generating and evaluating symbolic music. The submodules of the package allow the user to create symbolic music data from scratch, build algorithms to analyze symbolic music, encode MIDI data as tokens to train deep learning sequence models, modify existing music data and evaluate music generation systems. The evaluation submodule builds on previous work to objectively measure music generation systems and to be able to reproduce the results of music generation models. The library is publicly available online. We encourage the community to contribute and provide feedback.

en cs.SD, cs.MM
DOAJ Open Access 2022
Párizsi járvány és művészvilág

Marta-Adrienne Elekes

Paris Epidemic and Art World When reading music history writings, music textbooks a while ago, we quickly skipped those statements as “in that year cholera was raging in the city”, or “everybody who had the possibility moved to the countryside away from the epidemic”. Today however, as a global pandemic hit us as well, we are getting caught up in these findings, and we are reading every detail with special attention. It is a fact that in the spring of 1832, Paris suffered a widespread cholera epidemic, resulting in 18,400 deaths in the city and nearly 100,000 in France. We wonder how this affected the contemporary art life? We will try to analyze this, quoting from the testimonies of the writers-poets, as well as focusing on the living conditions of some significant musicians, highlighting the events of their career from those times.

Music and books on Music, Arts in general
arXiv Open Access 2021
Visualizing Ensemble Predictions of Music Mood

Zelin Ye, Min Chen

Music mood classification has been a challenging problem in comparison with other music classification problems (e.g., genre, composer, or period). One solution for addressing this challenge is to use an ensemble of machine learning models. In this paper, we show that visualization techniques can effectively convey the popular prediction as well as uncertainty at different music sections along the temporal axis while enabling the analysis of individual ML models in conjunction with their application to different musical data. In addition to the traditional visual designs, such as stacked line graph, ThemeRiver, and pixel-based visualization, we introduce a new variant of ThemeRiver, called "dual-flux ThemeRiver", which allows viewers to observe and measure the most popular prediction more easily than stacked line graph and ThemeRiver. Together with pixel-based visualization, dual-flux ThemeRiver plots can also assist in model-development workflows, in addition to annotating music using ensemble model predictions.

en eess.AS, cs.AI
arXiv Open Access 2021
LoopNet: Musical Loop Synthesis Conditioned On Intuitive Musical Parameters

Pritish Chandna, António Ramires, Xavier Serra et al.

Loops, seamlessly repeatable musical segments, are a cornerstone of modern music production. Contemporary artists often mix and match various sampled or pre-recorded loops based on musical criteria such as rhythm, harmony and timbral texture to create compositions. Taking such criteria into account, we present LoopNet, a feed-forward generative model for creating loops conditioned on intuitive parameters. We leverage Music Information Retrieval (MIR) models as well as a large collection of public loop samples in our study and use the Wave-U-Net architecture to map control parameters to audio. We also evaluate the quality of the generated audio and propose intuitive controls for composers to map the ideas in their minds to an audio loop.

en cs.SD, cs.LG

Halaman 35 dari 24850