Hasil untuk "Music"

Menampilkan 20 dari ~1059084 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar

JSON API
arXiv Open Access 2025
Large Language Models' Internal Perception of Symbolic Music

Andrew Shin, Kunitake Kaneko

Large language models (LLMs) excel at modeling relationships between strings in natural language and have shown promise in extending to other symbolic domains like coding or mathematics. However, the extent to which they implicitly model symbolic music remains underexplored. This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts describing combinations of genres and styles, and evaluating their utility through recognition and generation tasks. We produce a dataset of LLM-generated MIDI files without relying on explicit musical training. We then train neural networks entirely on this LLM-generated MIDI dataset and perform genre and style classification as well as melody completion, benchmarking their performance against established models. Our results demonstrate that LLMs can infer rudimentary musical structures and temporal relationships from text, highlighting both their potential to implicitly encode musical patterns and their limitations due to a lack of explicit musical context, shedding light on their generative capabilities for symbolic music.

en cs.CL, cs.AI
arXiv Open Access 2025
Revisiting MUSIC: A Finite-Precision Perspective

Yiming Fang, Li Chen, Ang Chen et al.

The high computational complexity of the multiple signal classification (MUSIC) algorithm is mainly caused by the subspace decomposition and spectrum search, especially for frequent real-time applications or massive sensors. In this paper, we propose a low-complexity MUSIC algorithm from a finite-precision arithmetic perspective. First, we analyze the computational bottlenecks of the classic low-complexity randomized unitary-based MUSIC (RU-MUSIC), formulating this computational issue as an inner product problem. Then, a mixed-precision method is introduced to address this problem. Specifically, this method partitions summations in inner products into blocks, where intra-block computations use low-precision arithmetic and inter-block sums use high-precision arithmetic. To further improve computational accuracy, we develop an adaptive-precision method that supports adaptive block sizes and multiple precision levels. Finally, simulation results show that the proposed finite-precision MUSIC design achieves direction-of-arrival (DOA) estimation performance similar to that using full-precision arithmetic while reducing more than 50\% computational cost.

en eess.SP
arXiv Open Access 2025
Are Expressions for Music Emotions the Same Across Cultures?

Elif Celen, Pol van Rijn, Harin Lee et al.

Music evokes profound emotions, yet the universality of emotional descriptors across languages remains debated. A key challenge in cross-cultural research on music emotion is biased stimulus selection and manual curation of taxonomies, predominantly relying on Western music and languages. To address this, we propose a balanced experimental design with nine online experiments in Brazil, the US, and South Korea, involving N=672 participants. First, we sample a balanced set of popular music from these countries. Using an open-ended tagging pipeline, we then gather emotion terms to create culture-specific taxonomies. Finally, using these bottom-up taxonomies, participants rate emotions of each song. This allows us to map emotional similarities within and across cultures. Results show consistency in high arousal, high valence emotions but greater variability in others. Notably, machine translations were often inadequate to capture music-specific meanings. These findings together highlight the need for a domain-sensitive, open-ended, bottom-up emotion elicitation approach to reduce cultural biases in emotion research.

en cs.CL, cs.HC
arXiv Open Access 2025
Exploring GPT's Ability as a Judge in Music Understanding

Kun Fang, Ziyu Wang, Gus Xia et al.

Recent progress in text-based Large Language Models (LLMs) and their extended ability to process multi-modal sensory data have led us to explore their applicability in addressing music information retrieval (MIR) challenges. In this paper, we use a systematic prompt engineering approach for LLMs to solve MIR problems. We convert the music data to symbolic inputs and evaluate LLMs' ability in detecting annotation errors in three key MIR tasks: beat tracking, chord extraction, and key estimation. A concept augmentation method is proposed to evaluate LLMs' music reasoning consistency with the provided music concepts in the prompts. Our experiments tested the MIR capabilities of Generative Pre-trained Transformers (GPT). Results show that GPT has an error detection accuracy of 65.20%, 64.80%, and 59.72% in beat tracking, chord extraction, and key estimation tasks, respectively, all exceeding the random baseline. Moreover, we observe a positive correlation between GPT's error finding accuracy and the amount of concept information provided. The current findings based on symbolic music input provide a solid ground for future LLM-based MIR research.

en cs.IR, cs.SD
arXiv Open Access 2025
Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation

Jincheng Zhang, György Fazekas, Charalampos Saitis

The recent surge in the popularity of diffusion models for image synthesis has attracted new attention to their potential for generation tasks in other domains. However, their applications to symbolic music generation remain largely under-explored because symbolic music is typically represented as sequences of discrete events and standard diffusion models are not well-suited for discrete data. We represent symbolic music as image-like pianorolls, facilitating the use of diffusion models for the generation of symbolic music. Moreover, this study introduces a novel diffusion model that incorporates our proposed Transformer-Mamba block and learnable wavelet transform. Classifier-free guidance is utilised to generate symbolic music with target chords. Our evaluation shows that our method achieves compelling results in terms of music quality and controllability, outperforming the strong baseline in pianoroll generation. Our code is available at https://github.com/jinchengzhanggg/proffusion.

en cs.SD, cs.AI
DOAJ Open Access 2025
Instrument sound classification using a music-based feature extraction model inspired by Mozart's Turkish March pattern

Mengmeng Chen, Diying Tang, Yu Xiang et al.

In the era of advanced artificial intelligence (AI) models, the intersection of music and pattern recognition has garnered significant interest. This study investigates the application of music-inspired features for the classification of instrument sounds. A novel feature extraction model, based on the harmonic patterns observed in Mozart's Turkish March, is proposed to enhance the detection and classification of sounds. A dataset comprising over 40,000 sound samples from 28 distinct musical instruments was utilised for evaluating the proposed approach. The feature engineering (FE) model employed in this study consists of three distinct phases: feature extraction, feature selection, and classification. During the feature extraction phase, a multilevel discrete wavelet transform (MDWT) was combined with the Turkish March pattern (TurkMarchPat) to capture a comprehensive set of features. In the subsequent feature selection phase, neighbourhood component analysis (NCA) was applied to identify the most discriminative features, which were then input into a k-nearest neighbours (kNN) classifier for sound classification. The results demonstrated the effectiveness of the proposed TurkMarchPat-based FE model, achieving a classification accuracy of 97.87 % on the instrument sound dataset. These findings suggest that the application of harmonic patterns, such as those derived from Mozart’s Turkish March, offers a promising approach to sound classification, demonstrating both the robustness and efficiency of the model. The proposed method holds potential for advancing the field of acoustic pattern recognition and could be extended to other domains requiring high-performance sound classification.

Engineering (General). Civil engineering (General)
DOAJ Open Access 2025
Randomised active controlled trial examining effects of aerobic exercise, cognitive and music interventions on depression, balance and mobility in schizophrenia

Razieh Khanmohammadi, Hasan Mirali, Hasan Mohammadzadeh et al.

Abstract Schizophrenia significantly impairs daily functioning, requiring innovative, cost-effective treatments beyond standard antipsychotics, and cognitive interventions. This study examined the individual and combined effects of cognitive, music, and aerobic exercise interventions on depression, balance, and mobility in patients with schizophrenia and severe depression. Eighty-four male patients with schizophrenia and severe depression from an inpatient psychiatric centre participated in a 12-week, single-blind, randomised active-controlled trial. Participants were systematically assigned to one of seven equal groups (n = 12 each): aerobic exercise (AerG), cognitive rehabilitation/treatment-as-usual (CogG), music intervention (MusG), aerobic exercise + music intervention (A&MG), aerobic exercise + cognitive intervention (A&CG), cognitive intervention + music intervention (C&MG), and a comprehensive combination of all three modalities (ACMG). Each intervention was delivered over 60 min, three times weekly for 12 weeks. The study employed the Beck Depression Inventory Short Form, Stork Balance Test, and modified Timed Up and Go Test to assess improvements in depression, balance, and mobility. Statistical analyses were conducted using paired t-tests for within-group comparisons and ANCOVA with Bonferroni post hoc tests for between-group differences, with significance set at p ≤ 0.05. Results showed significant improvements in depression, balance, and mobility across all treatment groups. The CogG group outperformed both AerG and MusG in all outcomes, establishing it as the gold-standard comparator. A&CG yielded greater benefits than other single or dual-modality groups, while the multimodal ACMG group demonstrated the most substantial improvements across all measures. These findings highlight the practical value of incorporating multimodal interventions into standard care to improve both mental health and physical function, offering a scalable, cost-effective approach to addressing the diverse needs of this population of patients with schizophrenia and severe depression. Implementing such interventions in psychiatric care settings could lead to more comprehensive and effective treatment strategies for improving patient outcomes.

Medicine, Science
DOAJ Open Access 2025
Still Here

Kristine Gustavsen Madsø , Inger Hilde Nordhus

Following the completion of a clinical research project assessing music therapy for home-dwelling people with dementia and their significant other, we developed an interactive photo exhibition, “Still here,” that was launched on World Alzheimer´s Day, September 21st, 2021, in the city center of Bergen. We aimed to communicate to the local citizens how music may facilitate experiences of positive identity and relational mutuality for people with dementia and their significant others. Stereotypes about loss and decline still characterize the mainstream discourse of dementia. Myths, misconceptions, and stigma are associated with dementia and have a significant impact on the people who live with the condition. We aimed to portray a different narrative of dementia to the public, a narrative of resilience and creative capacity—and, in so doing, to knowingly recuperate and communicate the idea of psychological resilience in contradistinction to the neoliberal constructions of this concept. We intended to reflect a perspective closer to how individuals describe or experience living with dementia. This paper tells the story of how the exhibition was developed to disseminate the findings of our research project, elaborating on the challenges current narratives on dementia present in our field and how the photographs became narrators of another story of dementia.

Arts in general, Language and Literature
arXiv Open Access 2024
Analyzing Musical Characteristics of National Anthems in Relation to Global Indices

S M Rakib Hasan, Aakar Dhakal, Ms. Ayesha Siddiqua et al.

Music plays a huge part in shaping peoples' psychology and behavioral patterns. This paper investigates the connection between national anthems and different global indices with computational music analysis and statistical correlation analysis. We analyze national anthem musical data to determine whether certain musical characteristics are associated with peace, happiness, suicide rate, crime rate, etc. To achieve this, we collect national anthems from 169 countries and use computational music analysis techniques to extract pitch, tempo, beat, and other pertinent audio features. We then compare these musical characteristics with data on different global indices to ascertain whether a significant correlation exists. Our findings indicate that there may be a correlation between the musical characteristics of national anthems and the indices we investigated. The implications of our findings for music psychology and policymakers interested in promoting social well-being are discussed. This paper emphasizes the potential of musical data analysis in social research and offers a novel perspective on the relationship between music and social indices. The source code and data are made open-access for reproducibility and future research endeavors. It can be accessed at http://bit.ly/na_code.

en cs.SD, cs.AI
arXiv Open Access 2024
Unrolled Creative Adversarial Network For Generating Novel Musical Pieces

Pratik Nag

Music generation has emerged as a significant topic in artificial intelligence and machine learning. While recurrent neural networks (RNNs) have been widely employed for sequence generation, generative adversarial networks (GANs) remain relatively underexplored in this domain. This paper presents two systems based on adversarial networks for music generation. The first system learns a set of music pieces without differentiating between styles, while the second system focuses on learning and deviating from specific composers' styles to create innovative music. By extending the Creative Adversarial Networks (CAN) framework to the music domain, this work introduces unrolled CAN to address mode collapse, evaluating both GAN and CAN in terms of creativity and variation.

en cs.SD, cs.LG
arXiv Open Access 2024
Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey

Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller et al.

Several adaptations of Transformers models have been developed in various domains since its breakthrough in Natural Language Processing (NLP). This trend has spread into the field of Music Information Retrieval (MIR), including studies processing music data. However, the practice of leveraging NLP tools for symbolic music data is not novel in MIR. Music has been frequently compared to language, as they share several similarities, including sequential representations of text and music. These analogies are also reflected through similar tasks in MIR and NLP. This survey reviews NLP methods applied to symbolic music generation and information retrieval studies following two axes. We first propose an overview of representations of symbolic music adapted from natural language sequential representations. Such representations are designed by considering the specificities of symbolic music. These representations are then processed by models. Such models, possibly originally developed for text and adapted for symbolic music, are trained on various tasks. We describe these models, in particular deep learning models, through different prisms, highlighting music-specialized mechanisms. We finally present a discussion surrounding the effective use of NLP tools for symbolic music data. This includes technical issues regarding NLP methods and fundamental differences between text and music, which may open several doors for further research into more effectively adapting NLP tools to symbolic MIR.

en cs.IR, cs.AI
arXiv Open Access 2024
Text2midi: Generating Symbolic Music from Captions

Keshav Bhandari, Abhinaba Roy, Kyra Wang et al.

This paper introduces text2midi, an end-to-end model to generate MIDI files from textual descriptions. Leveraging the growing popularity of multimodal generative approaches, text2midi capitalizes on the extensive availability of textual data and the success of large language models (LLMs). Our end-to-end system harnesses the power of LLMs to generate symbolic music in the form of MIDI files. Specifically, we utilize a pretrained LLM encoder to process captions, which then condition an autoregressive transformer decoder to produce MIDI sequences that accurately reflect the provided descriptions. This intuitive and user-friendly method significantly streamlines the music creation process by allowing users to generate music pieces using text prompts. We conduct comprehensive empirical evaluations, incorporating both automated and human studies, that show our model generates MIDI files of high quality that are indeed controllable by text captions that may include music theory terms such as chords, keys, and tempo. We release the code and music samples on our demo page (https://github.com/AMAAI-Lab/Text2midi) for users to interact with text2midi.

en cs.SD, cs.AI
arXiv Open Access 2024
Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset

Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick et al.

Recent years have seen many audio-domain text-to-music generation models that rely on large amounts of text-audio pairs for training. However, symbolic-domain controllable music generation has lagged behind partly due to the lack of a large-scale symbolic music dataset with extensive metadata and captions. In this work, we present MetaScore, a new dataset consisting of 963K musical scores paired with rich metadata, including free-form user-annotated tags, collected from an online music forum. To approach text-to-music generation, We employ a pretrained large language model (LLM) to generate pseudo-natural language captions for music from its metadata tags. With the LLM-enhanced MetaScore, we train a text-conditioned music generation model that learns to generate symbolic music from the pseudo captions, allowing control of instruments, genre, composer, complexity and other free-form music descriptors. In addition, we train a tag-conditioned system that supports a predefined set of tags available in MetaScore. Our experimental results show that both the proposed text-to-music and tags-to-music models outperform a baseline text-to-music model in a listening test. While a concurrent work Text2MIDI also supports free-form text input, our models achieve comparable performance. Moreover, the text-to-music system offers a more natural interface than the tags-to-music model, as it allows users to provide free-form natural language prompts.

en cs.SD, eess.AS
arXiv Open Access 2024
Benchmarking Sub-Genre Classification For Mainstage Dance Music

Hongzhi Shu, Xinglin Li, Hongyu Jiang et al.

Music classification, a cornerstone of music information retrieval, supports a wide array of applications. To address the lack of comprehensive datasets and effective methods for sub-genre classification in mainstage dance music, we introduce a novel benchmark featuring a new dataset and baseline. Our dataset expands the scope of sub-genres to reflect the diversity of recent mainstage live sets performed by leading DJs at global music festivals, capturing the vibrant and rapidly evolving electronic dance music (EDM) scene that engages millions of fans worldwide. We employ a continuous soft labeling approach to accommodate tracks blending multiple sub-genres, preserving their inherent complexity. Experiments demonstrate that even state-of-the-art multimodal large language models (MLLMs) struggle with this task, while our specialized baseline models achieve high accuracy. This benchmark supports applications such as music recommendation, DJ set curation, and interactive multimedia systems, with video demos provided. Our code and data are all open-sourced at https://github.com/Gariscat/housex-v2.git.

en cs.SD, cs.AI
DOAJ Open Access 2024
Le musicien ethnographe à l’épreuve de ses émotions

Sacha Thiébaud

This article proposes a methodological reflection on how respondents can confront the researcher with his or her own emotions. For the purpose of my study I joined a punk music band, as a guitarist. In this context, the sanctions imposed on my instrumental movements have revealed a set of shared feeling rules. I show how the emotion work according to musicians’s expectations gave me access to the intimacy of an underground band, and how this emotional normalization became a heuristic point of view that enabled to better understand the role of affects within regional punk culture.

Sociology (General)
arXiv Open Access 2023
CoCoFormer: A controllable feature-rich polyphonic music generation method

Jiuyang Zhou, Tengfei Niu, Hong Zhu et al.

This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music textures. This paper proposed Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level. In this paper, the self-supervised method improves the loss function and performs joint training through conditional control input and unconditional input training. In order to alleviate the lack of diversity on generated samples caused by the teacher forcing training, this paper added an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms. In this paper, the experiments proves that CoCoFormer has reached the current better level than current models. On the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways.

en cs.SD, cs.AI
arXiv Open Access 2023
LLark: A Multimodal Instruction-Following Language Model for Music

Josh Gardner, Simon Durand, Daniel Stoller et al.

Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for \emph{music} understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets and converting them to a unified instruction-tuning format. We propose a multimodal architecture for LLark, integrating a pretrained generative model for music with a pretrained language model. In evaluations on three types of tasks (music understanding, captioning, reasoning), we show that LLark matches or outperforms existing baselines in music understanding, and that humans show a high degree of agreement with its responses in captioning and reasoning tasks. LLark is trained entirely from open-source music data and models, and we make our training code available along with the release of this paper. Additional results and audio examples are at https://bit.ly/llark, and our source code is available at https://github.com/spotify-research/llark .

en cs.SD, cs.LG
arXiv Open Access 2023
Music Playlist Title Generation Using Artist Information

Haven Kim, SeungHeon Doh, Junwon Lee et al.

Automatically generating or captioning music playlist titles given a set of tracks is of significant interest in music streaming services as customized playlists are widely used in personalized music recommendation, and well-composed text titles attract users and help their music discovery. We present an encoder-decoder model that generates a playlist title from a sequence of music tracks. While previous work takes track IDs as tokenized input for playlist title generation, we use artist IDs corresponding to the tracks to mitigate the issue from the long-tail distribution of tracks included in the playlist dataset. Also, we introduce a chronological data split method to deal with newly-released tracks in real-world scenarios. Comparing the track IDs and artist IDs as input sequences, we show that the artist-based approach significantly enhances the performance in terms of word overlap, semantic relevance, and diversity.

en cs.IR, cs.CL
arXiv Open Access 2023
VampNet: Music Generation via Masked Acoustic Token Modeling

Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar et al.

We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.

en cs.SD, cs.AI

Halaman 46 dari 52955