Hasil untuk "Music"

Menampilkan 20 dari ~1058343 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar

JSON API
arXiv Open Access 2026
Music as an Asset Class

Sasha Stoikov, Aadityaa Singla, Umu Cetin et al.

In the streaming era, music revenues distributed to rights holders have become more transparent. However, it is not yet clear how to quantify the risk and return characteristics of music royalty assets, as is done with equities. In this paper, we fit three discounted cashflow models to transactions on the Royalty Exchange platform. We use our best model to backtest the one year and five year performance of music royalty assets, after transaction costs. We find that Life of Rights (LOR) music assets had risk and return characteristics comparable to stocks in the S\&P500, when held over 5 years. Since the performance of stocks and music assets are likely to be uncorrelated, this result may help investors assess this asset class within the context of a more traditional stock and bond portfolio.

en q-fin.PR, q-fin.PM
DOAJ Open Access 2026
Musical Instruments As Metaphor In Adeola’s Ni Ile Wa (In Our Land)

Hameed Olutoba Lawal

Dance serves as a potent metaphor for Taiye Adeola’s Ni Ile Wa, embodying cultural identity, resistance, and transformation. This study explores the symbolic and narrative functions of dance within the text, and examines how movement and rhythm reflect societal tensions, personal struggles, and communal aspirations. Drawing on theories of performance, embodiment, and postcolonial aesthetics, this paper argues that dance in Ni Ile Wa is not merely an artistic expression, but a language through which characters negotiate power, memory, and belonging. Ultimately, work positions dance as a dynamic force that bridges the past and present, reinforcing cultural heritage while enabling new forms of self-expression.

arXiv Open Access 2025
Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning

Zhilin Wang, Zhe Yang, Yun Luo et al.

Enhancing the ability of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) to interpret sheet music is a crucial step toward building AI musicians. However, current research lacks both evaluation benchmarks and training data for sheet music reasoning. Inspired by mathematics, where simple operations yield infinite verifiable problems, we introduce a novel approach that treats core music theory rules, such as those governing beats and intervals, as programmatic functions to systematically synthesize a vast and diverse corpus of sheet music reasoning problems. This approach allows us to introduce a data synthesis framework that generates verifiable sheet music questions in both textual and visual modalities, leading to the Synthetic Sheet Music Reasoning Benchmark (SSMR-Bench) and a complementary training set. Evaluation results on SSMR-Bench highlight the key role reasoning plays in interpreting sheet music, while also pointing out the ongoing challenges in understanding sheet music in a visual format. By leveraging synthetic data for RLVR, all models show significant improvements on the SSMR-Bench. Additionally, they also demonstrate considerable advancements on previously established human-crafted benchmarks, such as MusicTheoryBench and the music subset of MMMU. Finally, our results show that the enhanced reasoning ability can also facilitate music composition.

en cs.CL
arXiv Open Access 2025
Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music

Tianle Wang, Sirui Zhang, Xinyi Tong et al.

This paper presents an unsupervised machine learning algorithm that identifies recurring patterns -- referred to as ``music-words'' -- from symbolic music data. These patterns are fundamental to musical structure and reflect the cognitive processes involved in composition. However, extracting these patterns remains challenging because of the inherent semantic ambiguity in musical interpretation. We formulate the task of music-word discovery as a statistical optimization problem and propose a two-stage Expectation-Maximization (EM)-based learning framework: 1. Developing a music-word dictionary; 2. Reconstructing the music data. When evaluated against human expert annotations, the algorithm achieved an Intersection over Union (IoU) score of 0.61. Our findings indicate that minimizing code length effectively addresses semantic ambiguity, suggesting that human optimization of encoding systems shapes musical semantics. This approach enables computers to extract ``basic building blocks'' from music data, facilitating structural analysis and sparse encoding. The method has two primary applications. First, in AI music, it supports downstream tasks such as music generation, classification, style transfer, and improvisation. Second, in musicology, it provides a tool for analyzing compositional patterns and offers insights into the principle of minimal encoding across diverse musical styles and composers.

en cs.SD, cs.CV
arXiv Open Access 2025
CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music Generation

Zhejing Hu, Yan Liu, Gong Chen et al.

Generative artificial intelligence in music has made significant strides, yet it still falls short of the substantial achievements seen in natural language processing, primarily due to the limited availability of music data. Knowledge-informed approaches have been shown to enhance the performance of music generation models, even when only a few pieces of musical knowledge are integrated. This paper seeks to leverage comprehensive music theory in AI-driven music generation tasks, such as algorithmic composition and style transfer, which traditionally require significant manual effort with existing techniques. We introduce a novel automatic music lexicon construction model that generates a lexicon, named CompLex, comprising 37,432 items derived from just 9 manually input category keywords and 5 sentence prompt templates. A new multi-agent algorithm is proposed to automatically detect and mitigate hallucinations. CompLex demonstrates impressive performance improvements across three state-of-the-art text-to-music generation models, encompassing both symbolic and audio-based methods. Furthermore, we evaluate CompLex in terms of completeness, accuracy, non-redundancy, and executability, confirming that it possesses the key characteristics of an effective lexicon.

en cs.SD, cs.AI
DOAJ Open Access 2025
Vibing the Young Consumer to Wellness: Exploring Lo-Fi Music Consumption Through the Positive Design Lens

Melanie Pius Dsouza, Ankitha Shetty, Sara Ellen D’Souza et al.

The consumption of lo-fi music as a wellness and productivity-inducing product has become increasingly popular among young consumers in recent years. This pioneering article explores emerging evidence on lo-fi music consumption for young consumer wellness, using the positive design framework as a lens, and envisions an extensive future research agenda. Following a systematic approach to reviewing the literature, modeled on scoping review methodology, a thematic analysis of the literature is conducted, and theories from multiple disciplines support arguments. Key research gaps and current trends are identified, and a curated definition of the “lofi product” is provided. The study enhances the positive design framework of Desmet and Pohlmeyer with significant contributions from the themes generated, providing product strategists with a framework to design products that optimize young consumers’ wellness. The findings reveal that consumption of the “lofi product” may intensify positive affect, accelerate goal attainment, and improve health and performance while fostering the development of character strengths in young consumers. Intentionally designing products for young consumers using the proposed framework may also result in similar wellness outcomes. This study would empower marketers to leverage the lofi product effectively in their marketing strategies. Consultation with industry experts informs the future research directions proposed. This study calls out a pressing need for robust scientific investigation and academic discussion.

History of scholarship and learning. The humanities, Social Sciences
arXiv Open Access 2024
MusicScore: A Dataset for Music Score Modeling and Generation

Yuheng Lin, Zheqi Dai, Qiuqiang Kong

Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music scores contains more semantic information than audio and symbolic representations of music. Previous music score datasets have limited sizes and are mainly designed for optical music recognition (OMR). There is a lack of research on creating a large-scale benchmark dataset for music modeling and generation. In this work, we propose MusicScore, a large-scale music score dataset collected and processed from the International Music Score Library Project (IMSLP). MusicScore consists of image-text pairs, where the image is a page of a music score and the text is the metadata of the music. The metadata of MusicScore is extracted from the general information section of the IMSLP pages. The metadata includes rich information about the composer, instrument, piece style, and genre of the music pieces. MusicScore is curated into small, medium, and large scales of 400, 14k, and 200k image-text pairs with varying diversity, respectively. We build a score generation system based on a UNet diffusion model to generate visually readable music scores conditioned on text descriptions to benchmark the MusicScore dataset for music score generation. MusicScore is released to the public at https://huggingface.co/datasets/ZheqiDAI/MusicScore.

en cs.MM, cs.GR
arXiv Open Access 2024
The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Jiajia Li, Lu Yang, Mingni Tang et al.

Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs. ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs' performance in the domain of music. Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities. With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs' music-related abilities. The dataset is available at GitHub\footnote{https://github.com/zcli-charlie/ZIQI-Eval} and HuggingFace\footnote{https://huggingface.co/datasets/MYTH-Lab/ZIQI-Eval}.

en cs.SD, cs.AI
arXiv Open Access 2024
Music Consistency Models

Zhengcong Fei, Mingyuan Fan, Junshi Huang

Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music Consistency Models (\texttt{MusicCM}), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips, maintaining high quality while minimizing the number of sampling steps. Building upon existing text-to-music diffusion models, the \texttt{MusicCM} model incorporates consistency distillation and adversarial discriminator training. Moreover, we find it beneficial to generate extended coherent music by incorporating multiple diffusion processes with shared constraints. Experimental results reveal the effectiveness of our model in terms of computational efficiency, fidelity, and naturalness. Notable, \texttt{MusicCM} achieves seamless music synthesis with a mere four sampling steps, e.g., only one second per minute of the music clip, showcasing the potential for real-time application.

en cs.SD, cs.AI
arXiv Open Access 2024
DAIRHuM: A Platform for Directly Aligning AI Representations with Human Musical Judgments applied to Carnatic Music

Prashanth Thattai Ravikumar

Quantifying and aligning music AI model representations with human behavior is an important challenge in the field of MIR. This paper presents a platform for exploring the Direct alignment between AI music model Representations and Human Musical judgments (DAIRHuM). It is designed to enable musicians and experimentalists to label similarities in a dataset of music recordings, and examine a pre-trained model's alignment with their labels using quantitative scores and visual plots. DAIRHuM is applied to analyze alignment between NSynth representations, and a rhythmic duet between two percussionists in a Carnatic quartet ensemble, an example of a genre where annotated data is scarce and assessing alignment is non-trivial. The results demonstrate significant findings on model alignment with human judgments of rhythmic harmony, while highlighting key differences in rhythm perception and music similarity judgments specific to Carnatic music. This work is among the first efforts to enable users to explore human-AI model alignment in Carnatic music and advance MIR research in Indian music while dealing with data scarcity and cultural specificity. The development of this platform provides greater accessibility to music AI tools for under-represented genres.

en cs.SD, cs.AI
DOAJ Open Access 2024
Living Water: a look at sufficiency in Cameroon

Blick Bassy

In this interview Blick Bassy, a long-term campaigner for the preservation of water resources, talks about his vision of sufficiency. Many people in Cameroon who live without permanent access to tap-water intuitively manage water resources frugally. Their approach is rooted in a more direct relationship to water which, being visible, is almost a living thing. But in Cameroon and other African countries, water plays a vital spiritual and cultural role too. Ultimately, raising awareness about the scarcity of water resources can be achieved not only by focusing on water’s universal character as a common foundation for all, but also by using the art of music to shape new perceptions about water, as demonstrated by Blick Bassy.

Social Sciences
DOAJ Open Access 2024
Serbian folk costumes in textbooks of the first cycle of education

Vasilijević Danijela N., Sudzilovski Danijela M.

In the conditions of complex geopolitical changes in the world, the issue of national narratives is an indispensable research topic in various scientific fields. Appreciating various systemic, complex, mutually conditioned approaches to development, in addition to the complexity of the concept of national identity, the representation, appearance character and development of the concept of Serbian national costume, which is observed as an indispensable example of the Serbian national habitus, was observed from the aspect of pedagogy. This paper discusses the following issues: the way Serbian national costume is presented in the textbooks of the first cycle of education, and whether it is a formal or essential approach to the construction of the comprehensive idea of national identity. The author's research attention is focused on 24 textbooks of Science and social studies and Music (three publishers) for the first four grades of primary school. The findings have shown that the Serbian national costume is insufficiently, unsystematically, formally, often incorrectly presented in the analyzed textbooks, at the representative level; visual representations dominate over textual description, questions, tasks and requests; the content dimensionality and development of the concept in terms of intensity and extent were not noticed; it has been shown that there is no horizontal and vertical systemic content correlation.

History (General) and history of Europe, Social sciences (General)
DOAJ Open Access 2024
Iran’s Shahriaran Rastakhize Opera: An Anthropological Interpretation

Alireza Ghobadi

Mirzadeh Eshghi (1894–1924) was an innovative and patriotic Iranian poet. Mirzadeh is noted in the history of Iranian literature for pioneering a literary revolution and creating a new literary style. This qualitative study contemplates one of his literary innovations titled Iran’s Shahriaran Rastakhiz Opera, scrutinizing historical documents and employing ethnographic techniques. It expresses Mirzadeh’s concerns about the damage inflicted on the material and nonmaterial culture of Iranians after the Achaemenid era. This opera form utilizes four different genres of Iranian classical music, with six singers performing poems about cultural and social changes. Mirzadeh’s opera is a very attractive tool for inculcating social and cultural awareness, especially regarding Iran’s national and cultural identity. This study probes the diverse sociocultural and political functions accomplished by this dramatic work of art and simultaneously examines its problems.

Fine Arts
arXiv Open Access 2023
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning

Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun et al.

Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract music features. However, obtaining a suitable dataset for training the MU-LLaMA model remains challenging, as existing publicly accessible audio question answering datasets lack the necessary depth for open-ended music question answering. To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions. The experiments demonstrate that the proposed MU-LLaMA model, trained on our designed MusicQA dataset, achieves outstanding performance in both music question answering and music caption generation across various metrics, outperforming current state-of-the-art (SOTA) models in both fields and offering a promising advancement in the T2M-Gen research field.

en cs.SD, cs.AI
arXiv Open Access 2023
Music Mode: Transforming Robot Movement into Music Increases Likability and Perceived Intelligence

Catie Cuan, Emre Fisher, Allison Okamura et al.

As robots enter everyday spaces like offices, the sounds they create affect how they are perceived. We present Music Mode, a novel mapping between a robot's joint motions and sounds, programmed by artists and engineers to make the robot generate music as it moves. Two experiments were designed to characterize the effect of this musical augmentation on human users. In the first experiment, a robot performed three tasks while playing three different sound mappings. Results showed that participants observing the robot perceived it as more safe, animate, intelligent, anthropomorphic, and likable when playing the Music Mode Orchestra software. To test whether the results of the first experiment were due to the Music Mode algorithm, rather than music alone, we conducted a second experiment. Here the robot performed the same three tasks, while a participant observed via video, but the Orchestra music was either linked to its movement or random. Participants rated the robots as more intelligent when the music was linked to the movement. Robots using Music Mode logged approximately two hundred hours of operation while navigating, wiping tables, and sorting trash, and bystander comments made during this operating time served as an embedded case study. This paper has both designerly contributions and engineering contributions. The contributions are: (1) an interdisciplinary choreographic, musical, and coding design process to develop a real-world robot sound feature, (2) a technical implementation for movement-based sound generation, and (3) two experiments and an embedded case study of robots running this feature during daily work activities that resulted in increased likeability and perceived intelligence of the robot.

en cs.RO
arXiv Open Access 2023
ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

Pengfei Zhu, Chao Pang, Yekun Chai et al.

In recent years, the burgeoning interest in diffusion models has led to significant advances in image and speech generation. Nevertheless, the direct synthesis of music waveforms from unrestricted textual prompts remains a relatively underexplored domain. In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models. Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process within the diffusion model framework. Addressing the challenge of limited text-music parallel data, we undertake the creation of a dataset by harnessing web resources, a task facilitated by weak supervision techniques. Furthermore, a rigorous empirical inquiry is undertaken to contrast the efficacy of two distinct prompt formats for text conditioning, namely, music tags and unconstrained textual descriptions. The outcomes of this comparative analysis affirm the superior performance of our proposed model in terms of enhancing text-music relevance. Finally, our work culminates in a demonstrative exhibition of the excellent capabilities of our model in text-to-music generation. We further demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance.

en cs.SD, cs.AI
arXiv Open Access 2023
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

Shangda Wu, Dingyao Yu, Xu Tan et al.

We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss. To pre-train CLaMP, we collected a large dataset of 1.4 million music-text pairs. It employed text dropout as a data augmentation technique and bar patching to efficiently represent music data which reduces sequence length to less than 10\%. In addition, we developed a masked music model pre-training objective to enhance the music encoder's comprehension of musical context and structure. CLaMP integrates textual information to enable semantic search and zero-shot classification for symbolic music, surpassing the capabilities of previous models. To support the evaluation of semantic search and music classification, we publicly release WikiMusicText (WikiMT), a dataset of 1010 lead sheets in ABC notation, each accompanied by a title, artist, genre, and description. In comparison to state-of-the-art models that require fine-tuning, zero-shot CLaMP demonstrated comparable or superior performance on score-oriented datasets. Our models and code are available at https://github.com/microsoft/muzic/tree/main/clamp.

en cs.SD, cs.IR
arXiv Open Access 2023
Multi-Genre Music Transformer -- Composing Full Length Musical Piece

Abhinav Kaushal Keshari

In the task of generating music, the art factor plays a big role and is a great challenge for AI. Previous work involving adversarial training to produce new music pieces and modeling the compatibility of variety in music (beats, tempo, musical stems) demonstrated great examples of learning this task. Though this was limited to generating mashups or learning features from tempo and key distributions to produce similar patterns. Compound Word Transformer was able to represent music generation task as a sequence generation challenge involving musical events defined by compound words. These musical events give a more accurate description of notes progression, chord change, harmony and the art factor. The objective of the project is to implement a Multi-Genre Transformer which learns to produce music pieces through more adaptive learning process involving more challenging task where genres or form of the composition is also considered. We built a multi-genre compound word dataset, implemented a linear transformer which was trained on this dataset. We call this Multi-Genre Transformer, which was able to generate full length new musical pieces which is diverse and comparable to original tracks. The model trains 2-5 times faster than other models discussed.

en cs.SD, cs.LG
DOAJ Open Access 2023
ROSES, TOMATO CHUTNEY AND RISING SUN: ON VISIBILITY OF THREE FESTIVALS IN BULGARIA

Svetlana D. HRISTOVA-VLADI

Objectives. This study focuses on the visibility of three local festivals in Bulgaria: Rose Festival in Kazanlak, July Morning at Kamen bryag and the Festival of Peppers, Tomatoes, Traditional Foods, and Crafts in Kurtovo Konare. The research on festive visibility has been deconstructed to three components of analysis: story, local imagery and photogenicity (colors, photographic visuals). Material and methods. These include participant observations, in-depth interviews, analysis of visuals (both website and media ones as well as photographs, taken by the researcher), and desktop research of scientific literature and online media outlets. Results. The researcher conducted fieldwork as participant observer, interviewer, photographer, and visual analyst of festive events. It was discovered that the Rose Festival promotes pink symbols as prevalent elements of the cultural-historical branding, encompassing Thracian heritage and rose farming. July Morning has been commodified towards fragmented celebrations happening in the peripheral moment of 30th June and 1st July. This has obscured the sense of community and the sense of place affiliated with the initial phenomenon. Local farmers’ aesthetics and diligence play a central role in the publicity of Kurtovo Konare Fest: their agrarian knowledge and willpower to actively participate in social life, upskill and exchange know-how with fellow famers. Conclusions. The three local celebrations represent collections of sensations, colors, imagined experiences, memories, visitor’s expectations, sense of community and awoken sense of place. The optics of the Rose Festival in Kazanlak comprises of contrasting messages: the pink aesthetics is representing the beauty and the traditional means of local livelihood; however, the flashy pink ambience somewhat mutes the demands of the rose farmers, seen in the pieces of critical journalism. July Morning Festival has been largely deterritorialized from its original place to dispersed celebrations which do not recur the initial code of conduct. In the locality of Kamen bryag, however, the scent of wild nature and sea salt still reunites a few generations of like-minded people, mostly admirers of rock music and camping. The heart of the optics of Kurtovo Konare Fest are the village producers, eager to raise voices in defense of their production and generate a distinctive local ethos.

Geography. Anthropology. Recreation, Anthropology

Halaman 22 dari 52918