Tao Li, G. Tzanetakis
Hasil untuk "Music"
Menampilkan 20 dari ~1058345 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar
Don Redmond
Number theory, an abstract branch of mathematics that deals with relationships between whole numbers, has provided highly useful answers to numerous real-world problems. The author briefly reviews earlier uses of number theory and then examines recent applications to music, cryptography, and error-correction codes.>
Christopher Small
G. Gorn
Irmak Bukey, Zhepei Wang, Chris Donahue et al.
Music captioning, or the task of generating a natural language description of music, is useful for both music understanding and controllable music generation. Training captioning models, however, typically requires high-quality music caption data which is scarce compared to metadata (e.g., genre, mood, etc.). As a result, it is common to use large language models (LLMs) to synthesize captions from metadata to generate training data for captioning models, though this process imposes a fixed stylization and entangles factual information with natural language style. As a more direct approach, we propose metadata-based captioning. We train a metadata prediction model to infer detailed music metadata from audio and then convert it into expressive captions via pre-trained LLMs at inference time. Compared to a strong end-to-end baseline trained on LLM-generated captions derived from metadata, our method: (1) achieves comparable performance in less training time over end-to-end captioners, (2) offers flexibility to easily change stylization post-training, enabling output captions to be tailored to specific stylistic and quality requirements, and (3) can be prompted with audio and partial metadata to enable powerful metadata imputation or in-filling--a common task for organizing music data.
Xingjian Diao, Chunhui Zhang, Tingxuan Wu et al.
Music performances are representative scenarios for audio-visual modeling. Unlike common scenarios with sparse audio, music performances continuously involve dense audio signals throughout. While existing multimodal learning methods on the audio-video QA demonstrate impressive capabilities in general scenarios, they are incapable of dealing with fundamental problems within the music performances: they underexplore the interaction between the multimodal signals in performance and fail to consider the distinctive characteristics of instruments and music. Therefore, existing methods tend to answer questions regarding musical performances inaccurately. To bridge the above research gaps, (i) given the intricate multimodal interconnectivity inherent to music data, our primary backbone is designed to incorporate multimodal interactions within the context of music; (ii) to enable the model to learn music characteristics, we annotate and release rhythmic and music sources in the current music datasets; (iii) for time-aware audio-visual modeling, we align the model's music predictions with the temporal dimension. Our experiments show state-of-the-art effects on the Music AVQA datasets. Our code is available at https://github.com/xid32/Amuse.
Seonghyeon Go
As a result of continuous advances in Music Information Retrieval (MIR) technology, generating and distributing music has become more diverse and accessible. In this context, interest in music intellectual property protection is increasing to safeguard individual music copyrights. In this work, we propose a system for detecting music plagiarism by combining various MIR technologies. We developed a music segment transcription system that extracts musically meaningful segments from audio recordings to detect plagiarism across different musical formats. With this system, we compute similarity scores based on multiple musical features that can be evaluated through comprehensive musical analysis. Our approach demonstrated promising results in music plagiarism detection experiments, and the proposed method can be applied to real-world music scenarios. We also collected a Similar Music Pair (SMP) dataset for musical similarity research using real-world cases. The dataset are publicly available.
Daniel Chenyu Lin, Michael Freeman, John Thickstun
Music language models (Music LMs), like vision language models, leverage multimodal representations to answer natural language queries about musical audio recordings. Although Music LMs are reportedly improving, we find that current evaluations fail to capture whether their answers are correct. Specifically, for all Music LMs that we examine, widely-used evaluation metrics such as BLEU, METEOR, and BERTScore fail to measure anything beyond linguistic fluency of the model's responses. To measure the true performance of Music LMs, we propose (1) a better general-purpose evaluation metric for Music LMs adapted to the music domain and (2) a factual evaluation framework to quantify the correctness of a Music LM's responses. Our framework is agnostic to the modality of the question-answering model and could be generalized to quantify performance in other open-ended question-answering domains. We use open datasets in our experiments and will release all code on publication.
Sreyan Ghosh, Arushi Goel, Lasha Koroshinadze et al.
We introduce Music Flamingo, a novel large audio-language model designed to advance music (including song) understanding in foundational audio models. While audio-language research has progressed rapidly, music remains challenging due to its dynamic, layered, and information-dense nature. Progress has been further limited by the difficulty of scaling open audio understanding models, primarily because of the scarcity of high-quality music data and annotations. As a result, prior models are restricted to producing short, high-level captions, answering only surface-level questions, and showing limited generalization across diverse musical cultures. To address these challenges, we curate MF-Skills, a large-scale dataset labeled through a multi-stage pipeline that yields rich captions and question-answer pairs covering harmony, structure, timbre, lyrics, and cultural context. We fine-tune an enhanced Audio Flamingo 3 backbone on MF-Skills and further strengthen multiple skills relevant to music understanding. To improve the model's reasoning abilities, we introduce a post-training recipe: we first cold-start with MF-Think, a novel chain-of-thought dataset grounded in music theory, followed by GRPO-based reinforcement learning with custom rewards. Music Flamingo achieves state-of-the-art results across 10+ benchmarks for music understanding and reasoning, establishing itself as a generalist and musically intelligent audio-language model. Beyond strong empirical results, Music Flamingo sets a new standard for advanced music understanding by demonstrating how models can move from surface-level recognition toward layered, human-like perception of songs. We believe this work provides both a benchmark and a foundation for the community to build the next generation of models that engage with music as meaningfully as humans do.
Rebecca Lepping, Benjamin J. Hess, Jasmine M. Taylor et al.
Objectives/Goals: Research supports the use of music to improve the care and well-being of adults living with dementia; however, the practice and implementation of music in elder care communities is not regulated. The goal of this qualitative study was to survey elder care communities in Northeast Kansas to determine the use of music with people living with dementia. Methods/Study Population: We interviewed staff (n = 10) at five elder care communities in the Kansas City Metro area and observed musical activities and artifacts in shared living spaces within each community. Interview questions included details of the frequency and purpose of using music, who determined which music to use, and any effects, positive or negative, the interviewee believed to be associated with the use of music. Musical events, visiting musicians or music therapists leading group sing-alongs were observed at two communities, and music-related activities led by staff were observed at two others. Results/Anticipated Results: Music was used in some way at each of the five communities. Each location had recorded music available to residents in the shared living spaces, and most had a piano in the main lounge area. During the sing-along and music-related activities, residents were observed singing along to songs from memory, engaging with one another and the group leader and smiling. Staff employed by each community varied in their level of musical training and experience, from none to a full-time music therapist in residence. Staff interviewed said they believed music was helpful to aid memory recall, reduce anxiety, and to engage interest. Interestingly, a music therapist at one site also described how music during mealtimes created too much of a distraction for residents and interfered with dietary care. Discussion/Significance of Impact: It is clear from both the staff interviews and direct observations of musical activities that music is important to consider for people living with dementia in care communities. Guidelines for implementation and minimum standards would be helpful to ensure all care community residents can experience benefits highlighted by staff in this study.
Najmeh Pourgholamali, Zeynab Sharifi, Ali Salehi et al.
Background: Stress is defined as a state of mental or emotional strain. Given the calming effects of music and the emergence of a new genre known as three-dimensional (3D) music, this study aimed to evaluate the effectiveness of 3D music therapy in reducing dental anxiety in 6–7-year-old children.Methods and Materials: This applied, quasi-experimental study was conducted with a pre-test/post-test design. The sample was selected from 6-7 year-old children admitting to a private dental clinic in Rafsanjan, Iran. Prior to receiving any treatment, patients were evaluated in terms of dental anxiety using the Spence Children's Anxiety Scale. Children with relatively high or high levels of dental anxiety were included in the study. A total of 60 children were enrolled and were randomly divided into three equal groups (n=20): control, regular music and 3D music groups. All participants underwent the same dental treatment during their second session and pre- and post-treatment anxiety levels were measured for all. Intra- and inter-group comparisons were conducted. P-value<0.05 was considered statistically significant.Results: The three groups were homogeneous with respect to potential confounding variables such as the child’s age, parental age, education level, and occupation. According to Kruskal-Wallis test, there was no statistically significant difference between the study groups in terms of baseline anxiety levels (P=0.883). However, after the intervention, the mean anxiety score increased in the control group (2.90 ± 1.12 versus 3.45 ± 1.32), while it decreased in both the regular music (2.90 ± 1.45 versus 2.70 ± 1.45) and 3D music groups (3.05 ± 1.23 versus 2.95 ± 1.31).Conclusion: Given the reduction in anxiety levels observed in both the music and 3D music groups, the use of 3D music therapy can be recommended as an effective method to alleviate dental anxiety in pediatric patients.
Mayank Sanganeria, Rohan Gala
Recent AI-driven step-function advances in several longstanding problems in music technology are opening up new avenues to create the next generation of music education tools. Creating personalized, engaging, and effective learning experiences are continuously evolving challenges in music education. Here we present two case studies using such advances in music technology to address these challenges. In our first case study we showcase an application that uses Automatic Chord Recognition to generate personalized exercises from audio tracks, connecting traditional ear training with real-world musical contexts. In the second case study we prototype adaptive piano method books that use Automatic Music Transcription to generate exercises at different skill levels while retaining a close connection to musical interests. These applications demonstrate how recent AI developments can democratize access to high-quality music education and promote rich interaction with music in the age of generative AI. We hope this work inspires other efforts in the community, aimed at removing barriers to access to high-quality music education and fostering human participation in musical expression.
WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya et al.
We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across various downstream tasks including both understanding and generative tasks. We specifically evaluated this approach on representative tasks such as music tagging, music transcription, music source separation, and music mixing. Our results reveal that the features extracted from foundation models provide valuable enhancements in training downstream task models. This highlights the capability of using features extracted from music foundation models as a booster for downstream tasks. Our approach not only benefits existing task-specific models but also supports music downstream tasks constrained by data scarcity. This paves the way for more effective and accessible music processing solutions.
Meng Chen, Mohammad Mohammadi, Siros Izadpanah
This study aimed to comprehensively analyze Language Learning through Music on the academic achievement (AA), creative thinking (CT), and self-esteem (SE) of English as a Foreign Language (EFL) Learners. With the rapid progress of technology, there has been a growing interest in exploring innovative teaching methods that not only enhance learning outcomes but also actively engage students in the language learning process. However, the specific impact of technology-enhanced language learning through music (TELLTM) on these language learning outcomes has received limited attention in previous research. In 2023, a sample of 360 male elementary-level language learners was selected using a multiple-stage cluster sampling (MSCS) technique. The participants' homogeneity was assessed through the Oxford Quick Placement Test (OQPT), administered following a random sampling procedure. Data collection involved the administration of three questionnaires: The academic achievement questionnaire, the self-esteem questionnaire, and the creative thinking questionnaire. The findings of the study analyzed through descriptive and inferential statistics, revealed a significant positive impact of TELLTM on the AA, CT, and SE of EFL learners. These results have important implications for educators, curriculum developers, and policymakers, providing valuable insights into the incorporation of TELLTM into English language instruction. The use of these three questionnaires provided valuable insights into the effectiveness of TELLTM in enhancing various aspects of language learning. These findings underscore the importance of incorporating music into language instruction and offer practical guidance for educators seeking to improve their teaching practices.
J. Nattiez
Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
In this paper, we study whether music source separation can be used as a pre-training strategy for music representation learning, targeted at music classification tasks. To this end, we first pre-train U-Net networks under various music source separation objectives, such as the isolation of vocal or instrumental sources from a musical piece; afterwards, we attach a classification network to the pre-trained U-Net and jointly finetune the whole network. The features learned by the separation network are also propagated to the tail network through a convolutional feature adaptation module. Experimental results in two widely used and publicly available datasets indicate that pre-training the U-Nets with a music source separation objective can improve performance compared to both training the whole network from scratch and using the tail network as a standalone in two music classification tasks, music auto-tagging and music genre classification. We also show that our proposed framework can be successfully integrated into both convolutional and Transformer-based backends, highlighting its modularity.
Stefan Lattner, Javier Nistal
Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.
Zhihuan Kuang, Shi Zong, Jianbing Zhang et al.
In this paper, we consider a novel research problem: music-to-text synaesthesia. Different from the classical music tagging problem that classifies a music recording into pre-defined categories, music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding. As existing music-related datasets do not contain the semantic descriptions on music recordings, we collect a new dataset that contains 1,955 aligned pairs of classical music recordings and text descriptions. Based on this, we build a computational model to generate sentences that can describe the content of the music recording. To tackle the highly non-discriminative classical music, we design a group topology-preservation loss, which considers more samples as a group reference and preserves the relative topology among different samples. Extensive experimental results qualitatively and quantitatively demonstrate the effectiveness of our proposed model over five heuristics or pre-trained competitive methods and their variants on our collected dataset.
Lubov Ivanovna Gubareva, Andrey Gorgon'yevich Soloviev, Helen Victorovna Agarkova et al.
The study involved 38 students of the music school and Music College, has successfully played the piano. The control group consisted of 48 students of secondary school and university, which is not practice music. It was found that in the process of learning to play the piano in a greater degree developed a sense of harmony frets and a sense of altitude sound, working-out and originality creative thinking, especially among boys. This causes a higher occupational success and social adjustment of man-instrumentalist comparatively with women. The method of computer chrono reflexometry proved that the physiological basis of musical talent and creative thinking serve the higher lability and excitability of the central nervous system, a higher level of activation and reliability of its operation.
Alena D. Verin-Galitskaya
The beginnings of opera history are usually associated with Euridice by Ottavio Rinuccini and Jacopo Peri as the first staged and preserved example of the genre. Not many people know that Euridice was by no means the main event during the wedding celebrations in honour of Maria de’ Medici and King Henry IV of France in 1600, to which the opera was timed. The audience was much more drawn to the opera Il rapimento di Cefalo (The Abduction of Cephalus) by Giulio Caccini, a direct rival of Peri. As the music has not completely survived, we know about Il rapimento di Cefalo mainly from the reviews of contemporaries. Historical materials allow us to recreate the genesis of the opera, which is inseparable from the history of opera as a genre. The reported study focuses on personal ambitions, court intrigues, and the rivalry between the Florentines and Emilio de Cavalieri. It also explores similar other factors without which the genre of opera would have taken a different historical path. Besides, the article describes the political and cultural landscape at the court of Ferdinando de’ Medici. The history of Caccini’s opera is analyzed against the general backdrop of Florentine musical art of the last quarter of the 16th century.
Halaman 19 dari 52918