Hasil "Music and books on Music"

DOAJ Open Access 2026

Social Music as a Prescription for Maintaining Wellness

Joanne Loewy, Jon Batiste

Growing attention highlights the potential of music as a social prescription to enhance wellness. Explicating music’s function in communities can lead to healthy outcomes. During the COVID pandemic, communities tended to isolate, increasingly avoiding in-person interactions. Progress in music-based interventions highlights the potential of live music to improve our sense of community. The need for social prescribing to develop and maintain community brings us together. We propose a model that highlights how music serves as a social modality for maintaining wellness. As a musician and a music therapist, our focus includes analyses of historical contexts through time, and how through humanity’s struggles we’ve relied on music’s integrative elements to unite us, moving toward resilience as a quest to survive. Integrating these trajectories and expanding upon them, we elucidate ways that live engagement in music can strengthen performance and health and wellness. Music’s capacity to treat social aspects of humanity changes the way humanity works within community. This includes our sense of feeling connection and togetherness that feeds social willingness to perceive intimate relationships. In contexts where healthcare systems prioritize symptom management, we must realize that the larger picture includes how we socialize, thus Social Music is supported herein as an inclusive model of care.

Music, Psychology

Detail DOI Sumber

arXiv Open Access 2026

Video-based Music Generation

Serkan Sulun

As the volume of video content on the internet grows rapidly, finding a suitable soundtrack remains a significant challenge. This thesis presents EMSYNC (EMotion and SYNChronization), a fast, free, and automatic solution that generates music tailored to the input video, enabling content creators to enhance their productions without composing or licensing music. Our model creates music that is emotionally and rhythmically synchronized with the video. A core component of EMSYNC is a novel video emotion classifier. By leveraging pretrained deep neural networks for feature extraction and keeping them frozen while training only fusion layers, we reduce computational complexity while improving accuracy. We show the generalization abilities of our method by obtaining state-of-the-art results on Ekman-6 and MovieNet. Another key contribution is a large-scale, emotion-labeled MIDI dataset for affective music generation. We then present an emotion-based MIDI generator, the first to condition on continuous emotional values rather than discrete categories, enabling nuanced music generation aligned with complex emotional content. To enhance temporal synchronization, we introduce a novel temporal boundary conditioning method, called "boundary offset encodings," aligning musical chords with scene changes. Combining video emotion classification, emotion-based music generation, and temporal boundary conditioning, EMSYNC emerges as a fully automatic video-based music generator. User studies show that it consistently outperforms existing methods in terms of music richness, emotional alignment, temporal synchronization, and overall preference, setting a new state-of-the-art in video-based music generation.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2025

Multi Agents Semantic Emotion Aligned Music to Image Generation with Music Derived Captions

Junchang Shi, Gang Li

When people listen to music, they often experience rich visual imagery. We aim to externalize this inner imagery by generating images conditioned on music. We propose MESA MIG, a multi agent semantic and emotion aligned framework that first produces structured music captions and then refines them with cooperating agents specializing in scene, motion, style, color, and composition. In parallel, a Valence Arousal regression head predicts continuous affective states from music, while a CLIP based visual VA head estimates emotions from images. These components jointly enforce semantic and emotional alignment between music and synthesized images. Experiments on curated music image pairs show that MESA MIG outperforms caption only and single agent baselines in aesthetic quality, semantic consistency, and VA alignment, and achieves competitive emotion regression performance compared with state of the art music and image emotion models.

en cs.MM

Detail Sumber

arXiv Open Access 2025

Can Impressions of Music be Extracted from Thumbnail Images?

Takashi Harada, Takehiro Motomitsu, Katsuhiko Hayashi et al.

In recent years, there has been a notable increase in research on machine learning models for music retrieval and generation systems that are capable of taking natural language sentences as inputs. However, there is a scarcity of large-scale publicly available datasets, consisting of music data and their corresponding natural language descriptions known as music captions. In particular, non-musical information such as suitable situations for listening to a track and the emotions elicited upon listening is crucial for describing music. This type of information is underrepresented in existing music caption datasets due to the challenges associated with extracting it directly from music data. To address this issue, we propose a method for generating music caption data that incorporates non-musical aspects inferred from music thumbnail images, and validated the effectiveness of our approach through human evaluations. Additionally, we created a dataset with approximately 360,000 captions containing non-musical aspects. Leveraging this dataset, we trained a music retrieval model and demonstrated its effectiveness in music retrieval tasks through evaluation.

en cs.CL, cs.CV

Detail Sumber

arXiv Open Access 2025

Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening

Taketo Akama, Zhuohao Zhang, Tsukasa Nagashima et al.

Art has long played a profound role in shaping human emotion, cognition, and behavior. While visual arts such as painting and architecture have been studied through eye tracking, revealing distinct gaze patterns between experts and novices, analogous methods for auditory art forms remain underdeveloped. Music, despite being a pervasive component of modern life and culture, still lacks objective tools to quantify listeners' attention and perceptual focus during natural listening experiences. To our knowledge, this is the first attempt to decode selective attention to musical elements using naturalistic, studio-produced songs and a lightweight consumer-grade EEG device with only four electrodes. By analyzing neural responses during real world like music listening, we test whether decoding is feasible under conditions that minimize participant burden and preserve the authenticity of the musical experience. Our contributions are fourfold: (i) decoding music attention in real studio-produced songs, (ii) demonstrating feasibility with a four-channel consumer EEG, (iii) providing insights for music attention decoding, and (iv) demonstrating improved model ability over prior work. Our findings suggest that musical attention can be decoded not only for novel songs but also across new subjects, showing performance improvements compared to existing approaches under our tested conditions. These findings show that consumer-grade devices can reliably capture signals, and that neural decoding in music could be feasible in real-world settings. This paves the way for applications in education, personalized music technologies, and therapeutic interventions.

en q-bio.NC, cs.LG

Detail Sumber

arXiv Open Access 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation

Cheng Liu, Hui Wang, Jinghua Zhao et al.

The technology for generating music from textual descriptions has seen rapid advancements. However, evaluating text-to-music (TTM) systems remains a significant challenge, primarily due to the difficulty of balancing performance and cost with existing objective and subjective evaluation methods. In this paper, we propose an automatic assessment task for TTM models to align with human perception. To address the TTM evaluation challenges posed by the professional requirements of music evaluation and the complexity of the relationship between text and music, we collect MusicEval, the first generative music assessment dataset. This dataset contains 2,748 music clips generated by 31 advanced and widely used models in response to 384 text prompts, along with 13,740 ratings from 14 music experts. Furthermore, we design a CLAP-based assessment model built on this dataset, and our experimental results validate the feasibility of the proposed task, providing a valuable reference for future development in TTM evaluation. The dataset is available at https://www.aishelltech.com/AISHELL_7A.

en cs.SD, eess.AS

Detail Sumber

DOAJ Open Access 2024

Jan Maria Stęszewski (20 IV 1929–21 IX 2016)

Justyna Humięcka-Jakubowska

Literature on music, Music

Detail DOI Sumber

DOAJ Open Access 2024

Musical Performance in the Context of the Development of Contemporary Musical Art

Olena BATOVSKA, Natalia GREBENUK, Sergii KOSTOGRYZ et al.

The study aims to determine the specifics of musical performance in the context of the development of contemporary music, considering various musical genres. The methods of analysis, comparison, calculation of the Cronbach's coefficient, and Fisher criterion were used to achieve this goal. It has been established that the most characteristic elements of classical music are an academic approach to interpretation and the embodiment of artistic and aesthetic components. It has been proven that the characteristic features of contemporary music performance are improvisation (α=0.837), emotional expressiveness (α=00.823), and non-standard note combinations (0.819). The practical significance of the work lies in the possibility of using the established features of musical genres for their qualitative interpretation during the educational process.

Music

Detail DOI Sumber

arXiv Open Access 2024

Proceedings of the 6th International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza, Alexander Pacha, Elona Shatri

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 6th International Workshop on Reading Music Systems, held Online on November 22nd 2024.

en cs.CV, cs.IR

Detail Sumber

arXiv Open Access 2024

Large Language Models: From Notes to Musical Form

Lilac Atassi

While many topics of the learning-based approach to automated music generation are under active research, musical form is under-researched. In particular, recent methods based on deep learning models generate music that, at the largest time scale, lacks any structure. In practice, music longer than one minute generated by such models is either unpleasantly repetitive or directionless. Adapting a recent music generation model, this paper proposes a novel method to generate music with form. The experimental results show that the proposed method can generate 2.5-minute-long music that is considered as pleasant as the music used to train the model. The paper first reviews a recent music generation method based on language models (transformer architecture). We discuss why learning musical form by such models is infeasible. Then we discuss our proposed method and the experiments.

en cs.SD, eess.AS

Detail Sumber

DOAJ Open Access 2023

A Change of Plans and A New Venue of Possibility

Jordan Alan Fogle, Laurie Scott

The sensory friendly concert (SFC) represents an increasingly popular effort toward engaging the autism community in live music performances by promoting inclusive practices and offering specialized accommodations to counter what many consider the rigidity of concert etiquette. The authors explore academic and historical perspectives on SFCs and seek to highlight best practices for the design and facilitation of inclusive community music events in live and virtual settings. Drawing upon the experience of adapting a planned in-person protocol to the virtual setting, the authors explore benefits that extend far beyond the autism community. In addition to providing an environment in which self-expression, diversity, and community are celebrated, SFCs can serve as a transition-oriented therapeutic intervention aimed at promoting progress toward goals related to independent living and musical participation in the broader society, including school and community ensembles.

Music, Psychology

Detail DOI Sumber

arXiv Open Access 2023

Knowledge-based Multimodal Music Similarity

Andrea Poltronieri

Music similarity is an essential aspect of music retrieval, recommendation systems, and music analysis. Moreover, similarity is of vital interest for music experts, as it allows studying analogies and influences among composers and historical periods. Current approaches to musical similarity rely mainly on symbolic content, which can be expensive to produce and is not always readily available. Conversely, approaches using audio signals typically fail to provide any insight about the reasons behind the observed similarity. This research addresses the limitations of current approaches by focusing on the study of musical similarity using both symbolic and audio content. The aim of this research is to develop a fully explainable and interpretable system that can provide end-users with more control and understanding of music similarity and classification systems.

en cs.SD, cs.AI

Detail Sumber

arXiv Open Access 2023

Anticipatory Music Transformer

John Thickstun, David Hall, Chris Donahue et al.

We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.

en cs.SD, cs.LG

Detail Sumber

arXiv Open Access 2023

Generating Rhythm Game Music with Jukebox

Nicholas Yan

Music has always been thought of as a "human" endeavor -- when praising a piece of music, we emphasize the composer's creativity and the emotions the music invokes. Because music also heavily relies on patterns and repetition in the form of recurring melodic themes and chord progressions, artificial intelligence has increasingly been able to replicate music in a human-like fashion. This research investigated the capabilities of Jukebox, an open-source commercially available neural network, to accurately replicate two genres of music often found in rhythm games, artcore and orchestral. A Google Colab notebook provided the computational resources necessary to sample and extend a total of sixteen piano arrangements of both genres. A survey containing selected samples was distributed to a local youth orchestra to gauge people's perceptions of the musicality of AI and human-generated music. Even though humans preferred human-generated music, Jukebox's slightly high rating showed that it was somewhat capable at mimicking the styles of both genres. Despite limitations of Jukebox only using raw audio and a relatively small sample size, it shows promise for the future of AI as a collaborative tool in music production.

en cs.SD, eess.AS

Detail Sumber

arXiv Open Access 2023

Musical Form Generation

Lilac Atassi

While recent generative models can produce engaging music, their utility is limited. The variation in the music is often left to chance, resulting in compositions that lack structure. Pieces extending beyond a minute can become incoherent or repetitive. This paper introduces an approach for generating structured, arbitrarily long musical pieces. Central to this approach is the creation of musical segments using a conditional generative model, with transitions between these segments. The generation of prompts that determine the high-level composition is distinct from the creation of finer, lower-level details. A large language model is then used to suggest the musical form.

en cs.SD, cs.LG

Detail Sumber

DOAJ Open Access 2022

VOCAL DEPERSONALIZATION IN SCAT SINGING

Luiza ZAN, Stela DRĂGULIN

The purpose of this paper is to question the amount of personal investment in exploring the voice as an impersonal sound, in scat singing. Jazz singers and jazz voice teachers follow vocal practices that aim to control and distort the vocal timbre, to master microtonal intervals, to push and eventually overcome the voice’s limits. In scat singing, the boundaries of gender are subdued to the impulse of improvisation, thus, even though the timbre is a biological and a physical memory, influenced by the singer’s culture and experiences, the gender encoding can be reshaped inside the licks and patterns of the improvisation section. The current paper aims to prove that scat singing is the neutral ground where aspects of the voice can blend and disappear into one another: voice gender, vocal timber, technique, individual materiality, experimentation. REZUMAT. DEPERSONALIZAREA VOCALĂ ÎN IMPROVIZAŢIA VOCALĂ DE TIP SCAT. Scopul acestei lucrări este să verifice în ce măsură investim în explorarea vocii ca sunet impersonal, atunci când improvizăm în scat. Vocaliştii de jazz şi pedagogii de jazz vocal urmează tehnici vocale care urmăresc controlul asupra timbrului vocal, precum şi transformarea acestuia, cu scopul de a excela în controlul intervalelor microtonale, de a împinge şi a doborî limitele vocii. În improvizaţia de tip scat, limitele impuse de diferenţele de gen sunt subjugate impulsului improvizatoric, de aceea, deşi timbrul aparţine memoriei fizice şi biologice, şi este influenţat de cultura şi experienţele vocalistului, codificarea genului poate fi remodelată în cadrul motivelor scurte şi lungi din interiorul secţiunii improvizatorice. Prezentul articol îşi propune să dovedească faptul că improvizaţia de tip scat este terenul neutru, pe care diferitele aspecte ale vocii (genul vocal, timbrul vocal, tehnica vocală, materialitatea individuală, nevoia de experimentare) se pot întrepătrunde şi disipa unele între altele. Cuvinte cheie: scat, improvizație, jazz, vocalist

Music

Detail DOI Sumber

arXiv Open Access 2022

Video Background Music Generation: Dataset, Method and Evaluation

Le Zhuo, Zhaokai Wang, Baisen Wang et al.

Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a complete recipe including dataset, benchmark model, and evaluation metric for video background music generation. We present SymMV, a video and symbolic music dataset with various musical annotations. To the best of our knowledge, it is the first video-music dataset with rich musical annotations. We also propose a benchmark video background music generation framework named V-MusProd, which utilizes music priors of chords, melody, and accompaniment along with video-music relations of semantic, color, and motion features. To address the lack of objective metrics for video-music correspondence, we design a retrieval-based metric VMCP built upon a powerful video-music representation learning model. Experiments show that with our dataset, V-MusProd outperforms the state-of-the-art method in both music quality and correspondence with videos. We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation. Our dataset and code are available at https://github.com/zhuole1025/SymMV.

en cs.CV, cs.MM

Detail Sumber

arXiv Open Access 2022

On the Role of Visual Context in Enriching Music Representations

Kleanthis Avramidis, Shanti Stewart, Shrikanth Narayanan

Human perception and experience of music is highly context-dependent. Contextual variability contributes to differences in how we interpret and interact with music, challenging the design of robust models for information retrieval. Incorporating multimodal context from diverse sources provides a promising approach toward modeling this variability. Music presented in media such as movies and music videos provide rich multimodal context that modulates underlying human experiences. However, such context modeling is underexplored, as it requires large amounts of multimodal data along with relevant annotations. Self-supervised learning can help address these challenges by automatically extracting rich, high-level correspondences between different modalities, hence alleviating the need for fine-grained annotations at scale. In this study, we propose VCMR -- Video-Conditioned Music Representations, a contrastive learning framework that learns music representations from audio and the accompanying music videos. The contextual visual information enhances representations of music audio, as evaluated on the downstream task of music tagging. Experimental results show that the proposed framework can contribute additive robustness to audio representations and indicates to what extent musical elements are affected or determined by visual context.

en cs.SD, cs.MM

Detail Sumber

DOAJ Open Access 2021

Unfit for Subjection: Mental Illness, Mental Health, and the University Undercommons

Sarah Hankins

This colloquy, by graduate-student-led collective Project Spectrum, attempts to map out existing discussions around inclusion and equity in music academia, with a specific focus on identifying and analyzing the structures in academia that work against minoritized and historically excluded scholars. Sarah Hankins shares thoughts on mental illness, arguing that it is a gap in our discourse. Hankins asks us to bear witness to experiences of those who boldly declare that they are “unfit” for the pipeline—“unfit” to survive the pipeline, to have access to the pipeline, and for the so-called promises at the end of the pipeline. Following the work of Black studies, queer of color critique, Black radicalism, Afropessimism, and especially the writings of Stefano Harney and Fred Moten, Hankins’s intervention in this colloquy demands pause in academia’s system of perpetual motion.

Music and books on Music

Detail Sumber

DOAJ Open Access 2021

Evaluation of an E-Learning Tool for Augmented Acoustics in Music Education

Neva Klanjscek, Lisa David, Matthias Frank

Augmented Practice Room is an e-learning tool, developed by the project team, that allows music students to practice in different acoustical environments while remaining physically in their classroom or at home. Music teachers and students from violin, ‘cello, piano, clarinet, guitar, and pop-singing classes have collaborated in testing it for a semester and giving the authors continuous feedback. In this exploratory phase, we used methods such as group discussion and semi-structured diary, with the purpose to gather as many different perspectives and reactions from participants as possible. The analysis of the collected data showed that the tool was in general positively perceived and considered useful. In particular, results merged into a four-dimensional model that describes the impact of the tool on practice: musical expressiveness, level of attention or arousal, instrument-specific technical issues, and emotional state.

Music, Psychology

Detail DOI Sumber

Hasil untuk "Music and books on Music"