Existing research presents a mixed picture on music therapists’ preparedness to work effectively with LGBTQIA+ clients, highlighting deficits in training and a lack of insight on the part of some music therapists into what LGBTQIA+ inclusive practice actually entails. Alongside this, there is a growing literature on clinical practice with and interventions for LGBTQIA+ clients; however, there is an absence of research exploring directlywith members of LGBTQIA+ communities their experiences and perceptions of music therapy. The current study aims both to expand the limited literature exploring music therapists’ preparedness to work with LGBTQIA+ clients, and to begin to explore LGBTQIA+ people’s perceptions of music therapy. It does so through use of the novel, creative method of story completion (SC)—participants were given two (of four) “story stems” based on a hypothetical implied first therapy session scenario involving a trans or queer client or therapist and asked to complete them. Forty-six participants (20 trainee/qualified music therapists [nineof whom identified as LGBTQIA+]; 23 LGBTQIA+ people; threeno demographic data) wrote a total of 87 stories. Reflexive thematic analysis was used to develop three themes: 1) disclosure in therapy is important for the therapeutic relationship and the client; 2) effective therapists are non-judgmental and inclusive; and 3) shared identity matters. The analysis suggests a lack of knowledge of LGBTQIA+ communities and inclusive practice on the part of straight and cisgender music therapists, alongside an aspirational commitment to an open and non-judgmental approach. The stories written by LGBTQIA+ participants recognised the potential for prejudicial treatment—these participants framed openness as an ethical imperative.
Acknowledgements
We would like to thank all the participants for their contributions in this research.
Funding
There was no funding for this research.
Sarmistha Sarna Gomasta, Mahmood Jasim, Hossein Hadisi
et al.
Data videos have become a prominent vessel for communicating data to broad audiences, and a common object of study in information visualization. Many of these videos include music, yet the impact of music on how people experience data videos remains largely unexplored. We conducted a preregistered study into the effect of music across three dimensions: persuasion, engagement, and emotion. We showed online participants an existing data video (1) without any music, (2) with its generic default music, and (3) with custom music designed by a professional composer. We found that the default music helped make the data video more persuasive. However, the effects of custom music were more mixed, and we did not find that music increased engagement. In addition, and contrary to our expectations, our participants reported more intense emotions without music. Our study contributes new insights into the intersection of music and data visualization and is a first step toward guiding designers in creating impactful data-driven narratives.
Abstract Sample-based music—characterized by the adoption of extant audio fragments (sampling) in its creation process—plays an essential role in contemporary popular music, fostering inter-generational connections between the creators that have resulted in a rich and diverse sonic landscape. The selection, manipulation, and adoption of samples heavily impact the genre, mood, texture, and finally the distinctive identity of a new musical composition. One could call the samples “cultural genes” of sorts, continually incorporated into new music to contribute to its characteristics. One can then ask how this process has taken place in history and shaped the history of contemporary popular music, which we study in this work. We specifically study the evolution of sample-based music between the years 1980 and 2019, taking cue from the citation network analysis of academic literature that intuitively follows a similar dynamic of the flow of ideas and material from the old works into new ones. First, the community structure in the artist–sample network is identified, and its relationship to distinctive musical styles and flavors is verified. A longitudinal analysis of the passing down of musical styles is then performed based on similarity between communities of distinct eras, identifying continuous temporal developments in music as well as instances of the revival of styles dormant across multiple generations, akin to “genetic atavism.” This study demonstrates the complex nature of cultural evolution using a network framework that is also generally applicable to other creative enterprises.
Computer applications to medicine. Medical informatics
This study explores strategies to expand early-music dissemination beyond traditional audiences, focusing on festivals. It analyses attendance barriers, engagement practices and policies through literature, interviews and case studies. Findings highlight cross-disciplinary collaboration, digital tools and education, offering recommendations to enhance participation while preserving artistic integrity across multiple areas of action.
In order to systematically interpret the brocade with lion and playing music pattern on purple background in Tang Dynasty, which is now treasured in the Shosoin Temple in Japan, explain its theoretical significance and artistic value.The background and pattern of the brocade fragments were comprehensively analyzed through literature research, field study, and other methods in the article, and it was concluded that the brocade with lion and playing music pattern on purple background was produced during the flourishing Tang Dynasty period;the pattern is based on lions, plants, and figures. The overall lion-centered design features, on each side, three music-playing figures wearing Tang Dynasty round-necked, wide-sleeved robes and Hu-style clothing, with Futou official-like scarf. The figures on the lion's left hold Pipa, waist drum, and Bili three instruments, while those on the right play Paixiao, copper cymbals, and vertical Konghou three instruments, the periphery is connected by peony scroll grass patterns; and organized in a longitudinal two-dimensional continuous form.The pattern of this fragment is different from the brocade patterns of other subjects, and the overall content of the picture shows a rare and unique form of composition in Tang Dynasty, which reflects the high artistic skill of Tang Dynasty brocade in designing complex patterns in realistic subjects. The realistic theme of this brocade pattern is an important evidence to restore the social life of the Tang Dynasty, and its research results will provide a reference for the genealogy of Tang Dynasty patterns in China.
Materials of engineering and construction. Mechanics of materials, Environmental engineering
Until the 1970s, women in the Novi Pazar region partied separately from men. On that occasion, in accordance with the customs of this environment, they could only be cheered by women who played the def [tambourine]. This instrument was accompanied by singing and playing during posedak/poijelo [evening visit to friends], džumbus [party], krna [a party where a girl prepares for marriage by painting parts of her body]. This form of music disappeared over time, but in the last twenty years, it has been revived and today it is an essential element of socalled bachelorette parties. Contemporary scientific findings indicate that the contents of the culture in which the student grows up have a significant place in teaching. Since women's playing on the def in the Novi Pazar region is little known to the public and has not been evaluated as a teaching content to date, we decided to analyze the structure of the def, the way and forms of music playing in the context of customs, and point out its methodological applicability through a theoretical analysis of the relevant literature. The aim of the work is to improve the teaching of musical culture in Serbia and preserve this element of intangible culture. The descriptive method and content analysis technique were used in the research. In the paper, it was established that the singing and dancing of women in the Novi Pazar area accompanied by def is a methodically applicable content in the area of listening to music in all grades of primary school, while in the fields of man and music and musical instruments is applicable in the fifth grade of elementary school. In accordance with the goals of the program and modern trends - content integration and intercultural education - teachers were given specific guidelines for processing this music content. Namely, singing and dancing of women accompanied by def can be integrated within the subject (between areas), but also outside the subject Music - with the contents of Nature and society (the way of life of people and their customs in the region where the student lives) and the Language (folk songs). Also, through the processing of this content in areas where the majority of Serbs and/or other national minorities live, students, in addition to mastering musical issues, would get to know each other and be guided to understand the culture of Bosniaks living in Serbia, which leads to intercultural education. We hope that with this work we will improve the teaching of musica in Serbia and contribute to the preservation of this form of music.
Music induced painting is a unique artistic practice, where visual artworks are created under the influence of music. Evaluating whether a painting faithfully reflects the music that inspired it poses a challenging perceptual assessment task. Existing methods primarily rely on emotion recognition models to assess the similarity between music and painting, but such models introduce considerable noise and overlook broader perceptual cues beyond emotion. To address these limitations, we propose a novel framework for music induced painting assessment that directly models perceptual coherence between music and visual art. We introduce MPD, the first large scale dataset of music painting pairs annotated by domain experts based on perceptual coherence. To better handle ambiguous cases, we further collect pairwise preference annotations. Building on this dataset, we present MPJudge, a model that integrates music features into a visual encoder via a modulation based fusion mechanism. To effectively learn from ambiguous cases, we adopt Direct Preference Optimization for training. Extensive experiments demonstrate that our method outperforms existing approaches. Qualitative results further show that our model more accurately identifies music relevant regions in paintings.
Mehul Agarwal, Gauri Agarwal, Santiago Benoit
et al.
Music is a deeply personal experience and our aim is to enhance this with a fully-automated pipeline for personalized music video generation. Our work allows listeners to not just be consumers but co-creators in the music video generation process by creating personalized, consistent and context-driven visuals based on lyrics, rhythm and emotion in the music. The pipeline combines multimodal translation and generation techniques and utilizes low-rank adaptation on listeners' images to create immersive music videos that reflect both the music and the individual. To ensure the ethical use of users' identity, we also introduce CHARCHA (patent pending), a facial identity verification protocol that protects people against unauthorized use of their face while at the same time collecting authorized images from users for personalizing their videos. This paper thus provides a secure and innovative framework for creating deeply personalized music videos.
Our study has investigated the effect of music on the experience of viewing art, investigating the factors which create a sense of connectivity between the two forms. We worked with 138 participants, and included multiple choice and open-ended questions. For the latter, we performed both a qualitative analysis and also sentiment analysis using text-mining. We investigated the relationship between the user experience and the emotions in the artwork and music. We found that, besides emotion, theme, story, and to a lesser extent music tempo were factors which helped form connections between artwork and music. Overall, participants rated the music as being helpful in developing an appreciation of the art. We propose guidelines for using music to enhance the experience of viewing art, and we propose directions for future research.
Learning music theory not only has practical benefits for musicians to write, perform, understand, and express music better, but also for both non-musicians to improve critical thinking, math analytical skills, and music appreciation. However, current external tools applicable for learning music theory through writing when human instruction is unavailable are either limited in feedback, lacking a written modality, or assuming already strong familiarity of music theory concepts. In this paper, we describe Maestoso, an educational tool for novice learners to learn music theory through sketching practice of quizzed music structures. Maestoso first automatically recognizes students' sketched input of quizzed concepts, then relies on existing sketch and gesture recognition techniques to automatically recognize the input, and finally generates instructor-emulated feedback. From our evaluations, we demonstrate that Maestoso performs reasonably well on recognizing music structure elements and that novice students can comfortably grasp introductory music theory in a single session.
The term “differentiable digital signal processing” describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music and speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably, which is further supported by a web book containing practical advice on differentiable synthesiser programming (https://intro2ddsp.github.io/). Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.
While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.
This paper explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation, termed as FluxMusic. Generally, along with design in advanced Flux\footnote{https://github.com/black-forest-labs/flux} model, we transfers it into a latent VAE space of mel-spectrum. It involves first applying a sequence of independent attention to the double text-music stream, followed by a stacked single music stream for denoised patch prediction. We employ multiple pre-trained text encoders to sufficiently capture caption semantic information as well as inference flexibility. In between, coarse textual information, in conjunction with time step embeddings, is utilized in a modulation mechanism, while fine-grained textual details are concatenated with the music patch sequence as inputs. Through an in-depth study, we demonstrate that rectified flow training with an optimized architecture significantly outperforms established diffusion methods for the text-to-music task, as evidenced by various automatic metrics and human preference evaluations. Our experimental data, code, and model weights are made publicly available at: \url{https://github.com/feizc/FluxMusic}.
Content creators often use music to enhance their videos, from soundtracks in movies to background music in video blogs and social media content. However, identifying the best music for a video can be a difficult and time-consuming task. To address this challenge, we propose a novel framework for automatically retrieving a matching music clip for a given video, and vice versa. Our approach leverages annotated music labels, as well as the inherent artistic correspondence between visual and music elements. Distinct from previous cross-modal music retrieval works, our method combines both self-supervised and supervised training objectives. We use self-supervised and label-supervised contrastive learning to train a joint embedding space between music and video. We show the effectiveness of our approach by using music genre labels for the supervised training component, and our framework can be generalized to other music annotations (e.g., emotion, instrument, etc.). Furthermore, our method enables fine-grained control over how much the retrieval process focuses on self-supervised vs. label information at inference time. We evaluate the learned embeddings through a variety of video-to-music and music-to-video retrieval tasks. Our experiments show that the proposed approach successfully combines self-supervised and supervised objectives and is effective for controllable music-video retrieval.
AI systems for high quality music generation typically rely on extremely large musical datasets to train the AI models. This creates barriers to generating music beyond the genres represented in dominant datasets such as Western Classical music or pop music. We undertook a 4 month international research project summarised in this paper to explore the eXplainable AI (XAI) challenges and opportunities associated with reducing barriers to using marginalised genres of music with AI models. XAI opportunities identified included topics of improving transparency and control of AI models, explaining the ethics and bias of AI models, fine tuning large models with small datasets to reduce bias, and explaining style-transfer opportunities with AI models. Participants in the research emphasised that whilst it is hard to work with small datasets such as marginalised music and AI, such approaches strengthen cultural representation of underrepresented cultures and contribute to addressing issues of bias of deep learning models. We are now building on this project to bring together a global International Responsible AI Music community and invite people to join our network.
Kristina Matrosova, Lilian Marey, Guillaume Salha-Galvan
et al.
This paper examines the influence of recommender systems on local music representation, discussing prior findings from an empirical study on the LFM-2b public dataset. This prior study argued that different recommender systems exhibit algorithmic biases shifting music consumption either towards or against local content. However, LFM-2b users do not reflect the diverse audience of music streaming services. To assess the robustness of this study's conclusions, we conduct a comparative analysis using proprietary listening data from a global music streaming service, which we publicly release alongside this paper. We observe significant differences in local music consumption patterns between our dataset and LFM-2b, suggesting that caution should be exercised when drawing conclusions on local music based solely on LFM-2b. Moreover, we show that the algorithmic biases exhibited in the original work vary in our dataset, and that several unexplored model parameters can significantly influence these biases and affect the study's conclusion on both datasets. Finally, we discuss the complexity of accurately labeling local music, emphasizing the risk of misleading conclusions due to unreliable, biased, or incomplete labels. To encourage further research and ensure reproducibility, we have publicly shared our dataset and code.
The majority of recent progress in Optical Music Recognition (OMR) has been achieved with Deep Learning methods, especially models following the end-to-end paradigm, reading input images and producing a linear sequence of tokens. Unfortunately, many music scores, especially piano music, cannot be easily converted to a linear sequence. This has led OMR researchers to use custom linearized encodings, instead of broadly accepted structured formats for music notation. Their diversity makes it difficult to compare the performance of OMR systems directly. To bring recent OMR model progress closer to useful results: (a) We define a sequential format called Linearized MusicXML, allowing to train an end-to-end model directly and maintaining close cohesion and compatibility with the industry-standard MusicXML format. (b) We create a dev and test set for benchmarking typeset OMR with MusicXML ground truth based on the OpenScore Lieder corpus. They contain 1,438 and 1,493 pianoform systems, each with an image from IMSLP. (c) We train and fine-tune an end-to-end model to serve as a baseline on the dataset and employ the TEDn metric to evaluate the model. We also test our model against the recently published synthetic pianoform dataset GrandStaff and surpass the state-of-the-art results.
This paper delves into the intersection of computational theory and music, examining the concept of undecidability and its significant, yet overlooked, implications within the realm of modern music composition and production. It posits that undecidability, a principle traditionally associated with theoretical computer science, extends its relevance to the music industry. The study adopts a multidimensional approach, focusing on five key areas: (1) the Turing completeness of Ableton, a widely used digital audio workstation, (2) the undecidability of satisfiability in sound creation utilizing an array of effects, (3) the undecidability of constraints on polymeters in musical compositions, (4) the undecidability of satisfiability in just intonation harmony constraints, and (5) the undecidability of "new ordering systems". In addition to providing theoretical proof for these assertions, the paper elucidates the practical relevance of these concepts for practitioners outside the field of theoretical computer science. The ultimate aim is to foster a new understanding of undecidability in music, highlighting its broader applicability and potential to influence contemporary computer-assisted (and traditional) music making.
Music has been commonly recognized as a means of expressing emotions. In this sense, an intense debate emerges from the need to verbalize musical emotions. This concern seems highly relevant today, considering the exponential growth of natural language processing using deep learning models where it is possible to prompt semantic propositions to generate music automatically. This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions. To address this topic, we propose a historical perspective that encompasses the different disciplines and methods contributing to this topic. In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models. Of note are the deep learning architectures that aim to generate high-fidelity music from textual descriptions. These models raise fundamental questions about the expressivity of music, including whether emotions can be represented with words or expressed through them. We conclude that overcoming the limitation and ambiguity of language to express emotions through music, some of the use of deep learning with natural language has the potential to impact the creative industries by providing powerful tools to prompt and generate new musical works.
In this chapter, I will explore ‘black portraitures’ in Teju Cole’s writings, photos, and art history lessons, while ‘following’ his journeying – geographic, literary, photographic, digital – in both his photo essays and criticism Known and Strange Things (2016), Blind Spot (2016), in his novel Open City (2011) and his latest essay collection Black Paper (2021). I intend to study his poetics, his aesthetics and his ethical stance, particularly in relation to his re-formulation of postcolonial paradigms. Intersecting trajectories with works by Caryl Phillips (The European Tribe, 1987) and by Johny Pitts (Afropean. Notes from Black Europe, 2019) will also be considered.