M.-S. Wolff
Hasil untuk "Musical instruction and study"
Menampilkan 20 dari ~6493215 hasil · dari DOAJ, arXiv, Semantic Scholar, CrossRef
Katharina Ockl, Inga Glogger-Frey
Anna Kruspe
Large Language Models (LLMs) reflect the biases in their training data and, by extension, those of the people who created this training data. Detecting, analyzing, and mitigating such biases is becoming a focus of research. One type of bias that has been understudied so far are geocultural biases. Those can be caused by an imbalance in the representation of different geographic regions and cultures in the training data, but also by value judgments contained therein. In this paper, we make a first step towards analyzing musical biases in LLMs, particularly ChatGPT and Mixtral. We conduct two experiments. In the first, we prompt LLMs to provide lists of the "Top 100" musical contributors of various categories and analyze their countries of origin. In the second experiment, we ask the LLMs to numerically rate various aspects of the musical cultures of different countries. Our results indicate a strong preference of the LLMs for Western music cultures in both experiments.
Or Tal, Felix Kreuk, Yossi Adi
Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord progressions. State-of-the-art (SOTA) systems differ significantly in many dimensions, such as training datasets, modeling paradigms, and architectural choices. This diversity complicates efforts to evaluate models fairly and identify which design choices influence performance the most. While factors like data and architecture are important, in this study we focus exclusively on the modeling paradigm. We conduct a systematic empirical analysis to isolate its effects, offering insights into associated trade-offs and emergent behaviors that can guide future text-to-music generation systems. Specifically, we compare the two arguably most common modeling paradigms: auto-regressive decoding and conditional flow-matching. We conduct a controlled comparison by training all models from scratch using identical datasets, training configurations, and similar backbone architectures. Performance is evaluated across multiple axes, including generation quality, robustness to inference configurations, scalability, adherence to both textual and temporally aligned conditioning, and editing capabilities in the form of audio inpainting. This comparative study sheds light on distinct strengths and limitations of each paradigm, providing actionable insights that can inform future architectural and training decisions in the evolving landscape of text-to-music generation. Audio sampled examples are available at: https://huggingface.co/spaces/ortal1602/ARvsFM
Jingde Cheng, Gennaro Notomista
This paper proposes a novel control framework for robotic swarms capable of turning a musical input into a painting. The approach connects the two artistic domains, music and painting, leveraging their respective connections to fundamental emotions. The robotic units of the swarm are controlled in a coordinated fashion using a heterogeneous coverage policy to control the motion of the robots which continuously release traces of color in the environment. The results of extensive simulations performed starting from different musical inputs and with different color equipments are reported. Finally, the proposed framework has been implemented on real robots equipped with LED lights and capable of light-painting.
Yuanchao Li, Azalea Gui, Dimitra Emmanouilidou et al.
The complex nature of musical emotion introduces inherent bias in both recognition and generation, particularly when relying on a single audio encoder, emotion classifier, or evaluation metric. In this work, we conduct a study on Music Emotion Recognition (MER) and Emotional Music Generation (EMG), employing diverse audio encoders alongside Frechet Audio Distance (FAD), a reference-free evaluation metric. Our study begins with a benchmark evaluation of MER, highlighting the limitations of using a single audio encoder and the disparities observed across different measurements. We then propose assessing MER performance using FAD derived from multiple encoders to provide a more objective measure of musical emotion. Furthermore, we introduce an enhanced EMG approach designed to improve both the variability and prominence of generated musical emotion, thereby enhancing its realism. Additionally, we investigate the differences in realism between the emotions conveyed in real and synthetic music, comparing our EMG model against two baseline models. Experimental results underscore the issue of emotion bias in both MER and EMG and demonstrate the potential of using FAD and diverse audio encoders to evaluate musical emotion more objectively and effectively.
Shaoxiong Ji, Pinzhen Chen
Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages' genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.
Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee et al.
One of the key strengths of Large Language Models (LLMs) is their ability to interact with humans by generating appropriate responses to given instructions. This ability, known as instruction-following capability, has established a foundation for the use of LLMs across various fields and serves as a crucial metric for evaluating their performance. While numerous evaluation benchmarks have been developed, most focus solely on clear and coherent instructions. However, we have noted that LLMs can become easily distracted by instruction-formatted statements, which may lead to an oversight of their instruction comprehension skills. To address this issue, we introduce the Intention of Instruction (IoInst) benchmark. This benchmark evaluates LLMs' capacity to remain focused and understand instructions without being misled by extraneous instructions. The primary objective of this benchmark is to identify the appropriate instruction that accurately guides the generation of a given context. Our findings suggest that even recently introduced state-of-the-art models still lack instruction understanding capability. Along with the proposition of IoInst in this study, we also present broad analyses of the several strategies potentially applicable to IoInst.
Wenjun Li, Ying Cai, Ziyang Wu et al.
Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally. A better understanding of music can enhance our emotions, cognitive skills, and cultural connections. The rapid advancement of artificial intelligence (AI) has introduced new ways to analyze music, aiming to replicate human understanding of music and provide related services. While the traditional models focused on audio features and simple tasks, the recent development of large language models (LLMs) and foundation models (FMs), which excel in various fields by integrating semantic information and demonstrating strong reasoning abilities, could capture complex musical features and patterns, integrate music with language and incorporate rich musical, emotional and psychological knowledge. Therefore, they have the potential in handling complex music understanding tasks from a semantic perspective, producing outputs closer to human perception. This work, to our best knowledge, is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities. We also discussed their limitations and proposed possible future directions, offering insights for researchers in this field.
Yikai Qian, Tianle Wang, Xinyi Tong et al.
In addressing the challenge of interpretability and generalizability of artificial music intelligence, this paper introduces a novel symbolic representation that amalgamates both explicit and implicit musical information across diverse traditions and granularities. Utilizing a hierarchical and-or graph representation, the model employs nodes and edges to encapsulate a broad spectrum of musical elements, including structures, textures, rhythms, and harmonies. This hierarchical approach expands the representability across various scales of music. This representation serves as the foundation for an energy-based model, uniquely tailored to learn musical concepts through a flexible algorithm framework relying on the minimax entropy principle. Utilizing an adapted Metropolis-Hastings sampling technique, the model enables fine-grained control over music generation. A comprehensive empirical evaluation, contrasting this novel approach with existing methodologies, manifests considerable advancements in interpretability and controllability. This study marks a substantial contribution to the fields of music analysis, composition, and computational musicology.
Dziadek Magdalena
The aim of the paper is to situate the most important open-air venues where music was performed on the map of interwar Warsaw. This includes venues in city parks, restaurant tea gardens, streets and squares where mass celebrations and demonstrations took place, as well as the courtyards of tenement houses frequented by street players and singers. In addition to live music, that coming from the radio, gramophone records, as well as megaphones installed in parks and streets has been taken into account. On the basis of press reports, taken mainly from the Kurier Warszawski (Warsaw's largest daily newspaper), as well as works of fiction and diaries, the repertoire of works performed in the open air has been reconstructed. The organisers and performers of concerts, open-air shows, and street marches during which music was performed have also been listed. The material is divided into contexts connected with the everyday life of Warsaw's bourgeoisie and working class (during which music, functioning independently or as part of theatrical, cabaret and film performances, functioned as entertainment) and those belonging to the official-public or ceremonial sphere (celebrations of religious and national holidays, military parades, spontaneous demonstrations of supporters of various political groups). Special emphasis has been placed on recreating the social and political context of the latter type of events, which highlight the role of interwar Warsaw as the capital of the state.
Lorenzo Bianconi, Antonio Caroccia, Giovanni Giuriati et al.
This paper illustrates the important contribution of Italian university musicology to the training of future music educators, emphasizing the formative role of the history of music (or musics) not only for future musicians or musicologists, but for the training of all students. It also highlights some problematic aspects of the current regulatory framework for the teaching of music history.
Sandy Manolios, Catholijn M. Jonker, Cynthia C. S. Liem
The present work is part of a research line seeking to uncover the mysteries of what lies behind people's musical preferences in order to provide better music recommendations. More specifically, it takes the angle of personal values. Personal values are what we as people strive for, and are a popular tool in marketing research to understand customer preferences for certain types of product. Therefore, it makes sense to explore their usefulness in the music domain. Based on a previous qualitative work using the Means-End theory, we designed a survey in an attempt to more quantitatively approach the relationship between personal values and musical preferences. We support our approach with a simulation study as a tool to improve the experimental procedure and decisions.
Gabriela Barenboim, Luigi Del Debbio, Johannes Hirn et al.
We use Google's MusicVAE, a Variational Auto-Encoder with a 512-dimensional latent space to represent a few bars of music, and organize the latent dimensions according to their relevance in describing music. We find that, on average, most latent neurons remain silent when fed real music tracks: we call these "noise" neurons. The remaining few dozens of latent neurons that do fire are called "music neurons". We ask which neurons carry the musical information and what kind of musical information they encode, namely something that can be identified as pitch, rhythm or melody. We find that most of the information about pitch and rhythm is encoded in the first few music neurons: the neural network has thus constructed a couple of variables that non-linearly encode many human-defined variables used to describe pitch and rhythm. The concept of melody only seems to show up in independent neurons for longer sequences of music.
Serap Bastepe-Gray, Lavinia Wainwright, Diane C. Lanham et al.
Playing musical instruments may have positive effects on motor, emotional, and cognitive deficits in patients with Parkinson’s disease (PD). This pilot study examined the feasibility of a six-week nontraditional guitar instruction program for individuals with PD. Twenty-six participants with idiopathic PD (Age: 67.22 ± 8.07; 17 males) were randomly assigned to two groups (intervention first or 6 weeks of usual care control exposure) with stepwise exposure to the guitar intervention condition with cross-over at six weeks. Outcomes were assessed at baseline, 6, 12, and 18 weeks. Twenty-four participants completed the study. Combined analysis of the groups showed significant BDI-II improvement immediately after intervention completion (3.04 points, 95% CI [−5.2, −0.9], p=0.04). PDQ-39 total quality of life scores improved from baseline to immediately postintervention 5.19 points (95% CI [−9.4, −1.0]) at trend significance (corrected p=0.07). For Group 1 (exposed to the intervention first), MDS-UPDRS total scores improved by a mean of 8.04 points (95% CI [−12.4, −3.7], p=0.004) and remained improved at 12 weeks by 10.37 points (95% CI [−14.7, −6.0], p<0.001). This group also had significant improvements in mood and depression at weeks 6 and 12, remaining significant at week 18 (BDI-II: 3.75, 95% CI [−5.8, −1.7], p=0.004; NeuroQoL-depression: 10.6, 95% CI [−4.9. −1.4], p=0.004), and in anxiety at week 6 and week 18 (NeuroQoL; 4.42, 95% CI [−6.8, −2.1], p=0.004; 3.58, 95% CI [−5.9, −1.2], p=0.02, respectively). We found clinically and statistically significant improvements in mood/anxiety after 6 weeks of group guitar classes in individuals with PD. Group guitar classes can be a feasible intervention in PD and may improve mood, anxiety, and quality of life.
Simon Colton, Maria Teresa Llano, Rose Hepworth et al.
During 2015 and early 2016, the cultural application of Computational Creativity research and practice took a big leap forward, with a project where multiple computational systems were used to provide advice and material for a new musical theatre production. Billed as the world's first 'computer musical... conceived by computer and substantially crafted by computer', Beyond The Fence was staged in the Arts Theatre in London's West End during February and March of 2016. Various computational approaches to analytical and generative sub-projects were used to bring about the musical, and these efforts were recorded in two 1-hour documentary films made by Wingspan Productions, which were aired on SkyArts under the title Computer Says Show. We provide details here of the project conception and execution, including details of the systems which took on some of the creative responsibility in writing the musical, and the contributions they made. We also provide details of the impact of the project, including a perspective from the two (human) writers with overall control of the creative aspects the musical.
Yuka Hashizume, Li Li, Tomoki Toda
The criteria for measuring music similarity are important for developing a flexible music recommendation system. Some data-driven methods have been proposed to calculate music similarity from only music signals, such as metric learning based on a triplet loss using tag information on each musical piece. However, the resulting music similarity metric usually captures the entire piece of music, i.e., the mixing of various instrumental sound sources, limiting the capability of the music recommendation system, e.g., it is difficult to search for a musical piece containing similar drum sounds. Towards the development of a more flexible music recommendation system, we propose a music similarity calculation method that focuses on individual instrumental sound sources in a musical piece. By fully exploiting the potential of data-driven methods for our proposed method, we employ weakly supervised metric learning to individual instrumental sound source signals without using any tag information, where positive and negative samples in a triplet loss are defined by whether or not they are from the same musical piece. Furthermore, assuming that each instrumental sound source is not always available in practice, we also investigate the effects of using instrumental sound source separation to obtain each source in the proposed method. Experimental results have shown that (1) unique similarity metrics can be learned for individual instrumental sound sources, (2) similarity metrics learned using some instrumental sound sources are possible to lead to more accurate results than that learned using the entire musical piece, (3) the performance degraded when learning with the separated instrumental sounds, and (4) similarity metrics learned by the proposed method well produced results that correspond to perception by human senses.
Ennio Stipčević
Razprava je prvi poskus vseobsegajočega uvida v »popularno glasbeno kulturo« obdobja baroka na Hrvaškem. Uvodnemu razmišljanju sledi opisni pregled obstoječe bibliografije zbira ohranjenih muzikalij. Ureditev sledi vodilom sv. Bonaventure (14. stoletje), ki razlikuje med štirimi vrstami piscev besedil: scriptor, compilator, commentator in auctor. Zadnji del razprave nudi nekaj preliminarnih zaključkov, ki puščajo odprto pot za nadaljnje raziskave in razprave.
Fernando Emboaba
Como parte integrante e inerente à atividade musicológica, a recuperação e reconstituição de manuscritos é aquela que desperta grande fascínio ao pesquisador, fomentado pelo descobrimento de diversas perspectivas históricas e divulgação de um passado muitas vezes esquecido. Tendo isso em vista, nos voltamos para reconstituição e edição de um manuscrito do compositor Florêncio José Ferreira Coutinho – compositor em diversas irmandades atuando concomitantemente com atividade militar provinda do Regimento dos Dragões no último quartel do século XVIII e início do século XIX em Vila Rica, atual Ouro Preto (Estado de Minas Gerais) – pertencente ao acervo de partituras do Museu da Inconfidência no Anexo III – Casa do Pilar, denominado O pecador - Para Sagrada Comunhão. Em suma, neste recorte, pretendemos, por meio de uma metodologia exploratória e procedimento documental, remontar todo o processo de reconstituição do manuscrito, desde nosso primeiro contato com o documento até sua editoração e análise, pontuando as decisões musicológicas que enfrentamos até sua conclusão.
Andrea Valenti, Stefano Berti, Davide Bacciu
The polyphonic nature of music makes the application of deep learning to music modelling a challenging task. On the other hand, the Transformer architecture seems to be a good fit for this kind of data. In this work, we present Calliope, a novel autoencoder model based on Transformers for the efficient modelling of multi-track sequences of polyphonic music. The experiments show that our model is able to improve the state of the art on musical sequence reconstruction and generation, with remarkably good results especially on long sequences.
Halaman 19 dari 324661