G. Kramer
Hasil untuk "Music"
Menampilkan 20 dari ~1058343 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar
Zach Evans, CJ Carr, Josiah Taylor et al.
Generating long-form 44.1kHz stereo audio from text prompts can be computationally demanding. Further, most previous works do not tackle that music and sound effects naturally vary in their duration. Our research focuses on the efficient generation of long-form, variable-length stereo music and sounds at 44.1kHz using text prompts with a generative model. Stable Audio is based on latent diffusion, with its latent defined by a fully-convolutional variational autoencoder. It is conditioned on text prompts as well as timing embeddings, allowing for fine control over both the content and length of the generated music and sounds. Stable Audio is capable of rendering stereo signals of up to 95 sec at 44.1kHz in 8 sec on an A100 GPU. Despite its compute efficiency and fast inference, it is one of the best in two public text-to-music and -audio benchmarks and, differently from state-of-the-art models, can generate music with structure and stereo sounds.
Grosvenor W. Cooper, Leonard B. Meyer
Midhun T. Augustine
This paper presents a new approach to algorithmic composition, called predictive controlled music (PCM), which combines model predictive control (MPC) with music generation. PCM uses dynamic models to predict and optimize the music generation process, where musical notes are computed in a manner similar to an MPC problem by optimizing a performance measure. A feedforward neural network-based assessment function is used to evaluate the generated musical score, which serves as the objective function of the PCM optimization problem. Furthermore, a recurrent neural network model is employed to capture the relationships among the variables in the musical notes, and this model is then used to define the constraints in the PCM. Similar to MPC, the proposed PCM computes musical notes in a receding-horizon manner, leading to feedback controlled prediction. Numerical examples are presented to illustrate the PCM generation method.
Rakhat-Bi Abdyssagin, Bob Coecke
We initiate the development of a new language and theory for quantum music, to which we refer as Quantum Concept Music (QCM). This new music formalism is based on Categorical Quantum Mechanics (CQM), and more specifically, its diagrammatic incarnation Quantum Picturalism (QPict), which is heavily based on ZX-calculus. In fact, it is naturally inherited from CQM/QPict. At its heart is the explicit notational representation of relations that exist within and between the key concepts of music composition, performance, and automation. QCM also enables one to directly translate quantum phenomena into music compositions in a both intuitively obvious, rigorous and mechanical manner. Following this pattern, we propose a score for musicians interacting like a Bell-pair under measurement, and outline examples of how it could be live performed. While most of the Western classical music notation has heavily relied on linear representation of music - which does not always adequately capture the nature of music - our approach is distinct by highlighting the fundamental relational dimension of music. In addition, this quantum-based technique not only influences the music at the profound level of composition, but also has a direct impact on a live performance, and also provides a new template for automating music, e.g.~in the context of AI-generation. All together, we initiate the creation of new music formalism that is powerful and efficient in capturing the interactive nature of music, both in terms of internal and external interactions, and goes beyond the boundaries of Western classical music notation, which allows to use it in many different genres and directions.
Daniela Hřebíčková
Igor Lugo, Martha G. Alatriste-Contreras
This study is a theoretical approach for exploring the applicability of a 2D cellular automaton based on melodic and harmonic intervals in random arrays of musical notes. The aim of this study was to explore alternatives uses for a cellular automaton in the musical context for better understanding the musical creativity. We used the complex systems and humanities approaches as a framework for capturing the essence of creating music based on rules of music theory. Findings suggested that such rules matter for generating large-scale patterns of organized notes. Therefore, our formulation provides a novel approach for understanding and replicating aspects of the musical creativity.
Ziya Zhou, Yuhang Wu, Zhiyue Wu et al.
Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step reasoning perspective, which is a critical aspect in the conditioned, editable, and interactive human-computer co-creation process. This study conducts a thorough investigation of LLMs' capability and limitations in symbolic music processing. We identify that current LLMs exhibit poor performance in song-level multi-step music reasoning, and typically fail to leverage learned music knowledge when addressing complex musical tasks. An analysis of LLMs' responses highlights distinctly their pros and cons. Our findings suggest achieving advanced musical capability is not intrinsically obtained by LLMs, and future research should focus more on bridging the gap between music knowledge and reasoning, to improve the co-creation experience for musicians.
SeungHeon Doh, Jongpil Lee, Dasaem Jeong et al.
Word embedding has become an essential means for text-based information retrieval. Typically, word embeddings are learned from large quantities of general and unstructured text data. However, in the domain of music, the word embedding may have difficulty understanding musical contexts or recognizing music-related entities like artists and tracks. To address this issue, we propose a new approach called Musical Word Embedding (MWE), which involves learning from various types of texts, including both everyday and music-related vocabulary. We integrate MWE into an audio-word joint representation framework for tagging and retrieving music, using words like tag, artist, and track that have different levels of musical specificity. Our experiments show that using a more specific musical word like track results in better retrieval performance, while using a less specific term like tag leads to better tagging performance. To balance this compromise, we suggest multi-prototype training that uses words with different levels of musical specificity jointly. We evaluate both word embedding and audio-word joint embedding on four tasks (tag rank prediction, music tagging, query-by-tag, and query-by-track) across two datasets (Million Song Dataset and MTG-Jamendo). Our findings show that the suggested MWE is more efficient and robust than the conventional word embedding.
Hao-Wen Dong
Generative AI has been transforming the way we interact with technology and consume content. In the next decade, AI technology will reshape how we create audio content in various media, including music, theater, films, games, podcasts, and short videos. In this dissertation, I introduce the three main directions of my research centered around generative AI for music and audio: 1) multitrack music generation, 2) assistive music creation tools, and 3) multimodal learning for audio and music. Through my research, I aim to answer the following two fundamental questions: 1) How can AI help professionals or amateurs create music and audio content? 2) Can AI learn to create music in a way similar to how humans learn music? My long-term goal is to lower the barrier of entry for music composition and democratize audio content creation
Lilac Atassi
Recent music generation methods based on transformers have a context window of up to a minute. The music generated by these methods is largely unstructured beyond the context window. With a longer context window, learning long-scale structures from musical data is a prohibitively challenging problem. This paper proposes integrating a text-to-music model with a large language model to generate music with form. The papers discusses the solutions to the challenges of such integration. The experimental results show that the proposed method can generate 2.5-minute-long music that is highly structured, strongly organized, and cohesive.
Sylvain Charlebois, Janet Music, H. P. Vasantha Rupasinghe
<i>Purpose:</i> A diet rich in fruits and vegetables is vital for prolonged health and wellness. Yet, the consumption of fruits and vegetables remains low in some regions. <i>Methodology:</i> This exploratory quantitative study utilized a web-based survey instrument to probe the likelihood of consumption by Canadian consumers. Canadians who have lived in the country for 12 months or more and were 18 years of age or older were surveyed. Care was given to get a representative sample from all Canadian regions. <i>Findings:</i> Barriers to produce consumption include cost (39.5%), lack of knowledge and preparation skills (38.5%), and confusion surrounding health benefits (6.3%). There is further confusion surrounding the nutrition of frozen vs. fresh vegetables. Finally, respondents were concerned about pesticide residue on imported produce (63.4%). <i>Originality:</i> Although evidence that fruits and vegetables can mitigate disease and that promotion of fruit and vegetable consumption has been a key policy area for the Canadian government, consumers still fail to integrate sufficient fruits and vegetables into their diets. To our knowledge, this is the only study probing consumers on their fresh produce intake in the Canadian context. Public awareness and education about the regular consumption of fruits and vegetables and their nutritional value and health-promoting benefits can increase consumption in many Canadian regions and demographics.
ماجدة محمد ماضي, إيمان جمال غزي, رانيا محمود بركات et al.
Research from experimental research aimed at using beads in the work of separate specific supplements as a manufacturer of small industries,The beads are part of the culture of every era in the world around the world and was used to decorate the human body of crown ,clothing ,bags and others.The human interest continued to decorate himself to follow the different times to this day and used multiple and unfamiliar ore in making the supplementation of clothing and increasing the beauty and glazing and gives a new creative form,The beads were used in raising the technical and aesthetic value of these supplements,small industries have a vital role in Egyptian economic development and contribute to making the Egyptian family dramatically ,The number of three bags and three gangs and five crowns are produced using cloth ,beads,metal wire ,tests and safety,After the production process ,these supplements were presented to a group of twenty professors to see their opinion by responding to a three-main axes and fourteen questions Beads to make separate dressing accessories as small industries. يعتبر البحث من الأبحاث التجريبية التي تهدف إلى استخدام الخرز في عمل بعض المكملات الملبسية المنفصلة كصناعة من الصناعات الصغيرة،فالخرز جزء من ثقافة كل عصر في شتي أنحاء العالم وكان يستخدم في تزيين الجسم الإنساني من تيجان وملابس وحقائب وغيرها ،واستمر اهتمام الإنسان بتزيين نفسه بتتابع العصور المختلفة حتي يومنا هذاواستخدمت خامات متعددة وغير مألوفة في صنع مكملات الملبس والتي تزيد من جمال ورونق الملابس وتعطي شكل ابداعي جديد واستخدم الخرز في رفع القيمة الفنية والجمالية لهذه المكملات ،والصناعات الصغيرة لها دوراً حيوياًفي التنمية الإقتصادية المصرية وتساهم في جعل الأسرة المصرية منتجة بشكل كبير ،وقد تم انتاج عدد3 شنط ،3 عصابة قدم،5 تيجان باستخدام القماش والخرز والسلك المعدني والفصوص والركامة وبعد عملية الانتاج تم عرض هذه المكملات علي مجموعة مكونة من (20)أستاذ متخصص لمعرفة رأيهم عن طريق الاستجابة علي استبيان مكون من ثلاثة محاور رئيسية، 14 سؤال، وقد أجمعت النتائج الي الاستفادة من الخرز في عمل مكملات ملبسية منفصلة كصناعات صغيرة .
Tilen Žele, Jasmina Markovič-Božič, Polona Mušič et al.
Nidhal Jebabli, Mariem Khlifi, Nejmeddine Ouerghi et al.
Both music and endpoint knowledge of exercise have been shown to independently influence exercise performance. However, whether these factors work as synergists or counteract one another during exercise is unknown. The purpose of this study was to determine the single and combined effect of listening to preferred music and types of endpoint knowledge on repeated countermovement jump (CMJ) test performance. Twenty-four (n = 24) current or previously competitive basketball players underwent CMJ testing under the following endpoint knowledge conditions: (1) unknown/no knowledge, (2) knowledge of the number of jumps, and (3) knowledge of exercise duration. For each of these, participants listened to either their preferred music or no music during the duration of testing. For the exercise portion, participants completed repeated CMJs where participants were encouraged to jump as high as possible with jump height, contact time, and flight time as outcomes. Rate of perceived exertion (RPE) and feeling scale were measured before and after exercise. The results showed that, regardless of knowledge type, preferred music resulted in a significant decrease in both contact time and flight time (F ≥ 10.4, <i>p</i> ≤ 0.004, and η<sub>p</sub><sup>2</sup> ≥ 0.35), and a significant improvement of jump height (F = 11.36, <i>p</i> = 0.001, and η<sub>p</sub><sup>2</sup> = 0.09) and feeling scale ratings (F = 36.9, <i>p</i> < 0.001, and η<sub>p</sub><sup>2</sup> = 0.66) compared to no-music condition, while RPE was not significantly affected. Regardless of the presence of music, knowledge of the number of jumps and duration resulted in lower contact time (<i>p</i> < 0.001, 0.9 < d < 1.56) versus unknown condition during CMJs. Moreover, a significant decrease in RPE values was found during prior endpoint knowledge of number (<i>p</i> = 0.005; d = 0.72) and duration (<i>p</i> = 0.045; d = 0.63) compared to unknown condition. However, feeling scale ratings were not significantly affected. Moreover, no interactions with significance findings were found for any parameters. Overall, data suggest that listening to music and endpoint knowledge alter exercise responses in basketball players, but they do not interact with one another.
Qihan Wang, Anique Tahir, Zeyad Alghamdi et al.
Depression has emerged as a significant mental health concern due to a variety of factors, reflecting broader societal and individual challenges. Within the digital era, social media has become an important platform for individuals navigating through depression, enabling them to express their emotional and mental states through various mediums, notably music. Specifically, their music preferences, manifested through sharing practices, inadvertently offer a glimpse into their psychological and emotional landscapes. This work seeks to study the differences in music preferences between individuals diagnosed with depression and non-diagnosed individuals, exploring numerous facets of music, including musical features, lyrics, and musical networks. The music preferences of individuals with depression through music sharing on social media, reveal notable differences in musical features and topics and language use of lyrics compared to non-depressed individuals. We find the network information enhances understanding of the link between music listening patterns. The result highlights a potential echo-chamber effect, where depression individual's musical choices may inadvertently perpetuate depressive moods and emotions. In sum, this study underscores the significance of examining music's various aspects to grasp its relationship with mental health, offering insights for personalized music interventions and recommendation algorithms that could benefit individuals with depression.
Kexin Zhu, Xulong Zhang, Jianzong Wang et al.
Music Emotion Recognition involves the automatic identification of emotional elements within music tracks, and it has garnered significant attention due to its broad applicability in the field of Music Information Retrieval. It can also be used as the upstream task of many other human-related tasks such as emotional music generation and music recommendation. Due to existing psychology research, music emotion is determined by multiple factors such as the Timbre, Velocity, and Structure of the music. Incorporating multiple factors in MER helps achieve more interpretable and finer-grained methods. However, most prior works were uni-domain and showed weak consistency between arousal modeling performance and valence modeling performance. Based on this background, we designed a multi-domain emotion modeling method for instrumental music that combines symbolic analysis and acoustic analysis. At the same time, because of the rarity of music data and the difficulty of labeling, our multi-domain approach can make full use of limited data. Our approach was implemented and assessed using the publicly available piano dataset EMOPIA, resulting in a notable improvement over our baseline model with a 2.4% increase in overall accuracy, establishing its state-of-the-art performance.
Shih-Lun Wu, Chris Donahue, Shinji Watanabe et al.
Text-to-music generation models are now capable of generating high-quality music audio in broad styles. However, text control is primarily suitable for the manipulation of global musical attributes like genre, mood, and tempo, and is less suitable for precise control over time-varying attributes such as the positions of beats in time or the changing dynamics of the music. We propose Music ControlNet, a diffusion-based music generation model that offers multiple precise, time-varying controls over generated audio. To imbue text-to-music models with time-varying control, we propose an approach analogous to pixel-wise control of the image-domain ControlNet method. Specifically, we extract controls from training audio yielding paired data, and fine-tune a diffusion-based conditional generative model over audio spectrograms given melody, dynamics, and rhythm controls. While the image-domain Uni-ControlNet method already allows generation with any subset of controls, we devise a new strategy to allow creators to input controls that are only partially specified in time. We evaluate both on controls extracted from audio and controls we expect creators to provide, demonstrating that we can generate realistic music that corresponds to control inputs in both settings. While few comparable music generation models exist, we benchmark against MusicGen, a recent model that accepts text and melody input, and show that our model generates music that is 49% more faithful to input melodies despite having 35x fewer parameters, training on 11x less data, and enabling two additional forms of time-varying control. Sound examples can be found at https://MusicControlNet.github.io/web/.
Alexander Park, Kyung-Hyun Suh
This study identified the relationship between preoccupation with devotional songs and spiritual well-being of religious individuals, and examined the mediating effect of intrinsic religiosity on preoccupation with devotional songs and spiritual well-being, moderated by the emotionally adaptive functions of music. The participants were 427 male and female Korean religious individuals. PROCESS Macro 3.5 Model 7 was used to analyze the moderated mediating effects. The results revealed that preoccupation with devotional songs was positively correlated with the emotionally adaptive functions of music, religiosity, and spiritual well-being, whereas emotionally adaptive functions of music were not significantly correlated with intrinsic religiosity. Intrinsic religiosity was positively correlated with spiritual well-being, whereas extrinsic social religiosity was not. In a moderated mediating model, there was a significant interaction effect of preoccupation with devotional songs and the emotionally adaptive functions of music; however, intrinsic religiosity could mediate the relationship between preoccupation with devotional songs and spiritual well-being, regardless of the level of emotionally adaptive functions of music. These findings suggest that, although there may be a slight difference depending on the level of use of emotionally adaptive functions of music, preoccupation with devotional songs can promote intrinsic religiosity and lead to the spiritual well-being of religious individuals.
Peining Zhang, Junliang Guo, Linli Xu et al.
We consider a novel task of automatically generating text descriptions of music. Compared with other well-established text generation tasks such as image caption, the scarcity of well-paired music and text datasets makes it a much more challenging task. In this paper, we exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music. More concretely, we use the dilated convolutional layer as the basic component of the encoder and a memory based recurrent neural network as the decoder. To enhance the authenticity and thematicity of generated texts, we further propose to fine-tune the model with a discriminator as well as a novel topic evaluator. To measure the quality of generated texts, we also propose two new evaluation metrics, which are more aligned with human evaluation than traditional metrics such as BLEU. Experimental results verify that our model is capable of generating fluent and meaningful comments while containing thematic and content information of the original music.
Halaman 21 dari 52918