Evolving music theory for emerging musical languages
Emmanuel Deruty
This chapter reconsiders the concept of pitch in contemporary popular music (CPM), particularly in electronic contexts where traditional assumptions may fail. Drawing on phenomenological and inductive methods, it argues that pitch is not an ontologically objective property but a perceptual construct shaped by listeners and conditions. Analyses of quasi-harmonic tones reveal that a single tone can convey multiple pitches, giving rise to tonal fission. The perception of pitch may also be multistable, varying for the same listener over time. In this framework, the tuning system may emerge from a tone's internal structure. A parallel with the coastline paradox supports a model of pitch grounded in perceptual variability, challenging inherited theoretical norms.
Insights on Harmonic Tones from a Generative Music Experiment
Emmanuel Deruty, Maarten Grachten
The ultimate purpose of generative music AI is music production. The studio-lab, a social form within the art-science branch of cross-disciplinarity, is a way to advance music production with AI music models. During a studio-lab experiment involving researchers, music producers, and an AI model for music generating bass-like audio, it was observed that the producers used the model's output to convey two or more pitches with a single harmonic complex tone, which in turn revealed that the model had learned to generate structured and coherent simultaneous melodic lines using monophonic sequences of harmonic complex tones. These findings prompt a reconsideration of the long-standing debate on whether humans can perceive harmonics as distinct pitches and highlight how generative AI can not only enhance musical creativity but also contribute to a deeper understanding of music.
Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics
Jonathan Lehmkuhl, Ábel Ilyés-Kun, Nico Bremes
et al.
Although a variety of transformers have been proposed for symbolic music generation in recent years, there is still little comprehensive study on how specific design choices affect the quality of the generated music. In this work, we systematically compare different datasets, model architectures, model sizes, and training strategies for the task of symbolic piano music generation. To support model development and evaluation, we examine a range of quantitative metrics and analyze how well they correlate with human judgment collected through listening studies. Our best-performing model, a 950M-parameter transformer trained on 80K MIDI files from diverse genres, produces outputs that are often rated as human-composed in a Turing-style listening survey.
Detecting Musical Deepfakes
Nick Sunday
The proliferation of Text-to-Music (TTM) platforms has democratized music creation, enabling users to effortlessly generate high-quality compositions. However, this innovation also presents new challenges to musicians and the broader music industry. This study investigates the detection of AI-generated songs using the FakeMusicCaps dataset by classifying audio as either deepfake or human. To simulate real-world adversarial conditions, tempo stretching and pitch shifting were applied to the dataset. Mel spectrograms were generated from the modified audio, then used to train and evaluate a convolutional neural network. In addition to presenting technical results, this work explores the ethical and societal implications of TTM platforms, arguing that carefully designed detection systems are essential to both protecting artists and unlocking the positive potential of generative AI in music.
Ekaterinburg. Cultural history. Author’s essays for the anniversary of the capital of the Ural
Maria S. Frolova
In 2021-2023 In Yekaterinburg, 3 volumes of author's essays were published on the development of the cultural sphere of the capital of the Urals. The release of review texts was initiated by the Department of Culture of the Yekaterinburg Administration. On 864 pages, using archival materials, unique historical and contemporary photographs, the “spirit of the development of the arts” is presented - music, theater and cinema in Volume 1, sculpture, painting and architecture in Volume 2, literature, art education and the educational system in Volume 3. The chosen genre - essays - is original and productive. Texts are a form of summing up, recording successes in the development of the Yekaterinburg/Sverdlovsk sphere of culture. The tercentenary anniversary of Yekaterinburg (the city can be scientifically categorized as a regional or peripheral capital), which took place in 2023, is an occasion for reflection and further planning. Richly illustrated, gift-type books are deep and original from the point of view of analytics of the development of the cultural sphere. The authors were leading academic researchers and employees of the largest cultural institutions of Yekaterinburg - the Sverdlovsk Regional Museum of Local Lore, UrFU named after the first President of Russia B. N. Yeltsin, the Museum of the History of Yekaterinburg, the Sverdlovsk Music School named after P. I. Tchaikovsky. Using the general scientific critical method, methods of synthesis and analysis, the text of the review provides a brief overview of all three volumes of essays, characterizes the merits of the publication, and provides criticism.
Sociology (General), Urban groups. The city. Urban sociology
Geleneksel Değerlerin Modern Dönemdeki Dönüşümü: Sebahat Akkiraz'ın 'Yeşil İpek' Eserinin İcrası
Anıl Erkılınç, Sertan Demir, Mehtap Uçar Tören
Halk, geleneksellik, modernizm ve postmodernizm gibi kavramlar, tarih boyunca farklı bağlamlarda çeşitli disiplinler tarafından tartışılmış ve anlamları, dönemsel dinamikler ile değer sistemlerine bağlı olarak sürekli değişip dönüşmüş, kesin yargılara varılamamıştır. Yapılan araştırmalarda folklor, sosyoloji ve kültürel antropoloji gibi alanlarda halk kavramının farklı açılardan değerlendirildiği gözlemlenmiş, gelenek ile modernizm arasındaki çatışmanın toplumsal dönüşüm süreçlerindeki karmaşıklığı vurgulanmıştır. Modernizmin anlaşılma çabası genellikle temel ilkeleri ve etkileri üzerine yoğunlaşmıştır. Geleneğin modernizmin tam bir karşıtı olarak algılanmasının, bazı sorunlara yol açabileceği düşünülmektedir. Her iki kavramın da birbirleriyle çatışması zorunlu değildir; hatta birbirleriyle doğrudan ilişkili ve birbirlerinin gelişimine katkı sağladığı düşünülebilir. Tarihsel olarak gelişimlerini birbirinden ayrı düşünmek, sağlıklı bir yaklaşım değildir. Ayrıca, postmodernizmin evrenselcilik ve ilerlemeci ideallere karşı çıkarak yerel, öznel ve çoksesli bir bakış açısı sunduğu gözlemlenmiştir. Bu çalışmanın odak noktası olarak ele alınan Sabahat Akkiraz'ın vokal icrasını, Bedük’ün ise çalgısal düzenlemelerini gerçekleştirdiği Yeşil İpek adlı eser, Türk halk müziğinin postmodern dönüşümünü örneklemektedir. Geleneksel ve modern müzik unsurlarının sentezlendiği bu eser, geleneğin zenginliğini korurken günümüz teknoloji imkanlarının kullanıldığı bir icra tarzı ortaya koymaktadır. Bu araştırma, geleneksel değerlerin modern dönemdeki dönüşümünü anlamak için nitel bir araştırma deseni kullanılarak gerçekleştirilmiştir. Akademik dergilerde yayınlanmış makaleler, kitap bölümleri ve çevrimiçi veri tabanlarından elde edilen literatür, belirli başlıklar altında gruplandırılmış ve farklı disiplinler ve yaklaşımlar altında incelenmiştir.
GraphMuse: A Library for Symbolic Music Graph Processing
Emmanouil Karystinaios, Gerhard Widmer
Graph Neural Networks (GNNs) have recently gained traction in symbolic music tasks, yet a lack of a unified framework impedes progress. Addressing this gap, we present GraphMuse, a graph processing framework and library that facilitates efficient music graph processing and GNN training for symbolic music tasks. Central to our contribution is a new neighbor sampling technique specifically targeted toward meaningful behavior in musical scores. Additionally, GraphMuse integrates hierarchical modeling elements that augment the expressivity and capabilities of graph networks for musical tasks. Experiments with two specific musical prediction tasks -- pitch spelling and cadence detection -- demonstrate significant performance improvement over previous methods. Our hope is that GraphMuse will lead to a boost in, and standardization of, symbolic music processing based on graph representations. The library is available at https://github.com/manoskary/graphmuse
SMUG-Explain: A Framework for Symbolic Music Graph Explanations
Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer
In this work, we present Score MUsic Graph (SMUG)-Explain, a framework for generating and visualizing explanations of graph neural networks applied to arbitrary prediction tasks on musical scores. Our system allows the user to visualize the contribution of input notes (and note features) to the network output, directly in the context of the musical score. We provide an interactive interface based on the music notation engraving library Verovio. We showcase the usage of SMUG-Explain on the task of cadence detection in classical music. All code is available on https://github.com/manoskary/SMUG-Explain.
Experiencia en práctica virtual de pedagogía en música en tiempos de COVID-19
Francisca Carrasco Lavado, Jazmín Sarita Pérez Serey
El objetivo de esta investigación fue desarrollar una experiencia virtual en práctica inicial de estudiantes de Pedagogía en Educación Musical (PEM) asociado a Programa de Iniciación Musical Virtual (PIMV) en tiempos de COVID. Se presenta una investigación de metodología cuantitativa, de diseño Cuasi experimental, en nivel descriptivo con aplicación longitudinal. Durante 2 años de pandemia por Covid-19, la asignatura de Práctica Inicial II de Pedagogía en Música (PEM) se debió desarrollar en formato virtual a través de un Programa de Iniciación Musical Virtual (PIMV). Se aplicaron 2 encuestas a los participantes: Una encuesta se aplicó a estudiantes de música de la Práctica Inicial II que participaron como tutores en el PIMV, mientras que la otra encuesta se aplicó a los padres de los niños que participaron en el programa de iniciación. Los resultados muestran que la aplicación del PIMV tuvo buena aceptación por parte de los estudiantes en práctica, el 72% de ellos la considera de forma positiva y el 36% lo ve como una actividad práctica suficiente. En cuanto a los apoderados, hay una buena evaluación de este programa en ambos años, con un 95,5 % de bueno y muy bueno en el año 1 y un 100% de bueno y muy bueno en el año 2. En general, la experiencia de práctica virtual es valorada como positiva y motivó a los estudiantes universitarios a destacar en su curriculum la realización de la Práctica Inicial II mediante el Programa de Iniciación Musical en formato virtual.
Music and books on Music, Musical instruction and study
A PERFORMANCE INTERPRETATION OF THE VIENNESE CLASSICS BY THE EXAMPLE OF FANTASIA FOR PIANO, CHORUS AND ORCHESTRA IN C MINOR, OP. 80 BY LUDWIG VAN BEETHOVEN
Nataliya BYELIK-ZOLOTAROVA, Natalya ZOLOTARYOVA, Viacheslav BOIKO
et al.
The relevance of the study is determined by the need to cover the significance, uniqueness and means of performance of one of the masterpieces of the Viennese Classical School — Fantasia in c minor, op. 80 for mixed choir, piano and orchestra by Beethoven. The aim of this publication was to study the problems of performance interpretation of the legacy of the Viennese Classical School using the example of L. Beethoven’s Fantasia for piano, soloists, mixed chorus, and orchestra. Research methods were: creation of an information background; comparative analysis and structuring of information; identification of the categories that make the basis of the problem; generalization of obtained data. The materials based on audio and video recordings of the work by prominent representatives of the performing arts were used. The literature on the stylistic and compositional atmosphere of the era of Viennese classicism, as well as on features of the genres of the work was also used. The results of the study revealed the interrelationship of all components of the problem being studied. They evidenced the inseparability of such elements as genre, musical form, instrumentation, manner of performance, a certain historical period, as well as the stylistic orientation prevailing within it. Its main categories were identified while studying the chosen topic, as well as their dependency on the temporal, stylistic, individual, and psychological (performers’ personalities) context was established. These facts became a theoretical and methodological contribution to art science, history of performance, as well as music pedagogy. As a conclusion of the study on performing interpretation of the Viennese classics using the example of Fantasia by Beethoven, its genre stylistic and compositional technical universality was determined; the process during which the piano becomes a solo concert instrument; the role of improvisation in the large-scale synthetic genre; development of a single performance concept by the conductor, pianist and choirmaster, diversity of artistic and psychological types of performance while preserving the main author’s idea. The research of this topic has wide prospects in the future, thanks to its volume, multi-vector nature, connection with a wide range of musical subjects, and sensitivity to individual styles of interpretation in the context of different epochs. This is emphasized by the value of classical heritage, the need to preserve and popularize it.
Language-Guided Music Recommendation for Video via Prompt Analogies
Daniel McKee, Justin Salamon, Josef Sivic
et al.
We propose a method to recommend music for an input video while allowing a user to guide music selection with free-form natural language. A key challenge of this problem setting is that existing music video datasets provide the needed (video, music) training pairs, but lack text descriptions of the music. This work addresses this challenge with the following three contributions. First, we propose a text-synthesis approach that relies on an analogy-based prompting procedure to generate natural language music descriptions from a large-scale language model (BLOOM-176B) given pre-trained music tagger outputs and a small number of human text descriptions. Second, we use these synthesized music descriptions to train a new trimodal model, which fuses text and video input representations to query music samples. For training, we introduce a text dropout regularization mechanism which we show is critical to model performance. Our model design allows for the retrieved music audio to agree with the two input modalities by matching visual style depicted in the video and musical genre, mood, or instrumentation described in the natural language query. Third, to evaluate our approach, we collect a testing dataset for our problem by annotating a subset of 4k clips from the YT8M-MusicVideo dataset with natural language music descriptions which we make publicly available. We show that our approach can match or exceed the performance of prior methods on video-to-music retrieval while significantly improving retrieval accuracy when using text guidance.
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
Kun Su, Judith Yue Li, Qingqing Huang
et al.
Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally aligned signatures between video and music directly from paired music and videos, without explicitly modeling domain-specific rhythmic or semantic relationships. We propose V2Meow, a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input types using a multi-stage autoregressive model. Trained on 5k hours of music audio clips paired with video frames mined from in-the-wild music videos, V2Meow is competitive with previous domain-specific models when evaluated in a zero-shot manner. It synthesizes high-fidelity music audio waveforms solely by conditioning on pre-trained general-purpose visual features extracted from video frames, with optional style control via text prompts. Through both qualitative and quantitative evaluations, we demonstrate that our model outperforms various existing music generation systems in terms of visual-audio correspondence and audio quality. Music samples are available at tinyurl.com/v2meow.
Experimental Analysis on Detection of Emotions by Facial Recognition using Different Convolution Layers
S. Kezia, E. Grace Mary Kanaga, H. Eugene Kingsley
et al.
This Emotions have been playing a crucial part in the social, physical, and mental life. Hence identifying the emotions by recognizing the facial expression of the user is of utmost importance. This information obtained will help in various recommendation systems like music, products, books etc. In this paper, the emotions are detected by using the facial data and the results are compared between CNN model and DCNN model. The dataset used was FER-2013, which included 28,000 annotated photos in the training set, 3,500 in the training set, and 3,500 in the test set. In FER-2013, each image is assigned to one of seven emotions: happy, sad, angry, afraid, surprise, disgust, and neutral. The parameters chosen are applied to three different models created by altering Convolution Layers, demonstrating that as the number of layers grow, accuracy improves.
Implementation Of Project Based Learning (PjBL) Method On Music Learning In Junior High School Regina Pacis Surakarta
Antonius Edi Nugroho, Stefani Ekky Puspa Dewi
Learning methods used to teach in online classes provide solutions so students can continue to understand the material presented, have appreciation and innovate in participating in teaching and learning activities. The project based learning method is a learning method that involves a project or activity as a medium in the learning process. This method focuses on student activities to be able to understand a concept and principle by conducting in-depth research on a problem and finding relevant solutions. This learning method makes students active, independent in learning, can apply the knowledge they already have, practice various thinking skills, attitudes, and concrete skills. Meanwhile, for complex problems, learning is required through investigation, collaboration and experimentation in making a project, as well as integrating various subjects (materials) in learning. Teachers at Regina Pacis Junior High School Surakarta use project based learning methods in their learning. The teachers at the junior high school combined 3 lessons to make one project. The research method used in this study is qualitative with descriptive exposure. The research subjects were music teachers at Regina Pacis Junior High School, Surakarta. The results of data collection using observation, interview, documentation, and data analysis techniques. The data analysis technique is divided into three stages, namely data reduction, data presentation, and drawing conclusions. The results of interviews and observations that have been collected include documentation, pictures, photos, videos, field notes, personal notes, and other documents after being studied, examined and researched and then reduced to an abstraction.
Music, Musical instruction and study
Flat Latent Manifolds for Human-machine Co-creation of Music
Nutan Chen, Djalel Benbouzid, Francesco Ferroni
et al.
The use of machine learning in artistic music generation leads to controversial discussions of the quality of art, for which objective quantification is nonsensical. We therefore consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal interplay is to lead to new experiences, both for the musician and the audience. To obtain this behaviour, we resort to the framework of recurrent Variational Auto-Encoders (VAE) and learn to generate music, seeded by a human musician. In the learned model, we generate novel musical sequences by interpolation in latent space. Standard VAEs however do not guarantee any form of smoothness in their latent representation. This translates into abrupt changes in the generated music sequences. To overcome these limitations, we regularise the decoder and endow the latent space with a flat Riemannian manifold, i.e., a manifold that is isometric to the Euclidean space. As a result, linearly interpolating in the latent space yields realistic and smooth musical changes that fit the type of machine--musician interactions we aim for. We provide empirical evidence for our method via a set of experiments on music datasets and we deploy our model for an interactive jam session with a professional drummer. The live performance provides qualitative evidence that the latent representation can be intuitively interpreted and exploited by the drummer to drive the interplay. Beyond the musical application, our approach showcases an instance of human-centred design of machine-learning models, driven by interpretability and the interaction with the end user.
A Book Recommendation System Using Decision Tree-based Fuzzy Logic for E-Commerce Sites
M. F. Adak, Metehan Uçar
Recommendation systems are systems developed to advise users on the most accurate product, especially on e-commerce sites, movie viewing platforms, and music listening platforms. Nowadays, the increasing number of studies on data is led to applying different methods in recommendation systems. A fuzzy-logic-based product recommendation system is proposed for users who want to buy books on e-commerce sites in the study. Clustering is made using unsupervised learning with information from the “also bought-viewed” book data. A decision tree model is created with the data set. The rules of the Fuzzy model used in the study are created by using this decision tree. It is observed that successful results are obtained when tests are performed with actual data and decision trees, and fuzzy models are used together. Usually, in fuzzy models, data is not required. It is necessary to know the parameters and their effects during the design of the model. However, it will be complicated to determine rules in complex and challenging models like as in this study. As a result of the successful results obtained in this study, it has been shown that the rules are created quickly and accurately with the help of a method such as decision trees.
A Review on Multiple Data Source Based Recommendation Systems
Debashish Roy, F. Shirazi
Recommender systems are used by the content or item providers. The content could be video, music, etc. For example, Netflix recommends movies or TV shows, Amazon recommends books or other items, etc. Most content providers use their platform to collect data from their users and then use the collected data to design a recommender system. However, the recommendation results are more useful if a recommender system uses multiple data sources. Both Matrix Factorization (MF) and Deep Neural Network (DNN) models are used to design multiple data source-based recommenders. This paper reviews various approaches that use multiple data sources to design recommender systems using MF and DNN models.
3 sitasi
en
Computer Science
Exploration Of Lampung Traditional Music In Efforts To Preserve Culture By Kulit Tipis Community In Bandar Lampung
Birgita Iyona Yulita, Bagus Susetyo, Irfanda Riski Harmono Sejati
Traditional music is one of the elements of culture keeping the continuity, dynamics, cultural identity and also becoming a vehicle for transmitting the cultural values inherited from generation to generation. This study aims to identify and describe the exploration of traditional music Lampung as an effort to preserve music culture by the Kulit Tipis community in Bandar Lampung. This research was conducted using qualitative methods. Data collection techniques used literature study, observation, interviews, and documentation studies. The results showed that there were efforts to preserve Lampung traditional music by Kulit Tipis community in Bandar Lampung, namely: (1) Collaborating traditional Lampung musical instruments with modern musical instruments or traditional musical instruments from other regions. (2) Making musical compositions from Lampung folk songs or modern pop songs today. (3) Performing musical exploration performances in various festivals, regular bands, and competitions in Lampung or outside the Lampung area. (4) Creating and uploading their video recordings of musical composition performances from the Kulit Tipis community through social media, such as YouTube.
Music, Musical instruction and study
Ódry Árpád színészetéről
Dorka Porogi
Árpád Ódry’s Art of Performance
In 1896 the second-year academy student Árpád Ódry was already mentioned by critics as possessing the rare talent of being “able to speak”. The young journalist who first recognizes Ódry’s talent, Sándor Hevesi, will later become the director of the National Theatre. My research focuses on how Ódry’s methods represented an acting ideal and a new school at the beginning of the 20th century by introducing natural (but not naturalistic) performance methods. These methods have proved to have a lasting effect until the present. Ódry’s natural performance methods, however, can only be achieved with the special techniques of the acting profession.
Music and books on Music, Arts in general
Semi-Supervised Music Tagging Transformer
Minz Won, Keunwoo Choi, Xavier Serra
We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted features using stacked self-attention layers. Through a careful model assessment, we first show that the proposed architecture outperforms the previous state-of-the-art music tagging models that are based on convolutional neural networks under a supervised scheme. The Music Tagging Transformer is further improved by noisy student training, a semi-supervised approach that leverages both labeled and unlabeled data combined with data augmentation. To our best knowledge, this is the first attempt to utilize the entire audio of the million song dataset.