G. Conole
Hasil untuk "Musical instruction and study"
Menampilkan 20 dari ~6504211 hasil · dari CrossRef, DOAJ, Semantic Scholar, arXiv
Jijie Li, Li Du, Hanyu Zhao et al.
Large Language Models (LLMs) demonstrate strong performance in real-world applications, yet existing open-source instruction datasets often concentrate on narrow domains, such as mathematics or coding, limiting generalization and widening the gap with proprietary models. To bridge this gap, we introduce Infinity-Instruct, a high-quality instruction dataset designed to enhance both foundational and chat capabilities of LLMs through a two-phase pipeline. In Phase 1, we curate 7.4M high-quality foundational instructions (InfInstruct-F-7.4M) from over 100M samples using hybrid data selection techniques. In Phase 2, we synthesize 1.5M high-quality chat instructions (InfInstruct-G-1.5M) through a two-stage process involving instruction selection, evolution, and diagnostic filtering. We empirically evaluate Infinity-Instruct by fine-tuning several open-source models, including Mistral, LLaMA, Qwen, and Yi, and observe substantial performance gains across both foundational and instruction following benchmarks, consistently surpassing official instruction-tuned counterparts. Notably, InfInstruct-LLaMA3.1-70B outperforms GPT-4-0314 by 8.6\% on instruction following tasks while achieving comparable foundational performance. These results underscore the synergy between foundational and chat training and offer new insights into holistic LLM development. Our dataset\footnote{https://huggingface.co/datasets/BAAI/Infinity-Instruct} and codes\footnote{https://gitee.com/li-touch/infinity-instruct} have been publicly released.
Jiaye Tan, Haonan Luo, Linfeng Song et al.
Low-latency symbolic music generation is essential for real-time improvisation and human-AI co-creation. Existing transformer-based models, however, face a trade-off between inference speed and musical quality. Traditional acceleration techniques such as embedding pooling significantly degrade quality, while recently proposed Byte Pair Encoding (BPE) methods - though effective on single-track piano data - suffer large performance drops in multi-track settings, as revealed by our analysis. We propose Attribute-Specialized Key-Value Head Sharing (AS-KVHS), adapted to music's structured symbolic representation, achieving about 30% inference speedup with only a negligible (about 0.4%) quality drop in objective evaluations and slight improvements in subjective listening tests. Our main contributions are (1) the first systematic study of BPE's generalizability in multi-track symbolic music, and (2) the introduction of AS-KVHS for low-latency symbolic music generation. Beyond these, we also release SAGE-Music, an open-source benchmark that matches or surpasses state-of-the-art models in generation quality.
Karan Patel, Yu-Zheng Lin, Gaurangi Raul et al.
This full paper describes an LLM-assisted instruction integrated with a virtual cybersecurity lab platform. The digital transformation of Fourth Industrial Revolution (4IR) systems is reshaping workforce needs, widening skill gaps, especially among older workers. With rising emphasis on robotics, automation, AI, and security, re-skilling and up-skilling are essential. Generative AI can help build this workforce by acting as an instructional assistant to support skill acquisition during experiential learning. We present a generative AI instructional assistant integrated into a prior experiential learning platform. The assistant employs a zero-shot OCR-LLM pipeline within the legacy Cybersecurity Labs-as-a-Service (CLaaS) platform (2015). Text is extracted from slide images using Tesseract OCR, then simplified instructions are generated via a general-purpose LLM, enabling real-time instructional support with minimal infrastructure. The system was evaluated in a live university course where student feedback (n=42) averaged 7.83/10, indicating strong perceived usefulness. A comparative study with multimodal LLMs that directly interpret slide images showed higher performance on visually dense slides, but the OCR-LLM pipeline provided comparable pedagogical value on text-centric slides with much lower computational overhead and cost. This work demonstrates that a lightweight, easily integrable pipeline can effectively extend legacy platforms with modern generative AI, offering scalable enhancements for student comprehension in technical education.
Gwendal Le Vaillant, Yannick Molle
Efficiently retrieving specific instrument timbres from audio mixtures remains a challenge in digital music production. This paper introduces a contrastive learning framework for musical instrument retrieval, enabling direct querying of instrument databases using a single model for both single- and multi-instrument sounds. We propose techniques to generate realistic positive/negative pairs of sounds for virtual musical instruments, such as samplers and synthesizers, addressing limitations in common audio data augmentation methods. The first experiment focuses on instrument retrieval from a dataset of 3,884 instruments, using single-instrument audio as input. Contrastive approaches are competitive with previous works based on classification pre-training. The second experiment considers multi-instrument retrieval with a mixture of instruments as audio input. In this case, the proposed contrastive framework outperforms related works, achieving 81.7\% top-1 and 95.7\% top-5 accuracies for three-instrument mixtures.
Yongjae Kim, Seongchan Park
This study explores the extent to which national music preferences reflect underlying cultural values. We collected long-term popular music data from YouTube Music Charts across 62 countries, encompassing both Western and non-Western regions, and extracted audio embeddings using the CLAP model. To complement these quantitative representations, we generated semantic captions for each track using LP-MusicCaps and GPT-based summarization. Countries were clustered based on contrastive embeddings that highlight deviations from global musical norms. The resulting clusters were projected into a two-dimensional space via t-SNE for visualization and evaluated against cultural zones defined by the World Values Survey (WVS). Statistical analyses, including MANOVA and chi-squared tests, confirmed that music-based clusters exhibit significant alignment with established cultural groupings. Furthermore, residual analysis revealed consistent patterns of overrepresentation, suggesting non-random associations between specific clusters and cultural zones. These findings indicate that national-level music preferences encode meaningful cultural signals and can serve as a proxy for understanding global cultural boundaries.
Katherine Bombardieri
The large research study on which this paper draws examined the preparation of early-career teachers (ECTs) in New South Wales to teach composition and musical creativity. Presenting data collected through a Constructivist Grounded Theory (CGT) research design, and semi-structured interviews and themes, this article explores the theme of Musical Identity as it arose in the larger study. Through discussions with ECTs and composers who have experience teaching composition in NSW secondary schools, this study examined and compared these groups’ personal definitions of composition and perceptions of composer/creator identity and explored the implications of these definitions and identities on how these groups approach composition instruction in NSW secondary schools. These discussions revealed that participants held a wide variety of definitions of composing and the composer identity, stemming from composition’s strong Western Art Music (WAM) connotations, and these ambiguities presented themselves as barriers for the participants to teach composition effectively. Literature in this area indicates that, with relatively little pedagogical understanding of it, composition is neglected in initial teacher education (ITE) programs in Australia, Canada, Finland, the United Kingdom, and the United States. As music education activities should be facilitated using processes and paradigms that reflect the paradigms of students, music teachers’ personal definitions of composition need to be broadened through their ITE experiences. In order to democratise the act of creating music, syllabus terminology and requirements should be revised to reflect these broader definitions of music creation in NSW secondary schools.
Brenda Letícia dos Santos, Paulo Eduardo de Barros Veiga, Marcos Vinícius Miranda dos Santos
A Instituição Aparecido Savegnago, localizada em Sertãozinho, no interior de São Paulo, atende gratuitamente a cerca de 170 alunos por ano, contribuindo para a educação musical de crianças em situação de vulnerabilidade social. Este artigo apresenta a estrutura pedagógica do ensino de cordas friccionadas da Instituição, com ênfase em dois aspectos: o impacto social que gera na comunidade local e regional e a eficiência da formação musical de crianças e jovens. Em relação ao ensino de instrumentos de cordas, a metodologia orientadora é a Filosofia Suzuki, devidamente lecionada por professores credenciados pela Suzuki Association of the Americas (SAA). Nesse ensejo, comenta-se sobre esse modelo de ensino, com vistas à sua estruturação e ao desenvolvimento de bem-estar, procurando verificar a eficiência do projeto em mitigar os efeitos da desigualdade social e da precarização do ensino de música no município de Sertãozinho e região.
Ruben Ciranni, Giorgio Mariani, Michele Mancusi et al.
We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of the stems composing music tracks and can input features obtained via Harmonic-Percussive Separation (HPS). COCOLA allows the objective evaluation of generative models for music accompaniment generation, which are difficult to benchmark with established metrics. In this regard, we evaluate recent music accompaniment generation models, demonstrating the effectiveness of the proposed method. We release the model checkpoints trained on public datasets containing separate stems (MUSDB18-HQ, MoisesDB, Slakh2100, and CocoChorales).
Hanyu Zhao, Li Du, Yiming Ju et al.
With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs). Previous research mainly focuses on selecting individual high-quality instructions. However, these works overlooked the joint interactions and dependencies between different categories of instructions, leading to suboptimal selection strategies. Moreover, the nature of these interaction patterns remains largely unexplored, let alone optimize the instruction set with regard to them. To fill these gaps, in this paper, we: (1) systemically investigate interaction and dependency patterns between different categories of instructions, (2) manage to optimize the instruction set concerning the interaction patterns using a linear programming-based method, and optimize the learning schema of SFT using an instruction dependency taxonomy guided curriculum learning. Experimental results across different LLMs demonstrate improved performance over strong baselines on widely adopted benchmarks.
Moody Ivan
Igor Stravinsky’s philosophical and religious trajectory included transformative encounters with Catholic theologians and philosophers in the Paris of the 1920s and 1930s. The most important amongst these was Jacques Maritain, whose neoThomist philosophy applied to art was of significance to Stravinsky, and in particular through its application in the life and work of fellow Russian émigré composer Arthur Lourié. This article examines the relationship between Stravinsky and Maritain in terms of the larger philosophical and creative context of the period, also touching on the work of Lourié and Manuel de Falla, and discussing its ramifications in the work of Stravinsky himself.
Zihan Zhang, Meng Fang, Ling Chen et al.
Continual learning (CL) is a paradigm that aims to replicate the human ability to learn and accumulate knowledge continually without forgetting previous knowledge and transferring it to new tasks. Recent instruction tuning (IT) involves fine-tuning models to make them more adaptable to solving NLP tasks in general. However, it is still uncertain how instruction tuning works in the context of CL tasks. This challenging yet practical problem is formulated as Continual Instruction Tuning (CIT). In this work, we establish a CIT benchmark consisting of learning and evaluation protocols. We curate two long dialogue task streams of different types, InstrDialog and InstrDialog++, to study various CL methods systematically. Our experiments show that existing CL methods do not effectively leverage the rich natural language instructions, and fine-tuning an instruction-tuned model sequentially can yield similar or better results. We further explore different aspects that might affect the learning of CIT. We hope this benchmark will facilitate more research in this direction.
Emmanouil Karystinaios, Francesco Foscarin, Florent Jacquemard et al.
This paper focuses on the nominal durations of musical events (notes and rests) in a symbolic musical score, and on how to conveniently handle these in computer applications. We propose the usage of a temporal unit that is directly related to the graphical symbols in musical scores and pair this with a set of operations that cover typical computations in music applications. We formalize this time unit and the more commonly used approach in a single mathematical framework, as semirings, algebraic structures that enable an abstract description of algorithms/processing pipelines. We then discuss some practical use cases and highlight when our system can improve such pipelines by making them more efficient in terms of data type used and the number of computations.
Renze Lou, Kai Zhang, Jian Xie et al.
In the realm of large language models (LLMs), enhancing instruction-following capability often involves curating expansive training data. This is achieved through two primary schemes: i) Scaling-Inputs: Amplifying (input, output) pairs per task instruction, aiming for better instruction adherence. ii) Scaling Input-Free Tasks: Enlarging tasks, each composed of an (instruction, output) pair (without requiring a separate input anymore). However, LLMs under Scaling-Inputs tend to be overly sensitive to inputs, leading to misinterpretation or non-compliance with instructions. Conversely, Scaling Input-Free Tasks demands a substantial number of tasks but is less effective in instruction following when dealing with instances in Scaling-Inputs. This work introduces MUFFIN, a new scheme of instruction-following dataset curation. Specifically, we automatically Scale Tasks per Input by diversifying these tasks with various input facets. Experimental results across four zero-shot benchmarks, spanning both Scaling-Inputs and Scaling Input-Free Tasks schemes, reveal that LLMs, at various scales, trained on MUFFIN generally demonstrate superior instruction-following capabilities compared to those trained on the two aforementioned schemes.
Kharisma Indah Dwi Putri Herdian, Rina Maryanti
This study aims to educate Titi Laras Damina's learning in Kartika Senior High School students. The research was conducted on representatives of 10 grade-11 social studies students through 3 stages, namely (i) pre-test; (ii) theoretical and practical education using the Direct Instruction method; and (iii) post-test. The results showed that the average pretest theoretical value was 71 and the posttest average was 96.5. The results of the comparison of the pretest and posttest did not significantly increase after the learning was carried out. The results of the N-Gain calculation show that the average N-Gain value is 34% (N-Gain 40%) indicating that Titi Laras Damina's learning is theoretically ineffective for Senior High School students in grade 11 social studies. This is due to the comparison of pre-test scores. and the post-test was not so high and the students' knowledge was sufficient of Titi Laras theory. Practically there is a significant improvement because before learning students have not been able to mention the notes in each barrel. However, after learning using the direct instruction method there is an increase in musicality in students. Students can now say the notes in each barrel after it has been held. From this research, it is hoped that students can master the knowledge of Titi Laras Damina in theory and practice.
Monika Messner
This study analyzes the interplay of semiotic modes employed by a teacher and music students in a chamber music lesson for instructing, learning, and discussing. In particular, it describes how specific higher-level actions are accomplished through the mutual contextualization of talk and further audible and visible semiotic resources, such as gesture, gaze, material objects, vocalizing, and music. The focus lies on modal complexity, i.e., how different modes cohere to build action, and on modal intensity, i.e., the importance of specific modes related to their useful modal reaches. This study also attends to the linking and coherent coordination of interactional turns by the participants to achieve a mutual understanding of musical ideas and concepts. The rich multimodal texture of instructional, negotiation, and discussion actions in chamber music lessons stresses the role of multimodality and multimodal coherence in investigating music and pedagogy from an interactional perspective.
Bagus Wicaksono, Suharto Suharto
The purpose of this study was to identify and describe the music used as social-emotional therapy for mentally retarded children at YPAC Semarang and its application. This research uses a qualitative approach.Technique of data collection through observation, interview and documentation. The validity of the data was checked by triangulation, data sources, adequacy of references, and extension of participation. Data analysis was carried out using an interactive data analysis model which was taken through the process of data reduction, data presentation, and drawing conclusions/verification. The results showed that the music used as a medium for emotional social therapy was divided into two, namely music whose sound source came from the body and music whose sound source came from musical instruments. Both are divided into pitchless music and pitched music. The application of music as a medium of emotional social therapy is carried out through several stages, namely, In the first therapeutic process, students are not directly given material about music. The next stage of therapy refers to the psychomotor domain, and the next stage, after the students are considered capable, teacher begins to teach simple musical instruments to students.
One of the academic disciplines in the Colleges of Education Curriculum in Ghana that are structured to equip a trained teacher to fit properly at the Early Childhood Education Centers and the Basic Schools is Music and Dance. Due to its nature, it plays a dual role as a course of study and also serves as a form of entertainment during other school programmes where student music groups perform to grace the occasion. However, the study of music seems to be a bane among the students of Nusrat Jahan Ahmadiyya College of Education, Wa. They are ambivalent about receiving music instructions, probably, as a result of their religious and cultural inclination. Based on the theory of the perception and emotion of music, the author puts forward how Muslim and Christian students respond to music. Data were collected through interviews and participant observation. It is realized that Christian students embrace all forms of music but Muslim students frown on art music and the playing of Western musical instruments. They however welcome and join Christian students in the performance of traditional music and also enjoy recorded Ghanaian contemporary music. The discourse concludes that due to Muslim students’ perspectives of music, the formation and organization of music groups on campus has become burdensome.
Yui Suzukida
Adult second language (L2) learning often exhibits great variability in its rate and outcome. Although research shows that learning trajectories are partly shaped by social and contextual factors (e.g. Larson-Hall, 2008), certain learner factors play an important role in enhancing L2 pronunciation learning by helping L2 learners notice and process input efficiently, whereas certain learner factors may impede L2 pronunciation learning by impairing attention control or slowing down L2 input processing. Therefore, in order for language teachers to provide effective instruction and help students improve their L2 pronunciation proficiency, it is beneficial for them to understand the differential impact of learner characteristics on L2 learning and adapt such understanding to their instruction and learning activities. The aim of the current article is to provide a review of existing studies that have explored individual differences (IDs) in relation to L2 pronunciation acquisition and to present implications for effective L2 pronunciation teaching. The article begins with an introduction of the paradigm shift in L2 pronunciation research and the conceptual framework of IDs proposed by Dörnyei (2009). This is followed by a summary of the processes involved in L2 pronunciation learning. The third section focuses on the characteristics of four IDs that have been found to influence the development of L2 pronunciation. Those IDs include foreign language learning aptitude (e.g. Saito and Hanzawa, 2016), musical aptitude (e.g. Milovanov et al., 2010), L2 learning motivation (e.g. Moyer, 1999), and anxiety (e.g. Baran-Łucarz, 2016). Based on the discussion in the third section, the last section will offer various applications of IDs research findings to L2 pronunciation instruction (e.g. instructional approaches, feedback, and pronunciation syllabi) for successful L2 pronunciation teaching.
Shambhavi Sharma
This paper presents a comparative study on two classifiers created for speech emotion recognition. Perceiving a person’s feeling has consistently been an intriguing task for everyone. These feelings can be expressed through facial expressions, speech, actions, and so forth. The most widely used form of communication is through speech. Speech is an elaborated form of communication constituting various details. These details provide several information such as the abstract of the message, tone of the speaker, language used, background noise, any form of musical sound, emotions, etc. The significance of speech emotion recognition technology is getting mainstream with the advancement of "Voice User Interface" technology. This technology makes it possible for computers to interact with humans by applying speech analysis to understand the instructions given by a person and perform the required tasks and commands. There is always an emotion attached to a piece of speech while communicating but recognizing this emotion is a complex job in the research field. This is mainly because the way emotions are perceived from an audio differs from person to person. I have created two models for speech emotion recognition. I have used Mel Frequency Cepstral Coefficient (MFCC) for feature extraction from the audio files. The first model has been created using Multi-Layer Perceptron (MLP) classifier which gave an accuracy 57.29 percent. The second model was created Long Short-Term Memory (LSTM) and gave a good accuracy of 92.88. I have made use of RAVDESS dataset for classification purpose.
Halaman 35 dari 325211