Understanding shifts in creative work will help guide AI’s impact on the media ecosystem The capabilities of a new class of tools, colloquially known as generative artificial intelligence (AI), is a topic of much debate. One prominent application thus far is the production of high-quality artistic media for visual arts, concept art, music, and literature, as well as video and animation. For example, diffusion models can synthesize high-quality images (1), and large language models (LLMs) can produce sensible-sounding and impressive prose and verse in a wide range of contexts (2). The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of society. Understanding the impact of generative AI—and making policy decisions around it—requires new interdisciplinary scientific inquiry into culture, economics, law, algorithms, and the interaction of technology and creativity.
Multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization, and semantic scene classification. This article introduces the task of multi-label classification, organizes the sparse related literature into a structured presentation and performs comparative experimental results of certain multilabel classification methods. It also contributes the definition of concepts for the quantification of the multi-label nature of a data set.
Saadi Lahlou, Annabelle Gouttebroze, Atrina Oraee
et al.
We qualitatively compared literature reviews produced with varying degrees of AI assistance. The same LLM, given the same corpus of 280 papers but different selections, produced dramatically different reviews, from mainstream and politically neutral to critical and post-colonial, though neither orientation was intended. LLM outputs always appear at first glance to be well written, well informed and thought out, but closer reading reveals gaps, biases and lack of depth. Our comparison of six versions shows a series of pitfalls and suggests precautions necessary when using AI assistance to make a literature review. Main issues are: (1) The bias of ignorance (you do not know what you do not get) in the selection of relevant papers. (2) Alignment and digital sycophancy: commercial AI models slavishly take you further in the direction they understand you give them, reinforcing biases. (3) Mainstreaming: because of their statistical nature, LLM productions tend to favor mainstream perspectives and content; in our case there was only 20% overlap between paper selections by humans and the LLM. (4) Limited capacity for creative restructuring, with vague and ambiguous statements. (5) Lack of critical perspective, coming from distant reading and political correctness. Most pitfalls can be addressed by prompting, but only if the user knows the domain well enough to detect them. There is a paradox: producing a good AI-assisted review requires expertise that comes from reading the literature, which is precisely what AI was meant to reduce. Overall, AI can improve the span and quality of the review, but the gain of time is not as massive as one would expect, and a press-button strategy leaving AI to do the work is a recipe for disaster. We conclude with recommendations for those who write, or assess, such LLM-augmented reviews.
Live music provides a uniquely rich setting for studying creativity and interaction due to its spontaneous nature. The pursuit of live music agents--intelligent systems supporting real-time music performance and interaction--has captivated researchers across HCI, AI, and computer music for decades, and recent advancements in AI suggest unprecedented opportunities to evolve their design. However, the interdisciplinary nature of music has led to fragmented development across research communities, hindering effective communication and collaborative progress. In this work, we bring together perspectives from these diverse fields to map the current landscape of live music agents. Based on our analysis of 184 systems across both academic literature and video, we develop a comprehensive design space that categorizes dimensions spanning usage contexts, interactions, technologies, and ecosystems. By highlighting trends and gaps in live music agents, our design space offers researchers, designers, and musicians a structured lens to understand existing systems and shape future directions in real-time human-AI music co-creation. We release our annotated systems as a living artifact at https://live-music-agents.github.io.
Music captioning, or the task of generating a natural language description of music, is useful for both music understanding and controllable music generation. Training captioning models, however, typically requires high-quality music caption data which is scarce compared to metadata (e.g., genre, mood, etc.). As a result, it is common to use large language models (LLMs) to synthesize captions from metadata to generate training data for captioning models, though this process imposes a fixed stylization and entangles factual information with natural language style. As a more direct approach, we propose metadata-based captioning. We train a metadata prediction model to infer detailed music metadata from audio and then convert it into expressive captions via pre-trained LLMs at inference time. Compared to a strong end-to-end baseline trained on LLM-generated captions derived from metadata, our method: (1) achieves comparable performance in less training time over end-to-end captioners, (2) offers flexibility to easily change stylization post-training, enabling output captions to be tailored to specific stylistic and quality requirements, and (3) can be prompted with audio and partial metadata to enable powerful metadata imputation or in-filling--a common task for organizing music data.
Philip Hardie, Andrew Darley, Rosemarie Derwin
et al.
Abstract Background Generative Artificial Intelligence (Gen AI) is a type of artificial intelligence that can learn from and mimic large amounts of data to create content such as text, images, music, videos, code, and more, based on inputs or prompts. Gen AI technologies are being increasingly integrated into healthcare education, including the field of nursing, where they are utilised to support a range of pedagogical activities. Purpose This scoping review examined and described the application of Gen AI as a teaching, learning and assessment strategy in Nursing education and examined the ethical implications of and attitudes towards its implementation. Methods We conducted a scoping review using a combination of methodological approaches, including Arksey and O’Malley’s 5-step framework, the PRISMA-ScR guidelines, and JBI evidence synthesis methods and searched five databases: EMBASE (Elsevier), Web of Science Core (Clarivate), CINAHL & Medline (EBSCO), Applied Social Science Index and Abstracts, and ERIC (ProQuest). A wide search of grey literature was also conducted. Literature published in English between January 1st 2014, and July 1st 2025 was included in the review. Results Of the 1,251 articles retrieved, we identified 103 articles for inclusion in the review. There were 44 discussion/opinion/conference papers and 59 empirical research papers. Gen AI has predominantly been used for content creation simulation, personalised learning, tutoring, skill development and assessment. Students and Educators describe mixed attitudes towards the implementation of Gen AI, with several ethical concerns regarding the application of Gen AI in nursing education evident, including privacy, transparency, bias, and accountability issues. Conclusion While there is growing openness to Gen AI, a body of work remains regarding ethical and educational challenges. Recommendations for educational practice and curriculum development include a need for clear policies and guidelines to ensure the ethical use of Gen AI resources by educators and students. Further research is needed to understand long-term effects and promote responsible implementation within the context of nursing education.
Music performances are representative scenarios for audio-visual modeling. Unlike common scenarios with sparse audio, music performances continuously involve dense audio signals throughout. While existing multimodal learning methods on the audio-video QA demonstrate impressive capabilities in general scenarios, they are incapable of dealing with fundamental problems within the music performances: they underexplore the interaction between the multimodal signals in performance and fail to consider the distinctive characteristics of instruments and music. Therefore, existing methods tend to answer questions regarding musical performances inaccurately. To bridge the above research gaps, (i) given the intricate multimodal interconnectivity inherent to music data, our primary backbone is designed to incorporate multimodal interactions within the context of music; (ii) to enable the model to learn music characteristics, we annotate and release rhythmic and music sources in the current music datasets; (iii) for time-aware audio-visual modeling, we align the model's music predictions with the temporal dimension. Our experiments show state-of-the-art effects on the Music AVQA datasets. Our code is available at https://github.com/xid32/Amuse.
Mahtab Jamali, Paul Davidsson, Reza Khoshkangini
et al.
Context is an important factor in computer vision as it offers valuable information to clarify and analyze visual data. Utilizing the contextual information inherent in an image or a video can improve the precision and effectiveness of object detectors. For example, where recognizing an isolated object might be challenging, context information can improve comprehension of the scene. This study explores the impact of various context-based approaches to object detection. Initially, we investigate the role of context in object detection and survey it from several perspectives. We then review and discuss the most recent context-based object detection approaches and compare them. Finally, we conclude by addressing research questions and identifying gaps for further studies. More than 265 publications are included in this survey, covering different aspects of context in different categories of object detection, including general object detection, video object detection, small object detection, camouflaged object detection, zero-shot, one-shot, and few-shot object detection. This literature review presents a comprehensive overview of the latest advancements in context-based object detection, providing valuable contributions such as a thorough understanding of contextual information and effective methods for integrating various context types into object detection, thus benefiting researchers.
As a result of continuous advances in Music Information Retrieval (MIR) technology, generating and distributing music has become more diverse and accessible. In this context, interest in music intellectual property protection is increasing to safeguard individual music copyrights. In this work, we propose a system for detecting music plagiarism by combining various MIR technologies. We developed a music segment transcription system that extracts musically meaningful segments from audio recordings to detect plagiarism across different musical formats. With this system, we compute similarity scores based on multiple musical features that can be evaluated through comprehensive musical analysis. Our approach demonstrated promising results in music plagiarism detection experiments, and the proposed method can be applied to real-world music scenarios. We also collected a Similar Music Pair (SMP) dataset for musical similarity research using real-world cases. The dataset are publicly available.
Daniel Chenyu Lin, Michael Freeman, John Thickstun
Music language models (Music LMs), like vision language models, leverage multimodal representations to answer natural language queries about musical audio recordings. Although Music LMs are reportedly improving, we find that current evaluations fail to capture whether their answers are correct. Specifically, for all Music LMs that we examine, widely-used evaluation metrics such as BLEU, METEOR, and BERTScore fail to measure anything beyond linguistic fluency of the model's responses. To measure the true performance of Music LMs, we propose (1) a better general-purpose evaluation metric for Music LMs adapted to the music domain and (2) a factual evaluation framework to quantify the correctness of a Music LM's responses. Our framework is agnostic to the modality of the question-answering model and could be generalized to quantify performance in other open-ended question-answering domains. We use open datasets in our experiments and will release all code on publication.
Artykuł prezentuje zachowane szczątkowo informacje dotyczące dwóch synów kompozytora i muzyka Wacława Raszka (1764–1837). Starszy, Ludwik Wacław (1816–44), edukację muzyczną odbył prawdopodobnie w Warszawie. W latach trzydziestych XIX w. pracował jako chórzysta, pianista oraz napisał kilka tańców (druk jednego z nich się zachował). Jego rozpoczynającą się karierę przerwała choroba, w wyniku której zmarł w wieku zaledwie 28 lat. O związkach z muzyką młodszego syna, Adama (1824–43) wiadomo tylko, że reklamował swoje usługi jako stroiciel fortepianów. Te strzępki wiedzy pomagają nakreślić choć w zarysie kariery przedstawicieli młodszego pokolenia rodziny Raszków.
Harmonizing the elements of an artistic expression is a major factor that can always appear in the relationship between literary and pictorial language. Meanwhile, graphic art is of great significance when viewed in pictorial language by borrowing certain literature characteristics, such as forging a relationship between forms of expression and genres. Expression of forms refers to topics adopted from a variety of figures of speech such as imagery, metaphor, and metonymy, and the pictorial narrative is also interpretable in different genres, including epic, lyric, dramatic, and didactic themes, which seem to be one of the common grounds between literature and graphics. The present research aims to elucidate the features of graphics as a literary work and questions how verbal and pictorial forms are harmonized. Therefore, the descriptive-analytical approach is taken, and written sources and pictorial documents are randomly selected and qualitatively used to elucidate the common communicative grounds in between and in the form of film posters because forging a relationship between graphics and literature was evident in this specific genre of posters from narrative aspects.Moreover, in this context, the delivery of meaning in various genres and forms of expression heavily affects the elucidation of common components between them, as if graphic art begins to articulate as a literary work.The significance of this connection lies in understanding how a relationship is forged between literature and graphics through identifying the common devices, building on the right model between them, and clarifying how a literary text and a pictorial context are related. Thus, this research mainly discusses the fact that one of the shared viewpoints between graphics and literature and their common origin in creating an artistic expression can be found in the meaning-conveying language and various forms of expression. However, while the nature of literature and art and their relationship are studied from several perspectives in many artistic contexts such as cinema, music, and painting, this research focuses on the effectiveness of graphic art in the form of pictorial language with the use of certain features of literature, aiming to achieve a sustainable model to elucidate how to find the manifestations of literature in graphic art.This model provides a methodical structure and presents the aforementioned manifestations. Furthermore, as it is vital to clarify the relationship between literature and graphic art in terms of the structure of subject narration in literary genres and devices, the case study is exclusively focused on film posters because the significance of this fact is mostly verified in the methods of designing film posters as the best tools to depict the verbal narrative of events and themes, which can be interpreted from two verbal (literary) and pictorial (graphic) aspects. Therefore, the target pictures were randomly selected from the statistical population that included pictorial documents from "One Hundred + Five Years of Film Adverts and Film Posters in Iran" in a way that would allow a more ideal analysis of the materials and demonstrate the research objectives more accurately.
Although breastfeeding is extremely beneficial to the health of women and infants, breastfeeding rates are not at the desired levels. The literature includes medical and physical difficulties that can lead to early discontinuation of breastfeeding. However, studies examining the impact of women's emotional experiences on the breastfeeding process are rather limited. Dysphoric milk release reflex (DMER) is characterised by dysphoria that occurs during milk release and lasts for several minutes. Symptoms include sudden and unpleasant feelings of anxiety, sadness, irritability or panic. The exact cause of DMER is not known. Studies suggest that the sudden drop in dopamine at the start of lactation causes a short-term dopamine deficiency in women, which can lead to dysphoria. It is known that women experiencing DMER have a negative process towards breastfeeding due to the uncomfortable feelings, and some women may stop breastfeeding or feel compelled to continue breastfeeding because of this discomfort. Although there is no medically proven treatment, it has been suggested that various non-pharmacological methods such as distraction, lifestyle changes, music and aromatherapy may be effective. As DMER has only recently been recognised, the literature is limited. The aim of this review is to present the current literature on DMER.
Recent AI-driven step-function advances in several longstanding problems in music technology are opening up new avenues to create the next generation of music education tools. Creating personalized, engaging, and effective learning experiences are continuously evolving challenges in music education. Here we present two case studies using such advances in music technology to address these challenges. In our first case study we showcase an application that uses Automatic Chord Recognition to generate personalized exercises from audio tracks, connecting traditional ear training with real-world musical contexts. In the second case study we prototype adaptive piano method books that use Automatic Music Transcription to generate exercises at different skill levels while retaining a close connection to musical interests. These applications demonstrate how recent AI developments can democratize access to high-quality music education and promote rich interaction with music in the age of generative AI. We hope this work inspires other efforts in the community, aimed at removing barriers to access to high-quality music education and fostering human participation in musical expression.
WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya
et al.
We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across various downstream tasks including both understanding and generative tasks. We specifically evaluated this approach on representative tasks such as music tagging, music transcription, music source separation, and music mixing. Our results reveal that the features extracted from foundation models provide valuable enhancements in training downstream task models. This highlights the capability of using features extracted from music foundation models as a booster for downstream tasks. Our approach not only benefits existing task-specific models but also supports music downstream tasks constrained by data scarcity. This paves the way for more effective and accessible music processing solutions.
We present the first version of DDMD (Digital Drug Music Detector), a binary classifier that distinguishes digital drug music from normal music. In the literature, digital drug music is primarily explored regarding its psychological, neurological, or social impact. However, despite numerous studies on using machine learning in Music Information Retrieval (MIR), including music genre classification, digital drug music has not been considered in this field. In this study, we initially collected a dataset of 3,176 audio files divided into two classes (1,676 digital drugs and 1,500 non-digital drugs). We extracted machine learning features, including MFCCs, chroma, spectral contrast, and frequency analysis metrics (mean and standard deviation of detected frequencies). Using a Random Forest classifier, we achieved an accuracy of 93%. Finally, we developed a web application to deploy the model, enabling end users to detect digital drug music.
Contrary to the phenomena predicted by advertising research relevant to the elaboration likelihood model, content ads emphasizing non-message elements possess the potential to attract consumers with high elaboration likelihood. This research aims to address the following two questions: (1) why do such consumers develop favorable attitudes toward content ads depicting non-message elements? and (2) do their attitudes toward content ads result in product consumption? In proposing the hypotheses, the current research introduces the novel concept of content reproducibility, defined as the degree to which an ad authentically reflects the original product content. Studies 1 and 2 demonstrate that when the elaboration likelihood is high, consumers form more favorable attitudes toward ads with content reproducibility than toward those without it. This is because elements reproducing content (e.g., characters, storylines, settings, colors, and music) in an ad can serve as arguments. Study 2 indicates that when the elaboration likelihood is high, content reproducibility positively influences purchasing intentions. Our findings theoretically contribute to the literature on the elaboration likelihood model, the content business, and the marketing mix. Additionally, this research offers managerial insights to assist marketers in enhancing ad attitudes and purchasing intentions.