Hasil untuk "cs.SD"

Menampilkan 20 dari ~139106 hasil · dari arXiv, CrossRef

JSON API
arXiv Open Access 2025
Privacy-Aware Ambient Audio Sensing for Healthy Indoor Spaces

Bhawana Chhaglani

Indoor airborne transmission poses a significant health risk, yet current monitoring solutions are invasive, costly, or fail to address it directly. My research explores the untapped potential of ambient audio sensing to estimate key transmission risk factors such as ventilation, aerosol emissions, and occupant distribution non-invasively and in real time. I develop privacy-preserving systems that leverage existing microphones to monitor the whole spectrum of indoor air quality which can have a significant effect on an individual's health. This work lays the foundation for privacy-aware airborne risk monitoring using everyday devices.

arXiv Open Access 2024
Towards the Synthesis of Non-speech Vocalizations

Enjamamul Hoq, Ifeoma Nwogu

In this report, we focus on the unconditional generation of infant cry sounds using the DiffWave framework, which has shown great promise in generating high-quality audio from noise. We use two distinct datasets of infant cries: the Baby Chillanto and the deBarbaro cry dataset. These datasets are used to train the DiffWave model to generate new cry sounds that maintain high fidelity and diversity. The focus here is on DiffWave's capability to handle the unconditional generation task.

en cs.SD, cs.LG
arXiv Open Access 2024
Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

Mohamed F. Mansour

We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation. The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition of the observed sound field to estimate the line-of-sight direction under noisy and reverberant conditions. The effectiveness of the approach is established with measured data of different microphone array configurations under various usage scenarios.

en cs.SD, cs.LG
arXiv Open Access 2023
Adjoint-Based Identification of Sound Sources for Sound Reinforcement and Source Localization

Mathias Lemke, Lewin Stein

The identification of sound sources is a common problem in acoustics. Different parameters are sought, among these are signal and position of the sources. We present an adjoint-based approach for sound source identification, which employs computational aeroacoustic techniques. Two different applications are presented as a proof-of-concept: optimization of a sound reinforcement setup and the localization of (moving) sound sources.

en cs.SD, eess.AS
arXiv Open Access 2023
Better speech synthesis through scaling

James Betker

In recent years, the field of image generation has been revolutionized by the application of autoregressive transformers and DDPMs. These approaches model the process of image generation as a step-wise probabilistic processes and leverage large amounts of compute and data to learn the image distribution. This methodology of improving performance need not be confined to images. This paper describes a way to apply advances in the image generative domain to speech synthesis. The result is TorToise -- an expressive, multi-voice text-to-speech system. All model code and trained weights have been open-sourced at https://github.com/neonbjb/tortoise-tts.

en cs.SD, cs.CL
arXiv Open Access 2023
Modulation Graphs in Popular Music

Jason I. Brown, Ian George

In this paper, graph theory is used to explore the musical notion of tonal modulation, in theory and application. We define (pivot) modulation graphs based on the common scales used in popular music. Properties and parameters of these graphs are discussed. We also investigate modulation graphs for the canon of Lennon-McCartney songs in the works of The Beatles. Our approach may provide composers with mathematical insights into pivot modulation.

en cs.SD
arXiv Open Access 2022
Binaural Audio Rendering in the Spherical Harmonic Domain: A Summary of the Mathematics and its Pitfalls

Jens Ahrens

The present document reviews the mathematics behind binaural rendering of sound fields that are available as spherical harmonic expansion coefficients. This process is also known as binaural ambisonic decoding. We highlight that the details entail some amount peculiarity so that one has to be well aware of the precise definitions that are chosen for some of the involved quantities to obtain a consistent formulation. We also discuss what sets of definitions produce ambisonic signals that are compatible with the most common software tools that are available.

en cs.SD, eess.AS
arXiv Open Access 2020
Melody Classification based on Performance Event Vector and BRNN

Jinyue Guo, Aozhi Liu, Jing Xiao

We proposed a model for the Conference of Music and Technology (CSMT2020) data challenge of melody classification. Our model used the Performance Event Vector as the input sequence to build a Bidirectional RNN network for classfication. The model achieved a satisfying performance on the development dataset and Wikifonia dataset. We also discussed the effect of several hyper-parameters, and created multiple prediction outputs for the evaluation dataset.

en cs.SD, cs.IR
arXiv Open Access 2020
Data-driven audio recognition: a supervised dictionary approach

Imad Rida

Machine hearing is an emerging area. Motivated by the need of a principled framework across domain applications for machine listening, we propose a generic and data-driven representation learning approach. For this sake, a novel and efficient supervised dictionary learning method is presented. Experiments are performed on both computational auditory scene (East Anglia and Rouen) and synthetic music chord recognition datasets. Obtained results show that our method is capable to reach state-of-the-art hand-crafted features for both applications

en cs.SD, eess.AS
arXiv Open Access 2019
The Sounds of Music : Science of Musical Scales III -- Indian Classical

Sushan Konar

In the previous articles of this series, we have discussed the development of musical scales particularly that of the heptatonic scale which forms the basis of Western classical music today. In this last article, we take a look at the basic structure of scales used in Indian classical music and how different `raga's are generated through the simple process of scale shifting.

en cs.SD, eess.AS
arXiv Open Access 2019
Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics

Thomas Drugman, Abeer Alwan

This paper focuses on the problem of pitch tracking in noisy conditions. A method using harmonic information in the residual signal is presented. The proposed criterion is used both for pitch estimation, as well as for determining the voicing segments of speech. In the experiments, the method is compared to six state-of-the-art pitch trackers on the Keele and CSTR databases. The proposed technique is shown to be particularly robust to additive noise, leading to a significant improvement in adverse conditions.

en cs.SD, cs.CL
arXiv Open Access 2018
Vocal melody extraction using patch-based CNN

Li Su

A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.

en cs.SD, eess.AS
arXiv Open Access 2018
Generating Music using an LSTM Network

Nikhil Kotecha, Paul Young

A model of music needs to have the ability to recall past details and have a clear, coherent understanding of musical structure. Detailed in the paper is a neural network architecture that predicts and generates polyphonic music aligned with musical rules. The probabilistic model presented is a Bi-axial LSTM trained with a kernel reminiscent of a convolutional kernel. When analyzed quantitatively and qualitatively, this approach performs well in composing polyphonic music. Link to the code is provided.

en cs.SD, cs.LG
arXiv Open Access 2017
Understanding MIDI: A Painless Tutorial on Midi Format

H. M. de Oliveira, R. C. de Oliveira

A short overview demystifying the midi audio format is presented. The goal is to explain the file structure and how the instructions are used to produce a music signal, both in the case of monophonic signals as for polyphonic signals.

en cs.SD, eess.AS
arXiv Open Access 2017
Talking Condition Identification Using Second-Order Hidden Markov Models

Ismail Shahin

This work focuses on enhancing the performance of text-dependent and speaker-dependent talking condition identification systems using second-order hidden Markov models (HMM2s). Our results show that the talking condition identification performance based on HMM2s has been improved significantly compared to first-order hidden Markov models (HMM1s). Our talking conditions in this work are neutral, shouted, loud, angry, happy, and fear.

en cs.SD
arXiv Open Access 2017
Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries

Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee

In this paper we analyze the gate activation signals inside the gated recurrent neural networks, and find the temporal structure of such signals is highly correlated with the phoneme boundaries. This correlation is further verified by a set of experiments for phoneme segmentation, in which better results compared to standard approaches were obtained.

en cs.SD, cs.CL
arXiv Open Access 2016
Guitar Solos as Networks

Stefano Ferretti

This paper presents an approach to model melodies (and music pieces in general) as networks. Notes of a melody can be seen as nodes of a network that are connected whenever these are played in sequence. This creates a directed graph. By using complex network theory, it is possible to extract some main metrics, typical of networks, that characterize the piece. Using this framework, we provide an analysis on a set of guitar solos performed by main musicians. The results of this study indicate that this model can have an impact on multimedia applications such as music classification, identification, and automatic music generation.

arXiv Open Access 2016
Getting Closer to the Essence of Music: The Con Espressione Manifesto

Gerhard Widmer

This text offers a personal and very subjective view on the current situation of Music Information Research (MIR). Motivated by the desire to build systems with a somewhat deeper understanding of music than the ones we currently have, I try to sketch a number of challenges for the next decade of MIR research, grouped around six simple truths about music that are probably generally agreed on, but often ignored in everyday research.

arXiv Open Access 2013
Deep Scattering Spectrum

Joakim Andén, Stéphane Mallat

A scattering transform defines a locally translation invariant representation which is stable to time-warping deformations. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation. A frequency transposition invariant representation is obtained by applying a scattering transform along log-frequency. State-the-of-art classification results are obtained for musical genre and phone classification on GTZAN and TIMIT databases, respectively.

en cs.SD, cs.IT
arXiv Open Access 2013
A Simple Method to Produce Algorithmic MIDI Music based on Randomness, Simple Probabilities and Multi-Threading

Yannis Tzitzikas

This paper introduces a simple method for producing multichannel MIDI music that is based on randomness and simple probabilities. One distinctive feature of the method is that it produces and sends in parallel to the sound card more than one unsynchronized channels by exploiting the multi-threading capabilities of general purpose programming languages. As consequence the derived sound offers a quite ``full" and ``unpredictable" acoustic experience to the listener. Subsequently the paper reports the results of an evaluation with users. The results were very surprising: the majority of users responded that they could tolerate this music in various occasions.

en cs.SD

Halaman 3 dari 6956