Hasil untuk "eess.AS"

Menampilkan 20 dari ~72 hasil · dari CrossRef, arXiv

JSON API
arXiv Open Access 2025
LIWhiz: A Non-Intrusive Lyric Intelligibility Prediction System for the Cadenza Challenge

Ram C. M. C. Shekar, Iván López-Espejo

We present LIWhiz, a non-intrusive lyric intelligibility prediction system submitted to the ICASSP 2026 Cadenza Challenge. LIWhiz leverages Whisper for robust feature extraction and a trainable back-end for score prediction. Tested on the Cadenza Lyric Intelligibility Prediction (CLIP) evaluation set, LIWhiz achieves a root mean square error (RMSE) of 27.07%, a 22.4% relative RMSE reduction over the STOI-based baseline, yielding a substantial improvement in normalized cross-correlation.

en eess.AS, cs.SD
arXiv Open Access 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic Data

Łukasz Bondaruk, Jakub Kubiak, Mateusz Czyżnikiewicz

This paper presents a system developed for submission to Poleval 2024, Task 3: Polish Automatic Speech Recognition Challenge. We describe Voicebox-based speech synthesis pipeline and utilize it to augment Conformer and Whisper speech recognition models with synthetic data. We show that addition of synthetic speech to training improves achieved results significantly. We also present final results achieved by our models in the competition.

en eess.AS, cs.SD
arXiv Open Access 2024
Reduce Computational Complexity for Continuous Wavelet Transform in Acoustic Recognition Using Hop Size

Dang Thoai Phan

In recent years, the continuous wavelet transform (CWT) has been employed as a spectral feature extractor for acoustic recognition tasks in conjunction with machine learning and deep learning models. However, applying the CWT to each individual audio sample is computationally intensive. This paper proposes an approach that applies the CWT to a subset of samples, spaced according to a specified hop size. Experimental results demonstrate that this method significantly reduces computational costs while maintaining the robust performance of the trained models.

en eess.AS, eess.SP
CrossRef Open Access 2023
Analisis Kualitas Layanan Terhadap Penggunaan Aplikasi Blended Learning Menggunakan Model EESS

Elsia Miranda Mildad Tatumang, Ahmatang Ahmatang


 
 
 
 Research aim : This study aims to test the Quality of Moodle Borneo E-Learning (BEL) which includes Service Quality, Student Quality and Lecturer Quality on Perceived Satisfaction, Perceived Usefulness and Benefits based on the E-learning System Success Evaluating Model for active BEL users
 Design/Methode/Approach : This research uses a quantitative approach with non-probability sampling method and the technique used is Quota Sampling. To determine the number of samples, the hair formula was used which determined that the sample consisted of 280 University of Borneo students who had used BEL. The data analysis method used is SEM (Structural Equation Modeling) with the help of the SmartPLS program.
 Research Finding : The results showed that the variables of Service Quality, Student Quality and Lecturer Quality had a positive and significant effect on satisfaction and usability. And the Satisfaction and Usefulness Variables also have a positive and significant effect on benefits.
 Theoretical contribution/Originality : It is hoped that this research can provide insight and knowledge as well as provide information to researchers and academics regarding the analysis of the quality of BEL application service quality using the EESS model.
 Practitionel/Policy implication : The results of this study are used as input for the Borneo Tarakan University's LP3M, so that in the future it can improve the quality and quality of BEL so that in the future students will be more comfortable doing online learning with BEL.
 Research limitation : In this study it only focuses on evaluating the quality of using the BEL application but only looks at it from the perspective of students at the University of Borneo Tarakan. And also focuses on the EESS conceptual model which includes only social factors, namely Service Quality, Learner Quality, Instructor Quality.
  
 
 
 
 
 sitive and significant effect on benefits

arXiv Open Access 2023
ASPED: An Audio Dataset for Detecting Pedestrians

Pavan Seshadri, Chaeyeon Han, Bon-Woo Koo et al.

We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.

en eess.AS, cs.SD
arXiv Open Access 2023
InstrumentGen: Generating Sample-Based Musical Instruments From Text

Shahan Nercessian, Johannes Imort

We introduce the text-to-instrument task, which aims at generating sample-based musical instruments based on textual prompts. Accordingly, we propose InstrumentGen, a model that extends a text-prompted generative audio framework to condition on instrument family, source type, pitch (across an 88-key spectrum), velocity, and a joint text/audio embedding. Furthermore, we present a differentiable loss function to evaluate the intra-instrument timbral consistency of sample-based instruments. Our results establish a foundational text-to-instrument baseline, extending research in the domain of automatic sample-based instrument generation.

en eess.AS, cs.LG
arXiv Open Access 2023
Spatial sampling and beamforming for spherical microphone arrays

Boaz Rafaely

Spherical microphone arrays have been recently studied for spatial sound recording, speech communication, and sound field analysis for room acoustics and noise control. Complementary theoretical studies presented progress in spatial sampling and beamforming methods. This paper reviews recent results in spatial sampling that facilitate a wide range of spherical array configurations, from a single rigid sphere to free positioning of microphones. The paper then presents an overview of beamforming methods recently presented for spherical arrays, from the widely used delay-and-sum and Dolph-Chebyshev, to the more advanced optimal methods, typically performed in the spherical harmonics domain.

arXiv Open Access 2023
An investigation into the adaptability of a diffusion-based TTS model

Haolin Chen, Philip N. Garner

Given the recent success of diffusion in producing natural-sounding synthetic speech, we investigate how diffusion can be used in speaker adaptive TTS. Taking cues from more traditional adaptation approaches, we show that adaptation can be included in a diffusion pipeline using conditional layer normalization with a step embedding. However, we show experimentally that, whilst the approach has merit, such adaptation alone cannot approach the performance of Transformer-based techniques. In a second experiment, we show that diffusion can be optimally combined with Transformer, with the latter taking the bulk of the adaptation load and the former contributing to improved naturalness.

en eess.AS, cs.SD
arXiv Open Access 2022
A light-weight full-band speech enhancement model

Qinwen Hu, Zhongshu Hou, Xiaohuai Le et al.

Deep neural network based full-band speech enhancement systems face challenges of high demand of computational resources and imbalanced frequency distribution. In this paper, a light-weight full-band model is proposed with two dedicated strategies, i.e., a learnable spectral compression mapping for more effective high-band spectral information compression, and the utilization of the multi-head attention mechanism for more effective modeling of the global spectral pattern. Experiments validate the efficacy of the proposed strategies and show that the proposed model achieves competitive performance with only 0.89M parameters.

en eess.AS, cs.SD
arXiv Open Access 2022
diaLogic: Non-Invasive Speaker-Focused Data Acquisition for Team Behavior Modeling

Ryan Duke, Alex Doboli

This paper presents diaLogic system, a Human-In-A-Loop system for modeling the behavior of teams during solving open-ended problems. Team behavior is modeled through the hypotheses extracted from features computed from acquired voice data. These features include speaker interactions, speaker emotions, fundamental frequencies, and the corresponding text and clauses. Hypotheses about the invariant and differentiated situations are found based on the similarities and dissimilarities of the behavior of teams over time. To provide full automation of data acquisition, the diaLogic system is executed within an intuitive, user-friendly GUI interface. Experiments present the performance of the system for a broad set of cases featuring team behavior during problem solving.

en eess.AS
arXiv Open Access 2022
SG-VAD: Stochastic Gates Based Speech Activity Detection

Jonathan Svirsky, Ofir Lindenbaum

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad.

en eess.AS, cs.SD
arXiv Open Access 2022
Isolation performance metrics for personal sound zone reproduction systems

Yue Qiao, Léo Guadagnin, Edgar Choueiri

Two isolation performance metrics, Inter-Zone Isolation (IZI) and Inter-Program Isolation (IPI), are introduced for evaluating Personal Sound Zone (PSZ) systems. Compared to the commonly-used Acoustic Contrast metric, IZI and IPI are generalized for multichannel audio, and quantify the isolation of sound zones and of audio programs, respectively. The two metrics are shown to be generally non-interchangeable and suitable for different scenarios, such as generating dark zones (IZI) or minimizing audio-on-audio interference (IPI). Furthermore, two examples with free-field simulations are presented and demonstrate the applications of IZI and IPI in evaluating PSZ performance in different rendering modes and PSZ robustness.

arXiv Open Access 2021
Generalized Time Domain Velocity Vector

Srđan Kitić, Jérôme Daniel

We introduce and analyze Generalized Time Domain Velocity Vector (GTVV), an extension of the previously presented acoustic multipath footprint extracted from the Ambisonic recordings. GTVV is better adapted to adverse acoustic conditions, and enables efficient parameter estimation of multiple plane wave components in the recorded multichannel mixture. Experiments on simulated data confirm the predicted theoretical advantages of these new spatio-temporal features.

en eess.AS, cs.SD
arXiv Open Access 2021
A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

Ge Zhu, Frank Cwitkowitz, Zhiyao Duan

In this paper, we conduct a cross-dataset study on parametric and non-parametric raw-waveform based speaker embeddings through speaker verification experiments. In general, we observe a more significant performance degradation of these raw-waveform systems compared to spectral based systems. We then propose two strategies to improve the performance of raw-waveform based systems on cross-dataset tests. The first strategy is to change the real-valued filters into analytic filters to ensure shift-invariance. The second strategy is to apply variational dropout to non-parametric filters to prevent them from overfitting irrelevant nuance features.

en eess.AS, cs.SD
arXiv Open Access 2021
Modelling of the Fender Bassman 5F6-A Tone Stack

Steven Fenton

This paper outlines the procedure for the effective modelling of a complex analogue filter circuit. The Fender Bassman 5F6-A is a circuit commonly employed in guitar amplifiers to shape the tonal characteristics of the amplifier output. On first inspection this circuit may look rather simple, however the controls are not orthogonal, resulting in complicated filter coefficients as the controls are varied. This in turn can make the circuit difficult to analyse without the use of mathematical emulation tools such as PSPICE or MATLAB. First the circuit is described, a method of analysis is proposed and general expressions for continuous-time coefficients are given. A MATLAB model is then produced and the frequency responses of which are shown.

en eess.AS, cs.SD
arXiv Open Access 2020
Source coding of audio signals with a generative model

Roy Fejgin, Janusz Klejsa, Lars Villemoes et al.

We consider source coding of audio signals with the help of a generative model. We use a construction where a waveform is first quantized, yielding a finite bitrate representation. The waveform is then reconstructed by random sampling from a model conditioned on the quantized waveform. The proposed coding scheme is theoretically analyzed. Using SampleRNN as the generative model, we demonstrate that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.

en eess.AS

Halaman 2 dari 4