Hasil "Acoustics. Sound"

DOAJ Open Access 2025

Comparative study on physiological and metabolic responses of crayfish to ozone micro-nano bubbles and ultrasound washing

Xinyan Tong, Lingxiang Bao, Hanbin Lin et al.

Maintaining the well-being of living aquatic products before processing is crucial to guarantee best food quality. This study investigated the different effects of ozone micro-nano bubbles washing (OMW) and ultrasound washing on the well-being of living crayfish, one of the most economically valuable freshwater aquatic products in China. It has been found that OMW was a gentler and more effective alternative to traditional ultrasonic washing (UW) that reduced physical damage, thus leading to enhanced living vitality, increased intestinal content evacuation rate, and improved surface cleanliness of crayfish before processed as aquatic food. These improvements were attributed to the milder physical force generated by bubbles contraction and collapse than the cavitation and mechanical effects of UW. In addition, the strong cavitation, mechanical, and thermal effects of UW impaired the antioxidant system of crayfish, leading to the severe hepatopancreas damage as indicated by the substantially elevated activities of alanine aminotransferase and aspartate aminotransferase. Moreover, OMW significantly lowered the lactate dehydrogenase activity and reduced lactic acid accumulation due to decreased oxidative stress, thereby preventing acidosis in crayfish. These findings demonstrated that OMW was a milder yet more effective approach to improve the well-being of living aquatic products, thus maintaining their vitality and quality prior to processing into foods, suggesting its great potential as a pretreatment technology for applications in the aquatic food industry.

Chemistry, Acoustics. Sound

Detail DOI Sumber

DOAJ Open Access 2025

Sodium carboxymethyl cellulose coating pretreatment combined with multi-frequency ultrasound assisted vacuum far-infrared drying: An emerging approach to enhance drying characteristics, physicochemical properties, and sensory attributes of Cornus officinalis

Zepeng Zang, Xiaopeng Huang, Guojun Ma et al.

To enhance the drying efficiency and improve the sensory quality of Cornus officinalis, this study investigated the effects of sodium carboxymethyl cellulose (CMC-Na) coating combined with multi-frequency ultrasound assisted vacuum far-infrared (MFUS-VFIR) drying on its drying characteristics, physicochemical properties, and sensory attributes. Application of multi-frequency ultrasound (MFUS) during VFIR dehydration shortened the drying time by 12.12–39.39 % and increased the average drying rate by 15.38–69.23 % compared with VFIR alone. Physicochemical analyses revealed that the (MFUS-VFIR)-20/28/40 kHz treatment yielded dried products with higher retention of total phenolics, natural bioactive compounds, organic acids, total carotenoids, ascorbic acid, soluble solids, and total flavonoids, along with superior color quality. Under these conditions, antioxidant capacity increased by 10.89–23.68 %, 14.41–25.91 %, and 7.10–58.32 %, respectively, relative to dual-frequency ultrasound treatments. Scanning electron microscopy showed that MFUS treatment produced distinct honeycomb-like pores with larger apertures compared with SFUS, indicating reduced surface cracking and expanded micro-channels for mass transfer, thereby lowering mass transfer resistance. The overall sensory acceptability of (MFUS-VFIR)-20/28/40 kHz dried products reached 8.50, representing a 41.67 % and 13.33–30.77 % improvement over VFIR and SFUS-VFIR samples (P < 0.05), with lower bitterness and off-flavor scores. Principal component analysis (PCA), hierarchical cluster analysis (HCA), and correlation network heat mapping revealed that MFUS-treated samples clustered closely in multidimensional quality space and exhibited significant positive correlations with antioxidant activity, physicochemical quality, and flavor retention. Notably, the energy consumption of (MFUS-VFIR)-20/28/40 kHz treatment was 88.68 kW·h·kg−1, slightly higher than that of the control and SFUS-VFIR treatments. These findings provide a scientific basis and technical reference for quality optimization, energy-efficient drying, and high-value utilization of Cornus officinalis.

Chemistry, Acoustics. Sound

Detail DOI Sumber

arXiv Open Access 2025

Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism

Nouhaila Fraihi, Ouassim Karrakchou, Mounir Ghogho

Accurate classification of respiratory sounds requires deep learning models that effectively capture fine-grained acoustic features and long-range temporal dependencies. Convolutional Neural Networks (CNNs) are well-suited for extracting local time-frequency patterns but are limited in modeling global context. In contrast, transformer-based models can capture long-range dependencies, albeit with higher computational demands. To address these limitations, we propose a compact CNN-Temporal Self-Attention (CNN-TSA) network that integrates lightweight self-attention into an efficient CNN backbone. Central to our approach is a Frequency Band Selection (FBS) module that suppresses noisy and non-informative frequency regions, substantially improving accuracy and reducing FLOPs by up to 50%. We also introduce age-specific models to enhance robustness across diverse patient groups. Evaluated on the SPRSound-2022/2023 and ICBHI-2017 lung sound datasets, CNN-TSA with FBS sets new benchmarks on SPRSound and achieves state-of-the-art performance on ICBHI, all with a significantly smaller computational footprint. Furthermore, integrating FBS into an existing transformer baseline yields a new record on ICBHI, confirming FBS as an effective drop-in enhancement. These results demonstrate that our framework enables reliable, real-time respiratory sound analysis suitable for deployment in resource-constrained settings.

en cs.SD, cs.LG

Detail Sumber

arXiv Open Access 2025

Passive Acoustic Monitoring of Noisy Coral Reefs

Hari Vishnu, Yuen Min Too, Mandar Chitre et al.

Passive acoustic monitoring offers the potential to enable long-term, spatially extensive assessments of coral reefs. To explore this approach, we deployed underwater acoustic recorders at ten coral reef sites around Singapore waters over two years. To mitigate the persistent biological noise masking the low-frequency reef soundscape, we trained a convolutional neural network denoiser. Analysis of the acoustic data reveals distinct morning and evening choruses. Though the correlation with environmental variates was obscured in the low-frequency part of the noisy recordings, the denoised data showed correlations of acoustic activity indices such as sound pressure level and acoustic complexity index with diver-based assessments of reef health such as live coral richness and cover, and algal cover. Furthermore, the shrimp snap rate, computed from the high-frequency acoustic band, is robustly correlated with the reef parameters, both temporally and spatially. This study demonstrates that passive acoustics holds valuable information that can help with reef monitoring, provided the data is effectively denoised and interpreted. This methodology can be extended to other marine environments where acoustic monitoring is hindered by persistent noise.

en cs.SD

Detail Sumber

arXiv Open Access 2025

SonicRadiation: A Hybrid Numerical Solution for Sound Radiation without Ghost Cells

Xutong Jin, Guoping Wang, Sheng Li

Interactive synthesis of physical sound effects is crucial in digital media production. Sound radiation simulation, a key component of physically based sound synthesis, has posed challenges in the context of complex object boundaries. Previous methods, such as ghost cell-based finite-difference time-domain (FDTD) wave solver, have struggled to address these challenges, leading to large errors and failures in complex boundaries because of the limitation of ghost cells. We present SonicRadiation, a hybrid numerical solution capable of handling complex and dynamic object boundaries in sound radiation simulation without relying on ghost cells. We derive a consistent formulation to connect the physical quantities on grid cells in FDTD with the boundary elements in the time-domain boundary element method (TDBEM). Hereby, we propose a boundary grid synchronization strategy to seamlessly integrate TDBEM with FDTD while maintaining high numerical accuracy. Our method holds both advantages from the accuracy of TDBEM for the near-field and the efficiency of FDTD for the far-field. Experimental results demonstrate the superiority of our method in sound radiation simulation over previous approaches in terms of accuracy and efficiency, particularly in complex scenes, further validating its effectiveness.

en cs.SD, cs.GR

Detail Sumber

arXiv Open Access 2025

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology

Rinka Nobukawa, Makito Kitamura, Tomohiko Nakamura et al.

This paper defines the novel task of drum-to-vocal percussion (VP) sound conversion. VP imitates percussion instruments through human vocalization and is frequently employed in contemporary a cappella music. It exhibits acoustic properties distinct from speech and singing (e.g., aperiodicity, noisy transients, and the absence of linguistic structure), making conventional speech or singing synthesis methods unsuitable. We thus formulate VP synthesis as a timbre transfer problem from drum sounds, leveraging their rhythmic and timbral correspondence. To support this formulation, we define three requirements for successful conversion: rhythmic fidelity, timbral consistency, and naturalness as VP. We also propose corresponding subjective evaluation criteria. We implement two baseline conversion methods using a neural audio synthesizer, the real-time audio variational autoencoder (RAVE), with and without vector quantization (VQ). Subjective experiments show that both methods produce plausible VP outputs, with the VQ-based RAVE model yielding more consistent conversion.

en cs.SD, eess.AS

Detail Sumber

DOAJ Open Access 2024

Ultrasound assisted extraction and liposome encapsulation of olive leaves and orange peels: How to transform biomass waste into valuable resources with antimicrobial activity

Giuliana Prevete, Loïc G. Carvalho, Maria del Carmen Razola-Diaz et al.

Every year million tons of by-products and waste from olive and orange processing are produced by agri-food industries, thus triggering environmental and economic problems worldwide. From the perspective of a circular economy model, olive leaves and orange peels can be valorized in valuable products due to the presence of bioactive compounds such as polyphenols exhibiting beneficial effects on human health.The aqueous extracts of olive leaves and orange peels rich in phenolic compounds were prepared by ultrasound-assisted extraction. Both extracts were characterized in terms of yield of extraction, total phenolic content and antioxidant capacity; the polyphenolic profiles were deeper investigated by HPLC-MS analysis.Each extract was included in liposomes composed by a natural phospholipid, 1,2-dioleoyl-sn-glycero-3-phosphocholine, and cholesterol prepared according to the thin-layer evaporation method coupled with a sonication process.The antimicrobial activity of the extracts, free and loaded in liposomes, was investigated according to the broth macrodilution method against different strains of potential bacterial pathogenic species: Staphylococcus aureus (NCIMB 9518), Bacillus subtilis (ATCC 6051) and Enterococcus faecalis (NCIMB 775) as Gram-positive, while Escherichia coli (NCIMB 13302), Pseudomonas aeruginosa (NCIMB 9904) and Klebsiella oxytoca (NCIMB 12259) as Gram-negative.The encapsulation of olive leaves extract in liposomes enhanced its antibacterial activity against S. aureus by an order of magnitude.

Chemistry, Acoustics. Sound

Detail DOI Sumber

arXiv Open Access 2024

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

Iran R. Roman, Christopher Ick, Sivan Ding et al.

Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present SpatialScaper, a library for SELD data simulation and augmentation. Compared to existing tools, SpatialScaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (including movement) of foreground and background sound sources. SpatialScaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use SpatialScaper to add rooms to the DCASE SELD data. Training a model with our data led to progressive performance improves as a direct function of acoustic diversity. These results show that SpatialScaper is valuable to train robust SELD models.

en eess.AS, cs.LG

Detail Sumber

arXiv Open Access 2024

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

Generating sound effects for videos often requires creating artistic sound effects that diverge significantly from real-life sources and flexible control in the sound design. To address this problem, we introduce MultiFoley, a model designed for video-guided sound generation that supports multimodal conditioning through text, audio, and video. Given a silent video and a text prompt, MultiFoley allows users to create clean sounds (e.g., skateboard wheels spinning without wind noise) or more whimsical sounds (e.g., making a lion's roar sound like a cat's meow). MultiFoley also allows users to choose reference audio from sound effects (SFX) libraries or partial videos for conditioning. A key novelty of our model lies in its joint training on both internet video datasets with low-quality audio and professional SFX recordings, enabling high-quality, full-bandwidth (48kHz) audio generation. Through automated evaluations and human studies, we demonstrate that MultiFoley successfully generates synchronized high-quality sounds across varied conditional inputs and outperforms existing methods. Please see our project page for video results: https://ificl.github.io/MultiFoley/

en cs.CV, cs.MM

Detail Sumber

arXiv Open Access 2024

SonicVisionLM: Playing Sound with Vision Language Models

Zhifeng Xie, Shengye Yu, Qile He et al.

There has been a growing interest in the task of generating sound for silent videos, primarily because of its practicality in streamlining video post-production. However, existing methods for video-sound generation attempt to directly create sound from visual representations, which can be challenging due to the difficulty of aligning visual representations with audio representations. In this paper, we present SonicVisionLM, a novel framework aimed at generating a wide range of sound effects by leveraging vision-language models(VLMs). Instead of generating audio directly from video, we use the capabilities of powerful VLMs. When provided with a silent video, our approach first identifies events within the video using a VLM to suggest possible sounds that match the video content. This shift in approach transforms the challenging task of aligning image and audio into more well-studied sub-problems of aligning image-to-text and text-to-audio through the popular diffusion models. To improve the quality of audio recommendations with LLMs, we have collected an extensive dataset that maps text descriptions to specific sound effects and developed a time-controlled audio adapter. Our approach surpasses current state-of-the-art methods for converting video to audio, enhancing synchronization with the visuals, and improving alignment between audio and video components. Project page: https://yusiissy.github.io/SonicVisionLM.github.io/

en cs.MM, cs.SD

Detail Sumber

CrossRef Open Access 2023

Sound transmission loss of an automotive floor panel section with cross members

Chong Wang

4 sitasi en

Detail DOI Sumber

DOAJ Open Access 2023

Robust direct acoustic impedance control using two microphones for mixed feedforward-feedback controller

Volery Maxime, Guo Xinxin, Lissek Hervé

This paper presents an acoustic impedance control architecture for an electroacoustic absorber combining both feedforward and feedback microphone-based strategies on a current-driven loudspeaker. Feedforward systems enable good performance for direct impedance control. However, inaccuracies in the required actuator model can lead to a loss of passivity, which can cause unstable behaviour. The feedback contribution allows the absorber to better handle model errors and still achieve an accurate impedance, preserving passivity. Numerical and experimental studies were conducted to compare this new architecture against a state-of-the-art feedforward control method.

Acoustics in engineering. Acoustical engineering, Acoustics. Sound

Detail DOI Sumber

DOAJ Open Access 2023

Effects of ultrasound on the immunoreactivity of amandin, an allergen in apricot kernels during debitterizing

Fei-Fei Long, Xue-hui Fan, Qing-An Zhang

In this paper, an investigation was conducted on the effects of ultrasound time, power and temperatures on the immunoreactivity of the allergenic amandin in apricot kernels by western blotting analysis during the ultrasonically accelerated debitterizing. And its influencing mechanism on the structure of amandin was also analyzed by SDS-PAGE, circular dichroism spectrum, extrinsic fluorescence spectrum, surface hydrophobicity and zeta potential determination, respectively. The results indicate that ultrasound could significantly reduce the immunoreactivity of amandin during ultrasonically accelerated debitterizing, and the optimal ultrasound condition was 60 min, 300 W, 55 °C and 59 kHz and decreased the immunoreactivity to 15.61%, which might be attributed to the changes of the protein subunits, secondary and tertiary structure, and molecular aggregation state induced by ultrasound. In a word, ultrasound could not only accelerate debitterizing, but also significantly decrease the immunoreactivity of apricot kernels, which proved the feasibility of ultrasound in practical processing of apricot kernels.

Chemistry, Acoustics. Sound

Detail DOI Sumber

DOAJ Open Access 2023

Assessment of vibrational-translational relaxation dynamics of methane isotopologues in a wet-nitrogen matrix through QEPAS

Mariagrazia Olivieri, Marilena Giglio, Stefano Dello Russo et al.

Here we report on a study of the non-radiative relaxation dynamic of 12CH4 and 13CH4 in wet nitrogen-based matrixes by using the quartz-enhanced photoacoustic spectroscopy (QEPAS) technique. The dependence of the QEPAS signal on pressure at fixed matrix composition and on H2O concentration at fixed pressure was investigated. We demonstrated that QEPAS measurements can be used to retrieve both the effective relaxation rate in the matrix, and the V-T relaxation rate associated to collisions with nitrogen and water vapor. No significant differences in measured relaxation rates were observed between the two isotopologues.

Physics, Acoustics. Sound

Detail DOI Sumber

DOAJ Open Access 2023

Sonication-assisted liquid phase exfoliation of two-dimensional CrTe3 under inert conditions

Kevin Synnatschke, Narine Moses Badlyan, Angelika Wrzesińska et al.

Liquid phase exfoliation (LPE) has been used for the successful fabrication of nanosheets from a large number of van der Waals materials. While this allows to study fundamental changes of material properties’ associated with reduced dimensions, it also changes the chemistry of many materials due to a significant increase of the effective surface area, often accompanied with enhanced reactivity and accelerated oxidation. To prevent material decomposition, LPE and processing in inert atmosphere have been developed, which enables the preparation of pristine nanomaterials, and to systematically study compositional changes over time for different storage conditions. Here, we demonstrate the inert exfoliation of the oxidation-sensitive van der Waals crystal, CrTe3. The pristine nanomaterial was purified and size-selected by centrifugation, nanosheet dimensions in the fractions quantified by atomic force microscopy and studied by Raman, X-ray photoelectron spectroscopy (XPS), energy-dispersive X-ray spectroscopy (EDX) and photo spectroscopic measurements. We find a dependence of the relative intensities of the CrTe3 Raman modes on the propagation direction of the incident light, which prevents a correlation of the Raman spectral profile to the nanosheet dimensions. XPS and EDX reveal that the contribution of surface oxides to the spectra is reduced after exfoliation compared to the bulk material. Further, the decomposition mechanism of the nanosheets was studied by time-dependent extinction measurements after water titration experiments to initially dry solvents, which suggest that water plays a significant role in the material decomposition.

Chemistry, Acoustics. Sound

Detail DOI Sumber

arXiv Open Access 2023

Two vs. Four-Channel Sound Event Localization and Detection

Julia Wilkins, Magdalena Fuentes, Luca Bondi et al.

Sound event localization and detection (SELD) systems estimate both the direction-of-arrival (DOA) and class of sound sources over time. In the DCASE 2022 SELD Challenge (Task 3), models are designed to operate in a 4-channel setting. While beneficial to further the development of SELD systems using a multichannel recording setup such as first-order Ambisonics (FOA), most consumer electronics devices rarely are able to record using more than two channels. For this reason, in this work we investigate the performance of the DCASE 2022 SELD baseline model using three audio input representations: FOA, binaural, and stereo. We perform a novel comparative analysis illustrating the effect of these audio input representations on SELD performance. Crucially, we show that binaural and stereo (i.e. 2-channel) audio-based SELD models are still able to localize and detect sound sources laterally quite well, despite overall performance degrading as less audio information is provided. Further, we segment our analysis by scenes containing varying degrees of sound source polyphony to better understand the effect of audio input representation on localization and detection performance as scene conditions become increasingly complex.

en cs.SD, eess.AS

Detail Sumber

arXiv Open Access 2023

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

June-Woo Kim, Chihyeon Yoon, Miika Toikkanen et al.

Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.

en cs.SD, cs.LG

Detail Sumber

arXiv Open Access 2023

Sound reconstruction from human brain activity via a generative model with brain-like auditory features

Jong-Yun Park, Mitsuaki Tsukamoto, Misato Tanaka et al.

The successful reconstruction of perceptual experiences from human brain activity has provided insights into the neural representations of sensory experiences. However, reconstructing arbitrary sounds has been avoided due to the complexity of temporal sequences in sounds and the limited resolution of neuroimaging modalities. To overcome these challenges, leveraging the hierarchical nature of brain auditory processing could provide a path toward reconstructing arbitrary sounds. Previous studies have indicated a hierarchical homology between the human auditory system and deep neural network (DNN) models. Furthermore, advancements in audio-generative models enable to transform compressed representations back into high-resolution sounds. In this study, we introduce a novel sound reconstruction method that combines brain decoding of auditory features with an audio-generative model. Using fMRI responses to natural sounds, we found that the hierarchical sound features of a DNN model could be better decoded than spectrotemporal features. We then reconstructed the sound using an audio transformer that disentangled compressed temporal information in the decoded DNN features. Our method shows unconstrained sounds reconstruction capturing sound perceptual contents and quality and generalizability by reconstructing sound categories not included in the training dataset. Reconstructions from different auditory regions remain similar to actual sounds, highlighting the distributed nature of auditory representations. To see whether the reconstructions mirrored actual subjective perceptual experiences, we performed an experiment involving selective auditory attention to one of overlapping sounds. The results tended to resemble the attended sound than the unattended. These findings demonstrate that our proposed model provides a means to externalize experienced auditory contents from human brain activity.

en cs.SD, cs.HC

Detail Sumber

arXiv Open Access 2023

VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

Junhyeok Lee, Hyeonuk Nam, Yong-Hwa Park

The goal of DCASE 2023 Challenge Task 7 is to generate various sound clips for Foley sound synthesis (FSS) by "category-to-sound" approach. "Category" is expressed by a single index while corresponding "sound" covers diverse and different sound examples. To generate diverse sounds for a given category, we adopt VITS, a text-to-speech (TTS) model with variational inference. In addition, we apply various techniques from speech synthesis including PhaseAug and Avocodo. Different from TTS models which generate short pronunciation from phonemes and speaker identity, the category-to-sound problem requires generating diverse sounds just from a category index. To compensate for the difference while maintaining consistency within each audio clip, we heavily modified the prior encoder to enhance consistency with posterior latent variables. This introduced additional Gaussian on the prior encoder which promotes variance within the category. With these modifications, we propose VIFS, variational inference for end-to-end Foley sound synthesis, which generates diverse high-quality sounds.

en eess.AS, cs.AI

Detail Sumber

arXiv Open Access 2023

DiffSED: Sound Event Detection with Denoising Diffusion

Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia et al.

Sound Event Detection (SED) aims to predict the temporal boundaries of all the events of interest and their class labels, given an unconstrained audio sample. Taking either the splitand-classify (i.e., frame-level) strategy or the more principled event-level modeling approach, all existing methods consider the SED problem from the discriminative learning perspective. In this work, we reformulate the SED problem by taking a generative learning perspective. Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process, conditioned on a target audio sample. During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions in the elegant Transformer decoder framework. Doing so enables the model generate accurate event boundaries from even noisy queries during inference. Extensive experiments on the Urban-SED and EPIC-Sounds datasets demonstrate that our model significantly outperforms existing alternatives, with 40+% faster convergence in training.

en cs.SD, cs.LG

Detail Sumber

Hasil untuk "Acoustics. Sound"