Mapping glucose-induced hemodynamics in white fat depots with label-free optoacoustics
Nikolina-Alexia Fasoula, Nikoletta Katsouli, Michael Kallmayer
et al.
Subcutaneous adipose tissue (SAT) hemodynamics is an indicator of cardiometabolic health. Herein, we demonstrate a non-invasive approach for imaging SAT hemodynamics in humans using multispectral optoacoustic tomography (MSOT). We evaluated different SAT depots in individuals with low (< 24 kg/m²) and high (≥ 24 kg/m²) BMI, with each group consisting of 8 participants, during oral glucose challenges. Our results indicate a significant decrease in glucose-induced hyperemic responses within SAT for individuals with higher BMI, at 60 min postprandially. MSOT also revealed that abdominal SAT exhibited a more active hemodynamic status compared to femoral SAT in both groups when compared to baseline measurements. MSOT readouts were further validated against longitudinal blood tests of triglycerides, glucose, lactate, and cholesterol. We introduce MSOT as a new method for studying SAT hemodynamics across multiple depots in a single test, providing invaluable insights into SAT physiology related to BMI fluctuations and general cardiometabolic health.
Physics, Acoustics. Sound
A modified adaptive Kalman filter algorithm for the distributed underwater multi-target passive tracking system
Xuefei Ma, Jiaxin Ma, Zexu Ma
et al.
A modified adaptive Kalman filter (AKF) algorithm is proposed to make underwater multi-target tracking with uncertain measurement noise reliable. By utilizing the proposed AKF algorithm with three core points, including an adaptive fading factor, measurement noise covariance adjustment, and an adaptive weighting factor, the unknown measurement noise and state vector can be estimated with good accuracy and robustness. The practical trial data verify this algorithm, and it has proven superior to all traditional algorithms in this Letter based on the results that it reduces the estimated position RMSEs by at least 10.29% while estimated velocity RMSEs by at least 52.57%.
Enhanced degradation of lindane in water by sulfite-assisted ultrasonic (SF/US) process: The critical role of generated aqueous electrons
Alin Xia, Haitao Yu, Longyuan Tan
et al.
Persistent pesticides pose significant environmental and health risks due to their strong resistance to conventional degradation methods. This study investigates the degradation of lindane (LND), a perchlorinated pesticide, using a sulfite-assisted ultrasonic (SF/US) process, focusing on the critical role of aqueous electrons (eaq–) in reductive dechlorination. Aqueous electrons were indirectly identified as the primary reactive species in the SF/US system for pollutant degradation, providing insights into US-induced reduction mechanisms. The SF/US system significantly enhanced LND removal, achieving 99.4 % ± 1.0 % degradation within 100 min, compared to 88.9 % ± 1.5 % under ultrasound alone. Kinetic analysis showed that sulfite addition nearly doubled the reaction rate constant (from 0.022 to 0.041 min−1), confirming that eaq– drive LND degradation more efficiently than hydroxyl radicals (HO•). Scavenging experiments further demonstrated that nitrate strongly inhibited degradation, while tert-butanol (TBA) had minimal effect, verifying that eaq–, rather than HO•, dominate the process. The efficiency of SF/US was influenced by various factors, with optimal removal achieved at 200 kHz, oxygen-depleted conditions, and pH 10. The degradation pathway primarily involved sequential reductive dechlorination of LND, progressing through pentachlorocyclohexene, tetrachlorocyclohexadiene, and trichlorobenzene intermediates before ultimately forming non-toxic aromatic derivatives such as hydroquinone and phenol. These findings highlight SF/US as a novel and highly efficient strategy for the remediation of chlorinated pesticides in water treatment.
Chemistry, Acoustics. Sound
XAI-Driven Spectral Analysis of Cough Sounds for Respiratory Disease Characterization
Patricia Amado-Caballero, Luis Miguel San-José-Revuelta, María Dolores Aguilar-García
et al.
This paper proposes an eXplainable Artificial Intelligence (XAI)-driven methodology to enhance the understanding of cough sound analysis for respiratory disease management. We employ occlusion maps to highlight relevant spectral regions in cough spectrograms processed by a Convolutional Neural Network (CNN). Subsequently, spectral analysis of spectrograms weighted by these occlusion maps reveals significant differences between disease groups, particularly in patients with COPD, where cough patterns appear more variable in the identified spectral regions of interest. This contrasts with the lack of significant differences observed when analyzing raw spectrograms. The proposed approach extracts and analyzes several spectral features, demonstrating the potential of XAI techniques to uncover disease-specific acoustic signatures and improve the diagnostic capabilities of cough sound analysis by providing more interpretable results.
The iNaturalist Sounds Dataset
Mustafa Chasmai, Alexander Shepard, Subhransu Maji
et al.
We present the iNaturalist Sounds Dataset (iNatSounds), a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide. The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist, a global citizen science platform. Each recording in the dataset varies in length and includes a single species annotation. We benchmark multiple backbone architectures, comparing multiclass classification objectives with multilabel objectives. Despite weak labeling, we demonstrate that iNatSounds serves as a useful pretraining resource by benchmarking it on strongly labeled downstream evaluation datasets. The dataset is available as a single, freely accessible archive, promoting accessibility and research in this important domain. We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections, thereby contributing to the understanding of species compositions in diverse soundscapes.
Environmental Sound Deepfake Detection Challenge: An Overview
Han Yin, Yang Xiao, Rohan Kumar Das
et al.
Recent progress in audio generation models has made it possible to create highly realistic and immersive soundscapes, which are now widely used in film and virtual-reality-related applications. However, these audio generators also raise concerns about potential misuse, such as producing deceptive audio for fabricated videos or spreading misleading information. Therefore, it is essential to develop effective methods for detecting fake environmental sounds. Existing datasets for environmental sound deepfake detection (ESDD) remain limited in both scale and the diversity of sound categories they cover. To address this gap, we introduced EnvSDD, the first large-scale curated dataset designed for ESDD. Based on EnvSDD, we launched the ESDD Challenge, recognized as one of the ICASSP 2026 Grand Challenges. This paper presents an overview of the ESDD Challenge, including a detailed analysis of the challenge results.
BASSA: New software tool reveals hidden details in visualisation of low‐frequency animal sounds
Benjamin A. Jancovich, Tracey L. Rogers
Abstract The study of animal sounds in biology and ecology relies heavily upon time–frequency (TF) visualisation, most commonly using the short‐time Fourier transform (STFT) spectrogram. This method, however, has inherent bias towards either temporal or spectral details that can lead to misinterpretation of complex animal sounds. An ideal TF visualisation should accurately convey the structure of the sound in terms of both frequency and time, however, the STFT often cannot meet this requirement. We evaluate the accuracy of four TF visualisation methods (superlet transform [SLT], continuous wavelet transform [CWT] and two STFTs) using a synthetic test signal. We then apply these methods to visualise sounds of the Chagos blue whale, Asian elephant, southern cassowary, eastern whipbird, mulloway fish and the American crocodile. We show that the SLT visualises the test signal with 18.48%–28.08% less error than the other methods. A comparison between our visualisations of animal sounds and their literature descriptions indicates that the STFT's bias may have caused misinterpretations in describing pygmy blue whale songs and elephant rumbles. We suggest that use of the SLT to visualise low‐frequency animal sounds may prevent such misinterpretations. Finally, we employ the SLT to develop ‘BASSA’, an open‐source, GUI software application that offers a no‐code, user‐friendly tool for analysing short‐duration recordings of low‐frequency animal sounds for the Windows platform. The SLT visualises low‐frequency animal sounds with improved accuracy, in a user‐friendly format, minimising the risk of misinterpretation while requiring less technical expertise than the STFT. Using this method could propel advances in acoustics‐driven studies of animal communication, vocal production methods, phonation and species identification.
Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification
Ysobel Sims, Alexandre Mendes, Stephan Chalup
Zero-shot learning enables models to generalise to unseen classes by leveraging semantic information, bridging the gap between training and testing sets with non-overlapping classes. While much research has focused on zero-shot learning in computer vision, the application of these methods to environmental audio remains underexplored, with poor performance in existing studies. Generative methods, which have demonstrated success in computer vision, are notably absent from zero-shot environmental sound classification studies. To address this gap, this work investigates generative methods for zero-shot learning in environmental audio. Two successful generative models from computer vision are adapted: a cross-aligned and distribution-aligned variational autoencoder (CADA-VAE) and a leveraging invariant side generative adversarial network (LisGAN). Additionally, we introduced a novel diffusion model conditioned on class auxiliary data. Synthetic embeddings generated by the diffusion model are combined with seen class embeddings to train a classifier. Experiments are conducted on five environmental audio datasets, ESC-50, ARCA23K-FSD, FSC22, UrbanSound8k and TAU Urban Acoustics 2019, and one music classification dataset, GTZAN. Results show that the diffusion model outperforms all baseline methods on average across six audio datasets. This work establishes the diffusion model as a promising approach for zero-shot learning and introduces the first benchmark of generative methods for zero-shot environmental sound classification, providing a foundation for future research.
3D Room Geometry Inference from Multichannel Room Impulse Response using Deep Neural Network
Inmo Yeon, Jung-Woo Choi
Room geometry inference (RGI) aims at estimating room shapes from measured room impulse responses (RIRs) and has received lots of attention for its importance in environment-aware audio rendering and virtual acoustic representation of a real venue. A lot of estimation models utilizing time difference of arrival (TDoA) or time of arrival (ToA) information in RIRs have been proposed. However, an estimation model should be able to handle more general features and complex relations between reflections to cope with various room shapes and uncertainties such as the unknown number of walls. In this study, we propose a deep neural network that can estimate various room shapes without prior assumptions on the shape or number of walls. The proposed model consists of three sub-networks: a feature extractor, parameter estimation, and evaluation networks, which extract key features from RIRs, estimate parameters, and evaluate the confidence of estimated parameters, respectively. The network is trained by about 40,000 RIRs simulated in rooms of different shapes using a single source and spherical microphone array and tested for rooms of unseen shapes and dimensions. The proposed algorithm achieves almost perfect accuracy in finding the true number of walls and shows negligible errors in room shapes.
Enhancement of sonochemical production of hydroxyl radicals from pulsed cylindrically converging ultrasound waves
Cherie C.Y. Wong, Jason L. Raymond, Lillian N. Usadi
et al.
Sonochemistry is the use of ultrasound to generate highly reactive radical species through the inertial collapse of a gas/vapour cavity and is a green alternative for hydrogen production, wastewater treatment, and chemical synthesis and modifications. Yet, current sonochemical reactors often are limited by their design, resulting in low efficacy and yields with slow reaction kinetics. Here, we constructed a novel sonochemical reactor design that creates cylindrically converging ultrasound waves to create an intense localised region of high acoustic pressure amplitudes (15 MPaPKPK) capable of spontaneously nucleating cavitation. Using a novel dosimetry technique, we determined the effect of acoustic parameters on the yield of hydroxyl radicals (HO·), HO· production rate, and ultimately the sonochemical efficiency (SE) of our reactor. Our reactor design had a significantly higher HO· production rate and SE compared to other conventional reactors and across literature.
Chemistry, Acoustics. Sound
Effect of ultrasound on keratin valorization from chicken feather waste: Process optimization and keratin characterization
Xiaojie Qin, Chuan Yang, Yujie Guo
et al.
Chicken feather (CF) has been deemed as one of the main poultry byproducts with a large amount produced globally. However, the robust chemical nature of chicken feathers has been limiting in its wide-scale utilization and valorization. The study proposed a strategy of keratin regeneration from chicken feather combining ultrasound and Cysteine (Cys)-reduction for keratin regeneration. First, the ultrasonic effect on feather degradation and keratin properties was systematically explored based on Cys-reduction. Results showed that the feather dissolution was significantly improved by increasing both ultrasonic time and power, and the former had a greater impact on keratin yield. However, the treatment time over 4 h led to a decrease of keratin yield, producing more soluble peptides, > 9.7 % of which were < 0.5 kDa. Meanwhile, prolonging time decreased the thermal stability with weight loss at a lower temperature and amino acids content (e.g., Ser, Pro and Gly) of keratin. Conversely, no remarkable damage in chemical structure and thermal stability of regenerated keratin was observed by only increasing ultrasonic power, while the keratin solubility was notably promoted and reached 745.72 mg·g−1 in NaOH (0.1 M) solution (400 W, 4 h). The regenerated keratin under optimal conditions (130 W, 2.7 h, and 15 % of Cys) possessed better solubility while without obvious damage in chemical structure, thermal stability, and amino acids composition. The study illustrated that ultrasound physically improved CF degradation and keratin solubility without nature damage and provided an alternative for keratin regeneration involving no toxic reagent, probably holding promise in the utilization and valorization of feather waste.
Chemistry, Acoustics. Sound
Sound of Story: Multi-modal Storytelling with Audio
Jaeyeon Bae, Seokhoon Jeong, Seokun Kang
et al.
Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new component called "background sound" which is story context-based audio without any linguistic information. For this purpose, we introduce a new dataset, called "Sound of Story (SoS)", which has paired image and text sequences with corresponding sound or background music for a story. To the best of our knowledge, this is the largest well-curated dataset for storytelling with sound. Our SoS dataset consists of 27,354 stories with 19.6 images per story and 984 hours of speech-decoupled audio such as background music and other sounds. As benchmark tasks for storytelling with sound and the dataset, we propose retrieval tasks between modalities, and audio generation tasks from image-text sequences, introducing strong baselines for them. We believe the proposed dataset and tasks may shed light on the multi-modal understanding of storytelling in terms of sound. Downloading the dataset and baseline codes for each task will be released in the link: https://github.com/Sosdatasets/SoS_Dataset.
AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance
Yuanbo Hou, Qiaoqiao Ren, Huizhong Zhang
et al.
Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation.
Sound field decomposition based on two-stage neural networks
Ryo Matsuda, Makoto Otani
A method for sound field decomposition based on neural networks is proposed. The method comprises two stages: a sound field separation stage and a single-source localization stage. In the first stage, the sound pressure at microphones synthesized by multiple sources is separated into one excited by each sound source. In the second stage, the source location is obtained as a regression from the sound pressure at microphones consisting of a single sound source. The estimated location is not affected by discretization because the second stage is designed as a regression rather than a classification. Datasets are generated by simulation using Green's function, and the neural network is trained for each frequency. Numerical experiments reveal that, compared with conventional methods, the proposed method can achieve higher source-localization accuracy and higher sound-field-reconstruction accuracy.
Target Sound Extraction with Variable Cross-modality Clues
Chenda Li, Yao Qian, Zhuo Chen
et al.
Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources. It often uses a model conditioned on a fixed form of target sound clues, such as a sound class label, which limits the ways in which users can interact with the model to specify the target sounds. To leverage variable number of clues cross modalities available in the inference phase, including a video, a sound event class, and a text caption, we propose a unified transformer-based TSE model architecture, where a multi-clue attention module integrates all the clues across the modalities. Since there is no off-the-shelf benchmark to evaluate our proposed approach, we build a dataset based on public corpora, Audioset and AudioCaps. Experimental results for seen and unseen target-sound evaluation sets show that our proposed TSE model can effectively deal with a varying number of clues which improves the TSE performance and robustness against partially compromised clues.
Inverse design of a Helmholtz resonator based low-frequency acoustic absorber using deep neural network
K. Mahesh, S. Kumar Ranjith, R. Mini
The design of low-frequency sound absorbers with broadband absorption characteristics and optimized dimensions is a pressing research problem in engineering acoustics. In this work, a deep neural network based inverse prediction mechanism is proposed to geometrically design a Helmholtz resonator (HR) based acoustic absorber for low-frequency absorption. Analytically obtained frequency response from electro-acoustic theory is deployed to create the large dataset required for training and testing the deep neural network. The trained convolutional neural network inversely speculates optimum design parameters corresponding to the desired absorption characteristics with high fidelity. To validate, the inverse design procedure is initially implemented on a standard HR based sound absorber model with high accuracy. Thereafter, the inverse design strategy is extended to forecast the optimum geometric parameters of an absorber with complex features, which is realized using HRs and a micro-perforated panel. Subsequently, a quasi-perfect low-frequency acoustic absorber having minimum thickness and broadband characteristics is deduced. Importantly, it is demonstrated that the proposed absorber, comprising four parallel HRs and a microperforated panel, absorbed more than 90% sound in the frequency band of 347–630 Hz. The introduced design process reveals a wide variety of applications in engineering acoustics as it is suitable for tailoring any sound absorber model with desirable features.
45 sitasi
en
Computer Science
An investigation of interference between electromagnetic articulography and electroglottography
Matthew Masapollo, Ratree Wayland, Jessica Goel
et al.
The present study tested whether there is cross-interference between electromagnetic articulography (EMA) and electroglottography (EGG) during the acquisition of kinematic speech data. In experiments 1A and 1B, EMA sensors were calibrated with and without EGG electrodes present in the EMA field. In experiment 2, EMA was used to record lip, tongue, and jaw movements for one male speaker and one female speaker, with and without simultaneous EGG recording. Collectively, the results provide no evidence of signal artifacts in either direction, suggesting that EMA and EGG technology can be combined to reliably assess laryngeal and supralaryngeal motor coordination in speech.
Lysozyme crystallization in hydrogel media under ultrasound irradiation
Mariia Savchenko, Manuel Hurtado, Modesto T. Lopez-Lopez
et al.
Sonocrystallization implies the application of ultrasound radiation to control the nucleation and crystal growth depending on the actuation time and intensity. Its application allows to induce nucleation at lower supersaturations than required under standard conditions. Although extended in inorganic and organic crystallization, it has been scarcely explored in protein crystallization. Now, that industrial protein crystallization is gaining momentum, the interest on new ways to control protein nucleation and crystal growth is advancing. In this work we present the development of a novel ultrasound bioreactor to study its influence on protein crystallization in agarose gel. Gel media minimize convention currents and sedimentation, favoring a more homogeneous and stable conditions to study the effect of an externally generated low energy ultrasonic irradiation on protein crystallization avoiding other undesired effects such as temperature increase, introduction of surfaces which induce nucleation, destructive cavitation phenomena, etc. In-depth statistical analysis of the results has shown that the impact of ultrasound in gel media on crystal size populations are statistically significant and reproducible.
Chemistry, Acoustics. Sound
A novel sub-pilot-scale direct-contact ultrasonic dehydration technology for sustainable production of distillers dried grains (DDG)
Amir Malvandi, Danielle Nicole Coleman, Juan J. Loor
et al.
DDG is a major source of protein, calcium, phosphorus, and sulfur is arguably the most important byproduct of the bioethanol industry with increasing demand over the past few years. Reducing energy consumption in the DDG production process and energy recovery from DDG is vital for sustainable bioethanol productions. In this paper, a novel direct-contact multi-frequency, multimode, and modulated (MMM) ultrasonic dryer (US) was developed for the first time and has been applied in dehydration of wet distillers’ grain (WDG). Ultrasonic drying (US) was combined with a convective airflow (HA) at different temperatures of 25 (room temperature), 50 and 70 °C to evaluate the impact of US, HA, and US + HA on drying kinetics, activation energy, chemical compositions, microstructure, and color of DDG. Semi-empirical kinetic models were developed and evaluating drying performances showed that the application of ultrasound significantly enhanced the drying rate and decreased the drying time (by 46%), especially at low drying temperatures. The activation energy for moisture removal in the presence of ultrasound was about 50% of that without ultrasound. The final dried distillers' grains product processed by ultrasonic drying had a brighter color, a higher available protein, a higher digestible protein (the lowest acid detergent insoluble crude protein), and a better surface profile with no compromise on minerals and fiber contents.
Chemistry, Acoustics. Sound
GPU-Accelerated Target Strength Prediction Based on Multiresolution Shooting and Bouncing Ray Method
Gang Zhao, Naiwei Sun, Shen Shen
et al.
The application of the traditional planar acoustics method is limited due to the low accuracy when computing the echo characteristics of underwater targets. Based on the concept of the shooting and bouncing ray which considers multiple reflections on the basic of the geometrics optics principle, this paper presents a more efficient GPU-accelerated multiresolution grid algorithm in the shooting and bouncing ray method (SBR) to quickly predict the target strength value of complex underwater targets. The procedure of the virtual aperture plane generation, ray tracing, scattered sound field integral and subdividing the divergent ray tubes are all implemented on the GPU. Particularly, stackless KD-tree traversal is adopted to effectively improve the ray-tracing efficiency. Experiments on the rigid sphere, cylinder and corner reflector model verify the accuracy of GPU-based multiresolution SBR. Besides, the GPU-based SBR is more than 750 times faster than the CPU version because of its tremendous computing capability. Further, the proposed accelerated GPU-based multiresolution SBR improves runtime performance at least 2.4 times that of the single resolution GPU-based SBR.
Technology, Engineering (General). Civil engineering (General)