{"results":[{"id":"doaj_10.1051/aacus/2026013","title":"A lexicon to describe specific sounds of the electric car cabin: A verbal approach to comfort improvement","authors":[{"name":"Duroyon Matthieu"},{"name":"Susini Patrick"},{"name":"Misdariis Nicolas"},{"name":"Dauchez Nicolas"},{"name":"Pardo Louis-Ferdinand"},{"name":"Vialatte Eléonore"}],"abstract":"Electric vehicles are now part of the everyday automotive landscape. The resulting sonic experience is a major challenge for driver comfort. Despite this challenge being known, no solution reaching general consensus has yet been proposed. This might be due to the lack of a common culture of the sound or the expected sonic target in electric vehicles, in opposition to what existed for thermal engine. This work proposes a decisive tool to enhance communication on sound description in the electric car cabin. Inspired by soundscape studies, the methodology consists in using a semi-structured questionnaire oriented toward sound description and judgment with 12 acousticians working on electric vehicles. A verbal analysis identifies 11 specific sound names describing this sonic environment. Definitions that include three levels of description: causal, reduced and hedonic as well as audio illustrations, are proposed for each sound name. The lexicon is validated by the same group of acousticians and available online.","source":"DOAJ","year":2026,"language":"","subjects":["Acoustics in engineering. Acoustical engineering","Acoustics. Sound"],"doi":"10.1051/aacus/2026013","url":"https://acta-acustica.edpsciences.org/articles/aacus/full_html/2026/01/aacus250037/aacus250037.html","is_open_access":true,"published_at":"","score":70},{"id":"arxiv_2509.02622","title":"IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering","authors":[{"name":"Clémentine Berger"},{"name":"Paraskevas Stamatiadis"},{"name":"Roland Badeau"},{"name":"Slim Essid"}],"abstract":"We are interested in audio systems capable of performing a differentiated processing of stationary backgrounds and isolated acoustic events within an acoustic scene, whether for applying specific processing methods to each part or for focusing solely on one while ignoring the other. Such systems have applications in real-world scenarios, including robust adaptive audio rendering systems (e.g., EQ or compression), plosive attenuation in voice mixing, noise suppression or reduction, robust acoustic event classification or even bioacoustics. To this end, we introduce IS${}^3$, a neural network designed for Impulsive--Stationary Sound Separation, that isolates impulsive acoustic events from the stationary background using a deep filtering approach, that can act as a pre-processing stage for the above-mentioned tasks. To ensure optimal training, we propose a sophisticated data generation pipeline that curates and adapts existing datasets for this task. We demonstrate that a learning-based approach, build on a relatively lightweight neural architecture and trained with well-designed and varied data, is successful in this previously unaddressed task, outperforming the Harmonic--Percussive Sound Separation masking method, adapted from music signal processing research, and wavelet filtering on objective separation metrics.","source":"arXiv","year":2025,"language":"en","subjects":["eess.AS","cs.AI","cs.SD","eess.SP"],"url":"https://arxiv.org/abs/2509.02622","pdf_url":"https://arxiv.org/pdf/2509.02622","is_open_access":true,"published_at":"2025-09-01T08:55:29Z","score":69},{"id":"arxiv_2405.12221","title":"Images that Sound: Composing Images and Sounds on a Single Canvas","authors":[{"name":"Ziyang Chen"},{"name":"Daniel Geng"},{"name":"Andrew Owens"}],"abstract":"Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these visual spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space. During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models. Through quantitative evaluations and perceptual studies, we find that our method successfully generates spectrograms that align with a desired audio prompt while also taking the visual appearance of a desired image prompt. Please see our project page for video results: https://ificl.github.io/images-that-sound/","source":"arXiv","year":2024,"language":"en","subjects":["cs.CV","cs.LG","cs.MM","cs.SD","eess.AS"],"url":"https://arxiv.org/abs/2405.12221","pdf_url":"https://arxiv.org/pdf/2405.12221","is_open_access":true,"published_at":"2024-05-20T17:59:59Z","score":68},{"id":"arxiv_2404.11399","title":"In situ sound absorption estimation with the discrete complex image source method","authors":[{"name":"Eric Brandao"},{"name":"William Fonseca"},{"name":"Paulo Mareze"},{"name":"Carlos Resende"},{"name":"Gabriel Azzuz"},{"name":"Joao Pontalti"},{"name":"Efren Fernandez-Grande"}],"abstract":"Estimating the sound absorption in situ relies on accurately describing the measured sound field. Evidence suggests that modeling the reflection of impinging spherical waves is important, especially for compact measurement systems. This article proposes a method for estimating the sound absorption coefficient of a material sample by mapping the sound pressure, measured by a microphone array, to a distribution of monopoles along a line in the complex plane. The proposed method is compared to modeling the sound field as a superposition of two sources (a monopole and an image source). The obtained inverse problems are solved with Tikhonov regularization, with automatic choice of the regularization parameter by the L-curve criterion. The sound absorption measurement is tested with simulations of the sound field above infinite and finite porous absorbers. The approaches are compared to the plane-wave absorption coefficient and the one obtained by spherical wave incidence. Experimental analysis of two porous samples and one resonant absorber is also carried out in situ. Four arrays were tested with an increasing aperture and number of sensors. It was demonstrated that measurements are feasible even with an array with only a few microphones. The discretization of the integral equation led to a more accurate reconstruction of the sound pressure and particle velocity at the sample's surface. The resulting absorption coefficient agrees with the one obtained for spherical wave incidence, indicating that including more monopoles along the complex line is an essential feature of the sound field.","source":"arXiv","year":2024,"language":"en","subjects":["eess.AS","cs.SD","physics.class-ph"],"url":"https://arxiv.org/abs/2404.11399","pdf_url":"https://arxiv.org/pdf/2404.11399","is_open_access":true,"published_at":"2024-04-17T14:03:42Z","score":68},{"id":"arxiv_2402.02807","title":"Are Sounds Sound for Phylogenetic Reconstruction?","authors":[{"name":"Luise Häuser"},{"name":"Gerhard Jäger"},{"name":"Taraka Rama"},{"name":"Johann-Mattis List"},{"name":"Alexandros Stamatakis"}],"abstract":"In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.","source":"arXiv","year":2024,"language":"en","subjects":["cs.CL","cs.SD","eess.AS"],"url":"https://arxiv.org/abs/2402.02807","pdf_url":"https://arxiv.org/pdf/2402.02807","is_open_access":true,"published_at":"2024-02-05T08:35:33Z","score":68},{"id":"doaj_10.1016/j.ultsonch.2024.107001","title":"Acoustic cavitation-induced microstructure evolution in ultrasonically brazed Al/Cu joints using Zn-Al alloy fillers","authors":[{"name":"Dan Zhao"},{"name":"Dan Li"},{"name":"Yong Xiao"},{"name":"Mingyu Li"},{"name":"Wen Chen"}],"abstract":"Tailoring the phase constitutions of the interfacial reaction layers under the assistance of ultrasonic vibration is a convenient method to fabricate high-strength Al/Cu brazing joints. In this study, 1060-Al and T2-Cu dissimilar metals were ultrasonically brazed with Zn-3Al (wt. %) filler metals. Effects of ultrasonic brazing time on the microstructure and mechanical properties of joints were investigated. Results showed that the CuZn5 intermetallic compound (IMC) layer and Cu-based diffusion layer were created on the Cu substrate surface in the joint ultrasonically brazed at 400 ℃ for 2 s. However, the CuZn5 IMC layer was gradually transformed into a thin Al4.2Cu3.2Zn0.7 IMC layer by increasing the ultrasonic vibration time to 15 s. A well-matched coherent interface was formed between the Al4.2Cu3.2Zn0.7 ternary phase and the Cu-based diffusion layer. The phase transition of the Cu-side interfacial layer correlated closely with the acoustic cavitations induced super-saturation regions near the Cu substrate surface. The measured tensile strength of the Al/Zn-3Al/Cu joint ultrasonically brazed for 15 s was 89.3 MPa, which was approximately 2.5 times higher than that brazed for 2 s, and the tensile failure mainly occurred at the interface between the Al4.2Cu3.2Zn0.7 layer and the Cu-based diffusion layer.","source":"DOAJ","year":2024,"language":"","subjects":["Chemistry","Acoustics. Sound"],"doi":"10.1016/j.ultsonch.2024.107001","url":"http://www.sciencedirect.com/science/article/pii/S1350417724002499","is_open_access":true,"published_at":"","score":68},{"id":"doaj_10.1016/j.ultsonch.2024.106754","title":"Ultra-high-speed dynamics of acoustic droplet vaporization in soft biomaterials: Effects of viscoelasticity, frequency, and bulk boiling point","authors":[{"name":"Bachir A. Abeid"},{"name":"Mario L. Fabiilli"},{"name":"Jonathan B. Estrada"},{"name":"Mitra Aliabouzar"}],"abstract":"Phase-shift droplets are a highly adaptable platform for biomedical applications of ultrasound. The spatiotemporal response of phase-shift droplets to focused ultrasound above a certain pressure threshold, termed acoustic droplet vaporization (ADV), is influenced by intrinsic features (e.g., bulk boiling point) and extrinsic factors (e.g., driving frequency and surrounding media). A deep understanding of ADV dynamics is critical to ensure the robustness and repeatability of an ADV-assisted application. Here, we integrated ultra-high-speed imaging, at 10 million frames per second, and confocal microscopy for a full-scale (i.e., from nanoseconds to seconds) characterization of ADV. Experiments were conducted in fibrin-based hydrogels to mimic soft tissue environments. Effects of fibrin concentration (0.2 to 8 % (w/v)), excitation frequency (1, 2.5, and 9.4 MHz), and perfluorocarbon core (perfluoropentane, perfluorohexane, and perfluorooctane) on ADV dynamics were studied. Several fundamental parameters related to ADV dynamics, such as expansion ratio, expansion velocity, collapse radius, collapse time, radius of secondary rebound, resting radius, and equilibrium radius of the generated bubbles were extracted from the radius vs time curves. Diffusion-driven ADV-bubble growth was fit to a modified Epstein-Plesset equation, adding a material stress term, to estimate the growth rate. Our results indicated that ADV dynamics were significantly impacted by fibrin concentration, frequency, and perfluorocarbon liquid core. This is the first study to combine ultra-high-speed and confocal microscopy techniques to provide insights into ADV bubble dynamics in tissue-mimicking hydrogels.","source":"DOAJ","year":2024,"language":"","subjects":["Chemistry","Acoustics. Sound"],"doi":"10.1016/j.ultsonch.2024.106754","url":"http://www.sciencedirect.com/science/article/pii/S1350417724000026","is_open_access":true,"published_at":"","score":68},{"id":"doaj_10.1016/j.ultsonch.2024.106837","title":"High-speed imaging of supersaturated cavitation clouds and the vibration modes of the radiation surface of high-power transducers","authors":[{"name":"Yandong Gao"},{"name":"Maolin Zhou"},{"name":"Weilin Xu"},{"name":"Jing Luo"},{"name":"Lixin Bai"}],"abstract":"The vibration mode of the radiation surface of transducer (or structure of supersaturated cavitation cloud in thin liquid) is investigated experimentally by high-speed photography. The classification of saturated, supersaturated and undersaturated cavitation clouds was proposed, and a comparison was made between saturated and supersaturated cavitation cloud structures in liquid thin layers. The characteristics and formation mechanism of supersaturated cavitation cloud structure were investigated. Based on the close correspondence and rapid response between the distribution of supersaturated cavitation clouds and vibration modes of radiation surface, a new approach is proposed to measure the vibration mode of transducer operating at high power and large amplitude in real time.","source":"DOAJ","year":2024,"language":"","subjects":["Chemistry","Acoustics. Sound"],"doi":"10.1016/j.ultsonch.2024.106837","url":"http://www.sciencedirect.com/science/article/pii/S1350417724000853","is_open_access":true,"published_at":"","score":68},{"id":"doaj_10.1186/s13636-024-00380-4","title":"Modelling note’s pitch and duration in trained professional singers","authors":[{"name":"Behnam Faghih"},{"name":"Amin Shoari Nejad"},{"name":"Joseph Timoney"}],"abstract":"Abstract Performing musical notes correctly does not mean that all the performers will play the notes at the exact same pitch and duration. However, it does imply that they are performing the notes within acceptable psychoacoustic ranges. Therefore, this article aims to find the range of a note’ duration and pitch according to its position in a piece of music by analysing several parameters in trained-professional singers’ behaviours in singing notes. To achieve the goal, the variations of eight variables on 2688 solo singing recorded files by trained professional singers were investigated to find the relationships between a performed note’s F0 and duration with these variables. The variables considered in this study are the interval to the following and previous notes, the existence of rest before or after the note, the note’s MIDI pitch code and duration in a music score, and the particular singing technique applied. The Bayesian hierarchical model was used to find the effect of the variables on the pitch and duration of a note sung by professionals, mainly in opera style, singers. The investigation confirms that these parameters affect the pitch and duration of notes performed by professional singers. Finally, this paper proposes formulas to calculate the pitch frequency and duration of the notes according to the variables to simulate the behaviour of the trained-professional singers in performing notes’ pitches and duration.","source":"DOAJ","year":2024,"language":"","subjects":["Acoustics. Sound","Electronic computers. Computer science"],"doi":"10.1186/s13636-024-00380-4","url":"https://doi.org/10.1186/s13636-024-00380-4","is_open_access":true,"published_at":"","score":68},{"id":"arxiv_2306.08051","title":"Cognitive performance in open-plan office acoustic simulations: Effects of room acoustics and semantics but not spatial separation of sound sources","authors":[{"name":"Manuj Yadav"},{"name":"Markus Georgi"},{"name":"Larissa Leist"},{"name":"Maria Klatte"},{"name":"Sabine J. Schlittmeier"},{"name":"Janina Fels"}],"abstract":"The irrelevant sound effect (ISE) characterizes short-term memory performance impairment during irrelevant sounds relative to quiet. Irrelevant sound presentation in most laboratory-based ISE studies has been rather limited to represent complex scenarios including open-plan offices (OPOs) and not many studies have considered serial recall of heard information. This paper investigates ISE using an auditory-verbal serial recall task, wherein performance was evaluated for relevant factors in simulating OPO acoustics: the irrelevant sounds including the semanticity of speech, reproduction methods over headphones, and room acoustics. Results (Experiments 1 and 2) show that ISE was exhibited in most conditions with anechoic (irrelevant) nonspeech sounds with/without speech, but the effect was substantially higher with meaningful speech compared to foreign speech, suggesting a semantic effect. Performance differences in conditions with diotic and binaural reproductions were not statistically robust, suggesting limited role of spatial separation of sources. In Experiment 3, statistically robust ISE were exhibited for binaural room acoustic conditions with mid-frequency reverberation times, T30 (s) = 0.4, 0.8, 1.1, suggesting cognitive impairment regardless of sound absorption representative of OPOs. Performance differences in T30 = 0.4 s relative to T30 = 0.8 and 1.1 s conditions were statistically robust. This emphasizes the benefits for cognitive performance with increased sound absorption, reinforcing extant room acoustic design recommendations. Performance differences in T30 = 0.8 s vs. 1.1 s were not statistically robust. Collectively, these results suggest that certain findings from ISE studies with idiosyncratic acoustics may not translate well to complex OPO acoustic environments.","source":"arXiv","year":2023,"language":"en","subjects":["eess.AS"],"doi":"10.1016/j.apacoust.2023.109559","url":"https://arxiv.org/abs/2306.08051","pdf_url":"https://arxiv.org/pdf/2306.08051","is_open_access":true,"published_at":"2023-06-13T18:17:12Z","score":67},{"id":"arxiv_2303.16897","title":"Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos","authors":[{"name":"Kun Su"},{"name":"Kaizhi Qian"},{"name":"Eli Shlizerman"},{"name":"Antonio Torralba"},{"name":"Chuang Gan"}],"abstract":"Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly.","source":"arXiv","year":2023,"language":"en","subjects":["cs.CV","cs.LG","cs.SD","eess.AS"],"url":"https://arxiv.org/abs/2303.16897","pdf_url":"https://arxiv.org/pdf/2303.16897","is_open_access":true,"published_at":"2023-03-29T17:59:53Z","score":67},{"id":"doaj_10.1016/j.pacs.2023.100452","title":"Fast iterative reconstruction for photoacoustic tomography using learned physical model: Theoretical validation","authors":[{"name":"Ko-Tsung Hsu"},{"name":"Steven Guan"},{"name":"Parag V. Chitnis"}],"abstract":"Iterative reconstruction has demonstrated superior performance in medical imaging under compressed, sparse, and limited-view sensing scenarios. However, iterative reconstruction algorithms are slow to converge and rely heavily on hand-crafted parameters to achieve good performance. Many iterations are usually required to reconstruct a high-quality image, which is computationally expensive due to repeated evaluations of the physical model. While learned iterative reconstruction approaches such as model-based learning (MBLr) can reduce the number of iterations through convolutional neural networks, it still requires repeated evaluations of the physical models at each iteration. Therefore, the goal of this study is to develop a Fast Iterative Reconstruction (FIRe) algorithm that incorporates a learned physical model into the learned iterative reconstruction scheme to further reduce the reconstruction time while maintaining robust reconstruction performance. We also propose an efficient training scheme for FIRe, which releases the enormous memory footprint required by learned iterative reconstruction methods through the concept of recursive training. The results of our proposed method demonstrate comparable reconstruction performance to learned iterative reconstruction methods with a 9x reduction in computation time and a 620x reduction in computation time compared to variational reconstruction.","source":"DOAJ","year":2023,"language":"","subjects":["Physics","Acoustics. Sound","Optics. Light"],"doi":"10.1016/j.pacs.2023.100452","url":"http://www.sciencedirect.com/science/article/pii/S2213597923000058","is_open_access":true,"published_at":"","score":67},{"id":"doaj_10.1016/j.pacs.2023.100535","title":"An extremum-guided interpolation for sparsely sampled photoacoustic imaging","authors":[{"name":"Haoyu Wang"},{"name":"Luo Yan"},{"name":"Cheng Ma"},{"name":"Yiping Han"}],"abstract":"In photoacoustic (PA) reconstruction, spatial constraints or real-time system requirements often result to sparse PA sampling data. For sparse PA sensor data, the sparse spatial and dense temporal sampling often leads to poor signal continuity. To address the structural characteristics of sparse PA signals, a data interpolation algorithm based on extremum-guided interpolation is proposed. This algorithm is based on the continuity of the signal, and can complete the estimation of high sampling rate signals without complex mathematical calculations. PA signal data is interpolated and reconstructed, and the results are evaluated using image quality assessment methods. The simulation and experimental results show that the proposed method performs better than several typical algorithms, effectively restoring image details, suppressing the generation of artifacts and noise, and improving the quality of PA reconstruction under sparse sampling.","source":"DOAJ","year":2023,"language":"","subjects":["Physics","Acoustics. Sound","Optics. Light"],"doi":"10.1016/j.pacs.2023.100535","url":"http://www.sciencedirect.com/science/article/pii/S2213597923000885","is_open_access":true,"published_at":"","score":67},{"id":"doaj_10.1016/j.ultsonch.2023.106552","title":"Simultaneous hydrodynamic cavitation and nanosecond pulse discharge plasma enhanced by oxygen injection","authors":[{"name":"Qiong Wu"},{"name":"Haiyun Luo"},{"name":"Hao Wang"},{"name":"Zhigang Liu"},{"name":"Liyang Zhang"},{"name":"Yutai Li"},{"name":"Xiaobing Zou"},{"name":"Xinxin Wang"}],"abstract":"A novel Hydrodynamic Cavitation-Assisted Oxygen Plasma (HCAOP) process, which employs a venturi tube and oxygen injection, has been developed for enhancing the production and utilization of hydroxyl radicals (·OH) in the degradation of organic pollutants. This study has systematically investigated the fluid characteristics and discharge properties of the gas–liquid two-phase body in the venturi tube. The hydraulic cavitation two-phase body discharge is initiated by the bridging of the cavitation cloud between the electrodes. The discharge mode transitions from diffuse to spark to corona as the oxygen flow rate increases. The spark discharge has the highest current and discharge energy. Excessive oxygen results in the change of the flow from bubbly to annular and a subsequent decrease in discharge energy. The effects of cavitation intensity, oxygen flow rate, and power polarity on discharge characteristics and ·OH production were evaluated using terephthalic acid as a fluorescent probe. It was found that injecting 3 standard liter per minute (SLPM) of oxygen increased the ·OH yield by 6 times with only 1.2 times increase in power, whereas\u003c0.5 SLPM of oxygen did not improve the ·OH yield due to lower breakdown voltage. Negative polarity voltage increased the breakdown voltage and ·OH yield due to asymmetric density and pressure distribution in the throat tube. This polarity effect was explained by numerical simulation. Using indigo carmine (E132) as a model pollutant, the HCAOP process degraded 20 mg/L of dye in 5 L water within 2 min following a first-order reaction. The lowest electric energy per order (EEO) was 0.26 (kWh/m3/order). The HCAOP process is a highly efficient flow-type advanced oxidation process with potential industrial applications.","source":"DOAJ","year":2023,"language":"","subjects":["Chemistry","Acoustics. Sound"],"doi":"10.1016/j.ultsonch.2023.106552","url":"http://www.sciencedirect.com/science/article/pii/S135041772300264X","is_open_access":true,"published_at":"","score":67},{"id":"arxiv_2202.10910","title":"Sound Adversarial Audio-Visual Navigation","authors":[{"name":"Yinfeng Yu"},{"name":"Wenbing Huang"},{"name":"Fuchun Sun"},{"name":"Changan Chen"},{"name":"Yikai Wang"},{"name":"Xiaohong Liu"}],"abstract":"Audio-visual navigation task requires an agent to find a sound source in a realistic, unmapped 3D environment by utilizing egocentric audio-visual observations. Existing audio-visual navigation works assume a clean environment that solely contains the target sound, which, however, would not be suitable in most real-world applications due to the unexpected sound noise or intentional interference. In this work, we design an acoustically complex environment in which, besides the target sound, there exists a sound attacker playing a zero-sum game with the agent. More specifically, the attacker can move and change the volume and category of the sound to make the agent suffer from finding the sounding object while the agent tries to dodge the attack and navigate to the goal under the intervention. Under certain constraints to the attacker, we can improve the robustness of the agent towards unexpected sound attacks in audio-visual navigation. For better convergence, we develop a joint training mechanism by employing the property of a centralized critic with decentralized actors. Experiments on two real-world 3D scan datasets, Replica, and Matterport3D, verify the effectiveness and the robustness of the agent trained under our designed environment when transferred to the clean environment or the one containing sound attackers with random policy. Project: \\url{https://yyf17.github.io/SAAVN}.","source":"arXiv","year":2022,"language":"en","subjects":["cs.SD","cs.CV","cs.RO","eess.AS"],"url":"https://arxiv.org/abs/2202.10910","pdf_url":"https://arxiv.org/pdf/2202.10910","is_open_access":true,"published_at":"2022-02-22T14:19:42Z","score":66},{"id":"arxiv_2203.03926","title":"Numerical simulation of sound propagation in and around ducts using thin boundary elements","authors":[{"name":"Wolfgang Kreuzer"}],"abstract":"Investigating the sound field in and around ducts is an important topic in acoustics, e.g. when simulating musical instruments or the human vocal tract. In this paper a method that is based on the boundary element method in 3D combined with a formulation for infinitely thin elements is presented. The boundary integral equations for these elements are presented, and numerical experiments are used to illustrate the behavior of the thin elements. Using the example of a closed benchmark duct, boundary element solutions for thin elements and surface elements are compared with the analytic solution, and the accuracy of the boundary element method as function of element size is investigated. As already shown for surface elements in the literature, an accumulation of the error along the duct can also be found for thin elements, but in contrast to surface elements this effect is not as big and a damping of the amplitude cannot be seen. In a second experiment, the impedance at the open end of a half open duct is compared with formulas for the radiation impedance of an unflanged tube, and a good agreement is shown. Finally, resonance frequencies of a tube open at both ends are calculated and compared with measured spectra. For sufficiently small element sizes frequencies for lower harmonics agree very well, for higher frequencies a difference of a few Hertz can be observed, which may be explained by the fact that the method does not consider dampening effects near the duct walls. The numerical experiments also suggest, that for duct simulations the usual six to eight elements per wavelength rule is not enough for accurate results.","source":"arXiv","year":2022,"language":"en","subjects":["math.NA"],"doi":"10.1016/j.jsv.2022.117050","url":"https://arxiv.org/abs/2203.03926","pdf_url":"https://arxiv.org/pdf/2203.03926","is_open_access":true,"published_at":"2022-03-08T08:45:40Z","score":66},{"id":"arxiv_2208.07994","title":"Enhancing Audio Perception of Music By AI Picked Room Acoustics","authors":[{"name":"Prateek Verma"},{"name":"Jonathan Berger"}],"abstract":"Every sound that we hear is the result of successive convolutional operations (e.g. room acoustics, microphone characteristics, resonant properties of the instrument itself, not to mention characteristics and limitations of the sound reproduction system). In this work we seek to determine the best room in which to perform a particular piece using AI. Additionally, we use room acoustics as a way to enhance the perceptual qualities of a given sound. Historically, rooms (particularly Churches and concert halls) were designed to host and serve specific musical functions. In some cases the architectural acoustical qualities enhanced the music performed there. We try to mimic this, as a first step, by designating room impulse responses that would correlate to producing enhanced sound quality for particular music. A convolutional architecture is first trained to take in an audio sample and mimic the ratings of experts with about 78 % accuracy for various instrument families and notes for perceptual qualities. This gives us a scoring function for any audio sample which can rate the perceptual pleasantness of a note automatically. Now, via a library of about 60,000 synthetic impulse responses mimicking all kinds of room, materials, etc, we use a simple convolution operation, to transform the sound as if it was played in a particular room. The perceptual evaluator is used to rank the musical sounds, and yield the \"best room or the concert hall\" to play a sound. As a byproduct it can also use room acoustics to turn a poor quality sound into a \"good\" sound.","source":"arXiv","year":2022,"language":"en","subjects":["cs.SD","cs.AI","cs.LG","cs.MM","eess.AS"],"url":"https://arxiv.org/abs/2208.07994","pdf_url":"https://arxiv.org/pdf/2208.07994","is_open_access":true,"published_at":"2022-08-16T23:47:43Z","score":66},{"id":"arxiv_2211.01966","title":"MarginNCE: Robust Sound Localization with a Negative Margin","authors":[{"name":"Sooyoung Park"},{"name":"Arda Senocak"},{"name":"Joon Son Chung"}],"abstract":"The goal of this work is to localize sound sources in visual scenes with a self-supervised approach. Contrastive learning in the context of sound source localization leverages the natural correspondence between audio and visual signals where the audio-visual pairs from the same source are assumed as positive, while randomly selected pairs are negatives. However, this approach brings in noisy correspondences; for example, positive audio and visual pair signals that may be unrelated to each other, or negative pairs that may contain semantically similar samples to the positive one. Our key contribution in this work is to show that using a less strict decision boundary in contrastive learning can alleviate the effect of noisy correspondences in sound source localization. We propose a simple yet effective approach by slightly modifying the contrastive loss with a negative margin. Extensive experimental results show that our approach gives on-par or better performance than the state-of-the-art methods. Furthermore, we demonstrate that the introduction of a negative margin to existing methods results in a consistent improvement in performance.","source":"arXiv","year":2022,"language":"en","subjects":["cs.CV","cs.MM","cs.SD","eess.AS","eess.IV"],"url":"https://arxiv.org/abs/2211.01966","pdf_url":"https://arxiv.org/pdf/2211.01966","is_open_access":true,"published_at":"2022-11-03T16:44:14Z","score":66},{"id":"doaj_10.1016/j.ultsonch.2022.106175","title":"Numerical study of real gas effects during bubble collapse using a disequilibrium multiphase model","authors":[{"name":"Saeed Bidi"},{"name":"Phoevos Koukouvinis"},{"name":"Andreas Papoutsakis"},{"name":"Armand Shams"},{"name":"Manolis Gavaises"}],"abstract":"An explicit density-based solver of the Euler equations for inviscid and immiscible gas–liquid flow media is coupled with real-fluid thermodynamic equations of state supporting mild dissociation and calibrated with shock tube data up to 5000 K and 28 GPa. The present work expands the original 6-equation disequilibrium method by generalising the numerical approach required for estimating the equilibrium pressure in computational cells where both gas and liquid phases co-exist while enforcing energy conservation for all media. An iterative numerical procedure is suggested for taking into account the properties of the gas content as derived from highly non-linear real gas equations of state and implemented in a tabulated form during the numerical solution. The developed method is subsequently used to investigate gaseous bubble collapse cases considering both spherical and 2D asymmetric arrangements as induced by the presence of a rigid wall. It is demonstrated that the predicted maximum temperatures are strongly influenced by the equations of state used; the real gas model predicts a temperature reduction in the bubble interior up to 41% space-averaged and 50% locally during the collapse phase compared to the predictions obtained with the aid of the widely used ideal gas approximation.","source":"DOAJ","year":2022,"language":"","subjects":["Chemistry","Acoustics. Sound"],"doi":"10.1016/j.ultsonch.2022.106175","url":"http://www.sciencedirect.com/science/article/pii/S1350417722002711","is_open_access":true,"published_at":"","score":66},{"id":"doaj_10.1155/2022/9513357","title":"Simulation of Flow-Induced Vibration and Dynamic Performance of Circular-Arc Helical Gear Pump under Background of Machine Learning","authors":[{"name":"Xiaoling Wei"},{"name":"Yongbao Feng"},{"name":"Xiaoxia Han"},{"name":"Zhenxin He"}],"abstract":"At present, with the continuous development and great improvement of mechanical manufacturing, processing, and assembly technology, mechanical flow-induced vibration (FIV) with a relatively concentrated frequency domain can be controlled by active and passive noise reduction methods. However, whether it is active noise reduction or passive noise reduction, they all focus on how to suppress the transmission of sound waves and cannot solve the problems of flow leakage, obvious temperature rise, and noise excitation from the root cause. Therefore, it is necessary to determine the location of the primary and secondary excitation sound sources of FIV, the identification of true and false sounds, and the characteristic relationship between flow and noise. This provides a theoretical basis and engineering application direction for the mechanism of noise reduction of FIV. The numerical calculation part of the acoustics in this paper is solved by the hybrid method, and the flow field is discretely calculated by the large eddy simulation (LES) module in the Fluent software. When the calculated flow field is stable, the velocity field of one impeller rotation period is selected to be output as the iterative value of the sound field and imported into ACTRAN for Fourier transform. Then, the sound field calculation is carried out, and the result of the spatial and temporal variation of the sound field is finally obtained. Through experiments, it was found that when the load of the gear pump is 8 MPa, the volumetric efficiency of the optimized circular-arc helical gear pump of the sliding bearing was improved by about 4%. When the rotation speed is 2100°r/min, the arc helical gear pump reduced the surface temperature rise by 2.5°C. This verified that the optimized performance of the sliding bearing in the arc helical gear pump is significantly improved. Through the theoretical model of the temperature rise of the sliding bearing, the phenomenon that the surface temperature of the prototype gear pump was not significantly increased with the loading in the low pressure region is explained.","source":"DOAJ","year":2022,"language":"","subjects":["Electrical engineering. Electronics. Nuclear engineering"],"doi":"10.1155/2022/9513357","url":"http://dx.doi.org/10.1155/2022/9513357","is_open_access":true,"published_at":"","score":66}],"total":308427,"page":1,"page_size":20,"sources":["arXiv","DOAJ","CrossRef"],"query":"Acoustics. Sound"}