This study investigates the use of computational audio analysis to examine ideological narratives in Nazi propaganda films. Employing a three-step pipeline, speaker diarization, audio transcription and psycholinguistic analysis, it reveals ideological patterns in characters. Despite current issues with speaker diarization, the methodology provides insights into character traits and propaganda narratives, suggesting scalable applications.
In this work, we propose a new efficient solution, which is a Mamba-based model named BMACE (Bidirectional Mamba-based network, for Automatic Chord Estimation), which utilizes selective structured state-space models in a bidirectional Mamba layer to effectively model temporal dependencies. Our model achieves high prediction performance comparable to state-of-the-art models, with the advantage of requiring fewer parameters and lower computational resources
AbstractThe logistics sector plays a crucial role in supporting various aspects of the economy, making it an essential part of a nation's development. However, this sector also contributes to environmental pollution through various emissions. The adoption of environmentally friendly logistics practices presents a promising solution to mitigate adverse environmental impacts. This study aims to investigate the influence of economic growth, green innovation, foreign direct investment, transport emissions, renewable energy, and trade openness on green logistics in both Brazil, Russia, India, China, and South Africa (BRICS) and Gulf countries from 1992 to 2020. This study used an advanced panel approach to obtain robust results, considering cross‐sectional dependency and slope heterogeneity. The cross‐sectionally augmented autoregressive distributed lag method was employed to analyze long and short‐run estimations. Our findings reveal that in Gulf countries, both transport emissions and foreign direct investment have a negative impact on green logistics. In the BRICS countries, economic growth, transport emissions, trade openness, renewable energy, and green innovation have a positive impact on green logistics. The study proposes several recommendations to improve logistics development in both groups of nations and promote sustainability. To achieve carbon neutrality, it is important to adopt green logistics, promote green investments, and support renewable energy, innovation, and sustainable growth.
Our study delves into the "Embodied Musicking Dataset," exploring the intertwined relationships and correlations between physiological and psychological dimensions during improvisational music performances. The primary objective is to ascertain the presence of a definitive causal or correlational relationship between these states and comprehend their manifestation in musical compositions. This rich dataset provides a perspective on how musicians coordinate their physicality with sonic events in real-time improvisational scenarios, emphasizing the concept of "Embodied Musicking."
We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single instrument notes. We evaluate the models with a cross-modal retrieval task on synth patches. Finally, we demonstrate the application of TimbreCLIP on two tasks: text-driven audio equalization and timbre to image generation.
Suradej Duangpummet, Jessada Karnjana, Waree Kongprawechnon
et al.
This paper proposes a blind estimation method based on the modulation transfer function and Schroeder model for estimating reverberation time in seven-octave bands. Therefore, the speech transmission index and five room-acoustic parameters can be estimated.
Jonas Van Breedam, Heiko Goelzer, Philippe Huybrechts
Abstract. The emphasis for informing policy makers on future sea-level rise has been on projections by the end of the 21st century. However, due to the long lifetime of atmospheric CO2, the thermal inertia of the climate system and the slow equilibration of the ice sheets, global sea level will continue to rise on a multi-millennial timescale even when anthropogenic CO2 emissions cease completely during the coming decades to centuries. Here we present global sea-level change projections due to the melting of land ice combined with steric sea effects during the next 10 000 years calculated in a fully interactive way with the Earth system model of intermediate complexity LOVECLIMv1.3. The greenhouse forcing is based on the Extended Concentration Pathways defined until 2300 CE with no carbon dioxide emissions thereafter, equivalent to a cumulative CO2 release of between 460 and 5300 GtC. We performed one additional experiment for the highest-forcing scenario with the inclusion of a methane emission feedback where methane is slowly released due to a strong increase in surface and oceanic temperatures. After 10 000 years, the sea-level change rate drops below 0.05 m per century and a semi-equilibrated state is reached. The Greenland ice sheet is found to nearly disappear for all forcing scenarios. The Antarctic ice sheet contributes only about 1.6 m to sea level for the lowest forcing scenario with a limited retreat of the grounding line in West Antarctica. For the higher-forcing scenarios, the marine basins of the East Antarctic Ice Sheet also become ice free, resulting in a sea-level rise of up to 27 m. The global mean sea-level change after 10 000 years ranges from 9.2 to more than 37 m. For the highest-forcing scenario, the model uncertainty does not exclude the complete melting of the Antarctic ice sheet during the next 10 000 years.
Viewing polyphonic piano transcription as a multitask learning problem, where we need to simultaneously predict onsets, intermediate frames and offsets of notes, we investigate the performance impact of additional prediction targets, using a variety of suitable convolutional neural network architectures. We quantify performance differences of additional objectives on the large MAESTRO dataset.
In this paper, we present the LSF parameters by a unit vector form, which has directional characteristics. The underlying distribution of this unit vector variable is modeled by a von Mises-Fisher mixture model (VMM). With the high rate theory, the optimal inter-component bit allocation strategy is proposed and the distortion-rate (D-R) relation is derived for the VMM based-VQ (VVQ). Experimental results show that the VVQ outperforms our recently introduced DVQ and the conventional GVQ.
SpeechPy is an open source Python package that contains speech preprocessing techniques, speech features, and important post-processing operations. It provides most frequent used speech features including MFCCs and filterbank energies alongside with the log-energy of filter-banks. The aim of the package is to provide researchers with a simple tool for speech feature extraction and processing purposes in applications such as Automatic Speech Recognition and Speaker Verification.
Mauricio Toro, Myriam Desainte-Catherine, Antoine Allombert
In this chapter we explain briefly the fundamentals of the interactive scores formalism. Then we develop a solution for implementing the ECO machine by mixing petri nets and constraints propagation. We also present another solution for implementing the ECO machine using concurrent constraint programming. Finally, we present an extension of interactive score with conditional branching.
This manifesto paper will introduce machine listening intelligence, an integrated research framework for acoustic and musical signals modelling, based on signal processing, deep learning and computational musicology.
ABSTRACT An intertidal rocky platform tucked in behind a rocky headland on open‐ocean Gibson Beach, near Raglan, supports an agglomeration of cobble‐ to large‐boulder‐sized clasts of Cenozoic sandstone and limestone. Rather than exhibiting just point contacts, many larger clasts are tightly interlocked and fitted with their neighbours and/or the underlying platform bedrock. Clast interface geometry relates to the strength contrast between adjacent rock types, linked to their calcite (cement) content. The end‐product is an armoured, highly stable framework of boulder clasts resembling a giant three‐dimensional jigsaw puzzle. While the direct impact of breaking waves likely plays a role in in situ jostling of boulders, we speculate that mechanical abrasion and fitting between larger clasts may also be promoted and maintained by in situ microvibration of the boulders as a consequence of wave‐induced microseismic shaking within the cliff‐backed rocky platform and headland, especially during major storm wave assault from the southwest.
Even though chord roots constitute a fundamental concept in music theory, existing models do not explain and determine them to full satisfaction. We present a new method which takes sequential context into account to resolve ambiguities and detect nonharmonic tones. We extract features from chord pairs and use a decision tree to determine chord roots. This leads to a quantitative improvement in correctness of the predicted roots in comparison to other models. All this raises the question how much harmonic and nonharmonic tones actually contribute to the perception of chord roots.
Eric E. Hamke, Ramiro Jordan, Manel Ramon-Martinez
This report describes the use of a support vector machines with a novel kernel, to determine the breathing rate and inhalation duration of a fire fighter wearing a Self-Contained Breathing Apparatus. With this information, an incident commander can monitor the firemen in his command for exhaustion and ensure timely rotation of personnel to ensure overall fire fighter safety
Abstract. Large climate perturbations occurred during Termination II when the ice sheets retreated from their glacial configuration. Here we investigate the impact of ice sheet changes and associated freshwater fluxes on the climate evolution at the onset of the Last Interglacial. The period from 135 to 120 kyr BP is simulated with the Earth system model of intermediate complexity LOVECLIM v.1.3 with prescribed evolution of the Antarctic ice sheet, the Greenland ice sheet and the other Northern Hemisphere ice sheets. Variations in meltwater fluxes from the Northern Hemisphere ice sheets lead to North Atlantic temperature changes and modifications of the strength of the Atlantic meridional overturning circulation. By means of the interhemispheric see-saw effect, variations in the Atlantic meridional overturning circulation also give rise to temperature changes in the Southern Hemisphere, which are modulated by the direct impact of Antarctic meltwater fluxes into the Southern Ocean. Freshwater fluxes from the melting Antarctic ice sheet lead to a millennial time scale oceanic cold event in the Southern Ocean with expanded sea ice as evidenced in some ocean sediment cores, which may be used to constrain the timing of ice sheet retreat.
We introduce a scattering representation for the analysis and classification of sounds. It is locally translation-invariant, stable to deformations in time and frequency, and has the ability to capture harmonic structures. The scattering representation can be interpreted as a convolutional neural network which cascades a wavelet transform in time and along a harmonic spiral. We study its application for the analysis of the deformations of the source-filter model.
Reverberation is damaging to both the quality and the intelligibility of a speech signal. We propose a novel single-channel method of dereverberation based on a linear filter in the Short Time Fourier Transform domain. Each enhanced frame is constructed from a linear sum of nearby frames based on the channel impulse response. The results show that the method can resolve any reverberant signal with knowledge of the impulse response to a non-reverberant signal.
This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and text-to-speech experiments. Our results confirm that the proposed method increases the quality of synthetic speech in both experiments.