Hasil "Oral communication. Speech"

DOAJ Open Access 2026

Language as an Expression of Anger in Selected Namibian Novels: Masked warrior and Complicated

Elizabeth Kambwale

This article presents a cognitive stylistic study of anger in two Namibian novels: Ndinaelao Moses’ Masked warrior and Malakia Haimbangu’s Complicated. The study evaluated the lexical expressions of anger, figurative expressions, and features of anger discourse. The study applied textual world theory as a theoretical framework for understanding and analysing the texts. The methods of data collection and data analysis were qualitative. The results of the study showed that the texts had manipulated and maintained the readers' interest through the use of anger. The study found that words about anger are made more offensive by using figurative language terms. Additionally, the study showed that angry language might be used to show defensiveness, sorrow, or arrogance. The study found that creating writings with anger in them makes readers relate to the characters’ real-world experiences. It was further established that the use of figurative phrases to communicate and simplify complex ideas that are challenging to understand was also concluded. The study concluded that discourse influences how angry texts are written. The study suggests the use of alternative language and grammatical expressions that are consistent with Text World Theory, which emphasises the significance of using linguistic and cognitive strategies to create a cohesive and immersive fictional world.

Language. Linguistic theory. Comparative grammar, Oral communication. Speech

Detail Sumber

DOAJ Open Access 2025

Prompted by Me. Generated by ChatGPT

David J. Gunkel

This essay—which is not only about human-machine collaboration but is a performance in human-machine collaboration—interrogates the shifting terrain of authorship and creativity in the age of generative artificial intelligence (GAI). Challenging both the instrumentalist view of technology and the romantic myth of the singular genius, it argues for a reconceptualization of creative production as distributed, dialogical, and co-constituted. Drawing on both theoretical innovations in poststructuralism and the practices of pre- and post-modern content creators, the essay repositions the algorithm not as a mere tool but as an active participant in the generation of meaning. In doing so, it exposes and disrupts—in both content and form—the metaphysical assumptions that continue to underwrite our understanding of writing, agency, and communication.

Technology (General), Oral communication. Speech

Detail DOI Sumber

arXiv Open Access 2025

Enhancing In-the-Wild Speech Emotion Conversion with Resynthesis-based Duration Modeling

Navin Raj Prabhu, Danilo de Oliveira, Nale Lehmann-Willenbrock et al.

Speech Emotion Conversion aims to modify the emotion expressed in input speech while preserving lexical content and speaker identity. Recently, generative modeling approaches have shown promising results in changing local acoustic properties such as fundamental frequency, spectral envelope and energy, but often lack the ability to control the duration of sounds. To address this, we propose a duration modeling framework using resynthesis-based discrete content representations, enabling modification of speech duration to reflect target emotions and achieve controllable speech rates without using parallel data. Experimental results reveal that the inclusion of the proposed duration modeling framework significantly enhances emotional expressiveness, in the in-the-wild MSP-Podcast dataset. Analyses show that low-arousal emotions correlate with longer durations and slower speech rates, while high-arousal emotions produce shorter, faster speech.

en eess.AS

Detail Sumber

arXiv Open Access 2025

UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification

Mufan Sang, John H. L. Hansen

With excellent generalization ability, SSL speech models have shown impressive performance on various downstream tasks in the pre-training and fine-tuning paradigm. However, as the size of pre-trained models grows, fine-tuning becomes practically unfeasible due to expanding computation and storage requirements and the risk of overfitting. This study explores parameter-efficient tuning (PET) methods for adapting large-scale pre-trained SSL speech models to speaker verification task. Correspondingly, we propose three PET methods: (i)an adapter-tuning method, (ii)a prompt-tuning method, and (iii)a unified framework that effectively incorporates adapter-tuning and prompt-tuning with a dynamically learnable gating mechanism. First, we propose the Inner+Inter Adapter framework, which inserts two types of adapters into pre-trained models, allowing for adaptation of latent features within the intermediate Transformer layers and output embeddings from all Transformer layers, through a parallel adapter design. Second, we propose the Deep Speaker Prompting method that concatenates trainable prompt tokens into the input space of pre-trained models to guide adaptation. Lastly, we propose the UniPET-SPK, a unified framework that effectively incorporates these two alternate PET methods into a single framework with a dynamic trainable gating mechanism. The proposed UniPET-SPK learns to find the optimal mixture of PET methods to match different datasets and scenarios. We conduct a comprehensive set of experiments on several datasets to validate the effectiveness of the proposed PET methods. Experimental results on VoxCeleb, CN-Celeb, and 1st 48-UTD forensic datasets demonstrate that the proposed UniPET-SPK consistently outperforms the two PET methods, fine-tuning, and other parameter-efficient tuning methods, achieving superior performance while updating only 5.4% of the parameters.

en eess.AS, cs.LG

Detail Sumber

DOAJ Open Access 2024

Student perspectives of simulated learning to improve their dysphagia management

Skye N. Adams, Kelly-Ann Kater, Jaishika Seedat

Background: The use of simulation to enhance knowledge translation and bridge the theoretical-clinical gap to enhance clinical training and competency in health professions has received mixed reviews in the literature. Objectives: This research examined student perspectives of a simulation laboratory in speech therapy to improve students’ clinical competency when working with adults with communication and dysphagia impairments. Method: An exploratory descriptive pilot study was conducted in 2022 with 16 third-year speech-language therapy students. This mixed-methods study involved students completing purposefully developed pre-and post-surveys to explore their experiences with simulated teaching and learning and their perceptions of confidence. Data were analysed using an independent t-test. Following the surveys, the students participated in a focus group discussion about their simulation experience, and data were analysed using thematic analysis. Results: Student ratings of clinical skills improved from pre to post-simulation significantly overall and across six out of the eight items. The focus group revealed insights into students’ experiences, highlighting increased confidence, the benefits of making mistakes in a safe environment and improved preparedness to work with dysphagia in patients. Conclusion: While simulation serves as a valuable tool in enhancing clinical skills and building confidence, it must be used as an adjunct to real-life exposure and not as a replacement. Contribution: The integration of both simulated and real-life experiences is essential to provide a comprehensive and practical learning environment for students.

Oral communication. Speech

Detail DOI Sumber

DOAJ Open Access 2024

Cochlear implantation outcomes in children with multiple disabilities: a topic that’s worth revisiting

Goh Bee-See, Nur Af’Idah Mohd Zulkefli, Asma Abdullah et al.

Objectives: To determine the benefits of cochlear implantation in hearing loss children with multiple disabilities (MD) in terms of auditory outcomes, speech performance, and their quality of life. Methods: This was a cross sectional study from January 2019 to December 2020 in which thirty-one children with hearing loss and multiple disabilities were evaluated. Their improvement in auditory and speech performances were assessed using Categories of Auditory Performance version II (CAP-II) and the Speech Intelligibility Rating (SIR) scales. The assessment was done at 6-month intervals, with the baseline evaluation done at least six months after activation of the implant. Parents were asked to fill the Parents Evaluation of Aural/Oral Performance of Children (PEACH) diary and Perceived Benefit Questionnaire (PBQ) to evaluate the child’s quality of life. Results: All 31 children have Global Developmental Delay (GDD), with 11 having an additional disability. Both mean CAP-II and SIR scores showed significant improvement with increased hearing age (p < 0.05) after 6-month intervals. In addition, 20 out of 31 children (64.5%) have achieved verbal communication after implantation. The mean PEACH score in quiet was significantly better than in noise (p = 0.007) and improved with the increased of hearing age. The majority of parents (96%‒100%) perceived a cochlear implant as beneficial to their child in terms of auditory response, awareness, interaction, communication, and speech development. Conclusions: Cochlear implantation had shown benefits in children with multiple disabilities. Outcome measures should not only focus on auditory and speech performances but the improvement in quality of life. Hence, individualized each case with realistic expectation from families must be emphasized in this group of children. Level of evidence: Level 3.

Otorhinolaryngology

Detail DOI Sumber

DOAJ Open Access 2024

El disparate de la antigua lírica al impreso popular: fórmulas y personajes animales

Martha Fernanda Vázquez Carbajal

En este artículo se abordarán una selección de divertimentos publicados por Antonio Vanegas Arroyo (1880-1927) y Eduardo Guerrero (1900-1959) comparándolos con algunos ejemplos de la lírica antigua hispánica hasta los cancioneros tradicionales del siglo XX. Se hará énfasis en lo satírico y burlesco de los impresos, en los recursos formulísticos recurrentes y en la importancia de usar listados para crear una sensación de coherencia en lo absurdo, sobre todo cuando se trata de personajes animales. Todo ello con el fin de trazar una línea en la «poesía tradicional» que permita vincular las canciones desde la lírica antigua hasta la literatura popular impresa mexicana del siglo XX.

Oral communication. Speech, French literature - Italian literature - Spanish literature - Portuguese literature

Detail DOI Sumber

DOAJ Open Access 2024

Description des troubles langagiers suite à un accident vasculaire cérébral ischémique du thalamus : une revue de la littérature.

Raphaëlle Lesigne, Elisa Bron, Anaïs Philippe et al.

Contexte : Le thalamus est une structure cérébrale complexe ayant fait l’objet de nombreuses études scientifiques depuis sa découverte. Son implication dans les processus langagiers est actuellement reconnue par la communauté scientifique. Objectifs : Les objectifs de cette étude sont de recenser les dernières avancées de la recherche afin de préciser les manifestations cliniques des aphasies retrouvées lors d’un AVC ischémique du thalamus, et permettre aux orthophonistes de les évaluer et les prendre en charge de manière spécifique. Méthode : L'utilisation de la méthode PRISMA en quatre étapes, a permis de constituer une revue de la littérature et de recenser les articles les plus pertinents concernant le sujet d’étude. Résultats : Au total, ce sont 10 articles qui ont intégré cette revue de la littérature. Différents tests, plus ou moins exhaustifs et spécifiques, ont été administrés aux échantillons de patients de ces études, afin d’évaluer les fonctions langagières. Cette revue recense et analyse des informations concernant la fréquence, la sévérité, la latéralité et les atteintes cognitivo-linguistiques retrouvées en fonction du territoire vasculaire atteint, ainsi que l’évolution des aphasies thalamiques. L’hypothèse de la responsabilité d’une déconnexion thalamo-corticale pour les troubles langagiers est également évoquée. Discussion : Les résultats ont montré que le thalamus est impliqué dans des processus langagiers, avec une latéralisation à gauche. Son atteinte pourrait être associée à des phénomènes de diaschisis et de déconnexion thalamo-corticale et provoquerait des altérations de la production et de la compréhension du langage. Le degré de sévérité est plutôt léger, avec une atteinte particulière du langage élaboré, pouvant ainsi provoquer des difficultés de diagnostic lors de la phase aiguë. La fréquence de l’aphasie thalamique est de ce fait variable en fonction des études, et dépend de la spécificité de l’évaluation des troubles du langage. Conclusion : Les atteintes langagières dans le cas d’un AVC thalamique se caractérisent essentiellement par des troubles de la fluence, une anomie et peu ou pas d’atteinte de la répétition, avec un bon pronostic d’évolution. Il semble nécessaire d’utiliser des outils sensibles, élaborés d’après les modèles cognitifs et neuropsychologiques du langage, proposant ainsi une évaluation fine de l’aphasie thalamique, et permettant d’envisager des pistes rééducatives spécifiques et adaptées.

Oral communication. Speech, Pathology

Detail DOI Sumber

DOAJ Open Access 2024

Carmen Amaya interpreta el Romancero musicalizado (con fragmentos biográficos inéditos sobre literatura oral, cultura popular y memoria colectiva)

Francisco J. Escobar-Borrego

El presente artículo ofrece noticias inéditas sobre la sensibilidad y el interés de Carmen Amaya por el Romancero musicalizado. En este sentido, un texto desconocido hasta la fecha, del que se brinda la edición gracias a los distintos bocetos transmitidos, atesora su lectura interpretativa circunscrita a tres versiones poético-musicales dedicadas al Duque de Alba, Rosalinda y el Conde Olinos. Dicha fuente constituye, en fin, el fruto de un inconcluso proyecto de biografía de su amigo y asesor artístico Domingo J. Samperio.

Oral communication. Speech, French literature - Italian literature - Spanish literature - Portuguese literature

Detail DOI Sumber

arXiv Open Access 2024

SF-Speech: Straightened Flow for Zero-Shot Voice Clone

Xuyuan Li, Zengqiang Shang, Hua Hua et al.

Recently, neural ordinary differential equations (ODE) models trained with flow matching have achieved impressive performance on the zero-shot voice clone task. Nevertheless, postulating standard Gaussian noise as the initial distribution of ODE gives rise to numerous intersections within the fitted targets of flow matching, which presents challenges to model training and enhances the curvature of the learned generated trajectories. These curved trajectories restrict the capacity of ODE models for generating desirable samples with a few steps. This paper proposes SF-Speech, a novel voice clone model based on ODE and in-context learning. Unlike the previous works, SF-Speech adopts a lightweight multi-stage module to generate a more deterministic initial distribution for ODE. Without introducing any additional loss function, we effectively straighten the curved reverse trajectories of the ODE model by jointly training it with the proposed module. Experiment results on datasets of various scales show that SF-Speech outperforms the state-of-the-art zero-shot TTS methods and requires only a quarter of the solver steps, resulting in a generation speed approximately 3.7 times that of Voicebox and E2 TTS. Audio samples are available at the demo page\footnote{[Online] Available: https://lixuyuan102.github.io/Demo/}.

en cs.SD, eess.AS

Detail Sumber

arXiv Open Access 2024

LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement

Haoyin Yan, Jie Zhang, Cunhang Fan et al.

Speech enhancement (SE) aims to extract the clean waveform from noise-contaminated measurements to improve the speech quality and intelligibility. Although learning-based methods can perform much better than traditional counterparts, the large computational complexity and model size heavily limit the deployment on latency-sensitive and low-resource edge devices. In this work, we propose a lightweight SE network (LiSenNet) for real-time applications. We design sub-band downsampling and upsampling blocks and a dual-path recurrent module to capture band-aware features and time-frequency patterns, respectively. A noise detector is developed to detect noisy regions in order to perform SE adaptively and save computational costs. Compared to recent higher-resource-dependent baseline models, the proposed LiSenNet can achieve a competitive performance with only 37k parameters (half of the state-of-the-art model) and 56M multiply-accumulate (MAC) operations per second.

en eess.AS, cs.SD

Detail Sumber

arXiv Open Access 2024

On the Parameter Selection of Phase-transmittance Radial Basis Function Neural Networks for Communication Systems

Jonathan A. Soares, Kayol S. Mayer, Dalton S. Arantes

In the ever-evolving field of digital communication systems, complex-valued neural networks (CVNNs) have become a cornerstone, delivering exceptional performance in tasks like equalization, channel estimation, beamforming, and decoding. Among the myriad of CVNN architectures, the phase-transmittance radial basis function neural network (PT-RBF) stands out, especially when operating in noisy environments such as 5G MIMO systems. Despite its capabilities, achieving convergence in multi-layered, multi-input, and multi-output PT-RBFs remains a daunting challenge. Addressing this gap, this paper presents a novel Deep PT-RBF parameter initialization technique. Through rigorous simulations conforming to 3GPP TS 38 standards, our method not only outperforms conventional initialization strategies like random, $K$-means, and constellation-based methods but is also the only approach to achieve successful convergence in deep PT-RBF architectures. These findings pave the way to more robust and efficient neural network deployments in complex digital communication systems.

en eess.SP

Detail DOI Sumber

DOAJ Open Access 2023

Eye movement as a simple, cost-effective tool for people who stutter: A case study

Hilary D.-L. McDonagh, Patrick Broderick, Kenneth Monaghan

Background: Access to services remains the biggest barrier to helping the most vulnerable in the South African Stuttering Community. This novel stuttering therapy, harnessing an unconscious link between eye and tongue movement, may provide a new therapeutic approach, easily communicated and deliverable online. Objectives: This study provides both objective and subjective assessments of the feasibility of this intervention. Assessment tools holistically address all components of stuttering in line with comprehensive treatment approaches: core behaviours, secondary behaviours, anticipation and reactions. Method: On receipt of ethical approval, this single-subject case design recruited one adult (21-year-old) male with a developmental stutter (DS). The participant gave informed consent and completed four scheduled assessments: baseline, after 5-week training, 3 months post-intervention and 24 months post-completion. The study used objective assessment tools: Stuttering Severity Instrument-4 (SSI-4); Subjective-assessment tools: SSI-4 clinical use self-report tool (CUSR); Overall Assessment of Speaker’s Experience of Stuttering (OASES-A); Premonitory Awareness in Stuttering (PAiS) and Self-Report Stuttering Severity* (SRSS) (*final assessment). Results: The participant’s scores improved across all assessment measures, which may reflect a holistic improvement. The participant reported that the tool was very useful. There were no negative consequences. Conclusion: This case report indicates that this innovative treatment may be feasible. No adverse effects were experienced, and the treatment only benefited the participant. The results justify the design of a pilot randomised feasibility clinical trial. Contribution: The results indicate that this is a needed breakthrough in stuttering therapy as the instructions can be easily translated into any language. It can also be delivered remotely reducing accessibility barriers.

Oral communication. Speech

Detail DOI Sumber

DOAJ Open Access 2023

Corrigendum: Contextualising clinical reasoning within the clinical swallow evaluation: A scoping review and expert consultation

Thiani Pillay, Mershen Pillay

No abstract available.

Oral communication. Speech

Detail DOI Sumber

arXiv Open Access 2023

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu et al.

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combines automatic speech recognition, speech translation and speaker turn detection using special tokens in a serialized labeling format. We run experiments on the Fisher-CALLHOME corpus, which we adapted by merging the two single-speaker channels into one multi-speaker channel, thus representing the more realistic and challenging scenario with multi-speaker turns and cross-talk. Experimental results across single- and multi-speaker conditions and against conventional ST systems, show that our model outperforms the reference systems on the multi-speaker condition, while attaining comparable performance on the single-speaker condition. We release scripts for data processing and model training.

en cs.CL, eess.AS

Detail Sumber

arXiv Open Access 2023

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan et al.

Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowledge transfer from attention matrices of models rooted in high resource modalities, text and images, to those in resource-constrained domains, speech and audio, employing a zero-shot paradigm. MAM reduces the relative Word Error Rate (WER) of an Automatic Speech Recognition (ASR) model by up to 6.70%, and relative classification error of an Audio Event Classification (AEC) model by 10.63%. In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2.90% relative reduction in WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning.

en cs.LG, cs.SD

Detail Sumber

S2 Open Access 2022

Voice Sequelae Following Recovery From COVID-19

Tatiana Romero Arias, Moisés Betancort Montesinos

Introduction Covid-19 is an infectious disease with a different symptomatic implication depending on each person. There are sequelae in the nervous, cardiovascular, and/or digestive system that involve the approach and multidisciplinary work of different health professionals where the speech therapist is included. In this way, we can speak of a direct relationship between speech therapy and Covid-19; especially in those patients with serious sequelae such as the inability to eat and/or speak and the loss of voice. The damage caused to the laryngeal mucosa triggers the loss of some of the qualities of the voice, limiting oral communication. That is why we can find dysphonias caused by a great weakness, by a continuous overexertion or because of a paralysis of the vocal cords. Objectives/Hypothesis The objective of this study was to identify the patterns of behavior in the biomechanical correlates of people who passed Covid-19 symptomatically with sequelae in voice. Methods An experimental study with a total of 21 participants (11 women and 10 men) with sequelae in voice post Covid-19 is presented. Voice samples were collected and biomechanical correlates were analyzed through the Voice Clinical Systems program. Results and Conclusions The results show different altered biomechanical patterns between men and women that correlate with other infectious diseases.

13 sitasi en Medicine

Detail DOI Sumber

DOAJ Open Access 2022

Brief children's dictionary questionnaire SDDS 16-42: Introducing a screening diagnostic tool for early detection of children with delay in their language skills' development

Ilona Bytešníková, Filip Smolík

The aim of this work is to inform Clinical Speech Therapists and other experts of the significance of early detection of deficits in language skills' development via parental questionnaires. Parental questionnaires have been used efficiently in several foreign countries for many years. Even what are called the "brief versions" of parental questionnaires are significant for the early detection of children with delays/impairments in their language development. The article introduces a screening diagnostic tool for the early assessment of the level of receptive and productive language. Launching the said tool into practice can facilitate early identification of children who show increased risk of language development disorders.

Medicine, Oral communication. Speech

Detail DOI Sumber

DOAJ Open Access 2022

Testing the impact of paraverbal irony signals. Experimental study on verbal irony identification in face-to-face and computer-mediated communication

Ellis Raissa

This paper reports the results of an experimental study with a between subject design (N = 122) whose aim was to compare irony comprehension rates in face-to-face (FTF) and computer-mediated communication (CMC), and examine the influence of paraverbal irony signals on irony identification rates. An irony comprehension test was intersemiotically translated to three conditions: FTF (n = 46), paraverbal signal-rich CMC (n = 30), and paraverbal signal-poor CMC (n = 46). The study adopted a relevance theoretic account of irony. There was a statistically significant difference between the signal-rich CMC and FTF conditions - irony identification rates were higher in the signal-rich CMC condition. The results are important since they suggest that paraverbal irony signals are not essential for correct irony identification if relevant contextual information is available, and the CMC medium is not only unlikely to be an obstacle in communicating the ironic intent, but with the addition of the medium-specific irony signals, may be significantly better.

Oral communication. Speech, Psychology

Detail DOI Sumber

DOAJ Open Access 2022

Is a hybrid of online and face-to-face services feasible for audiological rehabilitation post COVID-19? Findings from three public health patients

Nuha Khatib, Vera-Genevey Hlayisi

Background: The global coronavirus disease 2019 (COVID-19) pandemic has pushed many audiologists to incorporate remote service delivery methods to adhere to mandatory health and safety protocols. The use of tele-audiology for audiological rehabilitation may provide a sustainable, cost-effective modality to suit the existing need, particularly in low-resourced countries. Objectives: This study aimed to investigate the feasibility of implementing a hybrid tele-rehabilitation programme in a South African public health context. An online auditory training (AT) programme was used to determine (1) compliance, (2) clinical benefit, (3) participant experience and (4) costs. Method: A convergent mixed methods design with a feasibility approach was utilised. Data collection was done through questionnaires, in-booth assessments, online AT, and face-to-face interviewing. Participants undertook online AT over 4 weeks. For pre- and post-online AT, the Abbreviated Profile of Hearing Aid Benefit (APHAB), QuickSIN, entrance and exit questionnaires, interviews and a system usability scale were administered. Results: Key findings of this study included (1) a high compliance rate (84.82%) with minimal clinician contact time at 3 h 25 min over 5–6-weeks; (2) improvement in perceived hearing aid (HA) benefit, and improvement in listening skills; (3) reported positive experiences; and (4) minimal programme costs at an average of R1350.00 per participant. Conclusion: The results showed positive indicators that the use of hybrid tele-rehabilitative strategies may provide a viable alternative to the traditional face-to-face modality. The hybrid approach showed clinical benefits, cost-effectiveness, minimal contact time as well as COVID-19 compliance. Further large-scale research is still needed.

Oral communication. Speech

Detail DOI Sumber

Hasil untuk "Oral communication. Speech"