Hasil untuk "Philology. Linguistics"

Menampilkan 20 dari ~449058 hasil · dari DOAJ, arXiv, Semantic Scholar

JSON API
DOAJ Open Access 2026
“Like a small trip back to the GDR”? East German evaluations of the television serial Weissensee and its authenticity in a dynamic discourse landscape

Löblich Maria

In this article, I take the example of the historical German television fiction success Weissensee and ask how East Germans have read and judged the way this serial constructed their past more than a decade after its television premiere. This question is raised against the background that Weissensee was produced at a time when all questions related to the German Democratic Republic (GDR) seemed to have been answered. In the meantime, the remembrance discourse has changed. Hall’s encoding/decoding model, Giddens’s identity theory, and Sabrow’s typology of memories of the GDR provided the theoretical framework. Empirically, this study draws on five focus groups of East Germans, and the findings demonstrate that the dominant media memory, the dictatorship discourse, still operates within three reading positions. More recent discourses are visible but have not (yet) expanded the horizon to relocate oneself in the past. This study contributes to the rare research on authenticity from an audience perspective.

Communication. Mass media
arXiv Open Access 2026
Cross-linguistic Prosodic Analysis of Autistic and Non-autistic Child Speech in Finnish, French and Slovak

Ida-Lotta Myllylä, Sofoklis Kakouros

Prosodic differences in autism are well-documented, but cross-linguistic evidence remains limited. This study investigates prosody in autism across a multilingual corpus of Finnish, French, and Slovak speakers. 88 acoustic features from over 5,000 inter-pausal units were extracted, and data were reduced via Principal Component Analysis (PCA) and analyzed using Linear Mixed-Effects Models (LMMs). Cross-linguistically, autistic speakers exhibited increased general intensity variability and a clearer, less breathy voice quality (higher Harmonics-to-Noise Ratio and alpha ratio), alongside reduced temporal intensity dynamics and lower central f0. Monolingual analyses revealed language-specific nuances: Slovak results aligned with cross-linguistic f0 patterns but diverged on voice quality, while Finnish results mirrored the broader voice quality findings. These results emphasize including voice quality and intensity dynamics in the study of possible language-independent markers of autism, alongside traditional pitch measures. The findings challenge deficiency-based models, suggesting instead a complex, acoustically distinct prosodic profile across languages.

en eess.AS
DOAJ Open Access 2025
Childhood under the scope of time: the 20th-century child in the aesthetics of Neorealism and Soviet cinema / Детство под прицелом времени: ребенок ХХ века в киноэстетике неореализма и советском кинематографе

Dmitry Mikhaylyuk / Дмитрий Павлович Михайлюк, Amir Kader / Амир Святославович Кадер

The aim of the research is to identify the evolutionary scenario and the specifics of the transformation of the child's image in Russian art, primarily using Soviet cinema as an example, taking into account the experience of world art development and the connection between the "Sixtiers" and the aesthetics of neorealism. The article analyzes the causes and nature of significant changes in the image of the child in 20th-century art, considering large-scale social, cultural, and ideological transformations. One of the prominent movements that shaped original perceptions of childhood was Italian neorealism, in which the image of the "orphan child" became a symbol of the tragic consequences of historical cataclysms and socio-economic crises. Soviet art, from its very inception, actively used images of young characters. For instance, the films "Kino-glaz" (1924), "The Desperate Battalion" (1933), and "Golden Honey" (1928) presented children as active participants in social life. Children were often depicted as miniature adults, actively involved in labor and patriotic processes. The article shows how, in the 1960s, Russian art, influenced by the aesthetics of neorealism, began to re-evaluate the image of the child. The "Sixtiers" paid more attention to child psychology, themes of socialization, friendship, and growing up. The relevance of studying the image of the child in contemporary art is linked to the need to support and assist children in the process of socialization. Modern cinema about children and for children not only showcases young heroes but also transmits important moral and ethical lessons to them and to society as a whole. The research results have demonstrated that realistic art, which rejects fantastical elements, offers children an experience based on real-life situations, helping them develop critical thinking and emotional perception of the world. / Цель исследования состоит в выявлении сценария эволюции и специфики трансформации образа ребенка в отечественном искусстве, преимущественно на примере советского кинематографа с учетом опыта развития мирового искусства и взаимосвязи «шестидесятников» с эстетикой неореализма. Статья анализирует причины и характер значительных изменений образа ребенка в искусстве XX века с учетом масштабных общественных, культурных и идеологических трансформаций. Одним из ярких направлений, сформировавших оригинальные представления о детстве, выступил итальянский неореализм, в котором образ «ребенка-сироты» стал символом трагических последствий исторических катаклизмов и социально-экономических кризисов. Советское искусство, начиная с первых своих шагов, активно использовало образы юных персонажей. Так фильмы «Кино-глаз» (1924), «Отчаянный батальон» (1933), «Золотой мед» (1928) представляли детей как активных участников социальной жизни. Часто дети изображались как миниатюрные взрослые, активно вовлеченные в трудовые и патриотические процессы. В статье показано, как в 1960-е годы отечественное искусство под влиянием эстетики неореализма начинает переосмысливать образ ребенка. Шестидесятники в большей мере уделяли внимание психологии ребенка, темам социализации, дружбы, взросления. Актуальность изучения образа ребенка в современном искусстве связана с необходимостью поддержки и помощи детям в процессе социализации. Современное кино о детях и для детей не только показывает юных героев, но и транслирует им и всему обществу важные уроки нравственности и морали. Результаты исследования продемонстрировали, что реалистическое искусство, отказывающееся от фантастических элементов, предлагает ребенку опыт, основанный на реальных жизненных ситуациях, помогающий развивать критическое мышление и эмоциональное восприятие мира.

Visual arts, Arts in general
DOAJ Open Access 2025
Local “Memory Wars” and the Phenomenon of “Cancellation” in the Media of “Accommodative Culture” (Case of Republic of Tatarstan)

Alexander V. Ovchinnikov

The article examines the problem of theory and practice of studying the internal Russian regional “memory wars”, which unfold, as a rule, in the local media and are accompanied by the phenomenon of “corporate cancellation”. It is stated that today the analysis of the ideological content of “memory wars” (“what is being argued about”) prevails in scientific practices of solving this problem, whereas it is more important to understand the socio-political mechanisms of the emergence of “conflicts of the past” in specific conditions of a non-“agonal” (not “Western”) types of culture. The aim of the study is to create a methodological model for analyzing the internal Russian regional “memory wars”. As a result of the research, the main provisions on the study of “memory wars” in the media of “accommodative culture” and the predominance of “corporate cancellation” mechanisms over the “culture of cancellation” were formulated. The dispute is going on between “Bulgarians” and “Tatars” about the origin of the Tatars and the difficulty of official recognition of the Kryashens (“Tatars- christians”) as a people other than Tatar, as well as polemics continues between Ufa and Kazan scientists about the ethnicity of the population living in the northwestern territories of contemporary Bashkortostan and about some archaeological objects located there. The main conclusion of the study is the statement of the secondary role of the factual content of the “memory wars”, the “deep” specifics of which are determined by “external” socio-political and even economic conditions, expressed in the implementation of “corporate abolition”. This conclusion is intended for the attention of social philosophers, anthropologists and historians.

Communication. Mass media
arXiv Open Access 2025
Exploring the encoding of linguistic representations in the Fully-Connected Layer of generative CNNs for Speech

Bruno Ferenc Šegedin, Gasper Beguš

Interpretability work on the convolutional layers of CNNs has primarily focused on computer vision, but some studies also explore correspondences between the latent space and the output in the audio domain. However, it has not been thoroughly examined how acoustic and linguistic information is represented in the fully connected (FC) layer that bridges the latent space and convolutional layers. The current study presents the first exploration of how the FC layer of CNNs for speech synthesis encodes linguistically relevant information. We propose two techniques for exploration of the fully connected layer. In Experiment 1, we use weight matrices as inputs into convolutional layers. In Experiment 2, we manipulate the FC layer to explore how symbolic-like representations are encoded in CNNs. We leverage the fact that the FC layer outputs a feature map and that variable-specific weight matrices are temporally structured to (1) demonstrate how the distribution of learned weights varies between latent variables in systematic ways and (2) demonstrate how manipulating the FC layer while holding constant subsequent model parameters affects the output. We ultimately present an FC manipulation that can output a single segment. Using this technique, we show that lexically specific latent codes in generative CNNs (ciwGAN) have shared lexically invariant sublexical representations in the FC-layer weights, showing that ciwGAN encodes lexical information in a linguistically principled manner.

en cs.CL
arXiv Open Access 2025
Automated Quality Control for Language Documentation: Detecting Phonotactic Inconsistencies in a Kokborok Wordlist

Kellen Parker van Dam, Abishek Stephen

Lexical data collection in language documentation often contains transcription errors and undocumented borrowings that can mislead linguistic analysis. We present unsupervised anomaly detection methods to identify phonotactic inconsistencies in wordlists, applying them to a multilingual dataset of Kokborok varieties with Bangla. Using character-level and syllable-level phonotactic features, our algorithms identify potential transcription errors and borrowings. While precision and recall remain modest due to the subtle nature of these anomalies, syllable-aware features significantly outperform character-level baselines. The high-recall approach provides fieldworkers with a systematic method to flag entries requiring verification, supporting data quality improvement in low-resourced language documentation.

en cs.CL
arXiv Open Access 2025
PILOT: Steering Synthetic Data Generation with Psychological & Linguistic Output Targeting

Caitlin Cisar, Emily Sheffield, Joshua Drake et al.

Generative AI applications commonly leverage user personas as a steering mechanism for synthetic data generation, but reliance on natural language representations forces models to make unintended inferences about which attributes to emphasize, limiting precise control over outputs. We introduce PILOT (Psychological and Linguistic Output Targeting), a two-phase framework for steering large language models with structured psycholinguistic profiles. In Phase 1, PILOT translates natural language persona descriptions into multidimensional profiles with normalized scores across linguistic and psychological dimensions. In Phase 2, these profiles guide generation along measurable axes of variation. We evaluate PILOT across three state-of-the-art LLMs (Mistral Large 2, Deepseek-R1, LLaMA 3.3 70B) using 25 synthetic personas under three conditions: Natural-language Persona Steering (NPS), Schema-Based Steering (SBS), and Hybrid Persona-Schema Steering (HPS). Results demonstrate that schema-based approaches significantly reduce artificial-sounding persona repetition while improving output coherence, with silhouette scores increasing from 0.098 to 0.237 and topic purity from 0.773 to 0.957. Our analysis reveals a fundamental trade-off: SBS produces more concise outputs with higher topical consistency, while NPS offers greater lexical diversity but reduced predictability. HPS achieves a balance between these extremes, maintaining output variety while preserving structural consistency. Expert linguistic evaluation confirms that PILOT maintains high response quality across all conditions, with no statistically significant differences between steering approaches.

en cs.CL, cs.AI
arXiv Open Access 2025
You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi et al.

Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%, based solely on the textual content of their utterances. Our explainability study reveals that the system decisions are linked to semantically similar keywords within utterances, stemming from how LibriSpeech is curated. Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.

en eess.AS, cs.CL
arXiv Open Access 2024
Differentiating Between Human-Written and AI-Generated Texts Using Automatically Extracted Linguistic Features

Georgios P. Georgiou

While extensive research has focused on ChatGPT in recent years, very few studies have systematically quantified and compared linguistic features between human-written and artificial intelligence (AI)-generated language. This exploratory study aims to investigate how various linguistic components are represented in both types of texts, assessing the ability of AI to emulate human writing. Using human-authored essays as a benchmark, we prompted ChatGPT to generate essays of equivalent length. These texts were analyzed using Open Brain AI, an online computational tool, to extract measures of phonological, morphological, syntactic, and lexical constituents. Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features such as specific types of consonants, nouns, adjectives, pronouns, adjectival/prepositional modifiers, and use of difficult words, among others. These findings underscore the importance of integrating automated tools for efficient language assessment, reducing time and effort in data analysis. Moreover, they emphasize the necessity for enhanced training methodologies to improve the engineering capacity of AI for producing more human-like text.

en cs.CL, cs.AI
arXiv Open Access 2024
Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models

Sergio E. Zanotto, Segun Aroyehun

The rapid advancements in large language models (LLMs) have significantly improved their ability to generate natural language, making texts generated by LLMs increasingly indistinguishable from human-written texts. Recent research has predominantly focused on using LLMs to classify text as either human-written or machine-generated. In our study, we adopt a different approach by profiling texts spanning four domains based on 250 distinct linguistic features. We select the M4 dataset from the Subtask B of SemEval 2024 Task 8. We automatically calculate various linguistic features with the LFTK tool and additionally measure the average syntactic depth, semantic similarity, and emotional content for each document. We then apply a two-dimensional PCA reduction to all the calculated features. Our analyses reveal significant differences between human-written texts and those generated by LLMs, particularly in the variability of these features, which we find to be considerably higher in human-written texts. This discrepancy is especially evident in text genres with less rigid linguistic style constraints. Our findings indicate that humans write texts that are less cognitively demanding, with higher semantic content, and richer emotional content compared to texts generated by LLMs. These insights underscore the need for incorporating meaningful linguistic features to enhance the understanding of textual outputs of LLMs.

en cs.CL
arXiv Open Access 2024
Linguistic Structure Induction from Language Models

Omar Momen

Linear sequences of words are implicitly represented in our brains by hierarchical structures that organize the composition of words in sentences. Linguists formalize different frameworks to model this hierarchy; two of the most common syntactic frameworks are Constituency and Dependency. Constituency represents sentences as nested groups of phrases, while dependency represents a sentence by assigning relations between its words. Recently, the pursuit of intelligent machines has produced Language Models (LMs) capable of solving many language tasks with a human-level performance. Many studies now question whether LMs implicitly represent syntactic hierarchies. This thesis focuses on producing constituency and dependency structures from LMs in an unsupervised setting. I review the critical methods in this field and highlight a line of work that utilizes a numerical representation for binary constituency trees (Syntactic Distance). I present a detailed study on StructFormer (SF) (Shen et al., 2021), which retrofits a transformer encoder architecture with a parser network to produce constituency and dependency structures. I present six experiments to analyze and address this field's challenges; experiments include investigating the effect of repositioning the parser network within the SF architecture, evaluating subword-based induced trees, and benchmarking the models developed in the thesis experiments on linguistic tasks. Models benchmarking is performed by participating in the BabyLM challenge, published at CoNLL 2023 (Momen et al., 2023). The results of this thesis encourage further development in the direction of retrofitting transformer-based models to induce syntactic structures, supported by the acceptable performance of SF in different experimental settings and the observed limitations that require innovative solutions to advance the state of syntactic structure induction.

en cs.CL, cs.AI
arXiv Open Access 2024
Promoting the linguistic diversity of TEI in the Maghreb and the Arab region

Henri Hudrisier, Rachid Zghibi, Sihem Zghidi et al.

The project targets both oral corpus and the rich text resources written in the Maghreb region. It focuses particularly on the continuity, for more than 12 centuries, of a classical still alive Arabic language and on the extreme hybridization of vernacular languages sustained by the rich Libyan, Roman, Hebrew and Ottoman influences and by the more recent French, Spanish and Italian linguistic interference. In short, the Maghreb is a place of extremely abundant, but much unexploited, textual studies.

en cs.IR, cs.OH
arXiv Open Access 2023
Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

Zhaokun Jiang, Qianxi Lv, Ziyin Zhang et al.

The growing popularity of neural machine translation (NMT) and LLMs represented by ChatGPT underscores the need for a deeper understanding of their distinct characteristics and relationships. Such understanding is crucial for language professionals and researchers to make informed decisions and tactful use of these cutting-edge translation technology, but remains underexplored. This study aims to fill this gap by investigating three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT. To achieve these objectives, we employ statistical testing, machine learning algorithms, and multidimensional analysis (MDA) to analyze Spokesperson's Remarks and their translations. After extracting a wide range of linguistic features, supervised classifiers demonstrate high accuracy in distinguishing the three translation types, whereas unsupervised clustering techniques do not yield satisfactory results. Another major finding is that ChatGPT-produced translations exhibit greater similarity with NMT than HT in most MDA dimensions, which is further corroborated by distance computing and visualization. These novel insights shed light on the interrelationships among the three translation types and have implications for the future advancements of NMT and generative AI.

en cs.CL
DOAJ Open Access 2022
Libertad, justicia y tolerancia. Una propuesta del discurso cinematográfico en la formación de comunicadores

Abel Antonio Grijalva Verdugo, Rosario Olivia Izaguirre Fierro

Esta investigación parte de la necesidad de interpretar la formación de comunicadores en México a través del cine. Lo hace a través de tres variables insertas en el discurso cinematográfico: libertad, justicia y tolerancia. Se basa en el análisis del discurso desde un modelo inductivo: modo simbólico-epistémicoestético, la interpretación de datos mediante un corpus de películas de EUA, Europa y América Latina y tiene como punto de partida el texto cinematográfico, conformado por la elipsis y el fuera de campo. De esta manera busca identificar los rasgos de formar y proyectar comunicadores en el cine.

Communication. Mass media
arXiv Open Access 2022
Modeling Intensification for Sign Language Generation: A Computational Approach

Mert İnan, Yang Zhong, Sabit Hassan et al.

End-to-end sign language generation models do not accurately represent the prosody in sign language. A lack of temporal and spatial variations leads to poor-quality generated presentations that confuse human interpreters. In this paper, we aim to improve the prosody in generated sign languages by modeling intensification in a data-driven manner. We present different strategies grounded in linguistics of sign language that inform how intensity modifiers can be represented in gloss annotations. To employ our strategies, we first annotate a subset of the benchmark PHOENIX-14T, a German Sign Language dataset, with different levels of intensification. We then use a supervised intensity tagger to extend the annotated dataset and obtain labels for the remaining portion of it. This enhanced dataset is then used to train state-of-the-art transformer models for sign language generation. We find that our efforts in intensification modeling yield better results when evaluated with automatic metrics. Human evaluation also indicates a higher preference of the videos generated using our model.

en cs.CL, cs.AI
arXiv Open Access 2022
Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization

Congbo Ma, Wei Emma Zhang, Pitawelayalage Dasun Dileepa Pitawela et al.

One key challenge in multi-document summarization is to capture the relations among input documents that distinguish between single document summarization (SDS) and multi-document summarization (MDS). Few existing MDS works address this issue. One effective way is to encode document positional information to assist models in capturing cross-document relations. However, existing MDS models, such as Transformer-based models, only consider token-level positional information. Moreover, these models fail to capture sentences' linguistic structure, which inevitably causes confusions in the generated summaries. Therefore, in this paper, we propose document-aware positional encoding and linguistic-guided encoding that can be fused with Transformer architecture for MDS. For document-aware positional encoding, we introduce a general protocol to guide the selection of document encoding functions. For linguistic-guided encoding, we propose to embed syntactic dependency relations into the dependency relation mask with a simple but effective non-linear encoding learner for feature learning. Extensive experiments show the proposed model can generate summaries with high quality.

en cs.CL
arXiv Open Access 2022
Compositional Evaluation on Japanese Textual Entailment and Similarity

Hitomi Yanaka, Koji Mineshima

Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.

en cs.CL
arXiv Open Access 2022
Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features

Changde Du, Kaicheng Fu, Jinpeng Li et al.

Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data for training. The two main reasons are 1) the under-exploitation of the multimodal semantic knowledge underlying the neural data and 2) the small number of paired (stimuli-responses) training data. To overcome these limitations, this paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. Specifically, we leverage the mixture-of-product-of-experts formulation to infer a latent code that enables a coherent joint generation of all three modalities. To learn a more consistent joint representation and improve the data efficiency in the case of limited brain activity data, we exploit both intra- and inter-modality mutual information maximization regularization terms. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories. Finally, we construct three trimodal matching datasets, and the extensive experiments lead to some interesting conclusions and cognitive insights: 1) decoding novel visual categories from human brain activity is practically possible with good accuracy; 2) decoding models using the combination of visual and linguistic features perform much better than those using either of them alone; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli. Code and data: https://github.com/ChangdeDu/BraVL.

en cs.CV, cs.AI
DOAJ Open Access 2021
A Gynocritical Study of The Color Purple by Alice Walker: A corpus-based Analysis of Adjectives

Zunaira Zafar , Haleema Majeed, Tehseen Zahra

From the beginning of the scholarly work on the women, most of the research have been carried out by opposite gender. Therefore, there have been limited work done to see how a woman portrays another woman in her writings. Moreover, there have been limited research conducted utilizing corpus tools for the analysis of gynocriticism. Thus, the present research aims to examine The Color Purple by Alice Walker by employing the corpus-based approach to investigate the representation of female character by the author in the novel. The positive or negative portrayal of women by the author in the novel was investigated through author’s usage of adjectives. Showalter's (2009) Theory of Gynocriticsim was used as a theoretical framework for the current research. Further, corpus-based methodology was employed to analyze how Alice Walker has portrayed the female character in her novel through the use of adjectives. An in-depth analysis has shown that Alice Walker has depicted woman as a helpless and sidelined being who can be turned as a resilient after suffering from frightful circumstances. The current research also opens new gates for the researchers to analyze the text from Gynocriticial perspective along with corpus techniques. Keywords: Adjectives; Corpus-based analysis; Gynocriticsim; The Color Purple; Woman  

Language. Linguistic theory. Comparative grammar, Oral communication. Speech

Halaman 38 dari 22453