Hasil untuk "English literature"

Menampilkan 20 dari ~7052305 hasil · dari DOAJ, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
Learning from Child-Directed Speech in Two-Language Scenarios: A French-English Case Study

Liel Binyamin, Elior Sulem

Research on developmentally plausible language models has largely focused on English, leaving open questions about multilingual settings. We present a systematic study of compact language models by extending BabyBERTa to English-French scenarios under strictly size-matched data conditions, covering monolingual, bilingual, and cross-lingual settings. Our design contrasts two types of training corpora: (i) child-directed speech (about 2.5M tokens), following BabyBERTa and related work, and (ii) multi-domain corpora (about 10M tokens), extending the BabyLM framework to French. To enable fair evaluation, we also introduce new resources, including French versions of QAMR and QASRL, as well as English and French multi-domain corpora. We evaluate the models on both syntactic and semantic tasks and compare them with models trained on Wikipedia-only data. The results reveal context-dependent effects: training on Wikipedia consistently benefits semantic tasks, whereas child-directed speech improves grammatical judgments in monolingual settings. Bilingual pretraining yields notable gains for textual entailment, with particularly strong improvements for French. Importantly, similar patterns emerge across BabyBERTa, RoBERTa, and LTG-BERT, suggesting consistent trends across architectures.

en cs.CL, cs.AI
DOAJ Open Access 2025
The impact of technology use for care by informal female caregivers on their well-being: a scoping review

Andrés Aparicio, María Alejandra Inostroza-Correa, Paula Miranda et al.

Abstract Introduction Informal caregiving roles are predominantly assumed by women, who often experience unique challenges related to physical, emotional, and social stress due to entrenched gender disparities. Within this context, the use of technologies to assist in caregiving tasks has become popular, and some evidence about their impact on caregiver well-being is available. However, significant research gaps persist. This scoping review was intended to assess the extent of literature examining the impact of technology use by informal female caregivers on their quality of life, to characterize existing research gaps, and to identify available evidence regarding gender-specific challenges. Methods We searched for studies in English, Spanish, and Portuguese published in peer-reviewed journals since 2018. We included studies exploring how informal female caregivers use technology and how such use impacts their well-being. The studies included in the review analyze the impact of technology use on the physical, emotional, or material well-being of informal female caregivers. Sources were screened in stages by two independent reviewers; data were extracted from selected full texts, and results were integrated into a narrative summary. Results A total of 14 studies were included in the review, highlighting a range of technologies such as health monitoring devices, communication platforms, and assistive aids. The review identified improvements in caregiver well-being related to reduced physical burden, enhanced emotional support, and increased social connectivity. However, significant research gaps were noted, particularly regarding the long-term effects of technology use, differences based on socio-economic contexts, and the limited inclusion of gender-specific analyses. Conclusions This review supports the notion that technology use can positively impact the well-being of informal female caregivers, especially in terms of emotional and social support. Nevertheless, the review also found that in certain situations, technology can fail to improve or even worsen the quality of life of caregivers. However, the limited availability of studies with standardized quantitative measures, gender-specific data, and comprehensive assessments of long-term effects highlights areas for future research. Further exploration into diverse sociocultural contexts and empirical model development will be essential to better understand the nuanced ways in which technology use influences caregiver quality of life. These findings underscore the potential for targeted technology solutions to support informal caregivers, with implications for healthcare professionals and policymakers designing caregiver support initiatives.

arXiv Open Access 2025
Domain-Aware Speaker Diarization On African-Accented English

Chibuzor Okocha, Kelechi Ezema, Christan Grant

This study examines domain effects in speaker diarization for African-accented English. We evaluate multiple production and open systems on general and clinical dialogues under a strict DER protocol that scores overlap. A consistent domain penalty appears for clinical speech and remains significant across models. Error analysis attributes much of this penalty to false alarms and missed detections, aligning with short turns and frequent overlap. We test lightweight domain adaptation by fine-tuning a segmentation module on accent-matched data; it reduces error but does not eliminate the gap. Our contributions include a controlled benchmark across domains, a concise approach to error decomposition and conversation-level profiling, and an adaptation recipe that is easy to reproduce. Results point to overlap-aware segmentation and balanced clinical resources as practical next steps.

en cs.CL, cs.AI
arXiv Open Access 2025
Cleaning English Abstracts of Scientific Publications

Michael E. Rose, Nils A. Herrmann, Sebastian Erhardt

Scientific abstracts are often used as proxies for the content and thematic focus of research publications. However, a significant share of published abstracts contains extraneous information-such as publisher copyright statements, section headings, author notes, registrations, and bibliometric or bibliographic metadata-that can distort downstream analyses, particularly those involving document similarity or textual embeddings. We introduce an open-source, easy-to-integrate language model designed to clean English-language scientific abstracts by automatically identifying and removing such clutter. We demonstrate that our model is both conservative and precise, alters similarity rankings of cleaned abstracts and improves information content of standard-length embeddings.

en cs.CL
DOAJ Open Access 2024
How does principal’s instructional leadership shape teacher performance mediated by teacher self-efficacy in Indonesian education context?

Elfira, Rasdiana, Fitrawati et al.

IntroductionThis study seeks to examine the relationships among principal instructional leadership (PIL), teacher self-efficacy (TSE), and teacher performance (TP) within the Indonesian educational setting, specifically in schools implementing the Merdeka Belajar curriculum.MethodsEmploying a quantitative approach utilizing Partial Least Square Structural Equation Modeling (PLS-SEM), this research involved a sample of 127 productive teachers randomly selected from 247 teachers. Surveys were administered to measure perceptions of PIL, TSE, and TP, and relevant literature was reviewed to provide theoretical frameworks and research hypotheses.ResultsThe findings reveal significant direct relationships between PIL and both TSE (0.721) and TP (0.598), indicating that strong instructional leadership positively influences teachers’ self-efficacy and performance. Moreover, TSE is found to directly impact TP (0.358), suggesting that teachers’ beliefs in their capabilities play a crucial role in their instructional practices and outcomes. TSE is also identified as a significant mediator in the relationship between PIL and TP (0.258), highlighting the importance of fostering teachers’ self-efficacy to enhance their performance indirectly through instructional leadership.DiscussionThis research underscores the crucial significance of instructional leadership in influencing school culture, which ultimately affects teacher confidence and facilitates teacher performance. By understanding the mechanisms through which PIL impacts TSE and TP, educational leaders can implement strategies to foster a supportive environment that empowers teachers.

Education (General)
arXiv Open Access 2024
What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics

Jordan J. Bird

The integration of new literature into the English curriculum remains a challenge since educators often lack scalable tools to rapidly evaluate readability and adapt texts for diverse classroom needs. This study proposes to address this gap through a multimodal approach that combines transformer-based text classification with linguistic feature analysis to align texts with UK Key Stages. Eight state-of-the-art Transformers were fine-tuned on segmented text data, with BERT achieving the highest unimodal F1 score of 0.75. In parallel, 500 deep neural network topologies were searched for the classification of linguistic characteristics, achieving an F1 score of 0.392. The fusion of these modalities shows a significant improvement, with every multimodal approach outperforming all unimodal models. In particular, the ELECTRA Transformer fused with the neural network achieved an F1 score of 0.996. Unimodal and multimodal approaches are shown to have statistically significant differences in all validation metrics (accuracy, precision, recall, F1 score) except for inference time. The proposed approach is finally encapsulated in a stakeholder-facing web application, providing non-technical stakeholder access to real-time insights on text complexity, reading difficulty, curriculum alignment, and recommendations for learning age range. The application empowers data-driven decision making and reduces manual workload by integrating AI-based recommendations into lesson planning for English literature.

en cs.CL, cs.AI
arXiv Open Access 2024
CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation

Kung Yin Hong, Lifeng Han, Riza Batista-Navarro et al.

This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus has been created by combining different available corpora online with preprocessing and cleaning. In addition, a monolingual Cantonese dataset has been created through web scraping to aid the synthetic parallel corpus generation. Following the data collection process, several approaches, including fine-tuning models, back-translation, and model switch, have been used. The translation quality of models has been evaluated with multiple quality metrics, including lexicon-based metrics (SacreBLEU and hLEPOR) and embedding-space metrics (COMET and BERTscore). Based on the automatic metrics, the best model is selected and compared against the 2 best commercial translators using the human evaluation framework HOPES. The best model proposed in this investigation (NLLB-mBART) with model switch mechanisms has reached comparable and even better automatic evaluation scores against State-of-the-art commercial models (Bing and Baidu Translators), with a SacreBLEU score of 16.8 on our test set. Furthermore, an open-source web application has been developed to allow users to translate between Cantonese and English, with the different trained models available for effective comparisons between models from this investigation and users. CANTONMT is available at https://github.com/kenrickkung/CantoneseTranslation

en cs.CL, cs.AI
arXiv Open Access 2024
Grammatical Error Correction for Code-Switched Sentences by Learners of English

Kelvin Wey Han Chan, Christopher Bryant, Li Nguyen et al.

Code-switching (CSW) is a common phenomenon among multilingual speakers where multiple languages are used in a single discourse or utterance. Mixed language utterances may still contain grammatical errors however, yet most existing Grammar Error Correction (GEC) systems have been trained on monolingual data and not developed with CSW in mind. In this work, we conduct the first exploration into the use of GEC systems on CSW text. Through this exploration, we propose a novel method of generating synthetic CSW GEC datasets by translating different spans of text within existing GEC corpora. We then investigate different methods of selecting these spans based on CSW ratio, switch-point factor and linguistic constraints, and identify how they affect the performance of GEC systems on CSW text. Our best model achieves an average increase of 1.57 $F_{0.5}$ across 3 CSW test sets (English-Chinese, English-Korean and English-Japanese) without affecting the model's performance on a monolingual dataset. We furthermore discovered that models trained on one CSW language generalise relatively well to other typologically similar CSW languages.

en cs.CL
arXiv Open Access 2024
BgGPT 1.0: Extending English-centric LLMs to other languages

Anton Alexandrov, Veselin Raychev, Dimitar I. Dimitrov et al.

We present BgGPT-Gemma-2-27B-Instruct and BgGPT-Gemma-2-9B-Instruct: continually pretrained and fine-tuned versions of Google's Gemma-2 models, specifically optimized for Bulgarian language understanding and generation. Leveraging Gemma-2's multilingual capabilities and over 100 billion tokens of Bulgarian and English text data, our models demonstrate strong performance in Bulgarian language tasks, setting a new standard for language-specific AI models. Our approach maintains the robust capabilities of the original Gemma-2 models, ensuring that the English language performance remains intact. To preserve the base model capabilities, we incorporate continual learning strategies based on recent Branch-and-Merge techniques as well as thorough curation and selection of training data. We provide detailed insights into our methodology, including the release of model weights with a commercial-friendly license, enabling broader adoption by researchers, companies, and hobbyists. Further, we establish a comprehensive set of benchmarks based on non-public educational data sources to evaluate models on Bulgarian language tasks as well as safety and chat capabilities. Our findings demonstrate the effectiveness of fine-tuning state-of-the-art models like Gemma 2 to enhance language-specific AI applications while maintaining cross-lingual capabilities.

en cs.CL, cs.AI
arXiv Open Access 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Li Lucy, Suchin Gururangan, Luca Soldaini et al.

Large language models' (LLMs) abilities are drawn from their pretraining data, and model development begins with data curation. However, decisions around what data is retained or removed during this initial stage are under-scrutinized. In our work, we ground web text, which is a popular pretraining data source, to its social and geographic contexts. We create a new dataset of 10.3 million self-descriptions of website creators, and extract information about who they are and where they are from: their topical interests, social roles, and geographic affiliations. Then, we conduct the first study investigating how ten "quality" and English language identification (langID) filters affect webpages that vary along these social dimensions. Our experiments illuminate a range of implicit preferences in data curation: we show that some quality classifiers act like topical domain filters, and langID can overlook English content from some regions of the world. Overall, we hope that our work will encourage a new line of research on pretraining data curation practices and its social implications.

en cs.CL
S2 Open Access 2012
Induction of Oxidative Stress in Kidney

E. Ozbek

Oxidative stress has a critical role in the pathophysiology of several kidney diseases, and many complications of these diseases are mediated by oxidative stress, oxidative stress-related mediators, and inflammation. Several systemic diseases such as hypertension, diabetes mellitus, and hypercholesterolemia; infection; antibiotics, chemotherapeutics, and radiocontrast agents; and environmental toxins, occupational chemicals, radiation, smoking, as well as alcohol consumption induce oxidative stress in kidney. We searched the literature using PubMed, MEDLINE, and Google scholar with “oxidative stress, reactive oxygen species, oxygen free radicals, kidney, renal injury, nephropathy, nephrotoxicity, and induction”. The literature search included only articles written in English language. Letters or case reports were excluded. Scientific relevance, for clinical studies target populations, and study design, for basic science studies full coverage of main topics, are eligibility criteria for articles used in this paper.

370 sitasi en Medicine
DOAJ Open Access 2023
The boosting academic literacy for young learners’ reading comprehension and character building

Elli Setiyo wahyuni, Sri Wahyuni

Reading habit is important to be introduced to young learners. Through academic literacy, students not only comprehend social and science texts but also enhance their character. To realize the literacy program in these lessons, a team-game tournament is used that consists of a presentation, group discussion, reading tournament, and the award. The study uses a qualitative research method that applies a semi-structured observation guide and interviews sixth-grade students in one primary school in Indonesia. The observation indicates teaching techniques for improving reading comprehension, and the interview shows the impacts of academic literacy in the team-game tournament (TGT) on students’ knowledge. The findings show that academic literacy improves understanding of social and science texts, and moral values. This is an integrated skill because students get academic knowledge and character building. Students know how to comprehend passages related to social and science, as well as how to communicate the information through presentation skills. Academic literacy is proven to be an effective strategy to improve understanding and fluency in written and spoken forms. The study recommends that team-game tournament provides practices not only in reading but also in writing and speaking skills. Team-game tournament is an interesting learning method to have fun activity in the modern era of English teaching for young learners. HIGHLIGHTS: • Academic literacy in team-game tournament (TGT) strategy is suggested to boost young learners’ reading comprehension and character building. • Academic literacy improves the students’ understanding of social and science texts, and moral values. • Academic literacy promotes fun learning in writing, reading, and speaking skills.

Education (General), English language
DOAJ Open Access 2023
English Extracurricular Activities in Developing Language Intelligence at SMP Plus Al Munawaroh Jember

Sofiatil Aliyah, Moch. Imam Machfudi, Suratinah Suratinah

Extracurricular activities are educational activities that take place outside of traditional classroom settings. These activities take place inside and/or outside of the school environment in order to broaden knowledge, improve skills, and internalize religious values or laws, as well as local, national, and global social norms. This is a qualitative study, which stresses the meaning and process of an activity rather than the outcome of an activity. By referring to the data taken from teacher and students’ involvement in the extracuricular activities and based on the results of observations and interviews, this study reveals  that the majority of students are engaged in the extracuricular activities and benefit greatly from them. Aside from that, the activity facilitated teachers in carrying out coaching, direction, care, and supervision in order to achieve the desired results. Students' language intelligence in learning English at SMP Plus Al Munawaroh Tempurejo Jember was formed by an acivity to communicate with others, both with their peers and with their teachers, using English even though it was often mixed up with their native language. Students were also engaged to perform their learning outcomes in many stages of learning.

English language, English literature
DOAJ Open Access 2023
Acute renal colic / renal stones – problematic common disease. Current knowledge on etiology, diagnosis and treatments

Rafał Bakalarczyk, Adrian Rejmer, Natalia Woś et al.

Introduction: An attack of renal colic as a result of kidney stones is one of the most common urological emergencies, with a wide spectrum of severity that generates high morbidity and high health costs. However, there is no single pharmacological treatment regimen for its acute phase. Renal colic is described as one of the worst types of pain, and effective analgesia in the shortest possible time is of paramount importance Aim of the study: Summary of current knowledge on the treatment of Acute renal colic / renal stones. The main purpose of our work is to review and summarize the various recommended options for the treatment of acute renal colic attack and prevention of recurrent urolithiasis depending on the type of stones and also to price the efficacy and safety profile of various drugs used in this ailment and to propose a practical treatment regimen. Material and methods: We conducted a literature search in PubMed using the terms "acute renal colic," "renal colic treatment," "anti-inflammatory drugs," "renal stones," and "fluid therapy." The most relevant clinical trials, meta-analyses and systematic reviews published from January 1, 2017 to January 1, 2022 in English and Spanish were reviewed. Results: In various studies reviewed, non-steroidal anti-inflammatory drugs (NSAIDs) show better pain control, with lower doses and fewer side effects than treatment with opioids. Complete cure in kidney stones is rare due to the frequent recurrence of the disease. Although acute attacks of renal colic are rare in the patients studied, many patients show improvement after taking nonsteroidal anti-inflammatory drugs. Opioids and fluid therapy are reserved for subsequent lines of treatment and also have proven benefits in relieving symptoms.

Education, Sports
DOAJ Open Access 2023
The effectiveness of combining the keyword mnemonic with retrieval practice on L2 vocabulary learning in Taiwanese EFL classes

Chia-Hui Chiu, Chad F. Hawkins

Previous research suggests that combining the keyword mnemonic and retrieval practice produces significant benefits for L2 vocabulary learning, but only for the receptive aspect. This study examines the effectiveness of combining the two methods for productive L2 vocabulary learning. Forty-two 8th-grade Taiwanese EFL learners participated in a four-week instructional treatment. They were divided into three groups with a similar spread of English vocabulary sizes. The learning materials consisted of 21 English words unknown to the participants and equally divided into three sets of equivalent difficulty. Three learning conditions (combined method, retrieval practice, and control) were rotated for each group so that each condition was applied to each set of words one time by one of the three groups. A productive vocabulary test was designed to measure the retention of the words at three different time points after the treatment. Repeated measures ANOVAs revealed that the combined method performed similarly to retrieval practice in the immediate and the one-week delayed posttests but significantly outperformed retrieval practice in the four-week delayed posttest. The predicted superiority of using the combined method over retrieval practice alone in the productive aspect of EFL vocabulary learning over the longer term was confirmed in the present study.

English language
DOAJ Open Access 2022
The Horrible Sepulture of Mannes Resoun: Intoxication and Medieval English Felony Law

Elizabeth Papp Kamali

The modern Anglo-American common law tends toward a hardline stance on intoxication, typically not treating it as an excuse to a criminal charge but offering a few well-guarded exceptions, most notably the idea in some jurisdictions that intoxication may be invoked to negate specific intent given its deleterious effect on cognitive capacity. A similar ambivalence toward the intersection of intoxication and criminal responsibility may be found in early English felony law, which offered no formal, intoxication-based exceptions to liability, but nevertheless countenanced jurors exercising their prudential judgment to treat intoxication as either an inculpatory or exculpatory factor in particular cases. Medieval English felony law treated drunkenness similarly to anger, recognizing that both conditions – which could be intertwined – often traced their roots to condemnable character formation and long-cultivated habit, and yet could result in a person’s detachment from their capacity to reason and exercise selfcontrol while under the influence. In the legal context, drunkenness was not equated with insanity, which was presumptively exculpatory, despite the fact that the two conditions could result in similar effects on a defendant’s observable behavior. Evidence from non-legal texts, including vernacular literature and guides for confessors, helps explain the concerns medieval English judges and jurors brought with them to the task of felony adjudication when faced with alcohol-laced facts, revealing a world in which tavern culture ensured alcohol’s omnipresence, but in which drunkenness was nevertheless not generally available as an excuse, partial or otherwise, for allegedly felonious behavior.

Law, Political science
DOAJ Open Access 2022
Apprehension and educational outcomes among Hispanic students in the United States: The impact of Secure Communities

Rosa Weber

Prior research suggests that disruptive events, such as shocks induced by family instability, neighborhood violence, or relocation, tend to be detrimental for children’s educational outcomes, but findings are heterogeneous depending on the type of event. Limited evidence is available on how shocks resulting from immigration enforcement impact educational outcomes among targeted minority groups. This study contributes to the literature by assessing how a policy implementation in the US–Secure Communities–is related to the school district level achievement of Hispanic students. The Secure Communities program is a national level immigration enforcement policy that was rolled out on a county-by-county basis. The program has increased the risk of deportation and led to rising apprehension and insecurity among undocumented migrants and the wider Hispanic community. Using detailed information on the implementation of Secure Communities, data from the Stanford Education Data Archive, and the Current Population Survey, this study estimates dynamic difference in differences exploiting regional variation in the timing of the policy change to assess its impact on educational outcomes. Results show that the activation of Secure Communities is negatively associated with Hispanic students’ subsequent English language arts achievement, while white and black students’ achievement does not change. Findings further suggest that Hispanic students living in the South, rural areas, and areas with high proportions of likely undocumented migrants are disproportionately impacted by the program’s activation. Whereas, Hispanic students in sanctuary jurisdictions, which reduce the likelihood of deportation, are not impacted. These findings indicate that immigration enforcement can have negative consequences for educational and social inequalities in the United States.

Medicine, Science
DOAJ Open Access 2022
The difference between the effectiveness of body-weight-supported treadmill training combined with functional electrical stimulation and sole body-weight-supported treadmill training for improving gait parameters in stroke patients: A systematic review and meta-analysis

Jiaqi Wang, Liangyu Zhao, Yan Gao et al.

BackgroundBody-weight-supported treadmill training (BWSTT) combined with functional electrical stimulation (FES) is considered an effective intervention method to improve gait parameters in stroke patients. In this article, we compared the effect of BWSTT combined with FES and BWSTT only on gait parameters in stroke patients.MethodsTwo researchers searched for literature published before January 5, 2021, in seven Chinese and English databases including PubMed, Web of Science, Cochrane Library, Ovid, CNKI, Wanfang Data, and VIP. Meta-analysis was then performed on various data collected, namely, 10 Meters Walking Test (10MWT), gait speed, Fugl-Meyer Assessment (FMA), Berg Balance Scale (BBS), Modified Barthel Index (MBI), Comprehensive Spasticity Scale (CSS), Functional Ambulation Category (FAC), and Ankle Range of Motion (AROM).ResultsA total of 14 studies were included in the meta-analysis, in which 945 stroke patients participated. In these 14 studies, the participants were randomly divided into a test group and a control group. The test group received BWSTT combined with FES, while the control group received BWSTT only. Meta-analysis showed that when compared to BWSTT, BWSTT combined with FES had a better effect on FAC, AROM, 10MWT, CSS, MBI, FMA, gait speed, and BBS of stroke patients. However, the effect of BWSTT combined with FES on BBS was not significant in the medium exercise group when compared to that of BWSTT. Also, the effect of BWSTT combined with FES on gait speed was not significant in the large exercise group when compared to that of BWSTT only.ConclusionBWSTT combined with FES is more effective than BWSTT only for improving gait parameters in stroke patients.Systematic review registrationhttps://www.crd.york.ac.uk/prospero/#recordDetails, CRD42022299636.

Neurology. Diseases of the nervous system
arXiv Open Access 2022
Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil

Eun Jung Yeo, Kwanghee Choi, Sunhee Kim et al.

This paper proposes a cross-lingual classification method for English, Korean, and Tamil, which employs both language-independent features and language-unique features. First, we extract thirty-nine features from diverse speech dimensions such as voice quality, pronunciation, and prosody. Second, feature selections are applied to identify the optimal feature set for each language. A set of shared features and a set of distinctive features are distinguished by comparing the feature selection results of the three languages. Lastly, automatic severity classification is performed, utilizing the two feature sets. Notably, the proposed method removes different features by languages to prevent the negative effect of unique features for other languages. Accordingly, eXtreme Gradient Boosting (XGBoost) algorithm is employed for classification, due to its strength in imputing missing data. In order to validate the effectiveness of our proposed method, two baseline experiments are conducted: experiments using the intersection set of mono-lingual feature sets (Intersection) and experiments using the union set of mono-lingual feature sets (Union). According to the experimental results, our method achieves better performance with a 67.14% F1 score, compared to 64.52% for the Intersection experiment and 66.74% for the Union experiment. Further, the proposed method attains better performances than mono-lingual classifications for all three languages, achieving 17.67%, 2.28%, 7.79% relative percentage increases for English, Korean, and Tamil, respectively. The result specifies that commonly shared features and language-specific features must be considered separately for cross-language dysarthria severity classification.

en cs.CL, cs.SD
arXiv Open Access 2022
Pirá: A Bilingual Portuguese-English Dataset for Question-Answering about the Ocean

André F. A. Paschoal, Paulo Pirozelli, Valdinei Freire et al.

Current research in natural language processing is highly dependent on carefully produced corpora. Most existing resources focus on English; some resources focus on languages such as Chinese and French; few resources deal with more than one language. This paper presents the Pirá dataset, a large set of questions and answers about the ocean and the Brazilian coast both in Portuguese and English. Pirá is, to the best of our knowledge, the first QA dataset with supporting texts in Portuguese, and, perhaps more importantly, the first bilingual QA dataset that includes this language. The Pirá dataset consists of 2261 properly curated question/answer (QA) sets in both languages. The QA sets were manually created based on two corpora: abstracts related to the Brazilian coast and excerpts of United Nation reports about the ocean. The QA sets were validated in a peer-review process with the dataset contributors. We discuss some of the advantages as well as limitations of Pirá, as this new resource can support a set of tasks in NLP such as question-answering, information retrieval, and machine translation.

Halaman 34 dari 352616