Hasil "cs.CL" - JURNALIN

CrossRef Open Access 2019

Conductive nanofibrous Chitosan/PEDOT:PSS tissue engineering scaffolds

Ali Abedi, Mahdi Hasanzadeh, Lobat Tayebi

125 sitasi en

arXiv Open Access 2022

David Samuel, Jeremy Barnes, Robin Kurtz et al.

This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions.

en cs.CL

Detail Sumber

arXiv Open Access 2022

Using dependency parsing for few-shot learning in distributional semantics

Stefania Preda, Guy Emerson

In this work, we explore the novel idea of employing dependency parsing information in the context of few-shot learning, the task of learning the meaning of a rare word based on a limited amount of context sentences. Firstly, we use dependency-based word embedding models as background spaces for few-shot learning. Secondly, we introduce two few-shot learning methods which enhance the additive baseline model by using dependencies.

en cs.CL

Detail Sumber

arXiv Open Access 2022

Romantic-Computing

Elizabeth Horishny

In this paper we compare various text generation models' ability to write poetry in the style of early English Romanticism. These models include: Character-Level Recurrent Neural Networks with Long Short-Term Memory, Hugging Face's GPT-2, OpenAI's GPT-3, and EleutherAI's GPT-NEO. Quality was measured based syllable count and coherence with the automatic evaluation metric GRUEN. Character-Level Recurrent Neural Networks performed far worse compared to transformer models. And, as parameter-size increased, the quality of transformer models' poems improved. These models are typically not compared in a creative context, and we are happy to contribute.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

Distilling Hypernymy Relations from Language Models: On the Effectiveness of Zero-Shot Taxonomy Induction

Devansh Jain, Luis Espinosa Anke

In this paper, we analyze zero-shot taxonomy learning methods which are based on distilling knowledge from language models via prompting and sentence scoring. We show that, despite their simplicity, these methods outperform some supervised strategies and are competitive with the current state-of-the-art under adequate conditions. We also show that statistical and linguistic properties of prompts dictate downstream performance.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

Adapting BigScience Multilingual Model to Unseen Languages

Zheng-Xin Yong, Vassilina Nikoulina

We benchmark different strategies of adding new languages (German and Korean) into the BigScience's pretrained multilingual language model with 1.3 billion parameters that currently supports 13 languages. We investigate the factors that affect the language adaptability of the model and the trade-offs between computational costs and expected performance.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

Detoxifying Language Models with a Toxic Corpus

Yoon A Park, Frank Rudzicz

Existing studies have investigated the tendency of autoregressive language models to generate contexts that exhibit undesired biases and toxicity. Various debiasing approaches have been proposed, which are primarily categorized into data-based and decoding-based. In our study, we investigate the ensemble of the two debiasing paradigms, proposing to use toxic corpus as an additional resource to reduce the toxicity. Our result shows that toxic corpus can indeed help to reduce the toxicity of the language generation process substantially, complementing the existing debiasing methods.

en cs.CL

Detail Sumber

arXiv Open Access 2021

Reproducibility Report: Contextualizing Hate Speech Classifiers with Post-hoc Explanation

Kiran Purohit, Owais Iqbal, Ankan Mullick

The presented report evaluates Contextualizing Hate Speech Classifiers with Post-hoc Explanation paper within the scope of ML Reproducibility Challenge 2020. Our work focuses on both aspects constituting the paper: the method itself and the validity of the stated results. In the following sections, we have described the paper, related works, algorithmic frameworks, our experiments and evaluations.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2021

The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP

Anastasia Shimorina, Anya Belz

This paper introduces the Human Evaluation Datasheet, a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP). Originally taking inspiration from seminal papers by Bender and Friedman (2018), Mitchell et al. (2019), and Gebru et al. (2020), the Human Evaluation Datasheet is intended to facilitate the recording of properties of human evaluations in sufficient detail, and with sufficient standardisation, to support comparability, meta-evaluation, and reproducibility tests.

en cs.CL

Detail Sumber

arXiv Open Access 2021

WVOQ at SemEval-2021 Task 6: BART for Span Detection and Classification

Cees Roele

A novel solution to span detection and classification is presented in which a BART EncoderDecoder model is used to transform textual input into a version with XML-like marked up spans. This markup is subsequently translated to an identification of the beginning and end of fragments and of their classes. Discussed is how pre-training methodology both explains the relative success of this method and its limitations. This paper reports on participation in task 6 of SemEval-2021: Detection of Persuasion Techniques in Texts and Images.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2021

Variable-Length Codes Independent or Closed with respect to Edit Relations

Jean Néraud

We investigate inference of variable-length codes in other domains of computer science, such as noisy information transmission or information retrieval-storage: in such topics, traditionally mostly constant-length codewords act. The study is relied upon the two concepts of independent and closed sets. We focus to those word relations whose images are computed by applying some peculiar combinations of deletion, insertion, or substitution. In particular, characterizations of variable-length codes that are maximal in the families of $τ$-independent or $τ$-closed codes are provided.

en cs.CL, cs.DM

Detail Sumber

CrossRef Open Access 2019

Structural, elastic, electronic and optical properties of lead-free halide double perovskite Cs<sub>2</sub>AgBiX<sub>6</sub> (X = Cl, Br, and I)

Madhvendra Nath Tripathi, Aditi Saha, Santosh Singh

63 sitasi en

Detail DOI Sumber

arXiv Open Access 2020

Unsupervised Bilingual Lexicon Induction Across Writing Systems

Parker Riley, Daniel Gildea

Recent embedding-based methods in unsupervised bilingual lexicon induction have shown good results, but generally have not leveraged orthographic (spelling) information, which can be helpful for pairs of related languages. This work augments a state-of-the-art method with orthographic features, and extends prior work in this space by proposing methods that can learn and utilize orthographic correspondences even between languages with different scripts. We demonstrate this by experimenting on three language pairs with different scripts and varying degrees of lexical similarity.

en cs.CL

Detail Sumber

arXiv Open Access 2020

A Benchmark Dataset of Check-worthy Factual Claims

Fatma Arslan, Naeemul Hassan, Chengkai Li et al.

In this paper we present the ClaimBuster dataset of 23,533 statements extracted from all U.S. general election presidential debates and annotated by human coders. The ClaimBuster dataset can be leveraged in building computational methods to identify claims that are worth fact-checking from the myriad of sources of digital or traditional media. The ClaimBuster dataset is publicly available to the research community, and it can be found at http://doi.org/10.5281/zenodo.3609356.

en cs.CL

Detail Sumber

arXiv Open Access 2020

Categories of Semantic Concepts

James Hefford, Vincent Wang, Matthew Wilson

Modelling concept representation is a foundational problem in the study of cognition and linguistics. This work builds on the confluence of conceptual tools from Gärdenfors semantic spaces, categorical compositional linguistics, and applied category theory to present a domain-independent and categorical formalism of 'concept'.

en cs.CL, cs.LO

Detail Sumber

arXiv Open Access 2020

A Comprehensive Survey on Aspect Based Sentiment Analysis

Kaustubh Yadav

Aspect Based Sentiment Analysis (ABSA) is the sub-field of Natural Language Processing that deals with essentially splitting our data into aspects ad finally extracting the sentiment information. ABSA is known to provide more information about the context than general sentiment analysis. In this study, our aim is to explore the various methodologies practiced while performing ABSA, and providing a comparative study. This survey paper discusses various solutions in-depth and gives a comparison between them. And is conveniently divided into sections to get a holistic view on the process.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2020

Challenges and Thrills of Legal Arguments

Anurag Pallaprolu, Radha Vaidya, Aditya Swaroop Attawar

State-of-the-art attention based models, mostly centered around the transformer architecture, solve the problem of sequence-to-sequence translation using the so-called scaled dot-product attention. While this technique is highly effective for estimating inter-token attention, it does not answer the question of inter-sequence attention when we deal with conversation-like scenarios. We propose an extension, HumBERT, that attempts to perform continuous contextual argument generation using locally trained transformers.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2020

Notes on Coalgebras in Stylometry

Joël A. Doat

The syntactic behaviour of texts can highly vary depending on their contexts (e.g. author, genre, etc.). From the standpoint of stylometry, it can be helpful to objectively measure this behaviour. In this paper, we discuss how coalgebras are used to formalise the notion of behaviour by embedding syntactic features of a given text into probabilistic transition systems. By introducing the behavioural distance, we are then able to quantitatively measure differences between points in these systems and thus, comparing features of different texts. Furthermore, the behavioural distance of points can be approximated by a polynomial-time algorithm.

en cs.CL

Detail Sumber

arXiv Open Access 2020

What Can We Do to Improve Peer Review in NLP?

Anna Rogers, Isabelle Augenstein

Peer review is our best tool for judging the quality of conference submissions, but it is becoming increasingly spurious. We argue that a part of the problem is that the reviewers and area chairs face a poorly defined task forcing apples-to-oranges comparisons. There are several potential ways forward, but the key difficulty is creating the incentives and mechanisms for their consistent implementation in the NLP community.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2019

Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference

Nikita Kitaev, Dan Klein

We present a constituency parsing algorithm that, like a supertagger, works by assigning labels to each word in a sentence. In order to maximally leverage current neural architectures, the model scores each word's tags in parallel, with minimal task-specific structure. After scoring, a left-to-right reconciliation phase extracts a tree in (empirically) linear time. Our parser achieves 95.4 F1 on the WSJ test set while also achieving substantial speedups compared to current state-of-the-art parsers with comparable accuracies.

en cs.CL

Detail Sumber

Hasil untuk "cs.CL"