Hasil "cs.GL" - JURNALIN

arXiv Open Access 2022

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

Pengcheng He, Baolin Peng, Liyang Lu et al.

This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

Moore's Law is dead, long live Moore's Law!

Nick Zhang

Moore's Law has been used by semiconductor industry as predicative indicators of the industry and it has become a self-fulfilling prophecy. Now more people tend to agree that the original Moore's Law started to falter. This paper proposes a possible quantitative modification to Moore's Law. It can cover other derivative laws of Moore's Law as well. It intends to more accurately predict the roadmap of chip's performance and energy consumption.

en cs.GL

Detail Sumber

arXiv Open Access 2021

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Pengcheng He, Jianfeng Gao, Weizhu Chen

This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. Our analysis shows that vanilla embedding sharing in ELECTRA hurts training efficiency and model performance. This is because the training losses of the discriminator and the generator pull token embeddings in different directions, creating the "tug-of-war" dynamics. We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model. We have pre-trained DeBERTaV3 using the same settings as DeBERTa to demonstrate its exceptional performance on a wide range of downstream natural language understanding (NLU) tasks. Taking the GLUE benchmark with eight tasks as an example, the DeBERTaV3 Large model achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art (SOTA) among the models with a similar structure. Furthermore, we have pre-trained a multi-lingual model mDeBERTa and observed a larger improvement over strong baselines compared to English models. For example, the mDeBERTa Base achieves a 79.8% zero-shot cross-lingual accuracy on XNLI and a 3.6% improvement over XLM-R Base, creating a new SOTA on this benchmark. We have made our pre-trained models and inference code publicly available at https://github.com/microsoft/DeBERTa.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2021

Edsger W. Dijkstra: a Commemoration

Krzysztof R. Apt, Tony Hoare

This article is a multiauthored portrait of Edsger Wybe Dijkstra that consists of testimonials written by several friends, colleagues, and students of his. It provides unique insights into his personality, working style and habits, and his influence on other computer scientists, as a researcher, teacher, and mentor.

en cs.GL

Detail Sumber

arXiv Open Access 2020

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao et al.

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions, respectively. Second, an enhanced mask decoder is used to incorporate absolute positions in the decoding layer to predict the masked tokens in model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve models' generalization. We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). Notably, we scale up DeBERTa by training a larger version that consists of 48 Transform layers with 1.5 billion parameters. The significant performance boost makes the single DeBERTa model surpass the human performance on the SuperGLUE benchmark (Wang et al., 2019a) for the first time in terms of macro-average score (89.9 versus 89.8), and the ensemble DeBERTa model sits atop the SuperGLUE leaderboard as of January 6, 2021, out performing the human baseline by a decent margin (90.3 versus 89.8).

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2020

Edsger Dijkstra. The Man Who Carried Computer Science on His Shoulders

Krzysztof R. Apt

This a biographical essay about Edsger Wybe Dijkstra.

en cs.GL

Detail Sumber

arXiv Open Access 2016

Life, The Mind, and Everything

Gary R. Prok

Incompleteness theorems of Godel, Turing, Chaitin, and Algorithmic Information Theory have profound epistemological implications. Incompleteness limits our ability to ever understand every observable phenomenon in the universe. Incompleteness limits the ability of evolutionary processes from finding optimal solutions. Incompleteness limits the detectability of machine consciousness. This is an effort to convey these thoughts and results in a somewhat entertaining manner.

en cs.GL

Detail Sumber

arXiv Open Access 2013

Les connaissances de la toile

Serge Abiteboul

How to manage knowledge on the Web.

en cs.GL

Detail Sumber

arXiv Open Access 2013

Software Carpentry: Lessons Learned

Greg Wilson

Over the last 15 years, Software Carpentry has evolved from a week-long training course at the US national laboratories into a worldwide volunteer effort to raise standards in scientific computing. This article explains what we have learned along the way the challenges we now face, and our plans for the future.

en cs.GL, cs.CY

Detail Sumber

arXiv Open Access 2010

Removing Barriers to Interdisciplinary Research

Naomi Jacobs, Martyn Amos

A significant amount of high-impact contemporary scientific research occurs where biology, computer science, engineering and chemistry converge. Although programmes have been put in place to support such work, the complex dynamics of interdisciplinarity are still poorly understood. In this paper we interrogate the nature of interdisciplinary research and how we might measure its "success", identify potential barriers to its implementation, and suggest possible mechanisms for removing these impediments.

en cs.GL

Detail Sumber

arXiv Open Access 2008

The non-anticipation of the asynchronous systems

Serban E. Vlad

The asynchronous systems are the models of the asynchronous circuits from the digital electrical engineering and non-anticipation is one of the most important properties in systems theory. Our present purpose is to introduce several concepts of non-anticipation of the asynchronous systems.

en cs.GL

Detail Sumber

arXiv Open Access 2007

Stop That Subversive Spreadsheet!

David Chadwick

This paper documents the formation of the European Spreadsheet Risks Interest Group (EuSpRIG www.eusprig.org) and outlines some of the research undertaken and reported upon by interested parties in EuSpRIG publications

en cs.GL

Detail Sumber

arXiv Open Access 2002

Edsger Wybe Dijkstra (1930 -- 2002): A Portrait of a Genius

Krzysztof R. Apt

We discuss the scientific contributions of Edsger Wybe Dijkstra, his opinions and his legacy.

en cs.GL

Detail Sumber

arXiv Open Access 2006

Ten Incredibly Dangerous Software Ideas

G. A. Maney

This is a rough draft synopsis of a book presently in preparation. This book provides a systematic critique of the software industry. This critique is accomplished using classical methods in practical design science.

en cs.GL

Detail Sumber

arXiv Open Access 2001

One More Revolution to Make: Free Scientific Publishing

Krzysztof R. Apt

Computer scientists are in the position to create new, free high-quality journals. So what would it take?

en cs.GL

Detail Sumber

arXiv Open Access 2006

Methods for scaling a large member base

Nathan Boeger

The technical challenges of scaling websites with large and growing member bases, like social networking sites, are numerous. One of these challenges is how to evenly distribute the growing member base across all available resources. This paper will explore various methods that address this issue. The techniques used in this paper can be generalized and applied to various other problems that need to distribute data evenly amongst a finite amount of resources.

en cs.GL

Detail Sumber

arXiv Open Access 2001

Multiple-Size Divide-and-Conquer Recurrences

Ming-Yang Kao

This short note reports a master theorem on tight asymptotic solutions to divide-and-conquer recurrences with more than one recursive term: for example, T(n) = 1/4 T(n/16) + 1/3 T(3n/5) + 4 T(n/100) + 10 T(n/300) + n^2.

en cs.GL, cs.DS

Detail Sumber

arXiv Open Access 2004

Real Time Models of the Asynchronous Circuits: The Delay Theory

Serban E. Vlad

The chapter from the book introduces the delay theory, whose purpose is the modeling of the asynchronous circuits from digital electrical engineering with ordinary and differential pseudo-boolean equations.

en cs.GL

Detail Sumber

arXiv Open Access 1991

Theory and practice

Donald E. Knuth

The author argues to Silicon Valley that the most important and powerful part of computer science is work that is simultaneously theoretical and practical. He particularly considers the intersection of the theory of algorithms and practical software development. He combines examples from the development of the TeX typesetting system with clever jokes, criticisms, and encouragements.

en cs.GL

Detail Sumber

arXiv Open Access 2004

Some first thoughts on the stability of the asynchronous systems

Serban E. Vlad

The (non-initialized, non-deterministic) asynchronous systems (in the input-output sense) are multi-valued functions from m-dimensional signals to sets of n-dimensional signals, the concept being inspired by the modeling of the asynchronous circuits. Our purpose is to state the problem of the their stability.

en cs.GL

Detail Sumber

Hasil untuk "cs.GL"