Analysis of Tree-Structured Architectures for Code Generation
Abstrak
Code generation is the task of generating code snippets from input user specifications in nat-ural language. Leveraging the linguistically-motivated hierarchical structure of the input can benefit code generation, especially since the specifications are complex sentences containing multiple variables and operations over various data structures. Moreover, recent advances in Transformer architectures have led to improved performance with tree-to-tree style generation for other seq2seq tasks e.g., machine translation. Hence, we present an empirical analysis of the significance of input parse trees for code generation. We run text-to-tree, linearized tree-to-tree, and structured tree-to-tree models, using constituency-based parse trees as input, where the target is Ab-stract Syntax Tree (AST) of the code. We evaluate our models on the Python-based code generation dataset CoNaLa and a semantic parsing dataset ATIS. We find that constituency trees encoded using a structure-aware model improve performance for both datasets. We also provide an analysis of those aspects of the input parse trees which are most impact-ful. For instance, we find that structure-aware encodings are better at modelling inputs with multiple variables and capturing long-range dependencies for code generation. 1
Topik & Kata Kunci
Penulis (3)
Samip Dahal
Adyasha Maharana
Mohit Bansal
Akses Cepat
- Tahun Terbit
- 2021
- Bahasa
- en
- Total Sitasi
- 10×
- Sumber Database
- Semantic Scholar
- DOI
- 10.18653/v1/2021.findings-acl.384
- Akses
- Open Access ✓