Skip to content

Commit 97eb8de

Browse files
author
Miltos Allamanis
committed
Add/Update stuff
1 parent d93038a commit 97eb8de

File tree

4 files changed

+41
-2
lines changed

4 files changed

+41
-2
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
layout: publication
3+
title: "When Deep Learning Met Code Search"
4+
authors: J. Cambronero, H. Li, S. Kim, K. Sen, S. Chandra
5+
conference:
6+
year: 2019
7+
bibkey: cambronero2019deep
8+
additional_links:
9+
- {name: "ArXiV", url: "https://arxiv.org/abs/1905.03813"}
10+
---
11+
There have been multiple recent proposals on using deep neural networks for code search using natural language. Common across these proposals is the idea of embedding code and natural language queries, into real vectors and then using vector distance to approximate semantic correlation between code and the query. Multiple approaches exist for learning these embeddings, including unsupervised techniques, which rely only on a corpus of code examples, and supervised techniques, which use an aligned corpus of paired code and natural language descriptions. The goal of this supervision is to produce embeddings that are more similar for a query and the corresponding desired code snippet.
12+
13+
Clearly, there are choices in whether to use supervised techniques at all, and if one does, what sort of network and training to use for supervision. This paper is the first to evaluate these choices systematically. To this end, we assembled implementations of state-of-the-art techniques to run on a common platform, training and evaluation corpora. To explore the design space in network complexity, we also introduced a new design point that is a minimal supervision extension to an existing unsupervised technique.
14+
15+
Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
---
22
layout: publication
3-
title: "Tree2Tree Neural Translation Model for Learning Source Code Changes"
3+
title: "CODIT: Code Editing with Tree-Based Neural Machine Translation"
44
authors: S. Chakraborty, M. Allamanis, B. Ray
55
conference:
66
year: 2018
77
bibkey: chakraborty2018tree2tree
8+
additional_links:
9+
- {name: "ArXiV", url: "https://arxiv.org/abs/1810.00314"}
810
---
9-
The way developers edit day-to-day code tend to be repetitive and often use existing code elements. Many researchers tried to automate this tedious task of code changes by learning from specific change templates and applied to limited scope. The advancement of Neural Machine Translation (NMT) and the availability of the vast open source software evolutionary data open up a new possibility of automatically learning those templates from the wild. However, unlike natural languages, for which NMT techniques were originally designed, source code and the changes have certain properties. For instance, compared to natural language source code vocabulary can be virtually infinite. Further, any good change in code should not break its syntactic structure. Thus, deploying state-of-the-art NMT models without domain adaptation may poorly serve the purpose. To this end, in this work, we propose a novel Tree2Tree Neural Machine Translation system to model source code changes and learn code change patterns from the wild. We realize our model with a change suggestion engine: CODIT. We train the model with more than 30k real-world changes and evaluate it with 6k patches. Our evaluation shows the effectiveness of CODIT in learning and suggesting abstract change templates. CODIT also shows promise in suggesting concrete patches and generating bug fixes.
11+
The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited scope. The advancement of Neural Machine Translation (NMT) and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild. However, unlike natural languages, for which NMT techniques were originally devised, source code and its changes have certain properties. For instance, compared to natural language, source code vocabulary can be significantly larger. Further, good changes in code do not break its syntactic structure. Thus, deploying state-of-the-art NMT models without adapting the methods to the source code domain yields sub-optimal results. To this end, we propose a novel Tree based NMT system to model source code changes and learn code change patterns from the wild. We realize our model with a change suggestion engine: CODIT and train the model with more than 30k real-world changes and evaluate it on 6k patches. Our evaluation shows the effectiveness of CODIT in learning and suggesting patches.CODIT also shows promise generating bug fix patches.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
layout: publication
3+
title: "Towards Neural Decompilation"
4+
authors: O. Katz, Y. Olshaker, Y. Goldberg, E. Yahav
5+
conference:
6+
year: 2019
7+
bibkey: katz2019towards
8+
---
9+
We address the problem of automatic decompilation, converting a program in low-level representation back to a higher-level human-readable programming language. The problem of decompilation is extremely important for security researchers. Finding vulnerabilities and understanding how malware operates is much easier when done over source code.
10+
11+
The importance of decompilation has motivated the construction of hand-crafted rule-based decompilers. Such decompilers have been designed by experts to detect specific control-flow structures and idioms in low-level code and lift them to source level. The cost of supporting additional languages or new language features in these models is very high.
12+
13+
We present a novel approach to decompilation based on neural machine translation. The main idea is to automatically learn a decompiler from a given compiler. Given a compiler from a source language S to a target language T , our approach automatically trains a decompiler that can translate (decompile) T back to S . We used our framework to decompile both LLVM IR and x86 assembly to C code with high success rates. Using our LLVM and x86 instantiations, we were able to successfully decompile over 97% and 88% of our benchmarks respectively.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
layout: publication
3+
title: "Inferring Javascript types using Graph Neural Networks"
4+
authors: J. Schrouff, K. Wohlfahrt, B. Marnette, L. Atkinson
5+
conference: Representation Learning on Graphs and Manifolds ICLR 2019 workshop
6+
year: 2019
7+
bibkey: schrouff2019inferring
8+
---
9+
The recent use of `Big Code' with state-of-the-art deep learning methods offers promising avenues to ease program source code writing and correction. As a first step towards automatic code repair, we implemented a graph neural network model that predicts token types for Javascript programs. The predictions achieve an accuracy above 90%, which improves on previous similar work.

0 commit comments

Comments
 (0)