Skip to content

Commit 3e88b47

Browse files
ArturHDmallamanis
authored andcommitted
Added publication from SANER 2019
Paper Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection, https://ieeexplore.ieee.org/document/8668039
1 parent 966dd89 commit 3e88b47

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
layout: publication
3+
title: Learning-based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection
4+
authors: L. Büch, A. Andrzejak
5+
conference: SANER 2019
6+
year: 2019
7+
bibkey: buech2019learning
8+
additional_links:
9+
- {name: "IEEEexplore", url: "https://ieeexplore.ieee.org/document/8668039"}
10+
- {name: "website_pdf", url: "https://pvs.ifi.uni-heidelberg.de/publications/"}
11+
---
12+
Code clone detection remains a crucial challenge in maintaining software projects. Many classic approaches rely on handcrafted aggregation schemes, while recent work uses supervised or unsupervised learning. In this work, we study several aspects of aggregation schemes for code clone detection based on supervised learning. To this aim, we implement an AST-based Recursive Neural Network. Firstly, our ablation study shows the influence of model choices and hyperparameters. We introduce error scaling as a way to effectively and efficiently address the class imbalance problem arising in code clone detection. Secondly, we study the influence of pretrained embeddings representing nodes in ASTs. We show that simply averaging all node vectors of a given AST yields strong baseline aggregation scheme. Further, learned AST aggregation schemes greatly benefit from pretrained node embeddings. Finally, we show the importance of carefully separating training and test data by clone clusters, to reliably measure generalization of models learned with supervision.

0 commit comments

Comments
 (0)