Skip to content

Commit f368db0

Browse files
committed
Added challenge and dataset.
1 parent 0e3e442 commit f368db0

File tree

2 files changed

+33
-2
lines changed

2 files changed

+33
-2
lines changed

challenges/notthereyet/index.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,35 @@ Dataset used: <a href="/datasets#estimatingTypesDataset">[Estimating Types in St
3535
<div class="highlightitem">
3636
<h2>Establishing similarity of code fragments</h2>
3737

38-
<p>Code similarity is a central challenge in many programming related applications, such as code search, automatic translation, and programming education.<p>
38+
<p>Code similarity is a central challenge in many programming related applications, such as code search, automatic translation, and programming education.</p>
3939

4040
<p>There are many approaches for establishing code similarity and clone detection.
4141
However, most of these cannot capture similarity across programs using different APIs or algorithms, let alone programming languages.
4242
Furthermore, in some cases, equivalence is not what we are looking for.</p>
4343

44-
<p>The goal is to capture connections between code fragments, such as semantic similarity or relatedness, which are more relaxed notions than strict equivalence.<p>
44+
<p>The goal is to capture connections between code fragments, such as semantic similarity or relatedness, which are more relaxed notions than strict equivalence.</p>
4545

4646
<p>Dataset used: <a href="/datasets#like2dropsData">[Like2DropsData]</a></p>
4747
<p>Crowd-sourcing system used to collect data: <a href="http://like2drops.com">[Like2Drops]</a><br></p>
4848
</div>
49+
50+
51+
<div class="highlightitem" id="methodnaming">
52+
<h2>Method Naming Challenge</h2>
53+
54+
<p>Developers pick the names of variables, classes and methods to reflect important aspects of their functionality.
55+
Learning to name snippets of methods is an important and hard machine learning problem and is a first step
56+
towards "understanding" what source code does from a machine learning lens.</p>
57+
58+
<h4>Challenge Description</h4>
59+
<p>The goal of the challenge is to create a system that can predict the name of a method body, given solely its body.
60+
No features external to the body (e.g. the method signature) are included.
61+
We provide a training set that contains training pairs and a test set to perform evaluation.
62+
The evaluation consists of computing the <a href="https://en.wikipedia.org/wiki/F1_score">F1 score</a> over the subtokens
63+
of the predicted method name, compared to the actual name.
64+
Two baselines (tf-idf) and a convolutional attentional neural network are provided in the
65+
related publication.</p>
66+
67+
<p>Dataset <a href="/datasets#methodnaming">[Method Naming Dataset]</a></p>
68+
<p>Related publication: <a href="http://arxiv.org/abs/1602.03001">[ArXiV]</a></p>
69+
</div>

datasets/index.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,13 @@ The dataset is provided in the form of a VM containing a Mongo database holding
7070
</div>
7171
7272
-->
73+
74+
<div class="highlightitem">
75+
<h1 id="methodnaming">Method Naming Dataset</h1>
76+
77+
<p>This dataset includes the Java source code and JSON files containing the names and the tokens
78+
of the methods of 11 of the most popular GitHub Java projects.</p>
79+
80+
<a href="http://groups.inf.ed.ac.uk/cup/codeattention/">[download dataset]</a></p>
81+
</div>
82+

0 commit comments

Comments
 (0)