From 9a6edb9effe17c95b251fcd5a057fb6e7be85610 Mon Sep 17 00:00:00 2001 From: "Sergey V. Kovalchuk" Date: Tue, 4 Apr 2023 22:01:10 +0300 Subject: [PATCH] Create kovalchuk2022human.markdown --- _publications/kovalchuk2022human.markdown | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 _publications/kovalchuk2022human.markdown diff --git a/_publications/kovalchuk2022human.markdown b/_publications/kovalchuk2022human.markdown new file mode 100644 index 00000000..7bfd8a0f --- /dev/null +++ b/_publications/kovalchuk2022human.markdown @@ -0,0 +1,11 @@ +--- +layout: publication +title: "Human perceiving behavior modeling in evaluation of code generation models" +authors: S. Kovalchuk, V. Lomshakov, A. Aliev +conference: GEM +year: 2022 +additional_links: + - {name: "ACLAnthology", url: "https://aclanthology.org/2022.gem-1.24/"} +tags: ["code generation", "evaluation", "human evaluation", ] +--- +Within this study, we evaluated a series of code generation models based on CodeGen and GPTNeo to compare the metric-based performance and human evaluation. For a deeper analysis of human perceiving within the evaluation procedure we’ve implemented a 5-level Likert scale assessment of the model output using a perceiving model based on the Theory of Planned Behavior (TPB). Through such analysis, we showed an extension of model assessment as well as a deeper understanding of the quality and applicability of generated code for practical question answering. The approach was evaluated with several model settings in order to assess diversity in quality and style of answer. With the TPB-based model, we showed a different level of perceiving the model result, namely personal understanding, agreement level, and readiness to use the particular code. With such analysis, we investigate a series of issues in code generation as natural language generation (NLG) problems observed in a practical context of programming question-answering with code. \ No newline at end of file