Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 12, 2025
Open Peer Review Period: Mar 25, 2025 - May 20, 2025
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Iterative LLM-Guided Sampling and Expert-annotated Benchmark Corpus for Harmful Suicide Content Detection

  • Kyumin Park; 
  • MYUNG JAE BAIK; 
  • YeongJun Hwang; 
  • Yen Shin; 
  • HoJae Lee; 
  • Ruda Lee; 
  • SANG MIN LEE; 
  • JE YOUNG HANNAH SUN; 
  • AH RAH LEE; 
  • SI YEUN YOON; 
  • Dong-ho Lee; 
  • Jihyung Moon; 
  • JinYeong Bak; 
  • Kyunghyun Cho; 
  • Jong-Woo Paik; 
  • Sungjoon Park

ABSTRACT

Background:

Harmful suicide content on the internet poses significant risks, as it can induce suicidal thoughts and behaviors, particularly among vulnerable populations. Despite global efforts, existing moderation approaches remain insufficient, especially in high-risk regions like South Korea, which has the highest suicide rate among OECD countries. Previous research has primarily focused on assessing the suicide risk of individuals rather than the harmfulness of content itself, highlighting a gap in automated detection systems for harmful suicide content.

Objective:

In this study, we aimed to develop an AI-driven system for classifying online suicide-related content into five levels: illegal, harmful, potentially harmful, harmless, and non-suicide-related. Additionally, the researchers construct a multi-modal bench- mark dataset with expert annotations to improve content moderation and assist AI models in detecting and regulating harmful content more effectively.

Methods:

We collected 43,244 user-generated posts from various online sources, including social media, Q&A platforms, and online communities. To reduce the workload on human annotators, GPT-4 was used for pre-annotation, filtering and categorizing content before manual review by medical professionals. A task description document ensured consistency in classification. Ultimately, a benchmark dataset of 452 manually labeled entries was developed, including both Korean and English versions, to support AI-based moderation. The study also evaluated zero-shot and few-shot learning to determine the best AI approach for detecting harmful content.

Results:

The multi-modal benchmark dataset showed that GPT-4 achieved the highest F1 scores (66.46 for illegal and 77.09 for harmful content detection). Image descriptions improved classification accuracy, while directly using raw images slightly decreased performance. Few-shot learning significantly enhanced detection, demonstrating that small but high-quality datasets could improve AI-driven moderation. However, translation challenges were observed, particularly in suicide-related slang and abbreviations, which were sometimes inaccurately conveyed in the English benchmark.

Conclusions:

This study provides a high-quality benchmark for AI-based suicide content detection, proving that LLMs can effectively assist in content moderation while reducing the burden on human moderators. Future work will focus on enhancing real-time detection and improving the handling of subtle or disguised harmful content.


 Citation

Please cite as:

Park K, BAIK MJ, Hwang Y, Shin Y, Lee H, Lee R, LEE SM, SUN JYH, LEE AR, YOON SY, Lee Dh, Moon J, Bak J, Cho K, Paik JW, Park S

Iterative LLM-Guided Sampling and Expert-annotated Benchmark Corpus for Harmful Suicide Content Detection

JMIR Preprints. 12/03/2025:73725

DOI: 10.2196/preprints.73725

URL: https://preprints.jmir.org/preprint/73725

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.