TY - JOUR AU - Loh, De Rong AU - Hill, Elliot D AU - Liu, Nan AU - Dawson, Geraldine AU - Engelhard, Matthew M PY - 2025 DA - 2025/3/27 TI - Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis JO - JMIR AI SP - e62985 VL - 4 KW - machine learning KW - artificial intelligence KW - deep learning KW - predictive models KW - practical models KW - early detection KW - electronic health records KW - right-censoring KW - survival analysis KW - distributional shifts AB - Background: A major challenge in using electronic health records (EHR) is the inconsistency of patient follow-up, resulting in right-censored outcomes. This becomes particularly problematic in long-horizon event predictions, such as autism and attention-deficit/hyperactivity disorder (ADHD) diagnoses, where a significant number of patients are lost to follow-up before the outcome can be observed. Consequently, fully supervised methods such as binary classification (BC), which are trained to predict observed diagnoses, are substantially affected by the probability of sufficient follow-up, leading to biased results. Objective: This empirical analysis aims to characterize BC’s inherent limitations for long-horizon diagnosis prediction from EHR; and quantify the benefits of a specific time-to-event (TTE) approach, the discrete-time neural network (DTNN). Methods: Records within the Duke University Health System EHR were analyzed, extracting features such as ICD-10 (International Classification of Diseases, Tenth Revision) diagnosis codes, medications, laboratories, and procedures. We compared a DTNN to 3 BC approaches and a deep Cox proportional hazards model across 4 clinical conditions to examine distributional patterns across various subgroups. Time-varying area under the receiving operating characteristic curve (AUCt) and time-varying average precision (APt) were our primary evaluation metrics. Results: TTE models consistently had comparable or higher AUCt and APt than BC for all conditions. At clinically relevant operating time points, the area under the receiving operating characteristic curve (AUC) values for DTNNYOB≤2020 (year-of-birth) and DCPHYOB≤2020 (deep Cox proportional hazard) were 0.70 (95% CI 0.66‐0.77) and 0.72 (95% CI 0.66‐0.78) at t=5 for autism, 0.72 (95% CI 0.65‐0.76) and 0.68 (95% CI 0.62‐0.74) at t=7 for ADHD, 0.72 (95% CI 0.70‐0.75) and 0.71 (95% CI 0.69‐0.74) at t=1 for recurrent otitis media, and 0.74 (95% CI 0.68‐0.82) and 0.71 (95% CI 0.63‐0.77) at t=1 for food allergy, compared to 0.6 (95% CI 0.55‐0.66), 0.47 (95% CI 0.40‐0.54), 0.73 (95% CI 0.70‐0.75), and 0.77 (95% CI 0.71‐0.82) for BCYOB≤2020, respectively. The probabilities predicted by BC models were positively correlated with censoring times, particularly for autism and ADHD prediction. Filtering strategies based on YOB or length of follow-up only partially corrected these biases. In subgroup analyses, only DTNN predicted diagnosis probabilities that accurately reflect actual clinical prevalence and temporal trends. Conclusions: BC models substantially underpredicted diagnosis likelihood and inappropriately assigned lower probability scores to individuals with earlier censoring. Common filtering strategies did not adequately address this limitation. TTE approaches, particularly DTNN, effectively mitigated bias from the censoring distribution, resulting in superior discrimination and calibration performance and more accurate prediction of clinical prevalence. Machine learning practitioners should recognize the limitations of BC for long-horizon diagnosis prediction and adopt TTE approaches. The DTNN in particular is well-suited to mitigate the effects of right-censoring and maximize prediction performance in this setting. SN - 2817-1705 UR - https://ai.jmir.org/2025/1/e62985 UR - https://doi.org/10.2196/62985 DO - 10.2196/62985 ID - info:doi/10.2196/62985 ER -