@Article{info:doi/10.2196/65681, author="Ziegler, Jasmin and Erpenbeck, Pascal Marcel and Fuchs, Timo and Saibold, Anna and Volkmer, Paul-Christian and Schmidt, Guenter and Eicher, Johanna and Pallaoro, Peter and De Souza Falguera, Renata and Aubele, Fabio and Hagedorn, Marlien and Vansovich, Ekaterina and Raffler, Johannes and Ringshandl, Stephan and Kerscher, Alexander and Maurer, Karolin Julia and K{\"u}hnel, Brigitte and Schenkirsch, Gerhard and Kampf, Marvin and Kapsner, A. Lorenz and Ghanbarian, Hadieh and Spengler, Helmut and Soto-Rey, I{\~n}aki and Albashiti, Fady and Hellwig, Dirk and Ertl, Maximilian and Fette, Georg and Kraska, Detlef and Boeker, Martin and Prokosch, Hans-Ulrich and Gulden, Christian", title="Bridging Data Silos in Oncology with Modular Software for Federated Analysis on Fast Healthcare Interoperability Resources: Multisite Implementation Study", journal="J Med Internet Res", year="2025", month="Apr", day="15", volume="27", pages="e65681", keywords="real-world data", keywords="real-world evidence", keywords="oncology", keywords="electronic health records", keywords="federated analysis", keywords="HL7 FHIR", keywords="cancer registries", keywords="interoperability", keywords="observational research network", abstract="Background: Real-world data (RWD) from sources like administrative claims, electronic health records, and cancer registries offer insights into patient populations beyond the tightly regulated environment of randomized controlled trials. To leverage this and to advance cancer research, 6 university hospitals in Bavaria have established a joint research IT infrastructure. Objective: This study aimed to outline the design, implementation, and deployment of a modular data transformation pipeline that transforms oncological RWD into a Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) format and then into a tabular format in preparation for a federated analysis (FA) across the 6 Bavarian Cancer Research Center university hospitals. Methods: To harness RWD effectively, we designed a pipeline to convert the oncological basic dataset (oBDS) into HL7 FHIR format and prepare it for FA. The pipeline handles diverse IT infrastructures and systems while maintaining privacy by keeping data decentralized for analysis. To assess the functionality and validity of our implementation, we defined a cohort to address two specific medical research questions. We evaluated our findings by comparing the results of the FA with reports from the Bavarian Cancer Registry and the original data from local tumor documentation systems. Results: We conducted an FA of 17,885 cancer cases from 2021/2022. Breast cancer was the most common diagnosis at 3 sites, prostate cancer ranked in the top 2 at 4 sites, and malignant melanoma was notably prevalent. Gender-specific trends showed larynx and esophagus cancers were more common in males, while breast and thyroid cancers were more frequent in females. Discrepancies between the Bavarian Cancer Registry and our data, such as higher rates of malignant melanoma (3400/63,771, 5.3\% vs 1921/17,885, 10.7\%) and lower representation of colorectal cancers (8100/63,771, 12.7\% vs 1187/17,885, 6.6\%) likely result from differences in the time periods analyzed (2019 vs 2021/2022) and the scope of data sources used. The Bavarian Cancer Registry reports approximately 3 times more cancer cases than the 6 university hospitals alone. Conclusions: The modular pipeline successfully transformed oncological RWD across 6 hospitals, and the federated approach preserved privacy while enabling comprehensive analysis. Future work will add support for recent oBDS versions, automate data quality checks, and integrate additional clinical data. Our findings highlight the potential of federated health data networks and lay the groundwork for future research that can leverage high-quality RWD, aiming to contribute valuable knowledge to the field of cancer research. ", doi="10.2196/65681", url="https://www.jmir.org/2025/1/e65681" } @Article{info:doi/10.2196/68256, author="Trinkley, E. Katy and Simon, T. Steven and Rosenberg, A. Michael", title="Impact of an Alert-Based Inpatient Clinical Decision Support Tool to Prevent Drug-Induced Long QT Syndrome: Large-Scale, System-Wide Observational Study", journal="J Med Internet Res", year="2025", month="Apr", day="14", volume="27", pages="e68256", keywords="drug-induced QT prolongation", keywords="predictive modeling", keywords="electronic health records", keywords="clinical decision support", keywords="alert-based CDS system", keywords="tools", keywords="long QT syndrome", keywords="prevention", abstract="Background: Prevention of drug-induced QT prolongation (diLQTS) has been the focus of many system-wide clinical decision support (CDS) tools, which can be directly embedded within the framework of the electronic health record system and triggered to alert in high-risk patients when a known QT-prolonging medication is ordered. Justification for these CDS systems typically lies in the ability to accurately predict which patients are at high risk; however, it is not always evident that identification of risk alone is sufficient for appropriate CDS implementation. Objective: In this investigation, we examined the impact of a system-wide, alert-based, inpatient CDS tool designed to prevent diLQTS across 10 known QT-prolonging medications. Methods: We compared the risk of diLQTS, duration of hospitalization, and in- and out-of-hospital mortality before and after implementation of the CDS system in 178,097 hospitalizations among 102,847 patients. We also compared outcomes between those in whom an alert fired and those in whom it did not, and within the various responses to the alert by providers. Analyses were adjusted for age, sex, race and ethnicity, inpatient location, electrolyte values, and comorbidities, with the latter processed using an unsupervised clustering analysis applied to the top 500 most common medications and diagnosis codes, respectively. Results: We found that the simple, rule-based logic of the CDS (any prior electrocardiograph with heart rate--corrected QT interval (QTc)?500 ms) successfully identified patients at high risk of diLQTS with an odds ratio of 2.28 (95\% CI 2.10-2.47, P<.001) among those in whom it fired. However, we did not identify any impact on the risk of diLQTS based on provider responses or on the risk of inpatient, 3-month, 6-month, or 1-year mortality. When compared with rates prior to implementation, the risk of diLQTS was not significantly different after the CDS tools were deployed across the system, although mortality was significantly higher after the tools were implemented. Conclusions: We found that despite successful identification of high-risk patients for diLQTS, deployment of an alert-based CDS did not impact the risk of diLQTS. These findings suggest that quantification of high risk may be insufficient rationale for implementation of a CDS system and that hospital systems should consider evaluation of the system in its entirety prior to adoption to improve clinical outcomes. ", doi="10.2196/68256", url="https://www.jmir.org/2025/1/e68256" } @Article{info:doi/10.2196/65452, author="Stanton, M. Amelia and Trichtinger, A. Lauren and Kirakosian, Norik and Li, M. Simon and Kabel, E. Katherine and Irani, Kiyan and Bettis, H. Alexandra and O'Cleirigh, Conall and Liu, T. Richard and Liu, Qimin", title="Identifying Intersecting Factors Associated With Suicidal Thoughts and Behaviors Among Transgender and Gender Diverse Adults: Preliminary Conditional Inference Tree Analysis", journal="J Med Internet Res", year="2025", month="Apr", day="11", volume="27", pages="e65452", keywords="transgender and gender diverse adults", keywords="suicidality", keywords="intersectionality", keywords="conditional inference tree", keywords="electronic medical record", abstract="Background: Transgender and gender diverse (TGD) individuals are disproportionately impacted by suicidal thoughts and behaviors (STBs), and intersecting demographic and psychosocial factors may contribute to STB disparities. Objective: We aimed to identify intersecting factors associated with increased risk for suicidal ideation, intent, plan, and attempts in the US transgender population health survey (N=274), and determine age of onset for each outcome using conditional inference trees (CITs), which iteratively partitions subgroups of greater homogeneity with respect to a specific outcome. Methods: In separate analyses, we restricted variables to those typically available within electronic medical records (EMRs) and then included variables not typically within EMRs. We also compared the results of the CIT analyses with logistic regressions and Cox proportional hazards models. Results: In restricted analyses, younger adults endorsed more frequent ideation and planning. Adults aged ?26 years who identified as Black or with another race not listed had the highest risk for ideation, followed by White, Latine, or multiracial adults aged ?39 years who identified as sexual minority individuals. Adults aged ?39 years who identified as sexual minority individuals had the highest risk for suicide planning. Increased risk for suicidal intent was observed among those who identified as multiracial, whereas no variables were associated with previous suicide attempts. In EMR-specific analyses, age of onset for ideation and attempts were associated with gender identity, such that transgender women were older compared to transgender men and nonbinary adults when they first experienced ideation; for attempts, transgender women and nonbinary adults were older than transgender men. In expanded analyses, including additional psychosocial variables, psychiatric distress was associated with increased risk for ideation, intent, and planning. High distress combined with high health care stereotype threat was linked to increased risk for intent and for suicide planning. Only high everyday discrimination was associated with increased risk for lifetime attempts. Ages of onset were associated with gender identity for ideation, the intersection of psychiatric distress and drug use for suicide planning, and gender identity alone for suicide attempts. No factors were associated with age of onset for suicide intent in the expanded variable set. The results of the CIT analysis and the traditional regressions were comparable for ordinal outcomes, but CITs substantially outperformed the regressions for the age of onset outcomes. Conclusions: In this preliminary test of the CIT approach to identify subgroups of TGD adults with increased STB risk, the risk was primarily influenced by age, racial identity, and sexual minority identity, as well as psychiatric distress, health care stereotype threat, and discrimination. Identifying intersecting factors linked to STBs is vital for early risk detection among TGD individuals. This approach should be tested on a larger scale using EMR data to facilitate service provision to TGD individuals at increased risk for STBs. ", doi="10.2196/65452", url="https://www.jmir.org/2025/1/e65452" } @Article{info:doi/10.2196/70752, author="Matulis 3rd, Charles John and Greenwood, Jason and Eberle, Michele and Anderson, Benjamin and Blair, David and Chaudhry, Rajeev", title="Implementation of an Integrated, Clinical Decision Support Tool at the Point of Antihypertensive Medication Refill Request to Improve Hypertension Management: Controlled Pre-Post Study", journal="JMIR Med Inform", year="2025", month="Apr", day="11", volume="13", pages="e70752", keywords="clinical decision support systems", keywords="population health", keywords="hypertension", keywords="electronic health records", abstract="Background: Improving processes regarding the management of electronic health record (EHR) requests for chronic antihypertensive medication renewals may represent an opportunity to enhance blood pressure (BP) management at the individual and population level. Objective: This study aimed to evaluate the effectiveness of the eRx HTN Chart Check, an integrated clinical decision support tool available at the point of antihypertensive medication refill request, in facilitating enhanced provider management of chronic hypertension. Methods: The study was conducted at two Mayo Clinic sites---Northwest Wisconsin Family Medicine and Rochester Community Internal Medicine practices---with control groups in comparable Mayo Clinic practices. The intervention integrated structured clinical data, including recent BP readings, laboratory results, and visit dates, into the electronic prescription renewal interface to facilitate prescriber decision-making regarding hypertension management. A difference-in-differences (DID) design compared pre- and postintervention hypertension control rates between the intervention and control groups. Data were collected from the Epic EHR system and analyzed using linear regression models. Results: The baseline BP control rates were slightly higher in intervention clinics. Postimplementation, no significant improvement in population-level hypertension control was observed (DID estimate: 0.07\%, 95\% CI ?4.0\% to 4.1\%; P=.97). Of the 19,968 refill requests processed, 46\% met all monitoring criteria. However, clinician approval rates remained high (90\%), indicating minimal impact on prescribing behavior. Conclusions: Despite successful implementation, the tool did not significantly improve hypertension control, possibly due to competing quality initiatives and high in-basket volumes. Future iterations should focus on enhanced integration with other decision support tools and strategies to improve clinician engagement and patient outcomes. Further research is needed to optimize chronic disease management through EHR-integrated decision support systems. ", doi="10.2196/70752", url="https://medinform.jmir.org/2025/1/e70752" } @Article{info:doi/10.2196/65566, author="Bak, Marieke and Hartman, Laura and Graafland, Charlotte and Korfage, J. Ida and Buyx, Alena and Schermer, Maartje and ", title="Ethical Design of Data-Driven Decision Support Tools for Improving Cancer Care: Embedded Ethics Review of the 4D PICTURE Project", journal="JMIR Cancer", year="2025", month="Apr", day="10", volume="11", pages="e65566", keywords="shared decision-making", keywords="oncology", keywords="IT", keywords="ethics", keywords="decision support tools", keywords="big data", keywords="medical decision-making", keywords="artificial intelligence", doi="10.2196/65566", url="https://cancer.jmir.org/2025/1/e65566" } @Article{info:doi/10.2196/67767, author="Chan, Fan-Ying and Ku, Yi-En and Lie, Wen-Nung and Chen, Hsiang-Yin", title="Web-Based Explainable Machine Learning-Based Drug Surveillance for Predicting Sunitinib- and Sorafenib-Associated Thyroid Dysfunction: Model Development and Validation Study", journal="JMIR Form Res", year="2025", month="Apr", day="10", volume="9", pages="e67767", keywords="thyroid dysfunction", keywords="machine learning", keywords="cancer", keywords="sunitinib", keywords="sorafenib", keywords="TKI", keywords="tyrosine kinase inhibitor", abstract="Background: Unlike one-snap data collection methods that only identify high-risk patients, machine learning models using time-series data can predict adverse events and aid in the timely management of cancer. Objective: This study aimed to develop and validate machine learning models for sunitinib- and sorafenib-associated thyroid dysfunction using a time-series data collection approach. Methods: Time series data of patients first prescribed sunitinib or sorafenib were collected from a deidentified clinical research database. Logistic regression, random forest, adaptive Boosting, Light Gradient-Boosting Machine, and Gradient Boosting Decision Tree were used to develop the models. Prediction performances were compared using the accuracy, precision, recall, F1-score, area under the receiver operating characteristic curve, and area under the precision-recall curve. The optimal threshold for the best-performing model was selected based on the maximum F1-score. SHapley Additive exPlanations analysis was conducted to assess feature importance and contributions at both the cohort and patient levels. Results: The training cohort included 609 patients, while the temporal validation cohort had 198 patients. The Gradient Boosting Decision Tree model without resampling outperformed other models, with area under the precision-recall curve of 0.600, area under the receiver operating characteristic curve of 0.876, and F1-score of 0.583 after adjusting the threshold. The SHapley Additive exPlanations analysis identified higher cholesterol levels, longer summed days of medication use, and clear cell adenocarcinoma histology as the most important features. The final model was further integrated into a web-based application. Conclusions: This model can serve as an explainable adverse drug reaction surveillance system for predicting sunitinib- and sorafenib-associated thyroid dysfunction. ", doi="10.2196/67767", url="https://formative.jmir.org/2025/1/e67767" } @Article{info:doi/10.2196/67144, author="Rahman, Mahmudur and Gao, Jifan and Carey, A. Kyle and Edelson, P. Dana and Afshar, Askar and Garrett, W. John and Chen, Guanhua and Afshar, Majid and Churpek, M. Matthew", title="Comparison of Deep Learning Approaches Using Chest Radiographs for Predicting Clinical Deterioration: Retrospective Observational Study", journal="JMIR AI", year="2025", month="Apr", day="10", volume="4", pages="e67144", keywords="chest X-ray", keywords="critical care", keywords="deep learning", keywords="chest radiographs", keywords="radiographs", keywords="clinical deterioration", keywords="prediction", keywords="predictive", keywords="deterioration", keywords="retrospective", keywords="data", keywords="dataset", keywords="artificial intelligence", keywords="AI", keywords="chest", keywords="patient", keywords="hospitalized", abstract="Background: The early detection of clinical deterioration and timely intervention for hospitalized patients can improve patient outcomes. The currently existing early warning systems rely on variables from structured data, such as vital signs and laboratory values, and do not incorporate other potentially predictive data modalities. Because respiratory failure is a common cause of deterioration, chest radiographs are often acquired in patients with clinical deterioration, which may be informative for predicting their risk of intensive care unit (ICU) transfer. Objective: This study aimed to compare and validate different computer vision models and data augmentation approaches with chest radiographs for predicting clinical deterioration. Methods: This retrospective observational study included adult patients hospitalized at the University of Wisconsin Health System between 2009 and 2020 with an elevated electronic cardiac arrest risk triage (eCART) score, a validated clinical deterioration early warning score, on the medical-surgical wards. Patients with a chest radiograph obtained within 48 hours prior to the elevated score were included in this study. Five computer vision model architectures (VGG16, DenseNet121, Vision Transformer, ResNet50, and Inception V3) and four data augmentation methods (histogram normalization, random flip, random Gaussian noise, and random rotate) were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) for predicting clinical deterioration (ie, ICU transfer or ward death in the following 24 hours). Results: The study included 21,817 patient admissions, of which 1655 (7.6\%) experienced clinical deterioration. The DenseNet121 model pretrained on chest radiograph datasets with histogram normalization and random Gaussian noise augmentation had the highest discrimination (AUROC 0.734 and AUPRC 0.414), while the vision transformer having 24 transformer blocks with random rotate augmentation had the lowest discrimination (AUROC 0.598). Conclusions: The study shows the potential of chest radiographs in deep learning models for predicting clinical deterioration. The DenseNet121 architecture pretrained with chest radiographs performed better than other architectures in most experiments, and the addition of histogram normalization with random Gaussian noise data augmentation may enhance the performance of DenseNet121 and pretrained VGG16 architectures. ", doi="10.2196/67144", url="https://ai.jmir.org/2025/1/e67144" } @Article{info:doi/10.2196/67867, author="Cichosz, Simon and Bender, Clara", title="Early Detection of Elevated Ketone Bodies in Type 1 Diabetes Using Insulin and Glucose Dynamics Across Age Groups: Model Development Study", journal="JMIR Diabetes", year="2025", month="Apr", day="10", volume="10", pages="e67867", keywords="type 1 diabetes", keywords="machine learning", keywords="diabetic ketoacidosis", keywords="ketone level", keywords="diabetic complication", keywords="prediction model", abstract="Background: Diabetic ketoacidosis represents a significant and potentially life-threatening complication of diabetes, predominantly observed in individuals with type 1 diabetes (T1D). Studies have documented suboptimal adherence to diabetes management among children and adolescents, as evidenced by deficient ketone monitoring practices. Objective: The aim of the study was to explore the potential for prediction of elevated ketone bodies from continuous glucose monitoring (CGM) and insulin data in pediatric and adult patients with T1D using a closed-loop system. Methods: Participants used the Dexcom G6 CGM system and the iLet Bionic Pancreas system for insulin administration for up to 13 weeks. We used supervised binary classification machine learning, incorporating feature engineering to identify elevated ketone bodies (>0.6 mmol/L). Features were derived from CGM, insulin delivery data, and self-monitoring of blood glucose to develop an extreme gradient boosting-based prediction model. A total of 259 participants aged 6-79 years with over 49,000 days of full-time monitoring were included in the study. Results: Among the participants, 1768 ketone samples were eligible for modeling, including 383 event samples with elevated ketone bodies (?0.6 mmol/L). Insulin, self-monitoring of blood glucose, and current glucose measurements provided discriminative information on elevated ketone bodies (receiver operating characteristic area under the curve [ROC-AUC] 0.64?0.69). The CGM-derived features exhibited stronger discrimination (ROC-AUC 0.75?0.76). Integration of all feature types resulted in an ROC-AUC estimate of 0.82 (SD 0.01) and a precision recall-AUC of 0.53 (SD 0.03). Conclusions: CGM and insulin data present a valuable avenue for early prediction of patients at risk of elevated ketone bodies. Furthermore, our findings indicate the potential application of such predictive models in both pediatric and adult populations with T1D. ", doi="10.2196/67867", url="https://diabetes.jmir.org/2025/1/e67867" } @Article{info:doi/10.2196/67318, author=" and Caviglia, Marta", title="Bridging Data Gaps in Emergency Care: The NIGHTINGALE Project and the Future of AI in Mass Casualty Management", journal="J Med Internet Res", year="2025", month="Apr", day="10", volume="27", pages="e67318", keywords="AI", keywords="technology", keywords="mass casualty incident", keywords="incident management", keywords="artificial intelligence", keywords="emergency care", keywords="MCI", keywords="data gaps", keywords="tool", doi="10.2196/67318", url="https://www.jmir.org/2025/1/e67318" } @Article{info:doi/10.2196/62853, author="Min, Won Ji and Min, Jae-Hong and Chang, Se-Hyun and Chung, Ha Byung and Koh, Sil Eun and Kim, Soo Young and Kim, Wook Hyung and Ban, Hyun Tae and Shin, Joon Seok and Choi, Young In and Yoon, Eun Hye", title="A Risk Prediction Model (CMC-AKIX) for Postoperative Acute Kidney Injury Using Machine Learning: Algorithm Development and Validation", journal="J Med Internet Res", year="2025", month="Apr", day="9", volume="27", pages="e62853", keywords="acute kidney injury", keywords="general surgery", keywords="deep neural networks", keywords="machine learning", keywords="prediction model", keywords="postoperative care", keywords="surgery", keywords="anesthesia", keywords="mortality", keywords="morbidity", keywords="retrospective study", keywords="cohort analysis", keywords="hospital", keywords="South Korea", keywords="logistic regression", keywords="user-friendly", keywords="patient care", keywords="risk management", keywords="artificial intelligence", keywords="digital health", abstract="Background: Postoperative acute kidney injury (AKI) is a significant risk associated with surgeries under general anesthesia, often leading to increased mortality and morbidity. Existing predictive models for postoperative AKI are usually limited to specific surgical areas or require external validation. Objective: We proposed to build a prediction model for postoperative AKI using several machine learning methods. Methods: We conducted a retrospective cohort analysis of noncardiac surgeries from 2009 to 2019 at seven university hospitals in South Korea. We evaluated six machine learning models: deep neural network, logistic regression, decision tree, random forest, light gradient boosting machine, and na{\"i}ve Bayes for predicting postoperative AKI, defined as a significant increase in serum creatinine or the initiation of renal replacement therapy within 30 days after surgery. The performance of the models was analyzed using the area under the curve (AUC) of the receiver operating characteristic curve, accuracy, precision, sensitivity (recall), specificity, and F1-score. Results: Among the 239,267 surgeries analyzed, 7935 cases of postoperative AKI were identified. The models, using 38 preoperative predictors, showed that deep neural network (AUC=0.832), light gradient boosting machine (AUC=0.836), and logistic regression (AUC=0.825) demonstrated superior performance in predicting AKI risk. The deep neural network model was then developed into a user-friendly website for clinical use. Conclusions: Our study introduces a robust, high-performance AKI risk prediction system that is applicable in clinical settings using preoperative data. This model's integration into a user-friendly website enhances its clinical utility, offering a significant step forward in personalized patient care and risk management. ", doi="10.2196/62853", url="https://www.jmir.org/2025/1/e62853" } @Article{info:doi/10.2196/66366, author="Kwun, Ju-Seung and Ahn, Houng-Beom and Kang, Si-Hyuck and Yoo, Sooyoung and Kim, Seok and Song, Wongeun and Hyun, Junho and Oh, Seon Ji and Baek, Gakyoung and Suh, Jung-Won", title="Developing a Machine Learning Model for Predicting 30-Day Major Adverse Cardiac and Cerebrovascular Events in Patients Undergoing Noncardiac Surgery: Retrospective Study", journal="J Med Internet Res", year="2025", month="Apr", day="9", volume="27", pages="e66366", keywords="perioperative risk evaluation", keywords="noncardiac surgery", keywords="prediction models", keywords="machine learning", keywords="common data model", keywords="ML", keywords="predictive modeling", keywords="cerebrovascular", keywords="electronic health records", keywords="EHR", keywords="clinical practice", keywords="risk", keywords="noncardiac surgeries", keywords="perioperative", abstract="Background: Considering that most patients with low or no significant risk factors can safely undergo noncardiac surgery without additional cardiac evaluation, and given the excessive evaluations often performed in patients undergoing intermediate or higher risk noncardiac surgeries, practical preoperative risk assessment tools are essential to reduce unnecessary delays for urgent outpatient services and manage medical costs more efficiently. Objective: This study aimed to use the Observational Medical Outcomes Partnership Common Data Model to develop a predictive model by applying machine learning algorithms that can effectively predict major adverse cardiac and cerebrovascular events (MACCE) in patients undergoing noncardiac surgery. Methods: This retrospective observational network study collected data by converting electronic health records into a standardized Observational Medical Outcomes Partnership Common Data Model format. The study was conducted in 2 tertiary hospitals. Data included demographic information, diagnoses, laboratory results, medications, surgical types, and clinical outcomes. A total of 46,225 patients were recruited from Seoul National University Bundang Hospital and 396,424 from Asan Medical Center. We selected patients aged 65 years and older undergoing noncardiac surgeries, excluding cardiac or emergency surgeries, and those with less than 30 days of observation. Using these observational health care data, we developed machine learning--based prediction models using the observational health data sciences and informatics open-source patient-level prediction package in R (version 4.1.0; R Foundation for Statistical Computing). A total of 5 machine learning algorithms, including random forest, were developed and validated internally and externally, with performance assessed through the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve, and calibration plots. Results: All machine learning prediction models surpassed the Revised Cardiac Risk Index in MACCE prediction performance (AUROC=0.704). Random forest showed the best results, achieving AUROC values of 0.897 (95\% CI 0.883-0.911) internally and 0.817 (95\% CI 0.815-0.819) externally, with an area under the precision-recall curve of 0.095. Among 46,225 patients of the Seoul National University Bundang Hospital, MACCE occurred in 4.9\% (2256/46,225), including myocardial infarction (907/46,225, 2\%) and stroke (799/46,225, 1.7\%), while in-hospital mortality was 0.9\% (419/46,225). For Asan Medical Center, 6.3\% (24,861/396,424) of patients experienced MACCE, with 1.5\% (6017/396,424) stroke and 3\% (11,875/396,424) in-hospital mortality. Furthermore, the significance of predictors linked to previous diagnoses and laboratory measurements underscored their critical role in effectively predicting perioperative risk. Conclusions: Our prediction models outperformed the widely used Revised Cardiac Risk Index in predicting MACCE within 30 days after noncardiac surgery, demonstrating superior calibration and generalizability across institutions. Its use can optimize preoperative evaluations, minimize unnecessary testing, and streamline perioperative care, significantly improving patient outcomes and resource use. We anticipate that applying this model to actual electronic health records will benefit clinical practice. ", doi="10.2196/66366", url="https://www.jmir.org/2025/1/e66366" } @Article{info:doi/10.2196/70475, author="Cai, Jianwen and Li, Peiyi and Li, Weimin and Hao, Xuechao and Li, Sheyu and Zhu, Tao", title="Digital Decision Support for Perioperative Care of Patients With Type 2 Diabetes: A Call to Action", journal="JMIR Diabetes", year="2025", month="Apr", day="8", volume="10", pages="e70475", keywords="perioperative diabetes", keywords="artificial intelligence", keywords="clinical decision support systems", doi="10.2196/70475", url="https://diabetes.jmir.org/2025/1/e70475" } @Article{info:doi/10.2196/69864, author="Jin, Yudi and Zhao, Min and Su, Tong and Fan, Yanjia and Ouyang, Zubin and Lv, Fajin", title="Comparing Random Survival Forests and Cox Regression for Nonresponders to Neoadjuvant Chemotherapy Among Patients With Breast Cancer: Multicenter Retrospective Cohort Study", journal="J Med Internet Res", year="2025", month="Apr", day="8", volume="27", pages="e69864", keywords="breast cancer", keywords="neoadjuvant chemotherapy", keywords="pathological complete response", keywords="survival risk", keywords="random survival forest", abstract="Background: Breast cancer is one of the most common malignancies among women worldwide. Patients who do not achieve a pathological complete response (pCR) or a clinical complete response (cCR) post--neoadjuvant chemotherapy (NAC) typically have a worse prognosis compared to those who do achieve these responses. Objective: This study aimed to develop and validate a random survival forest (RSF) model to predict survival risk in patients with breast cancer who do not achieve a pCR or cCR post-NAC. Methods: We analyzed patients with no pCR/cCR post-NAC treated at the First Affiliated Hospital of Chongqing Medical University from January 2019 to 2023, with external validation in Duke University and Surveillance, Epidemiology, and End Results (SEER) cohorts. RSF and Cox regression models were compared using the time-dependent area under the curve (AUC), the concordance index (C-index), and risk stratification. Results: The study cohort included 306 patients with breast cancer, with most aged 40-60 years (204/306, 66.7\%). The majority had invasive ductal carcinoma (290/306, 94.8\%), with estrogen receptor (ER)+ (182/306, 59.5\%), progesterone receptor (PR)-- (179/306, 58.5\%), and human epidermal growth factor receptor 2 (HER2)+ (94/306, 30.7\%) profiles. Most patients presented with T2 (185/306, 60.5\%), N1 (142/306, 46.4\%), and M0 (295/306, 96.4\%) staging (TNM meaning ``tumor, node, metastasis''), with 17.6\% (54/306) experiencing disease progression during a median follow-up of 25.9 months (IQR 17.2-36.3). External validation using Duke (N=94) and SEER (N=2760) cohorts confirmed consistent patterns in age (40-60 years: 59/94, 63\%, vs 1480/2760, 53.6\%), HER2+ rates (26/94, 28\%, vs 935/2760, 33.9\%), and invasive ductal carcinoma prevalence (89/94, 95\%, vs 2506/2760, 90.8\%). In the internal cohort, the RSF achieved significantly higher time-dependent AUCs compared to Cox regression at 1-year (0.811 vs 0.763), 3-year (0.834 vs 0.783), and 5-year (0.810 vs 0.771) intervals (overall C-index: 0.803, 95\% CI 0.747-0.859, vs 0.736, 95\% CI 0.673-0.799). External validation confirmed robust generalizability: the Duke cohort showed 1-, 3-, and 5-year AUCs of 0.912, 0.803, and 0.776, respectively, while the SEER cohort maintained consistent performance with AUCs of 0.771, 0.729, and 0.702, respectively. Risk stratification using the RSF identified 25.8\% (79/306) high-risk patients and a significantly reduced survival time (P<.001). Notably, the RSF maintained improved net benefits across decision thresholds in decision curve analysis (DCA); similar results were observed in external studies. The RSF model also showed promising performance across different molecular subtypes in all datasets. Based on the RSF predicted scores, patients were stratified into high- and low-risk groups, with notably poorer survival outcomes observed in the high-risk group compared to the low-risk group. Conclusions: The RSF model, based solely on clinicopathological variables, provides a promising tool for identifying high-risk patients with breast cancer post-NAC. This approach may facilitate personalized treatment strategies and improve patient management in clinical practice. ", doi="10.2196/69864", url="https://www.jmir.org/2025/1/e69864" } @Article{info:doi/10.2196/65629, author="Lim, De Ming and Connie, Tee and Goh, Ong Michael Kah and Saedon, `Izzati Nor", title="Model-Based Feature Extraction and Classification for Parkinson Disease Screening Using Gait Analysis: Development and Validation Study", journal="JMIR Aging", year="2025", month="Apr", day="8", volume="8", pages="e65629", keywords="model-based features", keywords="gait analysis", keywords="Parkinson disease", keywords="computer vision", keywords="support vector machine", abstract="Background: Parkinson disease (PD) is a progressive neurodegenerative disorder that affects motor coordination, leading to gait abnormalities. Early detection of PD is crucial for effective management and treatment. Traditional diagnostic methods often require invasive procedures or are performed when the disease has significantly progressed. Therefore, there is a need for noninvasive techniques that can identify early motor symptoms, particularly those related to gait. Objective: The study aimed to develop a noninvasive approach for the early detection of PD by analyzing model-based gait features. The primary focus is on identifying subtle gait abnormalities associated with PD using kinematic characteristics. Methods: Data were collected through controlled video recordings of participants performing the timed up and go (TUG) assessment, with particular emphasis on the turning phase. The kinematic features analyzed include shoulder distance, step length, stride length, knee and hip angles, leg and arm symmetry, and trunk angles. These features were processed using advanced filtering techniques and analyzed through machine learning methods to distinguish between normal and PD-affected gait patterns. Results: The analysis of kinematic features during the turning phase of the TUG assessment revealed that individuals with PD exhibited subtle gait abnormalities, such as freezing of gait, reduced step length, and asymmetrical movements. The model-based features proved effective in differentiating between normal and PD-affected gait, demonstrating the potential of this approach in early detection. Conclusions: This study presents a promising noninvasive method for the early detection of PD by analyzing specific gait features during the turning phase of the TUG assessment. The findings suggest that this approach could serve as a sensitive and accurate tool for diagnosing and monitoring PD, potentially leading to earlier intervention and improved patient outcomes. ", doi="10.2196/65629", url="https://aging.jmir.org/2025/1/e65629" } @Article{info:doi/10.2196/68454, author="Jeremic, Danko and Navarro-Lopez, D. Juan and Jimenez-Diaz, Lydia", title="Clinical Benefits and Risks of Antiamyloid Antibodies in Sporadic Alzheimer Disease: Systematic Review and Network Meta-Analysis With a Web Application", journal="J Med Internet Res", year="2025", month="Apr", day="7", volume="27", pages="e68454", keywords="Alzheimer disease", keywords="antibodies", keywords="donanemab", keywords="aducanumab", keywords="lecanemab", abstract="Background: Despite the increasing approval of antiamyloid antibodies for Alzheimer disease (AD), their clinical relevance and risk-benefit profile remain uncertain. The heterogeneity of AD and the limited availability of long-term clinical data make it difficult to establish a clear rationale for selecting one treatment over another. Objective: The aim of this work was to assess and compare the efficacy and safety of antiamyloid antibodies through an interactive online meta-analytic approach by performing conventional pair-wise meta-analyses and frequentist and Bayesian network meta-analyses of phase II and III clinical trial results. To achieve this, we developed AlzMeta.app 2.0, a freely accessible web application that enables researchers and clinicians to evaluate the relative and absolute risks and benefits of these therapies in real time, incorporating different prior choices and assumptions of baseline risks of disease progression and adverse events. Methods: We adhered to PRISMA-NMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for reporting of systematic reviews with network meta-analysis) and GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) guidelines for reporting and rating the certainty of evidence. Clinical trial reports (until September 30, 2024) were retrieved from PubMed, Google Scholar, and clinical trial databases (including ClinicalTrials.gov). Studies with <20 sporadic AD patients and a modified Jadad score <3 were excluded. Risk of bias was assessed with the RoB-2 tool. Relative risks and benefits have been expressed as risk ratios and standardized mean differences, with confidence, credible, and prediction intervals calculated for all outcomes. For significant results, the intervention effects were ranked in frequentist and Bayesian frameworks, and their clinical relevance was determined by the absolute risk per 1000 people and number needed to treat (NNT) for a wide range of control responses. Results: Among 7 treatments tested in 21,236 patients (26 studies with low risk of bias or with some concerns), donanemab was the best-ranked treatment on cognitive and functional measures, and it was almost 2 times more effective than aducanumab and lecanemab and significantly more beneficial than other treatments on the global (cognitive and functional) Clinical Dementia Rating Scale-Sum of Boxes (NNT=10, 95\% CI 8-16). Special caution is required regarding cerebral edema and microbleeding due to the clinically relevant risks of edema for donanemab (NNT=8, 95\% CI 5-16), aducanumab (NNT=10, 95\% CI 6-17), and lecanemab (NNT=14, 95\% CI 7-31), which may outweigh the benefits. Conclusions: Our results showed that donanemab is more effective and has a safety profile similar to aducanumab and lecanemab, highlighting the need for treatment options with improved safety. Potential bias may have been introduced in the included trials due to unblinding caused by frequent cerebral edema and microbleeds, as well as the impact of the COVID-19 pandemic. ", doi="10.2196/68454", url="https://www.jmir.org/2025/1/e68454" } @Article{info:doi/10.2196/62732, author="Zheng, Rui and Jiang, Xiao and Shen, Li and He, Tianrui and Ji, Mengting and Li, Xingyi and Yu, Guangjun", title="Investigating Clinicians' Intentions and Influencing Factors for Using an Intelligence-Enabled Diagnostic Clinical Decision Support System in Health Care Systems: Cross-Sectional Survey", journal="J Med Internet Res", year="2025", month="Apr", day="7", volume="27", pages="e62732", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="task-technology fit", keywords="technology acceptance model", keywords="perceived risk", keywords="performance expectations", keywords="intention to use", abstract="Background: An intelligence-enabled clinical decision support system (CDSS) is a computerized system that integrates medical knowledge, patient data, and clinical guidelines to assist health care providers make clinical decisions. Research studies have shown that CDSS utilization rates have not met expectations. Clinicians' intentions and their attitudes determine the use and promotion of CDSS in clinical practice. Objective: The aim of this study was to enhance the successful utilization of CDSS by analyzing the pivotal factors that influence clinicians' intentions to adopt it and by putting forward targeted management recommendations. Methods: This study proposed a research model grounded in the task-technology fit model and the technology acceptance model, which was then tested through a cross-sectional survey. The measurement instrument comprised demographic characteristics, multi-item scales, and an open-ended query regarding areas where clinicians perceived the system required improvement. We leveraged structural equation modeling to assess the direct and indirect effects of ``task-technology fit'' and ``perceived ease of use'' on clinicians' intentions to use the CDSS when mediated by ``performance expectation'' and ``perceived risk.'' We collated and analyzed the responses to the open-ended question. Results: We collected a total of 247 questionnaires. The model explained 65.8\% of the variance in use intention. Performance expectations ($\beta$=0.228; P<.001) and perceived risk ($\beta$=--0.579; P<.001) were both significant predictors of use intention. Task-technology fit ($\beta$=--0.281; P<.001) and perceived ease of use ($\beta$=--0.377; P<.001) negatively affected perceived risk. Perceived risk ($\beta$=--0.308; P<.001) negatively affected performance expectations. Task-technology fit positively affected perceived ease of use ($\beta$=0.692; P<.001) and performance expectations ($\beta$=0.508; P<.001). Task characteristics ($\beta$=0.168; P<.001) and technology characteristics ($\beta$=0.749; P<.001) positively affected task-technology fit. Contrary to expectations, perceived ease of use ($\beta$=0.108; P=.07) did not have a significant impact on use intention. From the open-ended question, 3 main themes emerged regarding clinicians' perceived deficiencies in CDSS: system security risks, personalized interaction, seamless integration. Conclusions: Perceived risk and performance expectations were direct determinants of clinicians' adoption of CDSS, significantly influenced by task-technology fit and perceived ease of use. In the future, increasing transparency within CDSS and fostering trust between clinicians and technology should be prioritized. Furthermore, focusing on personalized interactions and ensuring seamless integration into clinical workflows are crucial steps moving forward. ", doi="10.2196/62732", url="https://www.jmir.org/2025/1/e62732" } @Article{info:doi/10.2196/63609, author="Silva, Malpriya S. Sandun and Wabe, Nasir and Nguyen, D. Amy and Seaman, Karla and Huang, Guogui and Dodds, Laura and Meulenbroeks, Isabelle and Mercado, Ibarra Crisostomo and Westbrook, I. Johanna", title="Development of a Predictive Dashboard With Prescriptive Decision Support for Falls Prevention in Residential Aged Care: User-Centered Design Approach", journal="JMIR Aging", year="2025", month="Apr", day="7", volume="8", pages="e63609", keywords="falls prevention", keywords="dashboard architecture", keywords="predictive", keywords="sustainability", keywords="challenges", keywords="decision support", keywords="falls", keywords="aged care", keywords="geriatric", keywords="older adults", keywords="economic burden", keywords="prevention", keywords="electronic health record", keywords="EHR", keywords="intervention", keywords="decision-making", keywords="patient safety", keywords="risks", keywords="older people", keywords="monitoring", abstract="Background: Falls are a prevalent and serious health condition among older people in residential aged care facilities, causing significant health and economic burdens. However, the likelihood of future falls can be predicted, and thus, falls can be prevented if appropriate prevention programs are implemented. Current fall prevention programs in residential aged care facilities rely on risk screening tools with suboptimal predictive performance, leading to significant concerns regarding resident safety. Objective: This study aimed to develop a predictive, dynamic dashboard to identify residents at risk of falls with associated decision support. This paper provides an overview of the technical process, including the challenges faced and the strategies used to overcome them during the development of the dashboard. Methods: A predictive dashboard was co-designed with a major residential aged care partner in New South Wales, Australia. Data from resident profiles, daily medications, fall incidents, and fall risk assessments were used. A dynamic fall risk prediction model and personalized rule-based fall prevention recommendations were embedded in the dashboard. The data ingestion process into the dashboard was designed to mitigate the impact of underlying data system changes. This approach aims to ensure resilience against alterations in the data systems. Results: The dashboard was developed using Microsoft Power BI and advanced R programming by linking data silos. It includes dashboard views for those managing facilities and for those caring for residents. Data drill-through functionality was used to navigate through different dashboard views. Resident-level change in daily risk of falling and risk factors and timely evidence-based recommendations were output to prevent falls and enhance prescriptive decision support. Conclusions: This study emphasizes the significance of a sustainable dashboard architecture and how to overcome the challenges faced when developing a dashboard amid underlying data system changes. The development process used an iterative dashboard co-design process, ensuring the successful implementation of knowledge into practice. Future research will focus on the implementation and evaluation of the dashboard's impact on health processes and economic outcomes. International Registered Report Identifier (IRRID): RR2-https://doi.org/10.1136/bmjopen-2021-048657 ", doi="10.2196/63609", url="https://aging.jmir.org/2025/1/e63609" } @Article{info:doi/10.2196/58660, author="K{\"u}per, Alisa and Lodde, Christian Georg and Livingstone, Elisabeth and Schadendorf, Dirk and Kr{\"a}mer, Nicole", title="Psychological Factors Influencing Appropriate Reliance on AI-enabled Clinical Decision Support Systems: Experimental Web-Based Study Among Dermatologists", journal="J Med Internet Res", year="2025", month="Apr", day="4", volume="27", pages="e58660", keywords="AI reliance", keywords="psychological factors", keywords="clinical decision support systems", keywords="medical decision-making", keywords="artificial intelligence", keywords="AI", abstract="Background: Artificial intelligence (AI)--enabled decision support systems are critical tools in medical practice; however, their reliability is not absolute, necessitating human oversight for final decision-making. Human reliance on such systems can vary, influenced by factors such as individual psychological factors and physician experience. Objective: This study aimed to explore the psychological factors influencing subjective trust and reliance on medical AI's advice, specifically examining relative AI reliance and relative self-reliance to assess the appropriateness of reliance. Methods: A survey was conducted with 223 dermatologists, which included lesion image classification tasks and validated questionnaires assessing subjective trust, propensity to trust technology, affinity for technology interaction, control beliefs, need for cognition, as well as queries on medical experience and decision confidence. Results: A 2-tailed t test revealed that participants' accuracy improved significantly with AI support (t222=?3.3; P<.001; Cohen d=4.5), but only by an average of 1\% (1/100). Reliance on AI was stronger for correct advice than for incorrect advice (t222=4.2; P<.001; Cohen d=0.1). Notably, participants demonstrated a mean relative AI reliance of 10.04\% (139/1384) and a relative self-reliance of 85.6\% (487/569), indicating a high level of self-reliance but a low level of AI reliance. Propensity to trust technology influenced AI reliance, mediated by trust (indirect effect=0.024, 95\% CI 0.008-0.042; P<.001), and medical experience negatively predicted AI reliance (indirect effect=--0.001, 95\% CI --0.002 to ?0.001; P<.001). Conclusions: The findings highlight the need to design AI support systems in a way that assists less experienced users with a high propensity to trust technology to identify potential AI errors, while encouraging experienced physicians to actively engage with system recommendations and potentially reassess initial decisions. ", doi="10.2196/58660", url="https://www.jmir.org/2025/1/e58660" } @Article{info:doi/10.2196/68486, author="Cook, A. David and Overgaard, Joshua and Pankratz, Shane V. and Del Fiol, Guilherme and Aakre, A. Chris", title="Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback", journal="J Med Internet Res", year="2025", month="Apr", day="4", volume="27", pages="e68486", keywords="simulation training", keywords="natural language processing", keywords="computer-assisted instruction", keywords="clinical decision-making", keywords="clinical reasoning", keywords="machine learning", keywords="virtual patient", keywords="natural language generation", abstract="Background: Virtual patients (VPs) are computer screen--based simulations of patient-clinician encounters. VP use is limited by cost and low scalability. Objective: We aimed to show that VPs powered by large language models (LLMs) can generate authentic dialogues, accurately represent patient preferences, and provide personalized feedback on clinical performance. We also explored using LLMs to rate the quality of dialogues and feedback. Methods: We conducted an intrinsic evaluation study rating 60 VP-clinician conversations. We used carefully engineered prompts to direct OpenAI's generative pretrained transformer (GPT) to emulate a patient and provide feedback. Using 2 outpatient medicine topics (chronic cough diagnosis and diabetes management), each with permutations representing different patient preferences, we created 60 conversations (dialogues plus feedback): 48 with a human clinician and 12 ``self-chat'' dialogues with GPT role-playing both the VP and clinician. Primary outcomes were dialogue authenticity and feedback quality, rated using novel instruments for which we conducted a validation study collecting evidence of content, internal structure (reproducibility), relations with other variables, and response process. Each conversation was rated by 3 physicians and by GPT. Secondary outcomes included user experience, bias, patient preferences represented in the dialogues, and conversation features that influenced authenticity. Results: The average cost per conversation was US \$0.51 for GPT-4.0-Turbo and US \$0.02 for GPT-3.5-Turbo. Mean (SD) conversation ratings, maximum 6, were overall dialogue authenticity 4.7 (0.7), overall user experience 4.9 (0.7), and average feedback quality 4.7 (0.6). For dialogues created using GPT-4.0-Turbo, physician ratings of patient preferences aligned with intended preferences in 20 to 47 of 48 dialogues (42\%-98\%). Subgroup comparisons revealed higher ratings for dialogues using GPT-4.0-Turbo versus GPT-3.5-Turbo and for human-generated versus self-chat dialogues. Feedback ratings were similar for human-generated versus GPT-generated ratings, whereas authenticity ratings were lower. We did not perceive bias in any conversation. Dialogue features that detracted from authenticity included that GPT was verbose or used atypical vocabulary (93/180, 51.7\% of conversations), was overly agreeable (n=56, 31\%), repeated the question as part of the response (n=47, 26\%), was easily convinced by clinician suggestions (n=35, 19\%), or was not disaffected by poor clinician performance (n=32, 18\%). For feedback, detractors included excessively positive feedback (n=42, 23\%), failure to mention important weaknesses or strengths (n=41, 23\%), or factual inaccuracies (n=39, 22\%). Regarding validation of dialogue and feedback scores, items were meticulously developed (content evidence), and we confirmed expected relations with other variables (higher ratings for advanced LLMs and human-generated dialogues). Reproducibility was suboptimal, due largely to variation in LLM performance rather than rater idiosyncrasies. Conclusions: LLM-powered VPs can simulate patient-clinician dialogues, demonstrably represent patient preferences, and provide personalized performance feedback. This approach is scalable, globally accessible, and inexpensive. LLM-generated ratings of feedback quality are similar to human ratings. ", doi="10.2196/68486", url="https://www.jmir.org/2025/1/e68486", url="http://www.ncbi.nlm.nih.gov/pubmed/39854611" } @Article{info:doi/10.2196/60471, author="Chalmer, Rosansky Rachel Beth and Ayers, Emmeline and Weiss, F. Erica and Fowler, R. Nicole and Telzak, Andrew and Summanwar, Diana and Zwerling, Jessica and Wang, Cuiling and Xu, Huiping and Holden, J. Richard and Fiori, Kevin and French, D. Dustin and Nsubayi, Celeste and Ansari, Asif and Dexter, Paul and Higbie, Anna and Yadav, Pratibha and Walker, M. James and Congivaram, Harrshavasan and Adhikari, Dristi and Melecio-Vazquez, Mairim and Boustani, Malaz and Verghese, Joe", title="Improving Early Dementia Detection Among Diverse Older Adults With Cognitive Concerns With the 5-Cog Paradigm: Protocol for a Hybrid Effectiveness-Implementation Clinical Trial", journal="JMIR Res Protoc", year="2025", month="Apr", day="3", volume="14", pages="e60471", keywords="cognitive assessment", keywords="cognitive screening", keywords="cognitive impairment", keywords="mild cognitive impairment", keywords="dementia", keywords="dissemination and implementation science", keywords="clinical trial protocol", keywords="randomized controlled trial", keywords="hybrid implementation-effectiveness trial", abstract="Background: The 5-Cog paradigm is a 5-minute brief cognitive assessment coupled with a clinical decision support tool designed to improve clinicians' early detection of cognitive impairment, including dementia, in their diverse older primary care patients. The 5-Cog battery uses picture- and symbol-based assessments and a questionnaire. It is low cost, simple, minimizes literacy bias, and is culturally fair. The decision support component of the paradigm helps nudge appropriate care provider response to an abnormal 5-Cog battery. Objective: The objective of our study is to evaluate the effectiveness, implementation, and cost of the 5-Cog paradigm. Methods: We will enroll 6600 older patients with cognitive concerns from 22 primary care clinics in the Bronx, New York, and in multiple locations in Indiana for this hybrid type 1 effectiveness-implementation trial. We will analyze the effectiveness of the 5-Cog paradigm to increase the rate of new diagnoses of mild cognitive impairment syndrome or dementia using a pragmatic, cluster randomized clinical trial design. The secondary outcome is the ordering of new tests, treatments, and referrals for cognitive indications within 90 days after the study visit. The 5-Cog's decision support component will be deployed as an electronic medical record feature. We will analyze the 5-Cog's implementation process, context, and outcomes through the Consolidated Framework for Implementation Research using a mixed methods design (surveys and interviews). The study will also examine cost-effectiveness from societal and payer (Medicare) perspectives by estimating the cost per additional dementia diagnosis. Results: The study is funded by the National Institute of Neurological Disorders and Stroke of the National Institutes of Health (2U01NS105565). The protocol was approved by the Albert Einstein College of Medicine Institutional Review Board in September 2022. A validation study was completed to select cut scores for the 5-Cog battery. Among the 76 patients enrolled, the resulting clinical diagnoses were as follows: dementia in 32 (42\%); mild cognitive impairment in 28 (37\%); subjective cognitive concerns without objective cognitive impairment in 12 (16\%); no cognitive diagnosis assigned in 2 (3\%). The mean scores were Picture-Based Memory Impairment Screen 5.8 (SD 2.7), Symbol Match 27.2 (SD 18.2), and Subjective Motoric Cognitive Risk 2.4 (SD 1.7). The cut scores for an abnormal or positive result on the 5-Cog components were as follows: Picture-Based Memory Impairment Screen ?6 (range 0-8), Symbol Match ?25 (range 0-65), and Subjective Motoric Cognitive Risk >5 (range 0-7). As of December 2024, a total of 12 clinics had completed the onboarding processes, and 2369 patients had been enrolled. Conclusions: The findings of this study will facilitate the rapid adaptation and dissemination of this effective and practical clinical tool across diverse primary care clinical settings. Trial Registration: ClinicalTrials.gov NCT05515224; https://www.clinicaltrials.gov/study/NCT05515224 International Registered Report Identifier (IRRID): DERR1-10.2196/60471 ", doi="10.2196/60471", url="https://www.researchprotocols.org/2025/1/e60471" } @Article{info:doi/10.2196/62942, author="Isaradech, Natthanaphop and Sirikul, Wachiranun and Buawangpong, Nida and Siviroj, Penprapa and Kitro, Amornphat", title="Machine Learning Models for Frailty Classification of Older Adults in Northern Thailand: Model Development and Validation Study", journal="JMIR Aging", year="2025", month="Apr", day="2", volume="8", pages="e62942", keywords="aged care", keywords="gerontology", keywords="geriatric", keywords="old", keywords="aging", keywords="clinical decision support", keywords="delivering health information and knowledge to the public", keywords="diagnostic systems", keywords="digital health", keywords="epidemiology", keywords="surveillance", keywords="diagnosis", keywords="frailty", keywords="machine learning", keywords="prediction", keywords="predictive", keywords="AI", keywords="artificial intelligence", keywords="Thailand", keywords="community dwelling", keywords="health care intervention", keywords="patient care", abstract="Background: Frailty is defined as a clinical state of increased vulnerability due to the age-associated decline of an individual's physical function resulting in increased morbidity and mortality when exposed to acute stressors. Early identification and management can reverse individuals with frailty to being robust once more. However, we found no integration of machine learning (ML) tools and frailty screening and surveillance studies in Thailand despite the abundance of evidence of frailty assessment using ML globally and in Asia. Objective: We propose an approach for early diagnosis of frailty in community-dwelling older individuals in Thailand using an ML model generated from individual characteristics and anthropometric data. Methods: Datasets including 2692 community-dwelling Thai older adults in Lampang from 2016 and 2017 were used for model development and internal validation. The derived models were externally validated with a dataset of community-dwelling older adults in Chiang Mai from 2021. The ML algorithms implemented in this study include the k-nearest neighbors algorithm, random forest ML algorithms, multilayer perceptron artificial neural network, logistic regression models, gradient boosting classifier, and linear support vector machine classifier. Results: Logistic regression showed the best overall discrimination performance with a mean area under the receiver operating characteristic curve of 0.81 (95\% CI 0.75?0.86) in the internal validation dataset and 0.75 (95\% CI 0.71?0.78) in the external validation dataset. The model was also well-calibrated to the expected probability of the external validation dataset. Conclusions: Our findings showed that our models have the potential to be utilized as a screening tool using simple, accessible demographic and explainable clinical variables in Thai community-dwelling older persons to identify individuals with frailty who require early intervention to become physically robust. ", doi="10.2196/62942", url="https://aging.jmir.org/2025/1/e62942" } @Article{info:doi/10.2196/59520, author="Park, Chanmin and Han, Changho and Jang, Kyeong Su and Kim, Hyungjun and Kim, Sora and Kang, Hee Byung and Jung, Kyoungwon and Yoon, Dukyong", title="Development and Validation of a Machine Learning Model for Early Prediction of Delirium in Intensive Care Units Using Continuous Physiological Data: Retrospective Study", journal="J Med Internet Res", year="2025", month="Apr", day="2", volume="27", pages="e59520", keywords="delirium", keywords="intensive care unit", keywords="machine learning", keywords="prediction model", keywords="early prediction", abstract="Background: Delirium in intensive care unit (ICU) patients poses a significant challenge, affecting patient outcomes and health care efficiency. Developing an accurate, real-time prediction model for delirium represents an advancement in critical care, addressing needs for timely intervention and resource optimization in ICUs. Objective: We aimed to create a novel machine learning model for delirium prediction in ICU patients using only continuous physiological data. Methods: We developed models integrating routinely available clinical data, such as age, sex, and patient monitoring device outputs, to ensure practicality and adaptability in diverse clinical settings. To confirm the reliability of delirium determination records, we prospectively collected results of Confusion Assessment Method for the ICU (CAM-ICU) evaluations performed by qualified investigators from May 17, 2021, to December 23, 2022, determining Cohen $\kappa$ coefficients. Participants were included in the study if they were aged ?18 years at ICU admission, had delirium evaluations using the CAM-ICU, and had data collected for at least 4 hours before delirium diagnosis or nondiagnosis. The development cohort from Yongin Severance Hospital (March 1, 2020, to January 12, 2022) comprised 5478 records: 5129 (93.62\%) records from 651 patients for training and 349 (6.37\%) records from 163 patients for internal validation. For temporal validation, we used 4438 records from the same hospital (January 28, 2022, to December 31, 2022) to reflect potential seasonal variations. External validation was performed using data from 670 patients at Ajou University Hospital (March 2022 to September 2022). We evaluated machine learning algorithms (random forest [RF], extra-trees classifier, and light gradient boosting machine) and selected the RF model as the final model based on its performance. To confirm clinical utility, a decision curve analysis and temporal pattern for model prediction during the ICU stay were performed. Results: The $\kappa$ coefficient between labels generated by ICU nurses and prospectively verified by qualified researchers was 0.81, indicating reliable CAM-ICU results. Our final model showed robust performance in internal validation (area under the receiver operating characteristic curve [AUROC]: 0.82; area under the precision-recall curve [AUPRC]: 0.62) and maintained its accuracy in temporal validation (AUROC: 0.73; AUPRC: 0.85). External validation supported its effectiveness (AUROC: 0.84; AUPRC: 0.77). Decision curve analysis showed a positive net benefit at all thresholds, and the temporal pattern analysis showed a gradual increase in the model scores as the actual delirium diagnosis time approached. Conclusions: We developed a machine learning model for delirium prediction in ICU patients using routinely measured variables, including physiological waveforms. Our study demonstrates the potential of the RF model in predicting delirium, with consistent performance across various validation scenarios. The model uses noninvasive variables, making it applicable to a wide range of ICU patients, with minimal additional risk. ", doi="10.2196/59520", url="https://www.jmir.org/2025/1/e59520" } @Article{info:doi/10.2196/62978, author="M{\"a}nnikk{\"o}, Viljami and Tommola, Janne and Tikkanen, Emmi and H{\"a}tinen, Olli-Pekka and {\AA}berg, Fredrik", title="Large-Scale Evaluation and Liver Disease Risk Prediction in Finland's National Electronic Health Record System: Feasibility Study Using Real-World Data", journal="JMIR Med Inform", year="2025", month="Apr", day="2", volume="13", pages="e62978", keywords="Kanta archive", keywords="national patient data repository", keywords="real world data", keywords="risk prediction", keywords="chronic liver disease", keywords="mortality", keywords="risk detection", keywords="alcoholic liver", keywords="prediction", keywords="obesity", keywords="overweight", keywords="electronic health record", keywords="wearables", keywords="smartwatch", abstract="Background: Globally, the incidence and mortality of chronic liver disease are escalating. Early detection of liver disease remains a challenge, often occurring at symptomatic stages when preventative measures are less effective. The Chronic Liver Disease score (CLivD) is a predictive risk model developed using Finnish health care data, aiming to forecast an individual's risk of developing chronic liver disease in subsequent years. The Kanta Service is a national electronic health record system in Finland that stores comprehensive health care data including patient medical histories, prescriptions, and laboratory results, to facilitate health care delivery and research. Objective: This study aimed to evaluate the feasibility of implementing an automatic CLivD score with the current Kanta platform and identify and suggest improvements for Kanta that would enable accurate automatic risk detection. Methods: In this study, a real-world data repository (Kanta) was used as a data source for ``The ClivD score'' risk calculation model. Our dataset consisted of 96,200 individuals' whole medical history from Kanta. For real-world data use, we designed processes to handle missing input in the calculation process. Results: We found that Kanta currently lacks many CLivD risk model input parameters in the structured format required to calculate precise risk scores. However, the risk scores can be improved by using the unstructured text in patient reports and by approximating variables by using other health data--like diagnosis information. Using structured data, we were able to identify only 33 out of 51,275 individuals in the ``low risk'' category and 308 out of 51,275 individuals (<1\%) in the ``moderate risk'' category. By adding diagnosis information approximation and free text use, we were able to identify 18,895 out of 51,275 (37\%) individuals in the ``low risk'' category and 2125 out of 51,275 (4\%) individuals in the ``moderate risk'' category. In both cases, we were not able to identify any individuals in the ``high-risk'' category because of the missing waist-hip ratio measurement. We evaluated 3 scenarios to improve the coverage of waist-hip ratio data in Kanta and these yielded the most substantial improvement in prediction accuracy. Conclusions: We conclude that the current structured Kanta data is not enough for precise risk calculation for CLivD or other diseases where obesity, smoking, and alcohol use are important risk factors. Our simulations show up to 14\% improvement in risk detection when adding support for missing input variables. Kanta shows the potential for implementing nationwide automated risk detection models that could result in improved disease prevention and public health. ", doi="10.2196/62978", url="https://medinform.jmir.org/2025/1/e62978", url="http://www.ncbi.nlm.nih.gov/pubmed/40172947" } @Article{info:doi/10.2196/71768, author="Juels, Parker", title="The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary", journal="JMIR Dermatol", year="2025", month="Apr", day="1", volume="8", pages="e71768", keywords="artificial intelligence", keywords="ChatGPT", keywords="atopic dermatitis", keywords="acne vulgaris", keywords="actinic keratosis", keywords="rosacea", keywords="AI", keywords="diagnosis", keywords="treatment", keywords="prognosis", keywords="dermatological diagnoses", keywords="chatbots", keywords="patients", keywords="dermatologist", doi="10.2196/71768", url="https://derma.jmir.org/2025/1/e71768" } @Article{info:doi/10.2196/72540, author="Chau, Courtney and Feng, Hao and Cobos, Gabriela and Park, Joyce", title="Authors' Reply: The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary", journal="JMIR Dermatol", year="2025", month="Apr", day="1", volume="8", pages="e72540", keywords="artificial intelligence", keywords="ChatGPT", keywords="atopic dermatitis", keywords="acne vulgaris", keywords="actinic keratosis", keywords="rosacea", keywords="AI", keywords="diagnosis", keywords="treatment", keywords="prognosis", keywords="dermatological diagnoses", keywords="chatbots", keywords="patients", keywords="dermatologist", doi="10.2196/72540", url="https://derma.jmir.org/2025/1/e72540" } @Article{info:doi/10.2196/62749, author="Iivanainen, Sanna and Arokoski, Reetta and Mentu, Santeri and Lang, Laura and Ekstr{\"o}m, Jussi and Virtanen, Henri and Kataja, Vesa and Koivunen, Pekka Jussi", title="Development of a Comprehensive Decision Support Tool for Chemotherapy-Cycle Prescribing: Initial Usability Study", journal="JMIR Form Res", year="2025", month="Mar", day="31", volume="9", pages="e62749", keywords="cancer", keywords="chemotherapy", keywords="ePRO", keywords="electronic patient-reported outcome", keywords="decision support system", abstract="Background: Chemotherapy cycle prescription is generally carried out through a multistep manual process that is prone to human error. Clinical decision support tools can provide patient-specific assessments that support clinical decisions, improve prescribing practices, and reduce medication errors. Objective: We hypothesized that a knowledge-based, patient-derived, evidence-directed decision support tool consisting of multiple modules focusing on the core duties preceding chemotherapy-cycle prescription could result in a more cost-effective and error-free approach and streamline the workflow. Methods: A 1-arm, multicenter, prospective clinical trial (``Follow-up of Cancer Patients Receiving Chemotherapy or Targeted Therapy by Electronic Patient Reported Outcomes-tool'' [ECHO] 7/2019-1/2021; NCT04081558) was initiated to investigate the tool. The most important inclusion criteria were the presence of colorectal cancer (CRC) treated with oxaliplatin-based chemotherapy, age ?18 years, Eastern Cooperative Oncology Group [ECOG] performance score of 0 to 2, and internet access. A decision support tool that included digital symptom monitoring, a laboratory value interface, and treatment schedule integration for semiautomated chemotherapy cycle prescribing was integrated into the care pathway. Performance was assessed by the percentage of chemotherapy cycles with sent and completed symptom questionnaires, while perceptions of health care professionals (HCPs) on the feasibility of the approach were collected through a 1-time semistructured interview. Results: The ECHO trial included 43 patients with CRC treated with doublet or triplet chemotherapy in an adjuvant or metastatic setting. Altogether, 843 electronic patient-reported outcome (ePRO) symptom questionnaires were completed. Of the 15 recorded symptoms, fatigue (n=446, 52.9\%) and peripheral neuropathy (n=429, 50.9\%) were reported most often, while 137 grade 3 to 4 symptoms were recorded, of which diarrhea (n=5, 4\%) and peripheral neuropathy (n=4, 3\%) were the most common. During the study, 339 chemotherapy cycles were prescribed, and for the 77\% (n=262) of new chemotherapy cycles, ePRO questionnaire data were available within preset limits (completed within 3 days prior to chemotherapy scheduling) while 65\% of the cycles (n=221) had symptom questionnaire grading at ?1\%, and 67\% of the cycles (n=228) had laboratory values in a preset range. The recommendations by the tool for a new chemotherapy cycle were tier 1 (green; meaning ``go'') in 145 (42.8\%) of the cycles, tier 2 (yellow; ``evaluate'') in 83 (25\%), and tier 3 (red; ``hold'') in 111 (32.7\%). HCPs (n=3) were interviewed with a questionnaire (comprising 8 questions), revealing that they most valued the improved workflow, faster patient evaluation, and direct messaging option. Conclusions: In this study, we investigated the feasibility of a decision support system for chemotherapy-cycle pre-evaluation and prescription that was developed for the prospective ECHO trial. The study showed that the functionalities of the investigated tool were feasible and that an automated approach to chemotherapy-cycle prescription was possible for nearly half of the cycles. Trial Registration: ClinicalTrials.gov NCT04081558; https://clinicaltrials.gov/study/NCT04081558 ", doi="10.2196/62749", url="https://formative.jmir.org/2025/1/e62749" } @Article{info:doi/10.2196/63983, author="Lee, Heonyi and Kim, Yi-Jun and Kim, Jin-Hong and Kim, Soo-Kyung and Jeong, Tae-Dong", title="Optimizing Initial Vancomycin Dosing in Hospitalized Patients Using Machine Learning Approach for Enhanced Therapeutic Outcomes: Algorithm Development and Validation Study", journal="J Med Internet Res", year="2025", month="Mar", day="31", volume="27", pages="e63983", keywords="algorithm", keywords="machine learning", keywords="therapeutic drug monitoring", keywords="vancomycin", keywords="area under curve", keywords="pharmacokinetics", keywords="vancomycin dosing", abstract="Background: Vancomycin is commonly dosed using standard weight--based methods before dose adjustments are made through therapeutic drug monitoring (TDM). However, variability in initial dosing can lead to suboptimal therapeutic outcomes. A predictive model that personalizes initial dosing based on patient-specific pharmacokinetic factors prior to administration may enhance target attainment and minimize the need for subsequent dose adjustments. Objective: This study aimed to develop and evaluate a machine learning (ML)--based algorithm to predict whether an initial vancomycin dose falls within the therapeutic range of the 24-hour area under the curve to minimum inhibitory concentration, thereby optimizing the initial vancomycin dosage. Methods: A retrospective cohort study was conducted using hospitalized patients who received intravenous vancomycin and underwent pharmacokinetic TDM consultation (n=415). The cohort was randomly divided into training and testing datasets in a 7:3 ratio, and multiple ML techniques were used to develop an algorithm for optimizing initial vancomycin dosing. The optimal algorithm, referred to as the OPTIVAN algorithm, was selected and validated using an external cohort (n=268). We evaluated the performance of 4 ML models: gradient boosting machine, random forest (RF), support vector machine (SVM), and eXtreme gradient boosting (XGB). Additionally, a web-based clinical support tool was developed to facilitate real-time vancomycin TDM application in clinical practice. Results: The SVM algorithm demonstrated the best predictive performance, achieving an area under the receiver operating characteristic curve (AUROC) of 0.832 (95\% CI 0.753-0.900) for the training dataset and 0.720 (95\% CI 0.654-0.783) for the external validation dataset. The gradient boosting machine followed closely with AUROC scores of 0.802 (95\% CI 0.667-0.857) for the training dataset and 0.689 (95\% CI 0.596-0.733) for the validation dataset. In contrast, both XGB and RF exhibited relatively lower performance. XGB achieved AUROC values of 0.769 (95\% CI 0.671-0.853) for the training set and 0.707 (95\% CI 0.644-0.772) for the validation set, while RF recorded AUROC scores of 0.759 (95\% CI 0.656-0.846) for the test dataset and 0.693 (95\% CI 0.625-0.757) for the external validation set. The SVM model incorporated 7 covariates: age, BMI, glucose, blood urea nitrogen, estimated glomerular filtration rate, hematocrit, and daily dose per body weight. Subgroup analyses demonstrated consistent performance across different patient categories, such as renal function, sex, and BMI. A web-based TDM analysis tool was developed using the OPTIVAN algorithm. Conclusions: The OPTIVAN algorithm represents a significant advancement in personalized initial vancomycin dosing, addressing the limitations of current TDM practices. By optimizing the initial dose, this algorithm may reduce the need for subsequent dosage adjustments. The algorithm's web-based app is easy to use, making it a practical tool for clinicians. This study highlights the potential of ML to enhance the effectiveness of vancomycin treatment. ", doi="10.2196/63983", url="https://www.jmir.org/2025/1/e63983" } @Article{info:doi/10.2196/58951, author="Zhao, Yanchun and Huang, Ting and Chen, Yanli and Li, Songmei and Zhao, Juan and Han, Xu and Ni, Qing and Su, Ning", title="Evaluation of the Clinical Nursing Effects of a Traditional Chinese Medicine Nursing Program Based on Care Pathways for Patients With Type 2 Diabetes: Protocol for a Randomized Controlled Clinical Trial", journal="JMIR Res Protoc", year="2025", month="Mar", day="31", volume="14", pages="e58951", keywords="type 2 diabetes", keywords="traditional Chinese medicine", keywords="TCM nursing program", keywords="clinical pathway", keywords="application research", keywords="diabetes", keywords="diabetes mellitus", keywords="research protocol", keywords="nursing", keywords="nursing program", keywords="nursing care", keywords="chronic disease", keywords="disease monitoring", keywords="prevalence", keywords="China", keywords="adult", keywords="patient recovery", keywords="psychological care", keywords="health education", keywords="quality of life", keywords="blood glucose", keywords="self-care", keywords="medication", keywords="control group", keywords="patient satisfaction", abstract="Background: To improve the performance of health care institutions, reduce overmedication, and minimize the waste of medical resources, China is committed to implementing a clinical pathway management model. This study aims to standardize nursing practices, foster clinical thinking in nurses, and promote patient recovery. Objective: The purpose of this study is to evaluate the clinical effects of a traditional Chinese medicine (TCM) nursing program based on nursing pathways for patients with type 2 diabetes mellitus (T2DM). Methods: This study uses a prospective, randomized, single-blind, parallel-controlled design. Based on sample size calculations, the study will include 594 patients with diabetes, with 2 groups of 297 patients: an observation group will receive a TCM nursing program based on clinical pathways, while a control group will receive routine care. Both groups will be evaluated before and after the intervention using assessment indicators. The primary outcome is the quality of life score, measured by a diabetes-specific quality of life questionnaire. Secondary outcomes include hospital stay duration, medical expenses, health knowledge, blood glucose control, symptom scores, and patient satisfaction. Results: This study was funded in August 2021 and has received approval from the Ethics Committee of Guang'anmen Hospital, China Academy of Chinese Medical Sciences (2022-022-KY-01). The trial is ongoing, with the first patient enrolled in September 2022. The study is expected to conclude in April 2025. To date, 380 patients have been recruited, with 202 randomized into the study, though no statistical analysis of the data has yet been conducted. A single-blind method is used; nurses are aware of group assignments and intervention plans, while patients remain blinded. Final results are planned for release in the first quarter of 2025. Conclusions: This study seeks to integrate existing national standardized nursing protocols with clinical pathways to implement more efficient and higher-quality nursing practices. The goal is to standardize nursing procedures, enhance patients' quality of life, and improve self-care and medication adherence after discharge. Trial Registration: International Traditional Medicine Clinical Trial Registry ITMCTR2022000048; https://tinyurl.com/y4jd68h4 International Registered Report Identifier (IRRID): DERR1-10.2196/58951 ", doi="10.2196/58951", url="https://www.researchprotocols.org/2025/1/e58951" } @Article{info:doi/10.2196/68618, author="Abdullah, Abdullah and Kim, Tae Seong", title="Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework", journal="JMIR Med Inform", year="2025", month="Mar", day="28", volume="13", pages="e68618", keywords="large language model", keywords="generative pre-trained transformers", keywords="radiology report", keywords="labeling", keywords="BERT", keywords="thoracic pathologies", keywords="LLM", keywords="GPT", abstract="Background: Labeling unstructured radiology reports is crucial for creating structured datasets that facilitate downstream tasks, such as training large-scale medical imaging models. Current approaches typically rely on Bidirectional Encoder Representations from Transformers (BERT)-based methods or manual expert annotations, which have limitations in terms of scalability and performance. Objective: This study aimed to evaluate the effectiveness of a generative pretrained transformer (GPT)-based large language model (LLM) in labeling radiology reports, comparing it with 2 existing methods, CheXbert and CheXpert, on a large chest X-ray dataset (MIMIC Chest X-ray [MIMIC-CXR]). Methods: In this study, we introduce an LLM-based approach fine-tuned on expert-labeled radiology reports. Our model's performance was evaluated on 687 radiologist-labeled chest X-ray reports, comparing F1 scores across 14 thoracic pathologies. The performance of our LLM model was compared with the CheXbert and CheXpert models across positive, negative, and uncertainty extraction tasks. Paired t tests and Wilcoxon signed-rank tests were performed to evaluate the statistical significance of differences between model performances. Results: The GPT-based LLM model achieved an average F1 score of 0.9014 across all certainty levels, outperforming CheXpert (0.8864) and approaching CheXbert's performance (0.9047). For positive and negative certainty levels, our model scored 0.8708, surpassing CheXpert (0.8525) and closely matching CheXbert (0.8733). Statistically, paired t tests indicated no significant difference between our model and CheXbert (P=.35) but a significant improvement over CheXpert (P=.01). Wilcoxon signed-rank tests corroborated these findings, showing no significant difference between our model and CheXbert (P=.14) but confirming a significant difference with CheXpert (P=.005). The LLM also demonstrated superior performance for pathologies with longer and more complex descriptions, leveraging its extended context length. Conclusions: The GPT-based LLM model demonstrates competitive performance compared with CheXbert and outperforms CheXpert in radiology report labeling. These findings suggest that LLMs are a promising alternative to traditional BERT-based architectures for this task, offering enhanced context understanding and eliminating the need for extensive feature engineering. Furthermore, with large context length LLM-based models are better suited for this task as compared with the small context length of BERT based models. ", doi="10.2196/68618", url="https://medinform.jmir.org/2025/1/e68618" } @Article{info:doi/10.2196/67178, author="Skoric, James and Lomanowska, M. Anna and Janmohamed, Tahir and Lumsden-Ruegg, Heather and Katz, Joel and Clarke, Hance and Rahman, Abidur Quazi", title="Predicting Clinical Outcomes at the Toronto General Hospital Transitional Pain Service via the Manage My Pain App: Machine Learning Approach", journal="JMIR Med Inform", year="2025", month="Mar", day="28", volume="13", pages="e67178", keywords="chronic pain", keywords="transitional pain", keywords="pain interference", keywords="machine learning", keywords="prediction model", keywords="manage my pain", keywords="pain app", keywords="clinical outcome", keywords="Toronto", keywords="Canada", keywords="transitional pain service", keywords="pain service", keywords="pain", keywords="app", keywords="application", keywords="prognosis", keywords="chronic pain management", keywords="digital health", keywords="digital health tool", keywords="pain management", keywords="machine learning methods", keywords="prediction", keywords="machine learning models", keywords="logistic regression", abstract="Background: Chronic pain is a complex condition that affects more than a quarter of people worldwide. The development and progression of chronic pain are unique to each individual due to the contribution of interacting biological, psychological, and social factors. The subjective nature of the experience of chronic pain can make its clinical assessment and prognosis challenging. Personalized digital health apps, such as Manage My Pain (MMP), are popular pain self-tracking tools that can also be leveraged by clinicians to support patients. Recent advances in machine learning technologies open an opportunity to use data collected in pain apps to make predictions about a patient's prognosis. Objective: This study applies machine learning methods using real-world user data from the MMP app to predict clinically significant improvements in pain-related outcomes among patients at the Toronto General Hospital Transitional Pain Service. Methods: Information entered into the MMP app by 160 Transitional Pain Service patients over a 1-month period, including profile information, pain records, daily reflections, and clinical questionnaire responses, was used to extract 245 relevant variables, referred to as features, for use in a machine learning model. The machine learning model was developed using logistic regression with recursive feature elimination to predict clinically significant improvements in pain-related pain interference, assessed by the PROMIS Pain Interference 8a v1.0 questionnaire. The model was tuned and the important features were selected using the 10-fold cross-validation method. Leave-one-out cross-validation was used to test the model's performance. Results: The model predicted patient improvement in pain interference with 79\% accuracy and an area under the receiver operating characteristic curve of 0.82. It showed balanced class accuracies between improved and nonimproved patients, with a sensitivity of 0.76 and a specificity of 0.82. Feature importance analysis indicated that all MMP app data, not just clinical questionnaire responses, were key to classifying patient improvement. Conclusions: This study demonstrates that data from a digital health app can be integrated with clinical questionnaire responses in a machine learning model to effectively predict which chronic pain patients will show clinically significant improvement. The findings emphasize the potential of machine learning methods in real-world clinical settings to improve personalized treatment plans and patient outcomes. ", doi="10.2196/67178", url="https://medinform.jmir.org/2025/1/e67178" } @Article{info:doi/10.2196/65537, author="Yang, Hao and Li, Jiaxi and Zhang, Chi and Sierra, Pazos Alejandro and Shen, Bairong", title="Large Language Model--Driven Knowledge Graph Construction in Sepsis Care Using Multicenter Clinical Databases: Development and Usability Study", journal="J Med Internet Res", year="2025", month="Mar", day="27", volume="27", pages="e65537", keywords="sepsis", keywords="knowledge graph", keywords="large language models", keywords="prompt engineering", keywords="real-world", keywords="GPT-4.0", abstract="Background: Sepsis is a complex, life-threatening condition characterized by significant heterogeneity and vast amounts of unstructured data, posing substantial challenges for traditional knowledge graph construction methods. The integration of large language models (LLMs) with real-world data offers a promising avenue to address these challenges and enhance the understanding and management of sepsis. Objective: This study aims to develop a comprehensive sepsis knowledge graph by leveraging the capabilities of LLMs, specifically GPT-4.0, in conjunction with multicenter clinical databases. The goal is to improve the understanding of sepsis and provide actionable insights for clinical decision-making. We also established a multicenter sepsis database (MSD) to support this effort. Methods: We collected clinical guidelines, public databases, and real-world data from 3 major hospitals in Western China, encompassing 10,544 patients diagnosed with sepsis. Using GPT-4.0, we used advanced prompt engineering techniques for entity recognition and relationship extraction, which facilitated the construction of a nuanced sepsis knowledge graph. Results: We established a sepsis database with 10,544 patient records, including 8497 from West China Hospital, 690 from Shangjin Hospital, and 357 from Tianfu Hospital. The sepsis knowledge graph comprises of 1894 nodes and 2021 distinct relationships, encompassing nine entity concepts (diseases, symptoms, biomarkers, imaging examinations, etc) and 8 semantic relationships (complications, recommended medications, laboratory tests, etc). GPT-4.0 demonstrated superior performance in entity recognition and relationship extraction, achieving an F1-score of 76.76 on a sepsis-specific dataset, outperforming other models such as Qwen2 (43.77) and Llama3 (48.39). On the CMeEE dataset, GPT-4.0 achieved an F1-score of 65.42 using few-shot learning, surpassing traditional models such as BERT-CRF (62.11) and Med-BERT (60.66). Building upon this, we compiled a comprehensive sepsis knowledge graph, comprising of 1894 nodes and 2021 distinct relationships. Conclusions: This study represents a pioneering effort in using LLMs, particularly GPT-4.0, to construct a comprehensive sepsis knowledge graph. The innovative application of prompt engineering, combined with the integration of multicenter real-world data, has significantly enhanced the efficiency and accuracy of knowledge graph construction. The resulting knowledge graph provides a robust framework for understanding sepsis, supporting clinical decision-making, and facilitating further research. The success of this approach underscores the potential of LLMs in medical research and sets a new benchmark for future studies in sepsis and other complex medical conditions. ", doi="10.2196/65537", url="https://www.jmir.org/2025/1/e65537" } @Article{info:doi/10.2196/64617, author="Kelly, Anthony and Jensen, Kjems Esben and Grua, Martino Eoin and Mathiasen, Kim and Van de Ven, Pepijn", title="An Interpretable Model With Probabilistic Integrated Scoring for Mental Health Treatment Prediction: Design Study", journal="JMIR Med Inform", year="2025", month="Mar", day="26", volume="13", pages="e64617", keywords="machine learning", keywords="mental health", keywords="Monte Carlo dropout", keywords="explainability", keywords="explainable AI", keywords="XAI", keywords="artificial intelligence", keywords="AI", abstract="Background: Machine learning (ML) systems in health care have the potential to enhance decision-making but often fail to address critical issues such as prediction explainability, confidence, and robustness in a context-based and easily interpretable manner. Objective: This study aimed to design and evaluate an ML model for a future decision support system for clinical psychopathological treatment assessments. The novel ML model is inherently interpretable and transparent. It aims to enhance clinical explainability and trust through a transparent, hierarchical model structure that progresses from questions to scores to classification predictions. The model confidence and robustness were addressed by applying Monte Carlo dropout, a probabilistic method that reveals model uncertainty and confidence. Methods: A model for clinical psychopathological treatment assessments was developed, incorporating a novel ML model structure. The model aimed at enhancing the graphical interpretation of the model outputs and addressing issues of prediction explainability, confidence, and robustness. The proposed ML model was trained and validated using patient questionnaire answers and demographics from a web-based treatment service in Denmark (N=1088). Results: The balanced accuracy score on the test set was 0.79. The precision was ?0.71 for all 4 prediction classes (depression, panic, social phobia, and specific phobia). The area under the curve for the 4 classes was 0.93, 0.92, 0.91, and 0.98, respectively. Conclusions: We have demonstrated a mental health treatment ML model that supported a graphical interpretation of prediction class probability distributions. Their spread and overlap can inform clinicians of competing treatment possibilities for patients and uncertainty in treatment predictions. With the ML model achieving 79\% balanced accuracy, we expect that the model will be clinically useful in both screening new patients and informing clinical interviews. ", doi="10.2196/64617", url="https://medinform.jmir.org/2025/1/e64617" } @Article{info:doi/10.2196/64266, author="Ackerhans, Sophia and Wehkamp, Kai and Petzina, Rainer and Dumitrescu, Daniel and Schultz, Carsten", title="Perceived Trust and Professional Identity Threat in AI-Based Clinical Decision Support Systems: Scenario-Based Experimental Study on AI Process Design Features", journal="JMIR Form Res", year="2025", month="Mar", day="26", volume="9", pages="e64266", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="explainable artificial intelligence", keywords="professional identity threat", keywords="health care", keywords="physicians", keywords="perceptions", keywords="professional identity", abstract="Background: Artificial intelligence (AI)--based systems in medicine like clinical decision support systems (CDSSs) have shown promising results in health care, sometimes outperforming human specialists. However, the integration of AI may challenge medical professionals' identities and lead to limited trust in technology, resulting in health care professionals rejecting AI-based systems. Objective: This study aims to explore the impact of AI process design features on physicians' trust in the AI solution and on perceived threats to their professional identity. These design features involve the explainability of AI-based CDSS decision outcomes, the integration depth of the AI-generated advice into the clinical workflow, and the physician's accountability for the AI system-induced medical decisions. Methods: We conducted a 3-factorial web-based between-subject scenario-based experiment with 292 medical students in their medical training and experienced physicians across different specialties. The participants were presented with an AI-based CDSS for sepsis prediction and prevention for use in a hospital. Each participant was given a scenario in which the 3 design features of the AI-based CDSS were manipulated in a 2{\texttimes}2{\texttimes}2 factorial design. SPSS PROCESS (IBM Corp) macro was used for hypothesis testing. Results: The results suggest that the explainability of the AI-based CDSS was positively associated with both trust in the AI system ($\beta$=.508; P<.001) and professional identity threat perceptions ($\beta$=.351; P=.02). Trust in the AI system was found to be negatively related to professional identity threat perceptions ($\beta$=--.138; P=.047), indicating a partially mediated effect on professional identity threat through trust. Deep integration of AI-generated advice into the clinical workflow was positively associated with trust in the system ($\beta$=.262; P=.009). The accountability of the AI-based decisions, that is, the system required a signature, was found to be positively associated with professional identity threat perceptions among the respondents ($\beta$=.339; P=.004). Conclusions: Our research highlights the role of process design features of AI systems used in medicine in shaping professional identity perceptions, mediated through increased trust in AI. An explainable AI-based CDSS and an AI-generated system advice, which is deeply integrated into the clinical workflow, reinforce trust, thereby mitigating perceived professional identity threats. However, explainable AI and individual accountability of the system directly exacerbate threat perceptions. Our findings illustrate the complex nature of the behavioral patterns of AI in health care and have broader implications for supporting the implementation of AI-based CDSSs in a context where AI systems may impact professional identity. ", doi="10.2196/64266", url="https://formative.jmir.org/2025/1/e64266" } @Article{info:doi/10.2196/63681, author="Senathirajah, Yalini and Kaufman, R. David and Cato, Kenrick and Daniel, Pia and Roblin, Patricia and Kushniruk, Andre and Borycki, M. Elizabeth and Feld, Emanuel and Debi, Poli", title="The Impact of the Burden of COVID-19 Regulatory Reporting in a Small Independent Hospital and a Large Network Hospital: Comparative Mixed Methods Study", journal="Online J Public Health Inform", year="2025", month="Mar", day="26", volume="17", pages="e63681", keywords="regulatory reporting", keywords="human factors", keywords="reporting burden", keywords="emergency response", keywords="COVID-19", keywords="hospital resilience", keywords="pandemic response", abstract="Background: During the COVID-19 pandemic in 2020, hospitals encountered numerous challenges that compounded their difficulties. Some of these challenges directly impacted patient care, such as the need to expand capacities, adjust services, and use new knowledge to save lives in an ever-evolving situation. In addition, hospitals faced regulatory challenges. Objective: This paper presents the findings of a qualitative study that aimed to compare the effects of reporting requirements on a small independent hospital and a large network hospital during the COVID-19 pandemic. Methods: We used both quantitative and qualitative analyses and conducted 51 interviews, which were thematically analyzed. We quantified the changes in regulatory reporting requirements during the first 14 months of the pandemic. Results: Reporting requirements placed a substantial time burden on key clinical personnel at the small independent hospital, consequently reducing the time available for patient care. Conversely, the large network hospital had dedicated nonclinical staff responsible for reporting duties, and their robust health information system facilitated this work. Conclusions: The discrepancy in health IT capabilities suggests that there may be significant institutional inequities affecting smaller hospitals' ability to respond to a pandemic and adequately support public health efforts. Electronic certification guidelines are essential to addressing the substantial equity issues. We discuss in detail the health care policy implications of these findings. ", doi="10.2196/63681", url="https://ojphi.jmir.org/2025/1/e63681", url="http://www.ncbi.nlm.nih.gov/pubmed/40137048" } @Article{info:doi/10.2196/65872, author="Ji, Huanhuan and Gong, Meiling and Gong, Li and Zhang, Ni and Zhou, Ruiou and Deng, Dongmei and Yang, Ya and Song, Lin and Jia, Yuntao", title="Detection of Clinically Significant Drug-Drug Interactions in Fatal Torsades de Pointes: Disproportionality Analysis of the Food and Drug Administration Adverse Event Reporting System", journal="J Med Internet Res", year="2025", month="Mar", day="25", volume="27", pages="e65872", keywords="torsades de pointes", keywords="FAERS database", keywords="drug-drug interactions", keywords="QTc-prolonging drugs", keywords="adverse drug events", abstract="Background: Torsades de pointes (TdP) is a rare yet potentially fatal cardiac arrhythmia that is often drug-induced. Drug-drug interactions (DDIs) are a major risk factor for TdP development, but the specific drug combinations that increase this risk have not been extensively studied. Objective: This study aims to identify clinically significant, high-priority DDIs to provide a foundation to minimize the risk of TdP and effectively manage DDI risks in the future. Methods: We used the following 4 frequency statistical models to detect DDI signals using the Food and Drug Administration Adverse Event Reporting System (FAERS) database: $\Omega$ shrinkage measure, combination risk ratio, chi-square statistic, and additive model. The adverse event of interest was TdP, and the drugs targeted were all registered and classified as ``suspect,'' ``interacting,'' or ``concomitant drugs'' in FAERS. The DDI signals were identified and evaluated using the Lexicomp and Drugs.com databases, supplemented with real-world data from the literature. Results: As of September 2023, this study included 4313 TdP cases, with 721 drugs and 4230 drug combinations that were reported for at least 3 cases. The $\Omega$ shrinkage measure model demonstrated the most conservative signal detection, whereas the chi-square statistic model exhibited the closest similarity in signal detection tendency to the $\Omega$ shrinkage measure model. The $\kappa$ value was 0.972 (95\% CI 0.942-1.002), and the Ppositive and Pnegative values were 0.987 and 0.985, respectively. We detected 2158 combinations using the 4 frequency statistical models, of which 241 combinations were indexed by Drugs.com or Lexicomp and 105 were indexed by both. The most commonly interacting drugs were amiodarone, citalopram, quetiapine, ondansetron, ciprofloxacin, methadone, escitalopram, sotalol, and voriconazole. The most common combinations were citalopram and quetiapine, amiodarone and ciprofloxacin, amiodarone and escitalopram, amiodarone and fluoxetine, ciprofloxacin and sotalol, and amiodarone and citalopram. Although 38 DDIs were indexed by Drugs.com and Lexicomp, they were not detected by any of the 4 models. Conclusions: Clinical evidence on DDIs is limited, and not all combinations of heart rate--corrected QT interval (QTc)--prolonging drugs result in TdP, even when involving high-risk drugs or those with known risk of TdP. This study provides a comprehensive real-world overview of drug-induced TdP, delimiting both clinically significant DDIs and negative DDIs, providing valuable insights into the safety profiles of various drugs, and informing the optimization of clinical practice. ", doi="10.2196/65872", url="https://www.jmir.org/2025/1/e65872" } @Article{info:doi/10.2196/59738, author="Dima, Lelia Alexandra and Nabergoj Makovec, Urska and Ribaut, Janette and Haupenthal, Frederik and Barnestein-Fonseca, Pilar and Goetzinger, Catherine and Grant, Sean and J{\'a}come, Cristina and Smits, Dins and Tadic, Ivana and van Boven, Job and Tsiligianni, Ioanna and Herdeiro, Teresa Maria and Roque, F{\'a}tima and ", title="Stakeholder Consensus on an Interdisciplinary Terminology to Enable the Development and Uptake of Medication Adherence Technologies Across Health Systems: Web-Based Real-Time Delphi Study", journal="J Med Internet Res", year="2025", month="Mar", day="25", volume="27", pages="e59738", keywords="health technology", keywords="medication adherence", keywords="Delphi study", keywords="stakeholder engagement", keywords="digital health", keywords="behavioral science", keywords="implementation science", abstract="Background: Technology-mediated medication adherence interventions have proven useful, yet implementation in clinical practice is low. The European Network to Advance Best Practices and Technology on Medication Adherence (ENABLE) European Cooperation in Science and Technology Action (CA19132) online repository of medication adherence technologies (MATechs) aims to provide an open access, searchable knowledge management platform to facilitate innovation and support medication adherence management across health systems. To provide a solid foundation for optimal use and collaboration, the repository requires a shared interdisciplinary terminology. Objective: We consulted stakeholders on their views and level of agreement with the terminology proposed to inform the ENABLE repository structure. Methods: A real-time web-based Delphi study was conducted with stakeholders from 39 countries active in research, clinical practice, patient representation, policy making, and technology development. Participants rated terms and definitions of MATech and of 21 attribute clusters on product and provider information, medication adherence descriptors, and evaluation and implementation. Relevance, clarity, and completeness criteria were rated on 9-point scales, and free-text comments were provided interactively. Participants could reconsider their ratings based on real-time aggregated feedback and revisit the survey throughout the study period. We quantified agreement and process indicators for the complete sample and per stakeholder group and performed content analysis on comments. Consensus was considered reached for ratings with a disagreement index of <1. Median ratings guided decisions on whether attributes were considered mandatory, optional, or not relevant. We used the results to improve the terminology and repository structure. Results: Of 250 stakeholders invited, 117 (46.8\%) rated the MATech definition, of whom 83 (70.9\%) rated all attributes. Consensus was reached for all items. The definition was considered appropriate and clear (median ratings 7.02, IPR 6.10-7.69, and 7.26, IPR 6.73-7.90, respectively). Most attributes were considered relevant, mandatory, and sufficiently clear to remain unchanged except for ISO certification (considered optional; median relevance rating 6.34, IPR 5.50-7.24) and medication adherence phase, medication adherence measurement, and medication adherence intervention (candidates for optional changes; median clarity ratings 6.07, IPR 4.86-7.17; 6.37, IPR 4.80-6.67; and 5.67, IPR 4.66-6.61, respectively). Subgroup analyses found several attribute clusters considered moderately clear by some stakeholder groups. Results were consistent across stakeholder groups and time, yet response variation was found within some stakeholder groups for selected clusters, suggesting targets for further discussion. Comments highlighted issues for further debate and provided suggestions informing modifications to improve comprehensiveness, relevance, and clarity. Conclusions: By reaching agreement on a comprehensive MATech terminology developed following state-of-the-art methodology, this study represents a key step in the ENABLE initiative to develop an information architecture capable of structuring and facilitating the development and implementation of MATech across Europe. The debates and challenges highlighted in stakeholders' comments outline a potential road map for further development of the terminology and the ENABLE repository. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2021-059674 ", doi="10.2196/59738", url="https://www.jmir.org/2025/1/e59738" } @Article{info:doi/10.2196/65456, author="Helgeson, A. Scott and Quicksall, S. Zachary and Johnson, W. Patrick and Lim, G. Kaiser and Carter, E. Rickey and Lee, S. Augustine", title="Estimation of Static Lung Volumes and Capacities From Spirometry Using Machine Learning: Algorithm Development and Validation", journal="JMIR AI", year="2025", month="Mar", day="24", volume="4", pages="e65456", keywords="artificial intelligence", keywords="machine learning", keywords="pulmonary function test", keywords="spirometry", keywords="total lung capacity", keywords="AI", keywords="ML", keywords="lung", keywords="lung volume", keywords="lung capacity", keywords="spirometer", keywords="lung disease", keywords="database", keywords="respiratory", keywords="pulmonary", abstract="Background: Spirometry can be performed in an office setting or remotely using portable spirometers. Although basic spirometry is used for diagnosis of obstructive lung disease, clinically relevant information such as restriction, hyperinflation, and air trapping require additional testing, such as body plethysmography, which is not as readily available. We hypothesize that spirometry data contains information that can allow estimation of static lung volumes in certain circumstances by leveraging machine learning techniques. Objective: The aim of the study was to develop artificial intelligence-based algorithms for estimating lung volumes and capacities using spirometry measures. Methods: This study obtained spirometry and lung volume measurements from the Mayo Clinic pulmonary function test database for patient visits between February 19, 2001, and December 16, 2022. Preprocessing was performed, and various machine learning algorithms were applied, including a generalized linear model with regularization, random forests, extremely randomized trees, gradient-boosted trees, and XGBoost for both classification and regression cohorts. Results: A total of 121,498 pulmonary function tests were used in this study, with 85,017 allotted for exploratory data analysis and model development (ie, training dataset) and 36,481 tests reserved for model evaluation (ie, testing dataset). The median age of the cohort was 64.7 years (IQR 18?119.6), with a balanced distribution between genders, consisting 48.2\% (n=58,607) female and 51.8\% (n=62,889) male patients. The classification models showed a robust performance overall, with relatively low root mean square error and mean absolute error values observed across all predicted lung volumes. Across all lung volume categories, the models demonstrated strong discriminatory capacity, as indicated by the high area under the receiver operating characteristic curve values ranging from 0.85 to 0.99 in the training set and 0.81 to 0.98 in the testing set. Conclusions: Overall, the models demonstrate robust performance across lung volume measurements, underscoring their potential utility in clinical practice for accurate diagnosis and prognosis of respiratory conditions, particularly in settings where access to body plethysmography or other lung volume measurement modalities is limited. ", doi="10.2196/65456", url="https://ai.jmir.org/2025/1/e65456" } @Article{info:doi/10.2196/63937, author="Parduzi, Qendresa and Wermelinger, Jonathan and Koller, Domingo Simon and Sariyar, Murat and Schneider, Ulf and Raabe, Andreas and Seidel, Kathleen", title="Explainable AI for Intraoperative Motor-Evoked Potential Muscle Classification in Neurosurgery: Bicentric Retrospective Study", journal="J Med Internet Res", year="2025", month="Mar", day="24", volume="27", pages="e63937", keywords="intraoperative neuromonitoring", keywords="motor evoked potential", keywords="artificial intelligence", keywords="machine learning", keywords="deep learning", keywords="random forest", keywords="convolutional neural network", keywords="explainability", keywords="medical informatics", keywords="personalized medicine", keywords="neurophysiological", keywords="monitoring", keywords="orthopedic", keywords="motor", keywords="neurosurgery", abstract="Background: Intraoperative neurophysiological monitoring (IONM) guides the surgeon in ensuring motor pathway integrity during high-risk neurosurgical and orthopedic procedures. Although motor-evoked potentials (MEPs) are valuable for predicting motor outcomes, the key features of predictive signals are not well understood, and standardized warning criteria are lacking. Developing a muscle identification prediction model could increase patient safety while allowing the exploration of relevant features for the task. Objective: The aim of this study is to expand the development of machine learning (ML) methods for muscle classification and evaluate them in a bicentric setup. Further, we aim to identify key features of MEP signals that contribute to accurate muscle classification using explainable artificial intelligence (XAI) techniques. Methods: This study used ML and deep learning models, specifically random forest (RF) classifiers and convolutional neural networks (CNNs), to classify MEP signals from routine supratentorial neurosurgical procedures from two medical centers according to muscle identity of four muscles (extensor digitorum, abductor pollicis brevis, tibialis anterior, and abductor hallucis). The algorithms were trained and validated on a total of 36,992 MEPs from 151 surgeries in one center, and they were tested on 24,298 MEPs from 58 surgeries from the other center. Depending on the algorithm, time-series, feature-engineered, and time-frequency representations of the MEP data were used. XAI techniques, specifically Shapley Additive Explanation (SHAP) values and gradient class activation maps (Grad-CAM), were implemented to identify important signal features. Results: High classification accuracy was achieved with the RF classifier, reaching 87.9\% accuracy on the validation set and 80\% accuracy on the test set. The 1D- and 2D-CNNs demonstrated comparably strong performance. Our XAI findings indicate that frequency components and peak latencies are crucial for accurate MEP classification, providing insights that could inform intraoperative warning criteria. Conclusions: This study demonstrates the effectiveness of ML techniques and the importance of XAI in enhancing trust in and reliability of artificial intelligence--driven IONM applications. Further, it may help to identify new intrinsic features of MEP signals so far overlooked in conventional warning criteria. By reducing the risk of muscle mislabeling and by providing the basis for possible new warning criteria, this study may help to increase patient safety during surgical procedures. ", doi="10.2196/63937", url="https://www.jmir.org/2025/1/e63937" } @Article{info:doi/10.2196/63923, author="Partridge, Brad and Gillespie, Nicole and Soyer, Peter H. and Mar, Victoria and Janda, Monika", title="Exploring the Views of Dermatologists, General Practitioners, and Melanographers on the Use of AI Tools in the Context of Good Decision-Making When Detecting Melanoma: Qualitative Interview Study", journal="JMIR Dermatol", year="2025", month="Mar", day="24", volume="8", pages="e63923", keywords="artificial intelligence", keywords="melanoma", keywords="skin cancer", keywords="decision-making", keywords="decision support", keywords="qualitative", keywords="attitudes", keywords="dermatologists", keywords="general practitioners", keywords="melanographers", keywords="Australia", keywords="New Zealand", abstract="Background: Evidence that artificial intelligence (AI) may improve melanoma detection has led to calls for increased human-AI collaboration in clinical workflows. However, AI-based support may entail a wide range of specific functions for AI. To appropriately integrate AI into decision-making processes, it is crucial to understand the precise role that clinicians see AI playing within their clinical deliberations. Objective: This study aims to provide an in-depth understanding of how a range of clinicians involved in melanoma screening and diagnosis conceptualize the role of AI within their decision-making and what these conceptualizations mean for good decision-making. Methods: This qualitative exploration used in-depth individual interviews with 30 clinicians, predominantly from Australia and New Zealand (n=26, 87\%), who engaged in melanoma detection (n=17, 57\% dermatologists; n=6, 20\% general practitioners with an interest in skin cancer; and n=7, 23\% melanographers). The vast majority of the sample (n=25, 83\%) had interacted with or used 2D or 3D skin imaging technologies with AI tools for screening or diagnosis of melanoma, either as part of testing through clinical AI reader studies or within their clinical work. Results: We constructed the following 5 themes to describe how participants conceptualized the role of AI within decision-making when it comes to melanoma detection: theme 1 (integrative theme)---the importance of good clinical judgment; theme 2---AI as just one tool among many; theme 3---AI as an adjunct after a clinician's decision; theme 4---AI as a second opinion for unresolved decisions; theme 5---AI as an expert guide before decision-making. Participants articulated a major conundrum---AI may benefit inexperienced clinicians when conceptualized as an ``expert guide,'' but overreliance, deskilling, and a failure to recognize AI errors may mean only experienced clinicians should use AI ``as a tool.'' However, experienced clinicians typically relied on their own clinical judgment, and some could be wary of allowing AI to ``influence'' their deliberations. The benefit of AI was often to reassure decisions once they had been reached by conceptualizing AI as a kind of ``checker,'' ``validator,'' or in a small number of equivocal cases, as a genuine ``second opinion.'' This raised questions about the extent to which experienced clinicians truly seek to ``collaborate'' with AI or use it to inform decisions. Conclusions: Clinicians conceptualized AI support in an array of disparate ways that have implications for how AI should be incorporated into clinical workflows. A priority for clinicians is the conservation of good clinical acumen, and our study encourages a more focused engagement with users about the precise way to incorporate AI into the clinical decision-making process for melanoma detection. ", doi="10.2196/63923", url="https://derma.jmir.org/2025/1/e63923" } @Article{info:doi/10.2196/67922, author="Xu, He-Li and Li, Xiao-Ying and Jia, Ming-Qian and Ma, Qi-Peng and Zhang, Ying-Hua and Liu, Fang-Hua and Qin, Ying and Chen, Yu-Han and Li, Yu and Chen, Xi-Yang and Xu, Yi-Lin and Li, Dong-Run and Wang, Dong-Dong and Huang, Dong-Hui and Xiao, Qian and Zhao, Yu-Hong and Gao, Song and Qin, Xue and Tao, Tao and Gong, Ting-Ting and Wu, Qi-Jun", title="AI-Derived Blood Biomarkers for Ovarian Cancer Diagnosis: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2025", month="Mar", day="24", volume="27", pages="e67922", keywords="artificial intelligence", keywords="AI", keywords="blood biomarker", keywords="ovarian cancer", keywords="diagnosis", keywords="PRISMA", abstract="Background: Emerging evidence underscores the potential application of artificial intelligence (AI) in discovering noninvasive blood biomarkers. However, the diagnostic value of AI-derived blood biomarkers for ovarian cancer (OC) remains inconsistent. Objective: We aimed to evaluate the research quality and the validity of AI-based blood biomarkers in OC diagnosis. Methods: A systematic search was performed in the MEDLINE, Embase, IEEE Xplore, PubMed, Web of Science, and the Cochrane Library databases. Studies examining the diagnostic accuracy of AI in discovering OC blood biomarkers were identified. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies--AI tool. Pooled sensitivity, specificity, and area under the curve (AUC) were estimated using a bivariate model for the diagnostic meta-analysis. Results: A total of 40 studies were ultimately included. Most (n=31, 78\%) included studies were evaluated as low risk of bias. Overall, the pooled sensitivity, specificity, and AUC were 85\% (95\% CI 83\%-87\%), 91\% (95\% CI 90\%-92\%), and 0.95 (95\% CI 0.92-0.96), respectively. For contingency tables with the highest accuracy, the pooled sensitivity, specificity, and AUC were 95\% (95\% CI 90\%-97\%), 97\% (95\% CI 95\%-98\%), and 0.99 (95\% CI 0.98-1.00), respectively. Stratification by AI algorithms revealed higher sensitivity and specificity in studies using machine learning (sensitivity=85\% and specificity=92\%) compared to those using deep learning (sensitivity=77\% and specificity=85\%). In addition, studies using serum reported substantially higher sensitivity (94\%) and specificity (96\%) than those using plasma (sensitivity=83\% and specificity=91\%). Stratification by external validation demonstrated significantly higher specificity in studies with external validation (specificity=94\%) compared to those without external validation (specificity=89\%), while the reverse was observed for sensitivity (74\% vs 90\%). No publication bias was detected in this meta-analysis. Conclusions: AI algorithms demonstrate satisfactory performance in the diagnosis of OC using blood biomarkers and are anticipated to become an effective diagnostic modality in the future, potentially avoiding unnecessary surgeries. Future research is warranted to incorporate external validation into AI diagnostic models, as well as to prioritize the adoption of deep learning methodologies. Trial Registration: PROSPERO CRD42023481232; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023481232 ", doi="10.2196/67922", url="https://www.jmir.org/2025/1/e67922", url="http://www.ncbi.nlm.nih.gov/pubmed/40126546" } @Article{info:doi/10.2196/66273, author="Gyrard, Amelie and Abedian, Somayeh and Gribbon, Philip and Manias, George and van Nuland, Rick and Zatloukal, Kurt and Nicolae, Emilia Irina and Danciu, Gabriel and Nechifor, Septimiu and Marti-Bonmati, Luis and Mallol, Pedro and Dalmiani, Stefano and Autexier, Serge and Jendrossek, Mario and Avramidis, Ioannis and Garcia Alvarez, Eva and Holub, Petr and Blanquer, Ignacio and Boden, Anna and Hussein, Rada", title="Lessons Learned From European Health Data Projects With Cancer Use Cases: Implementation of Health Standards and Internet of Things Semantic Interoperability", journal="J Med Internet Res", year="2025", month="Mar", day="24", volume="27", pages="e66273", keywords="artificial intelligence", keywords="cancer", keywords="European Health Data Space", keywords="health care standards", keywords="interoperability", keywords="AI", keywords="health data", keywords="cancer use cases", keywords="IoT", keywords="Internet of Things", keywords="primary data", keywords="diagnosis", keywords="prognosis", keywords="decision-making", doi="10.2196/66273", url="https://www.jmir.org/2025/1/e66273", url="http://www.ncbi.nlm.nih.gov/pubmed/40126534" } @Article{info:doi/10.2196/64933, author="Wu, J. Jennifer and Graham, Ross and {\c{C}}elebi, Julie and Fraser, Kevin and Gin, T. Geneen and Dang, Laurel and Hatamy, Esmatullah and Walker, Amanda and Barbato, Courtney and Lunde, Ottar and Coles, Lisa and Agnihotri, Parag and Morn, Cassandra and Tai-Seale, Ming", title="Factors Influencing Primary Care Physicians' Intent to Refer Patients With Hypertension to a Digital Remote Blood Pressure Monitoring Program: Mixed Methods Study", journal="J Med Internet Res", year="2025", month="Mar", day="24", volume="27", pages="e64933", keywords="digital health", keywords="primary care", keywords="electronic health records", keywords="referral", keywords="hypertension", keywords="remote monitoring", keywords="remote blood pressure", keywords="digital technology", keywords="mobile phone", keywords="mixed method", keywords="quantitative analysis", keywords="linear regression", keywords="clinical information", abstract="Background: Primary care physicians' (PCP) referral rates to digital health programs are highly variable. This study explores whether knowledge of the digital remote blood pressure monitoring (RBPM) program and information on referral patterns influence PCPs' intention to refer patients. Objective: This study aims to examine the relationship between PCPs' knowledge of the digital RBPM program and information on their own prior referral rates versus their own with their peers' referral rates and their likelihood to refer patients to the digital RBPM program. Methods: This is a mixed methods study integrating quantitative analysis of electronic health record data regarding the frequency of PCPs' referrals of patients with hypertension to a digital health program and quantitative and qualitative analyses of survey data about PCPs' knowledge of the program and their intention to refer patients. PCPs responded to a clinical vignette featuring an eligible patient. They were randomized to either receive their own referral rate or their own plus their peers' referral rate. They were assessed on their intent to refer eligible future patients. Descriptive and multivariable linear regression analyses examined participant characteristics and the factors associated with their intent to refer patients. Narrative reasons for their intention to refer were thematically analyzed. Results: Of the 242 eligible PCPs invited to participate, 31\% (n=70) responded to the survey. From electronic health record data, the mean referral rate of patients per PCP was 11.80\% (SD 13.30\%). The mean self-reported knowledge of the digital health program was 6.47 (SD 1.81). The mean likelihood of referring an eligible patient (on a scale of 0 to 10, with 0 being not at all, and 10 being definitely) based on a vignette was 8.54 (SD 2.12). The own referral data group's mean likelihood to refer was 8.91 (SD 1.28), whereas the own plus peer prior referral data group was 8.35 (SD 2.19). Regression analyses suggested the intention to refer the vignette patient was significantly associated with their knowledge (coefficient 0.46, 95\% CI 0.20-0.73; P<.001), whereas the intention to refer future patients was significantly associated with their intent to refer the patient in the vignette (coefficient 0.62, 95\% CI 0.46-0.78; P<.001). No evidence of association was found on receiving own plus peer referral data compared with own referral data and intent to refer future patients (coefficient 0.23, 95\% CI --0.43 to 0.89; P=.48). Conclusions: Physicians' intention to refer patients to a novel digital health program can be extrapolated by examining their intention to refer an eligible patient portrayed in a vignette, which was found to be significantly influenced by their knowledge of the program. Future efforts should engage PCPs to better inform them so that more patients can benefit from the digital health program. ", doi="10.2196/64933", url="https://www.jmir.org/2025/1/e64933", url="http://www.ncbi.nlm.nih.gov/pubmed/40126550" } @Article{info:doi/10.2196/67967, author="Schaye, Verity and DiTullio, David and Guzman, Vincent Benedict and Vennemeyer, Scott and Shih, Hanniel and Reinstein, Ilan and Weber, E. Danielle and Goodman, Abbie and Wu, Y. Danny T. and Sartori, J. Daniel and Santen, A. Sally and Gruppen, Larry and Aphinyanaphongs, Yindalon and Burk-Rafel, Jesse", title="Large Language Model--Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study", journal="J Med Internet Res", year="2025", month="Mar", day="21", volume="27", pages="e67967", keywords="large language models", keywords="artificial intelligence", keywords="clinical reasoning", keywords="documentation", keywords="assessment", keywords="feedback", keywords="electronic health record", abstract="Background: Clinical reasoning (CR) is an essential skill; yet, physicians often receive limited feedback. Artificial intelligence holds promise to fill this gap. Objective: We report the development of named entity recognition (NER), logic-based and large language model (LLM)--based assessments of CR documentation in the electronic health record across 2 institutions (New York University Grossman School of Medicine [NYU] and University of Cincinnati College of Medicine [UC]). Methods: The note corpus consisted of internal medicine resident admission notes (retrospective set: July 2020-December 2021, n=700 NYU and 450 UC notes and prospective validation set: July 2023-December 2023, n=155 NYU and 92 UC notes). Clinicians rated CR documentation quality in each note using a previously validated tool (Revised-IDEA), on 3-point scales across 2 domains: differential diagnosis (D0, D1, and D2) and explanation of reasoning, (EA0, EA1, and EA2). At NYU, the retrospective set was annotated for NER for 5 entities (diagnosis, diagnostic category, prioritization of diagnosis language, data, and linkage terms). Models were developed using different artificial intelligence approaches, including NER, logic-based model: a large word vector model (scispaCy en\_core\_sci\_lg) with model weights adjusted with backpropagation from annotations, developed at NYU with external validation at UC, NYUTron LLM: an NYU internal 110 million parameter LLM pretrained on 7.25 million clinical notes, only validated at NYU, and GatorTron LLM: an open source 345 million parameter LLM pretrained on 82 billion words of clinical text, fined tuned on NYU retrospective sets, then externally validated and further fine-tuned at UC. Model performance was assessed in the prospective sets with F1-scores for the NER, logic-based model and area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) for the LLMs. Results: At NYU, the NYUTron LLM performed best: the D0 and D2 models had AUROC/AUPRC 0.87/0.79 and 0.89/0.86, respectively. The D1, EA0, and EA1 models had insufficient performance for implementation (AUROC range 0.57-0.80, AUPRC range 0.33-0.63). For the D1 classification, the approach pivoted to a stepwise approach taking advantage of the more performant D0 and D2 models. For the EA model, the approach pivoted to a binary EA2 model (ie, EA2 vs not EA2) with excellent performance, AUROC/AUPRC 0.85/ 0.80. At UC, the NER, D-logic--based model was the best performing D model (F1-scores 0.80, 0.74, and 0.80 for D0, D1, D2, respectively. The GatorTron LLM performed best for EA2 scores AUROC/AUPRC 0.75/ 0.69. Conclusions: This is the first multi-institutional study to apply LLMs for assessing CR documentation in the electronic health record. Such tools can enhance feedback on CR. Lessons learned by implementing these models at distinct institutions support the generalizability of this approach. ", doi="10.2196/67967", url="https://www.jmir.org/2025/1/e67967" } @Article{info:doi/10.2196/59209, author="Wickramasekera, Nyantara and Shackley, Phil and Rowen, Donna", title="Embedding a Choice Experiment in an Online Decision Aid or Tool: Scoping Review", journal="J Med Internet Res", year="2025", month="Mar", day="21", volume="27", pages="e59209", keywords="decision aid", keywords="decision tool", keywords="discrete choice experiment", keywords="conjoint analysis", keywords="value clarification", keywords="scoping review", keywords="choice experiment", keywords="database", keywords="study", keywords="article", keywords="data charting", keywords="narrative synthesis", abstract="Background: Decision aids empower patients to understand how treatment options match their preferences. Choice experiments, a method to clarify values used within decision aids, present patients with hypothetical scenarios to reveal their preferences for treatment characteristics. Given the rise in research embedding choice experiments in decision tools and the emergence of novel developments in embedding methodology, a scoping review is warranted. Objective: This scoping review examines how choice experiments are embedded into decision tools and how these tools are evaluated, to identify best practices. Methods: This scoping review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Searches were conducted on MEDLINE, PsycInfo, and Web of Science. The methodology, development and evaluation details of decision aids were extracted and summarized using narrative synthesis. Results: Overall, 33 papers reporting 22 tools were included in the scoping review. These tools were developed for various health conditions, including musculoskeletal (7/22, 32\%), oncological (8/22, 36\%), and chronic conditions (7/22, 32\%). Most decision tools (17/22, 77\%) were developed in the United States, with the remaining tools originating in the Netherlands, United Kingdom, Canada, and Australia. The number of publications increased, with 73\% (16/22) published since 2015, peaking at 4 publications in 2019. The primary purpose of these tools (20/22, 91\%) was to help patients compare or choose treatments. Adaptive conjoint analysis was the most frequently used design type (10/22, 45\%), followed by conjoint analysis and discrete choice experiments (DCEs; both 4/22, 18\%), modified adaptive conjoint analysis (3/22, 14\%), and adaptive best-worst conjoint analysis (1/22, 5\%). The number of tasks varied depending on the design (6-12 for DCEs and adaptive conjoint vs 16-20 for conjoint analysis designs). Sawtooth software was commonly used (14/22, 64\%) to embed choice tasks. Four proof-of-concept embedding methods were identified: scenario analysis, known preference phenotypes, Bayesian collaborative filtering, and penalized multinomial logit model. After completing the choice tasks patients received tailored information, 73\% (16/22) of tools provided attribute importance scores, and 23\% (5/22) presented a ``best match'' treatment ranking. To convey probabilistic attributes, most tools (13/22, 59\%) used a combination of approaches, including percentages, natural frequencies, icon arrays, narratives, and videos. The tools were evaluated across diverse study designs (randomized controlled trials, mixed methods, and cohort studies), with sample sizes ranging from 23 to 743 participants. Over 40 different outcomes were included in the evaluations, with the decisional conflict scale being the most frequently used in 6 tools. Conclusions: This scoping review provides an overview of how choice experiments are embedded into decision tools. It highlights the lack of established best practices for embedding methods, with only 4 proof-of-concept methods identified. Furthermore, the review reveals a lack of consensus on outcome measures, emphasizing the need for standardized outcome selection for future evaluations. ", doi="10.2196/59209", url="https://www.jmir.org/2025/1/e59209" } @Article{info:doi/10.2196/60215, author="Amagai, Saki and Kaat, J. Aaron and Fox, S. Rina and Ho, H. Emily and Pila, Sarah and Kallen, A. Michael and Schalet, D. Benjamin and Nowinski, J. Cindy and Gershon, C. Richard", title="Customizing Computerized Adaptive Test Stopping Rules for Clinical Settings Using the Negative Affect Subdomain of the NIH Toolbox Emotion Battery: Simulation Study", journal="JMIR Form Res", year="2025", month="Mar", day="21", volume="9", pages="e60215", keywords="computerized adaptive testing", keywords="CAT", keywords="stopping rules", keywords="NIH Toolbox", keywords="reliability", keywords="test burden", keywords="clinical setting", keywords="patient-reported outcome", keywords="clinician", abstract="Background: Patient-reported outcome measures are crucial for informed medical decisions and evaluating treatments. However, they can be burdensome for patients and sometimes lack the reliability clinicians need for clear clinical interpretations. Objective: We aimed to assess the extent to which applying alternative stopping rules can increase reliability for clinical use while minimizing the burden of computerized adaptive tests (CATs). Methods: CAT simulations were conducted on 3 adult item banks in the NIH Toolbox for Assessment of Neurological and Behavioral Function Emotion Battery; the item banks were in the Negative Affect subdomain (ie, Anger Affect, Fear Affect, and Sadness) and contained at least 8 items. In the originally applied NIH Toolbox CAT stopping rules, the CAT was stopped if the score SE reached <0.3 before 12 items were administered. We first contrasted this with a SE-change rule in a planned simulation analysis. We then contrasted the original rules with fixed-length CATs (4?12 items), a reduction of the maximum number of items to 8, and other modifications in post hoc analyses. Burden was measured by the number of items administered per simulation, precision by the percentage of assessments yielding reliability cutoffs (0.85, 0.90, and 0.95), and accurate score recovery by the root mean squared error between the generating $\theta$ and the CAT-estimated ``expected a posteriori''--based $\theta$. Results: In general, relative to the original rules, the alternative stopping rules slightly decreased burden while also increasing the proportion of assessments achieving high reliability for the adult banks; however, the SE-change rule and fixed-length CATs with 8 or fewer items also notably increased assessments yielding reliability <0.85. Among the alternative rules explored, the reduced maximum stopping rule best balanced precision and parsimony, presenting another option beyond the original rules. Conclusions: Our findings demonstrate the challenges in attempting to reduce test burden while also achieving score precision for clinical use. Stopping rules should be modified in accordance with the context of the study population and the purpose of the study. ", doi="10.2196/60215", url="https://formative.jmir.org/2025/1/e60215" } @Article{info:doi/10.2196/67774, author="Vanhala, Ville and Surakka, Outi and Multisilta, Vilma and Lundsby Johansen, Mette and Villinger, Jonas and Nicolle, Emmanuelle and Heikkil{\"a}, Johanna and Korhonen, Pentti", title="Efficiency Improvement of the Clinical Pathway in Cardiac Monitor Insertion and Follow-Up: Retrospective Analysis", journal="JMIR Cardio", year="2025", month="Mar", day="21", volume="9", pages="e67774", keywords="insertable cardiac monitor", keywords="clinical pathway", keywords="nurse-led service", keywords="task shifting", keywords="efficiency improvement", keywords="remote monitoring", abstract="Background: The insertable cardiac monitor (ICM) clinical pathway in Tampere Heart Hospital, Finland, did not correspond to the diagnostic needs of the population. There has been growing evidence of delegating the insertion from cardiologists to specially trained nurses and outsourcing the remote follow-up. However, it is unclear if the change in the clinical pathway is safe and improves efficiency. Objective: We aim to describe and assess the efficiency of the change in the ICM clinical pathway. Methods: Pathway improvements included initiating nurse-performed insertions, relocating the procedure from the catheterization laboratory to a procedure room, and outsourcing part of the remote follow-up to manage ICM workload. Data were collected from electronic health records of all patients who received an ICM in the Tampere Heart Hospital in 2018 and 2020. Follow-up time was 36 months after insertion. Results: The number of inserted ICMs doubled from 74 in 2018 to 159 in 2020. In 2018, cardiologists completed all insertions, while in 2020, a total of 70.4\% (n=112) were completed by nurses. The waiting time from referral to procedure was significantly shorter in 2020 (mean 36, SD 27.7 days) compared with 2018 (mean 49, SD 37.3 days; P=.02). The scheduled ICM procedure time decreased from 60 minutes in 2018 to 45 minutes in 2020. Insertions performed in the catheterization laboratory decreased significantly (n=14, 18.9\% in 2018 and n=3, 1.9\% in 2020; P=<.001). Patients receiving an ICM after syncope increased from 71 to 94 patients. Stroke and transient ischemic attack as an indication increased substantially from 2018 to 2020 (2 and 62 patients, respectively). In 2018, nurses analyzed all remote transmissions. In 2020, the external monitoring service escalated only 11.2\% (204/1817) of the transmissions to the clinic for revision. This saved 296 hours of nursing time in 2020. Having nurses insert ICMs in 2020 saved 48 hours of physicians' time and the shorter scheduling for the procedure saved an additional 40 hours of nursing time compared with the process in 2018. Additionally, the catheterization laboratory was released for other procedures (27 h/y). The complication rate did not change significantly (n=2, 2.7\% in 2018 and n=5, 3.1\% in 2020; P=.85). The 36-month diagnostic yield for syncope remained high in 2018 and 2020 (n=32, 45.1\% and n=36, 38.3\%; P=.38). The diagnostic yield for patients who had stroke with a procedure in 2020 was 43.5\% (n=27). Conclusions: The efficiency of the clinical pathway for patients eligible for an ICM insertation can be increased significantly by shifting to nurse-led insertions in procedure rooms and to the use of an external monitoring and triaging service. ", doi="10.2196/67774", url="https://cardio.jmir.org/2025/1/e67774" } @Article{info:doi/10.2196/58021, author="Oh, Mi-Young and Kim, Hee-Soo and Jung, Mi Young and Lee, Hyung-Chul and Lee, Seung-Bo and Lee, Mi Seung", title="Machine Learning--Based Explainable Automated Nonlinear Computation Scoring System for Health Score and an Application for Prediction of Perioperative Stroke: Retrospective Study", journal="J Med Internet Res", year="2025", month="Mar", day="19", volume="27", pages="e58021", keywords="machine learning", keywords="explainability", keywords="score", keywords="computation scoring system", keywords="Nonlinear computation", keywords="application", keywords="perioperative stroke", keywords="perioperative", keywords="stroke", keywords="efficiency", keywords="ML-based models", keywords="patient", keywords="noncardiac surgery", keywords="noncardiac", keywords="surgery", keywords="effectiveness", keywords="risk tool", keywords="risk", keywords="tool", keywords="real-world data", abstract="Background: Machine learning (ML) has the potential to enhance performance by capturing nonlinear interactions. However, ML-based models have some limitations in terms of interpretability. Objective: This study aimed to develop and validate a more comprehensible and efficient ML-based scoring system using SHapley Additive exPlanations (SHAP) values. Methods: We developed and validated the Explainable Automated nonlinear Computation scoring system for Health (EACH) framework score. We developed a CatBoost-based prediction model, identified key features, and automatically detected the top 5 steepest slope change points based on SHAP plots. Subsequently, we developed a scoring system (EACH) and normalized the score. Finally, the EACH score was used to predict perioperative stroke. We developed the EACH score using data from the Seoul National University Hospital cohort and validated it using data from the Boramae Medical Center, which was geographically and temporally different from the development set. Results: When applied for perioperative stroke prediction among 38,737 patients undergoing noncardiac surgery, the EACH score achieved an area under the curve (AUC) of 0.829 (95\% CI 0.753-0.892). In the external validation, the EACH score demonstrated superior predictive performance with an AUC of 0.784 (95\% CI 0.694-0.871) compared with a traditional score (AUC=0.528, 95\% CI 0.457-0.619) and another ML-based scoring generator (AUC=0.564, 95\% CI 0.516-0.612). Conclusions: The EACH score is a more precise, explainable ML-based risk tool, proven effective in real-world data. The EACH score outperformed traditional scoring system and other prediction models based on different ML techniques in predicting perioperative stroke. ", doi="10.2196/58021", url="https://www.jmir.org/2025/1/e58021" } @Article{info:doi/10.2196/65263, author="Mansoor, Masab and Ibrahim, F. Andrew and Grindem, David and Baig, Asad", title="Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance", journal="JMIRx Med", year="2025", month="Mar", day="19", volume="6", pages="e65263", keywords="natural language processing", keywords="NLP", keywords="machine learning", keywords="ML", keywords="artificial intelligence", keywords="language model", keywords="large language model", keywords="LLM", keywords="generative pretrained transformer", keywords="GPT", keywords="pediatrics", abstract="Background: Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision support but remain understudied in pediatric differential diagnosis. Objective: This study aims to evaluate the diagnostic accuracy and reliability of a fine-tuned GPT-3 model compared to board-certified pediatricians in rural health care settings. Methods: This multicenter retrospective cohort study analyzed 500 pediatric encounters (ages 0?18 years; n=261, 52.2\% female) from rural health care organizations in Central Louisiana between January 2020 and December 2021. The GPT-3 model (DaVinci version) was fine-tuned using the OpenAI application programming interface and trained on 350 encounters, with 150 reserved for testing. Five board-certified pediatricians (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance was assessed using accuracy, sensitivity, specificity, and subgroup analyses. Results: The GPT-3 model achieved an accuracy of 87.3\% (131/150 cases), sensitivity of 85\% (95\% CI 82\%?88\%), and specificity of 90\% (95\% CI 87\%?93\%), comparable to pediatricians' accuracy of 91.3\% (137/150 cases; P=.47). Performance was consistent across age groups (0?5 years: 54/62, 87\%; 6?12 years: 47/53, 89\%; 13?18 years: 30/35, 86\%) and common complaints (fever: 36/39, 92\%; abdominal pain: 20/23, 87\%). For rare diagnoses (n=20), accuracy was slightly lower (16/20, 80\%) but comparable to pediatricians (17/20, 85\%; P=.62). Conclusions: This study demonstrates that a fine-tuned GPT-3 model can provide diagnostic support comparable to pediatricians, particularly for common presentations, in rural health care. Further validation in diverse populations is necessary before clinical implementation. ", doi="10.2196/65263", url="https://xmed.jmir.org/2025/1/e65263" } @Article{info:doi/10.2196/67840, author="Lu, Zhen and Dong, Binhua and Cai, Hongning and Tian, Tian and Wang, Junfeng and Fu, Leiwen and Wang, Bingyi and Zhang, Weijie and Lin, Shaomei and Tuo, Xunyuan and Wang, Juntao and Yang, Tianjie and Huang, Xinxin and Zheng, Zheng and Xue, Huifeng and Xu, Shuxia and Liu, Siyang and Sun, Pengming and Zou, Huachun", title="Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study", journal="JMIR Public Health Surveill", year="2025", month="Mar", day="19", volume="11", pages="e67840", keywords="cervical cancer", keywords="human papillomavirus", keywords="screening", keywords="machine learning", keywords="cervical tumor", keywords="cancer", keywords="carcinoma", keywords="tumor", keywords="malignant", keywords="ML", keywords="phenomapping strategy", keywords="logistic regression", keywords="regression", keywords="population-based", keywords="validation study", keywords="cancer prevention", keywords="validity", keywords="usability", keywords="algorithm", keywords="surveillance", keywords="electronic health record", keywords="EHR", abstract="Background: Cervical cancer remains a major global health issue. Personalized, data-driven cervical cancer prevention (CCP) strategies tailored to phenotypic profiles may improve prevention and reduce disease burden. Objective: This study aimed to identify subgroups with differential cervical precancer or cancer risks using machine learning, validate subgroup predictions across datasets, and propose a computational phenomapping strategy to enhance global CCP efforts. Methods: We explored the data-driven CCP subgroups by applying unsupervised machine learning to a deeply phenotyped, population-based discovery cohort. We extracted CCP-specific risks of cervical intraepithelial neoplasia (CIN) and cervical cancer through weighted logistic regression analyses providing odds ratio (OR) estimates and 95\% CIs. We trained a supervised machine learning model and developed pathways to classify individuals before evaluating its diagnostic validity and usability on an external cohort. Results: This study included 551,934 women (median age, 49 years) in the discovery cohort and 47,130 women (median age, 37 years) in the external cohort. Phenotyping identified 5 CCP subgroups, with CCP4 showing the highest carcinoma prevalence. CCP2--4 had significantly higher risks of CIN2+ (CCP2: OR 2.07 [95\% CI: 2.03?2.12], CCP3: 3.88 [3.78?3.97], and CCP4: 4.47 [4.33?4.63]) and CIN3+ (CCP2: 2.10 [2.05?2.14], CCP3: 3.92 [3.82?4.02], and CCP4: 4.45 [4.31?4.61]) compared to CCP1 (P<.001), consistent with the direction of results observed in the external cohort. The proposed triple strategy was validated as clinically relevant, prioritizing high-risk subgroups (CCP3-4) for colposcopies and scaling human papillomavirus screening for CCP1-2. Conclusions: This study underscores the potential of leveraging machine learning algorithms and large-scale routine electronic health records to enhance CCP strategies. By identifying key determinants of CIN2+/CIN3+ risk and classifying 5 distinct subgroups, our study provides a robust, data-driven foundation for the proposed triple strategy. This approach prioritizes tailored prevention efforts for subgroups with varying risks, offering a novel and scalable tool to complement existing cervical cancer screening guidelines. Future work should focus on independent external and prospective validation to maximize the global impact of this strategy. ", doi="10.2196/67840", url="https://publichealth.jmir.org/2025/1/e67840" } @Article{info:doi/10.2196/66598, author="Rountree, Lillian and Lin, Yi-Ting and Liu, Chuyu and Salvatore, Maxwell and Admon, Andrew and Nallamothu, Brahmajee and Singh, Karandeep and Basu, Anirban and Bu, Fan and Mukherjee, Bhramar", title="Reporting of Fairness Metrics in Clinical Risk Prediction Models Used for Precision Health: Scoping Review", journal="Online J Public Health Inform", year="2025", month="Mar", day="19", volume="17", pages="e66598", keywords="bias", keywords="cardiovascular disease", keywords="COVID-19", keywords="risk stratification", keywords="sensitive features", keywords="clinical risk prediction", keywords="equity", abstract="Background: Clinical risk prediction models integrated into digitized health care informatics systems hold promise for personalized primary prevention and care, a core goal of precision health. Fairness metrics are important tools for evaluating potential disparities across sensitive features, such as sex and race or ethnicity, in the field of prediction modeling. However, fairness metric usage in clinical risk prediction models remains infrequent, sporadic, and rarely empirically evaluated. Objective: We seek to assess the uptake of fairness metrics in clinical risk prediction modeling through an empirical evaluation of popular prediction models for 2 diseases, 1 chronic and 1 infectious disease. Methods: We conducted a scoping literature review in November 2023 of recent high-impact publications on clinical risk prediction models for cardiovascular disease (CVD) and COVID-19 using Google Scholar. Results: Our review resulted in a shortlist of 23 CVD-focused articles and 22 COVID-19 pandemic--focused articles. No articles evaluated fairness metrics. Of the CVD-focused articles, 26\% used a sex-stratified model, and of those with race or ethnicity data, 92\% had study populations that were more than 50\% from 1 race or ethnicity. Of the COVID-19 models, 9\% used a sex-stratified model, and of those that included race or ethnicity data, 50\% had study populations that were more than 50\% from 1 race or ethnicity. No articles for either disease stratified their models by race or ethnicity. Conclusions: Our review shows that the use of fairness metrics for evaluating differences across sensitive features is rare, despite their ability to identify inequality and flag potential gaps in prevention and care. We also find that training data remain largely racially and ethnically homogeneous, demonstrating an urgent need for diversifying study cohorts and data collection. We propose an implementation framework to initiate change, calling for better connections between theory and practice when it comes to the adoption of fairness metrics for clinical risk prediction. We hypothesize that this integration will lead to a more equitable prediction world. ", doi="10.2196/66598", url="https://ojphi.jmir.org/2025/1/e66598", url="http://www.ncbi.nlm.nih.gov/pubmed/39962044" } @Article{info:doi/10.2196/66568, author="Cho, Nam-Jun and Jeong, Inyong and Ahn, Se-Jin and Gil, Hyo-Wook and Kim, Yeongmin and Park, Jin-Hyun and Kang, Sanghee and Lee, Hwamin", title="Machine Learning to Assist in Managing Acute Kidney Injury in General Wards: Multicenter Retrospective Study", journal="J Med Internet Res", year="2025", month="Mar", day="18", volume="27", pages="e66568", keywords="acute kidney injury", keywords="machine learning", keywords="recovery of function", keywords="creatinine", keywords="kidney", keywords="patient rooms", abstract="Background: Most artificial intelligence--based research on acute kidney injury (AKI) prediction has focused on intensive care unit settings, limiting their generalizability to general wards. The lack of standardized AKI definitions and reliance on intensive care units further hinder the clinical applicability of these models. Objective: This study aims to develop and validate a machine learning--based framework to assist in managing AKI and acute kidney disease (AKD) in general ward patients, using a refined operational definition of AKI to improve predictive performance and clinical relevance. Methods: This retrospective multicenter cohort study analyzed electronic health record data from 3 hospitals in South Korea. AKI and AKD were defined using a refined version of the Kidney Disease: Improving Global Outcomes criteria, which included adjustments to baseline serum creatinine estimation and a stricter minimum increase threshold to reduce misclassification due to transient fluctuations. The primary outcome was the development of machine learning models for early prediction of AKI (within 3 days before onset) and AKD (nonrecovery within 7 days after AKI). Results: The final analysis included 135,068 patients. A total of 7658 (8\%) patients in the internal cohort and 2898 (7.3\%) patients in the external cohort developed AKI. Among the 5429 patients in the internal cohort and 1998 patients in the external cohort for whom AKD progression could be assessed, 896 (16.5\%) patients and 287 (14.4\%) patients, respectively, progressed to AKD. Using the refined criteria, 2898 cases of AKI were identified, whereas applying the standard Kidney Disease: Improving Global Outcomes criteria resulted in the identification of 5407 cases. Among the 2509 patients who were not classified as having AKI under the refined criteria, 2242 had a baseline serum creatinine level below 0.6 mg/dL, while the remaining 267 experienced a decrease in serum creatinine before the onset of AKI. The final selected early prediction model for AKI achieved an area under the receiver operating characteristic curve of 0.9053 in the internal cohort and 0.8860 in the external cohort. The early prediction model for AKD achieved an area under the receiver operating characteristic curve of 0.8202 in the internal cohort and 0.7833 in the external cohort. Conclusions: The proposed machine learning framework successfully predicted AKI and AKD in general ward patients with high accuracy. The refined AKI definition significantly reduced the classification of patients with transient serum creatinine fluctuations as AKI cases compared to the previous criteria. These findings suggest that integrating this machine learning framework into hospital workflows could enable earlier interventions, optimize resource allocation, and improve patient outcomes. ", doi="10.2196/66568", url="https://www.jmir.org/2025/1/e66568" } @Article{info:doi/10.2196/57358, author="Hama, Tuankasfee and Alsaleh, M. Mohanad and Allery, Freya and Choi, Won Jung and Tomlinson, Christopher and Wu, Honghan and Lai, Alvina and Pontikos, Nikolas and Thygesen, H. Johan", title="Enhancing Patient Outcome Prediction Through Deep Learning With Sequential Diagnosis Codes From Structured Electronic Health Record Data: Systematic Review", journal="J Med Internet Res", year="2025", month="Mar", day="18", volume="27", pages="e57358", keywords="deep learning", keywords="electronic health records", keywords="EHR", keywords="diagnosis codes", keywords="prediction", keywords="patient outcomes", keywords="systematic review", abstract="Background: The use of structured electronic health records in health care systems has grown rapidly. These systems collect huge amounts of patient information, including diagnosis codes representing temporal medical history. Sequential diagnostic information has proven valuable for predicting patient outcomes. However, the extent to which these types of data have been incorporated into deep learning (DL) models has not been examined. Objective: This systematic review aims to describe the use of sequential diagnostic data in DL models, specifically to understand how these data are integrated, whether sample size improves performance, and whether the identified models are generalizable. Methods: Relevant studies published up to May 15, 2023, were identified using 4 databases: PubMed, Embase, IEEE Xplore, and Web of Science. We included all studies using DL algorithms trained on sequential diagnosis codes to predict patient outcomes. We excluded review articles and non--peer-reviewed papers. We evaluated the following aspects in the included papers: DL techniques, characteristics of the dataset, prediction tasks, performance evaluation, generalizability, and explainability. We also assessed the risk of bias and applicability of the studies using the Prediction Model Study Risk of Bias Assessment Tool (PROBAST). We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist to report our findings. Results: Of the 740 identified papers, 84 (11.4\%) met the eligibility criteria. Publications in this area increased yearly. Recurrent neural networks (and their derivatives; 47/84, 56\%) and transformers (22/84, 26\%) were the most commonly used architectures in DL-based models. Most studies (45/84, 54\%) presented their input features as sequences of visit embeddings. Medications (38/84, 45\%) were the most common additional feature. Of the 128 predictive outcome tasks, the most frequent was next-visit diagnosis (n=30, 23\%), followed by heart failure (n=18, 14\%) and mortality (n=17, 13\%). Only 7 (8\%) of the 84 studies evaluated their models in terms of generalizability. A positive correlation was observed between training sample size and model performance (area under the receiver operating characteristic curve; P=.02). However, 59 (70\%) of the 84 studies had a high risk of bias. Conclusions: The application of DL for advanced modeling of sequential medical codes has demonstrated remarkable promise in predicting patient outcomes. The main limitation of this study was the heterogeneity of methods and outcomes. However, our analysis found that using multiple types of features, integrating time intervals, and including larger sample sizes were generally related to an improved predictive performance. This review also highlights that very few studies (7/84, 8\%) reported on challenges related to generalizability and less than half (38/84, 45\%) of the studies reported on challenges related to explainability. Addressing these shortcomings will be instrumental in unlocking the full potential of DL for enhancing health care outcomes and patient care. Trial Registration: PROSPERO CRD42018112161; https://tinyurl.com/yc6h9rwu ", doi="10.2196/57358", url="https://www.jmir.org/2025/1/e57358" } @Article{info:doi/10.2196/65794, author="Fedele, A. David and Ray, M. Jessica and Mallela, L. Jaya and Bian, Jiang and Chen, Aokun and Qin, Xiao and Salloum, G. Ramzi and Kelly, Maria and Gurka, J. Matthew and Hollenbach, Jessica", title="Development of a Clinical Decision Support Tool to Implement Asthma Management Guidelines in Pediatric Primary Care: Qualitative Study", journal="JMIR Form Res", year="2025", month="Mar", day="18", volume="9", pages="e65794", keywords="clinical decision support", keywords="asthma", keywords="primary care", keywords="guidelines", keywords="pediatric", keywords="asthma care", keywords="morbidity", keywords="health information technology", keywords="electronic health record", keywords="EHR", keywords="user-centered design", keywords="inductive approach", keywords="digital health", keywords="health technology", abstract="Background: There is a longstanding gap between national asthma guidelines and their implementation in primary care. Primary care providers (PCPs) endorse numerous provider and practice or clinic-related barriers to providing guidelines-based asthma care. To reduce asthma morbidity in primary care, PCPs need access to tools that facilitate adherence to national guidelines, which can be delivered at the point of care, are minimally burdensome, and fit within the clinic workflow. Clinical decision support (CDS) tools are health IT systems that can be housed in the electronic health record (EHR) system. Objective: This study aimed to follow user-centered design principles and describe the formative qualitative work with target stakeholders (ie, PCPs and IT professionals) to inform our design of an EHR-embedded CDS tool that adheres to recent, significant changes in asthma management guidelines. Methods: Purposive sampling was used to recruit three separate subgroups of professionals (n=15) between (1) PCPs with previous experience using a paper-based CDS tool for asthma management, (2) PCPs without previous experience using CDS tools for asthma management, and (3) health care IT professionals. The PCP interview guide focused on their practice, familiarity with national asthma guidelines, and how a CDS tool embedded in the EHR might help them provide guideline-based care. The health care IT professional guide included questions on the design and implementation processes of CDS tools into the EHR. Qualitative data were audio-recorded, transcribed, and then analyzed using an inductive approach to develop themes. Results: Themes were organized into 2 domains, current practice and CDS tool development. The themes that emerged from PCPs included descriptions of assessments conducted to make an asthma diagnosis, previous attempts or opportunities to implement updated national asthma guidelines, and how a CDS tool could be implemented using the EHR and fit into the current asthma management workflow. The themes that emerged from health care IT professionals included processes used to design CDS tools and strategies to collect evidence that indicated a tool's value to a practice and the broader health system. Conclusions: In this study, user-centered design principles were used to guide a qualitative study on perceived barriers and facilitators to a primary care--based, EHR-integrated asthma CDS tool. PCPs expressed their interest in adopting an asthma CDS tool that was low burden and efficient but could help them adhere to national asthma guidelines and improve clinic workflow. Similarly, health care IT professionals perceived an asthma CDS tool to be useful, if it adhered to EHR design standards. Implementation of a CDS tool to improve adherence of PCPs to recently updated national asthma guidelines could be beneficial in reducing pediatric asthma morbidity. ", doi="10.2196/65794", url="https://formative.jmir.org/2025/1/e65794" } @Article{info:doi/10.2196/60424, author="Dol{\'o}n-Poza, Mar{\'i}a and Gabald{\'o}n-P{\'e}rez, Ana-Marta and Berrezueta-Guzman, Santiago and L{\'o}pez Gracia, David and Mart{\'i}n-Ruiz, Mar{\'i}a-Luisa and Pau De La Cruz, Iv{\'a}n", title="Enhancing Early Language Disorder Detection in Preschools: Evaluation and Future Directions for the Gades Platform", journal="JMIR Hum Factors", year="2025", month="Mar", day="14", volume="12", pages="e60424", keywords="developmental language disorder", keywords="simple language delay", keywords="adaptive screening system", keywords="early childhood education", keywords="pervasive therapy", abstract="Background: Language acquisition is a critical developmental milestone, with notable variability during the first 4 years of life. Developmental language disorder (DLD) often overlaps with other neurodevelopmental disorders or simple language delay (SLD), making early detection challenging, especially for primary caregivers. Objective: We aimed to evaluate the effectiveness of the Gades platform, an adaptive screening tool that enables preschool teachers to identify potential language disorders without direct support from nursery school language therapists (NSLTs). Methods: The study took place in a nursery school and an early childhood educational and psychopedagogical center in Madrid, Spain, involving 218 children aged 6 to 36 months, 24 preschool teachers, and 2 NSLTs. Initially, NSLTs conducted informational sessions to familiarize teachers with DLDs and how to identify them. Following this, the teachers used the Gades platform to conduct language screenings independently, without ongoing support from NSLTs. The Gades platform was enhanced to collect detailed profiles of each child and implemented an adaptive screening model tailored to account for variability in language development. This setup allowed preschool teachers, who are not language experts, to observe and assess language development effectively in natural, unsupervised educational environments. The study assessed the platform's utility in guiding teachers through these observations and its effectiveness in such settings. Results: Gades identified language difficulties in 19.7\% (43/218) of the children, with a higher prevalence in boys (29/218, 13.3\%) than in girls (14/218, 6.4\%). These challenges were most frequently observed in children aged 15 to 27 months. The platform demonstrated a high accuracy rate of 97.41\%, with evaluators largely agreeing with its recommendations. Teachers also found Gades to be user friendly and a valuable tool for supporting language development observations in everyday educational settings. Conclusions: Gades demonstrates potential as a reliable and accessible tool for early detection of language disorders, empowering educators to identify DLD and SLD in the absence of NSLTs. However, further refinement of the platform is required to effectively differentiate between DLD and SLD. By integrating Gades into routine preschool assessments, educators can facilitate timely interventions, bridging gaps in early childhood education and therapy. Trial Registration: Pan-African Clinical Trial Registry (PACTR) PACTR202210657553944; https://pactr.samrc.ac.za/TrialDisplay.aspx?TrialID=24051 ", doi="10.2196/60424", url="https://humanfactors.jmir.org/2025/1/e60424" } @Article{info:doi/10.2196/67239, author="Tzeng, Jing-Tong and Li, Jeng-Lin and Chen, Huan-Yu and Huang, Chu-Hsiang and Chen, Chi-Hsin and Fan, Cheng-Yi and Huang, Pei-Chuan Edward and Lee, Chi-Chun", title="Improving the Robustness and Clinical Applicability of Automatic Respiratory Sound Classification Using Deep Learning--Based Audio Enhancement: Algorithm Development and Validation", journal="JMIR AI", year="2025", month="Mar", day="13", volume="4", pages="e67239", keywords="respiratory sound", keywords="lung sound", keywords="audio enhancement", keywords="noise robustness", keywords="clinical applicability", keywords="artificial intelligence", keywords="AI", abstract="Background: Deep learning techniques have shown promising results in the automatic classification of respiratory sounds. However, accurately distinguishing these sounds in real-world noisy conditions poses challenges for clinical deployment. In addition, predicting signals with only background noise could undermine user trust in the system. Objective: This study aimed to investigate the feasibility and effectiveness of incorporating a deep learning--based audio enhancement preprocessing step into automatic respiratory sound classification systems to improve robustness and clinical applicability. Methods: We conducted extensive experiments using various audio enhancement model architectures, including time-domain and time-frequency--domain approaches, in combination with multiple classification models to evaluate the effectiveness of the audio enhancement module in an automatic respiratory sound classification system. The classification performance was compared against the baseline noise injection data augmentation method. These experiments were carried out on 2 datasets: the International Conference in Biomedical and Health Informatics (ICBHI) respiratory sound dataset, which contains 5.5 hours of recordings, and the Formosa Archive of Breath Sound dataset, which comprises 14.6 hours of recordings. Furthermore, a physician validation study involving 7 senior physicians was conducted to assess the clinical utility of the system. Results: The integration of the audio enhancement module resulted in a 21.88\% increase with P<.001 in the ICBHI classification score on the ICBHI dataset and a 4.1\% improvement with P<.001 on the Formosa Archive of Breath Sound dataset in multi-class noisy scenarios. Quantitative analysis from the physician validation study revealed improvements in efficiency, diagnostic confidence, and trust during model-assisted diagnosis, with workflows that integrated enhanced audio leading to an 11.61\% increase in diagnostic sensitivity and facilitating high-confidence diagnoses. Conclusions: Incorporating an audio enhancement algorithm significantly enhances the robustness and clinical utility of automatic respiratory sound classification systems, improving performance in noisy environments and fostering greater trust among medical professionals. ", doi="10.2196/67239", url="https://ai.jmir.org/2025/1/e67239" } @Article{info:doi/10.2196/65001, author="Huang, Tracy and Ngan, Chun-Kit and Cheung, Ting Yin and Marcotte, Madelyn and Cabrera, Benjamin", title="A Hybrid Deep Learning--Based Feature Selection Approach for Supporting Early Detection of Long-Term Behavioral Outcomes in Survivors of Cancer: Cross-Sectional Study", journal="JMIR Bioinform Biotech", year="2025", month="Mar", day="13", volume="6", pages="e65001", keywords="machine learning", keywords="data driven", keywords="clinical domain--guided framework", keywords="survivors of cancer", keywords="cancer", keywords="oncology", keywords="behavioral outcome predictions", keywords="behavioral study", keywords="behavioral outcomes", keywords="feature selection", keywords="deep learning", keywords="neural network", keywords="hybrid", keywords="prediction", keywords="predictive modeling", keywords="patients with cancer", keywords="deep learning models", keywords="leukemia", keywords="computational study", keywords="computational biology", abstract="Background: The number of survivors of cancer is growing, and they often experience negative long-term behavioral outcomes due to cancer treatments. There is a need for better computational methods to handle and predict these outcomes so that physicians and health care providers can implement preventive treatments. Objective: This study aimed to create a new feature selection algorithm to improve the performance of machine learning classifiers to predict negative long-term behavioral outcomes in survivors of cancer. Methods: We devised a hybrid deep learning--based feature selection approach to support early detection of negative long-term behavioral outcomes in survivors of cancer. Within a data-driven, clinical domain--guided framework to select the best set of features among cancer treatments, chronic health conditions, and socioenvironmental factors, we developed a 2-stage feature selection algorithm, that is, a multimetric, majority-voting filter and a deep dropout neural network, to dynamically and automatically select the best set of features for each behavioral outcome. We also conducted an experimental case study on existing study data with 102 survivors of acute lymphoblastic leukemia (aged 15-39 years at evaluation and >5 years postcancer diagnosis) who were treated in a public hospital in Hong Kong. Finally, we designed and implemented radial charts to illustrate the significance of the selected features on each behavioral outcome to support clinical professionals' future treatment and diagnoses. Results: In this pilot study, we demonstrated that our approach outperforms the traditional statistical and computation methods, including linear and nonlinear feature selectors, for the addressed top-priority behavioral outcomes. Our approach holistically has higher F1, precision, and recall scores compared to existing feature selection methods. The models in this study select several significant clinical and socioenvironmental variables as risk factors associated with the development of behavioral problems in young survivors of acute lymphoblastic leukemia. Conclusions: Our novel feature selection algorithm has the potential to improve machine learning classifiers' capability to predict adverse long-term behavioral outcomes in survivors of cancer. ", doi="10.2196/65001", url="https://bioinform.jmir.org/2025/1/e65001", url="http://www.ncbi.nlm.nih.gov/pubmed/40080820" } @Article{info:doi/10.2196/60831, author="Gao, Yu and Magin, Parker and Tapley, Amanda and Holliday, Elizabeth and Dizon, Jason and Fisher, Katie and van Driel, Mieke and Davis, S. Joshua and Davey, Andrew and Ralston, Anna and Fielding, Alison and Moad, Dominica and Mulquiney, Katie and Clarke, Lisa and Turner, Alexandria", title="Prevalence of Antibiotic Prescribing for Acute Respiratory Tract Infection in Telehealth Versus Face-to-Face Consultations: Cross-Sectional Analysis of General Practice Registrars' Clinical Practice", journal="J Med Internet Res", year="2025", month="Mar", day="13", volume="27", pages="e60831", keywords="antimicrobial resistance", keywords="antibiotics stewardship", keywords="telehealth", keywords="general practice", keywords="registrars", keywords="acute respiratory tract infection", keywords="antibiotics", keywords="prescription", keywords="respiratory tract infection", keywords="RTIs", keywords="Australia", keywords="consultations", keywords="teleconsultation", keywords="teleconsult", keywords="bronchitis", keywords="sore throat", keywords="acute otitis", keywords="sinusitis", keywords="in-consultation", keywords="upper respiratory tract infection", abstract="Background: Antimicrobial resistance is a global threat. Australia has high antibiotic prescribing rates with the majority of antibiotics prescribed by general practitioners (GPs) for self-limiting acute respiratory tract infection (ARTIs). Australian GP trainees' (registrars') prescribing for ARTIs may have been affected by the introduction of remunerated telehealth consultations in 2020. Understanding of the impact of telehealth on antibiotic stewardship may inform registrar educational programs. Objective: This study aimed to compare the prevalence of antibiotic prescribing by GP registrars in telehealth versus face-to-face (F2F) consultations for common cold (upper respiratory tract infection [URTI]), bronchitis, sore throat, acute otitis media, and sinusitis. Methods: A cross-sectional analysis of data from the Registrar Clinical Encounters in Training (ReCEnT) study, a multicenter inception cohort study of registrars' in-consultation clinical and educational experiences. Analysis used univariable and multivariable logistic regression using 2020-2023 ReCEnT data. The outcome variable was ``antibiotic prescribed'' for new presentations of URTI, acute sore throat, acute bronchitis, acute sinusitis, and acute otitis media. The study factor was consultation type (telehealth or F2F). Results: A total of 2392 registrars participated (response rate=93.4\%). The proportions of diagnoses that were managed via telehealth were 25\% (5283/21384) overall, 19\% (641/3327) for acute sore throat, 29\% (3733/12773) for URTI, 21\% (364/1772), for acute bronchitis, 4.1\% (72/1758) for acute otitis media, and 27\% (473/1754) for acute sinusitis. Antibiotics were prescribed for 51\% (1685/3327) of sore throat diagnoses, 6.9\% (880/12773) of URTI diagnoses, 64\% (1140/1772) of bronchitis diagnoses, 61\% (1067/1754) of sinusitis diagnoses, and 73\% (1278/1758) of otitis media diagnoses. On multivariable analysis, antibiotics were less often prescribed in telehealth than F2F consultations for sore throat (adjusted odds ratio [OR] 0.69, 95\% CI 0.55-0.86; P=.001), URTI (adjusted OR 0.64, 95\% CI 0.51-0.81; P<.001), and otitis media (adjusted OR 0.47, 95\% CI 0.26-0.84; P=.01). There were no significant differences for acute bronchitis (adjusted OR 1.07, 95\% CI 0.79-1.45; P=.66) or acute sinusitis (adjusted OR 1, 95\% CI 0.76-1.32; P=.99). Conclusions: GP registrars are less likely to prescribe antibiotics for sore throat, URTI, and otitis media when seeing patients by telehealth versus F2F. Understanding the reason for this difference is essential to help guide educational efforts aimed at decreasing antibiotic prescribing by GPs for conditions such as ARTIs where they are of little to no benefit. There was no evidence in this study that telehealth consultations were associated with greater registrar antibiotic prescribing for ARTIs. Therefore, there is no deleterious effect on antibiotic stewardship. ", doi="10.2196/60831", url="https://www.jmir.org/2025/1/e60831" } @Article{info:doi/10.2196/55277, author="Lau, Jerry and Bisht, Shivani and Horton, Robert and Crisan, Annamaria and Jones, John and Gantotti, Sandeep and Hermes-DeSantis, Evelyn", title="Creation of Scientific Response Documents for Addressing Product Medical Information Inquiries: Mixed Method Approach Using Artificial Intelligence", journal="JMIR AI", year="2025", month="Mar", day="13", volume="4", pages="e55277", keywords="AI", keywords="LLM", keywords="GPT", keywords="biopharmaceutical", keywords="medical information", keywords="content generation", keywords="artificial intelligence", keywords="pharmaceutical", keywords="scientific response", keywords="documentation", keywords="information", keywords="clinical data", keywords="strategy", keywords="reference", keywords="feasibility", keywords="development", keywords="machine learning", keywords="large language model", keywords="accuracy", keywords="context", keywords="traceability", keywords="accountability", keywords="survey", keywords="scientific response documentation", keywords="SRD", keywords="benefit", keywords="content generator", keywords="content analysis", keywords="Generative Pre-trained Transformer", abstract="Background: Pharmaceutical manufacturers address health care professionals' information needs through scientific response documents (SRDs), offering evidence-based answers to medication and disease state questions. Medical information departments, staffed by medical experts, develop SRDs that provide concise summaries consisting of relevant background information, search strategies, clinical data, and balanced references. With an escalating demand for SRDs and the increasing complexity of therapies, medical information departments are exploring advanced technologies and artificial intelligence (AI) tools like large language models (LLMs) to streamline content development. While AI and LLMs show promise in generating draft responses, a synergistic approach combining an LLM with traditional machine learning classifiers in a series of human-supervised and -curated steps could help address limitations, including hallucinations. This will ensure accuracy, context, traceability, and accountability in the development of the concise clinical data summaries of an SRD. Objective: This study aims to quantify the challenges of SRD development and develop a framework exploring the feasibility and value addition of integrating AI capabilities in the process of creating concise summaries for an SRD. Methods: To measure the challenges in SRD development, a survey was conducted by phactMI, a nonprofit consortium of medical information leaders in the pharmaceutical industry, assessing aspects of SRD creation among its member companies. The survey collected data on the time and tediousness of various activities related to SRD development. Another working group, consisting of medical information professionals and data scientists, used AI to aid SRD authoring, focusing on data extraction and abstraction. They used logistic regression on semantic embedding features to train classification models and transformer-based summarization pipelines to generate concise summaries. Results: Of the 33 companies surveyed, 64\% (21/33) opened the survey, and 76\% (16/21) of those responded. On average, medical information departments generate 614 new documents and update 1352 documents each year. Respondents considered paraphrasing scientific articles to be the most tedious and time-intensive task. In the project's second phase, sentence classification models showed the ability to accurately distinguish target categories with receiver operating characteristic scores ranging from 0.67 to 0.85 (all P<.001), allowing for accurate data extraction. For data abstraction, the comparison of the bilingual evaluation understudy (BLEU) score and semantic similarity in the paraphrased texts yielded different results among reviewers, with each preferring different trade-offs between these metrics. Conclusions: This study establishes a framework for integrating LLM and machine learning into SRD development, supported by a pharmaceutical company survey emphasizing the challenges of paraphrasing content. While machine learning models show potential for section identification and content usability assessment in data extraction and abstraction, further optimization and research are essential before full-scale industry implementation. The working group's insights guide an AI-driven content analysis; address limitations; and advance efficient, precise, and responsive frameworks to assist with pharmaceutical SRD development. ", doi="10.2196/55277", url="https://ai.jmir.org/2025/1/e55277" } @Article{info:doi/10.2196/64354, author="Ehrig, Molly and Bullock, S. Garrett and Leng, Iris Xiaoyan and Pajewski, M. Nicholas and Speiser, Lynn Jaime", title="Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data", journal="JMIR Med Inform", year="2025", month="Mar", day="13", volume="13", pages="e64354", keywords="missing indicator method", keywords="missing data", keywords="imputation", keywords="longitudinal data", keywords="electronic health record data", keywords="electronic health records", keywords="EHR", keywords="simulation study", keywords="clinical prediction model", keywords="prediction model", keywords="older adults", keywords="falls", keywords="logistic regression", keywords="prediction modeling", abstract="Background: Missing data in electronic health records are highly prevalent and result in analytical concerns such as heterogeneous sources of bias and loss of statistical power. One simple analytic method for addressing missing or unknown covariate values is to treat missingness for a particular variable as a category onto itself, which we refer to as the missing indicator method. For cross-sectional analyses, recent work suggested that there was minimal benefit to the missing indicator method; however, it is unclear how this approach performs in the setting of longitudinal data, in which correlation among clustered repeated measures may be leveraged for potentially improved model performance. Objectives: This study aims to conduct a simulation study to evaluate whether the missing indicator method improved model performance and imputation accuracy for longitudinal data mimicking an application of developing a clinical prediction model for falls in older adults based on electronic health record data. Methods: We simulated a longitudinal binary outcome using mixed effects logistic regression that emulated a falls assessment at annual follow-up visits. Using multivariate imputation by chained equations, we simulated time-invariant predictors such as sex and medical history, as well as dynamic predictors such as physical function, BMI, and medication use. We induced missing data in predictors under scenarios that had both random (missing at random) and dependent missingness (missing not at random). We evaluated aggregate performance using the area under the receiver operating characteristic curve (AUROC) for models with and with no missing indicators as predictors, as well as complete case analysis, across simulation replicates. We evaluated imputation quality using normalized root-mean-square error for continuous variables and percent falsely classified for categorical variables. Results: Independent of the mechanism used to simulate missing data (missing at random or missing not at random), overall model performance via AUROC was similar regardless of whether missing indicators were included in the model. The root-mean-square error and percent falsely classified measures were similar for models including missing indicators versus those with no missing indicators. Model performance and imputation quality were similar regardless of whether the outcome was related to missingness. Imputation with or with no missing indicators had similar mean values of AUROC compared with complete case analysis, although complete case analysis had the largest range of values. Conclusions: The results of this study suggest that the inclusion of missing indicators in longitudinal data modeling neither improves nor worsens overall performance or imputation accuracy. Future research is needed to address whether the inclusion of missing indicators is useful in prediction modeling with longitudinal data in different settings, such as high dimensional data analysis. ", doi="10.2196/64354", url="https://medinform.jmir.org/2025/1/e64354" } @Article{info:doi/10.2196/68442, author="Cheng, Yinlin and Gu, Kuiying and Ji, Weidong and Hu, Zhensheng and Yang, Yining and Zhou, Yi", title="Two-Year Hypertension Incidence Risk Prediction in Populations in the Desert Regions of Northwest China: Prospective Cohort Study", journal="J Med Internet Res", year="2025", month="Mar", day="12", volume="27", pages="e68442", keywords="hypertension", keywords="desert", keywords="machine learning", keywords="deep learning", keywords="prevention", keywords="clinical applicability", abstract="Background: Hypertension is a major global health issue and a significant modifiable risk factor for cardiovascular diseases, contributing to a substantial socioeconomic burden due to its high prevalence. In China, particularly among populations living near desert regions, hypertension is even more prevalent due to unique environmental and lifestyle conditions, exacerbating the disease burden in these areas, underscoring the urgent need for effective early detection and intervention strategies. Objective: This study aims to develop, calibrate, and prospectively validate a 2-year hypertension risk prediction model by using large-scale health examination data collected from populations residing in 4 regions surrounding the Taklamakan Desert of northwest China. Methods: We retrospectively analyzed the health examination data of 1,038,170 adults (2019-2021) and prospectively validated our findings in a separate cohort of 961,519 adults (2021-2023). Data included demographics, lifestyle factors, physical examinations, and laboratory measurements. Feature selection was performed using light gradient-boosting machine--based recursive feature elimination with cross-validation and Least Absolute Shrinkage and Selection Operator, yielding 24 key predictors. Multiple machine learning (logistic regression, random forest, extreme gradient boosting, light gradient-boosting machine) and deep learning (Feature Tokenizer + Transformer, SAINT) models were trained with Bayesian hyperparameter optimization. Results: Over a 2-year follow-up, 15.20\% (157,766/1,038,170) of the participants in the retrospective cohort and 10.50\% (101,077/961,519) in the prospective cohort developed hypertension. Among the models developed, the CatBoost model demonstrated the best performance, achieving area under the curve (AUC) values of 0.888 (95\% CI 0.886-0.889) in the retrospective cohort and 0.803 (95\% CI 0.801-0.804) in the prospective cohort. Calibration via isotonic regression improved the model's probability estimates, with Brier scores of 0.090 (95\% CI 0.089-0.091) and 0.102 (95\% CI 0.101-0.103) in the internal validation and prospective cohorts, respectively. Participants were ranked by the positive predictive value calculated using the calibrated model and stratified into 4 risk categories (low, medium, high, and very high), with the very high group exhibiting a 41.08\% (5741/13,975) hypertension incidence over 2 years. Age, BMI, and socioeconomic factors were identified as significant predictors of hypertension. Conclusions: Our machine learning model effectively predicted the 2-year risk of hypertension, making it particularly suitable for preventive health care management in high-risk populations residing in the desert regions of China. Our model exhibited excellent predictive performance and has potential for clinical application. A web-based application was developed based on our predictive model, which further enhanced the accessibility for clinical and public health use, aiding in reducing the burden of hypertension through timely prevention strategies. ", doi="10.2196/68442", url="https://www.jmir.org/2025/1/e68442", url="http://www.ncbi.nlm.nih.gov/pubmed/40072485" } @Article{info:doi/10.2196/59377, author="Gao, Jing and Jie, Xu and Yao, Yujun and Xue, Jingdong and Chen, Lei and Chen, Ruiyao and Chen, Jiayuan and Cheng, Weiwei", title="Fetal Birth Weight Prediction in the Third Trimester: Retrospective Cohort Study and Development of an Ensemble Model", journal="JMIR Pediatr Parent", year="2025", month="Mar", day="10", volume="8", pages="e59377", keywords="fetal birthweight", keywords="ensemble learning model", keywords="machine learning", keywords="prediction model", keywords="ultrasonography", keywords="macrosomia", keywords="low birth weight", keywords="birth weight", keywords="fetal", keywords="AI", keywords="artificial intelligence", keywords="prenatal", keywords="prenatal care", keywords="Shanghai", keywords="neonatal", keywords="maternal", keywords="parental", abstract="Background: Accurate third-trimester birth weight prediction is vital for reducing adverse outcomes, and machine learning (ML) offers superior precision over traditional ultrasound methods. Objective: This study aims to develop an ML model on the basis of clinical big data for accurate prediction of birth weight in the third trimester of pregnancy, which can help reduce adverse maternal and fetal outcomes. Methods: From January 1, 2018 to December 31, 2019, a retrospective cohort study involving 16,655 singleton live births without congenital anomalies (>28 weeks of gestation) was conducted in a tertiary first-class hospital in Shanghai. The initial set of data was divided into a train set for algorithm development and a test set on which the algorithm was divided in a ratio of 4:1. We extracted maternal and neonatal delivery outcomes, as well as parental demographics, obstetric clinical data, and sonographic fetal biometry, from electronic medical records. A total of 5 basic ML algorithms, including Ridge, SVM, Random Forest, extreme gradient boosting (XGBoost), and Multi-Layer Perceptron, were used to develop the prediction model, which was then averaged into an ensemble learning model. The models were compared using accuracy, mean squared error, root mean squared error, and mean absolute error. International Peace Maternity and Child Health Hospital's Research Ethics Committee granted ethical approval for the usage of patient information (GKLW2021-20). Results: Train and test sets contained a total of 13,324 and 3331 cases, respectively. From a total of 59 variables, we selected 17 variables that were readily available for the ``few feature model,'' which achieved high predictive power with an accuracy of 81\% and significantly exceeded ultrasound formula methods. In addition, our model maintained superior performance for low birth weight and macrosomic fetal populations. Conclusions: Our research investigated an innovative artificial intelligence model for predicting fetal birth weight and maximizing health care resource use. In the era of big data, our model improves maternal and fetal outcomes and promotes precision medicine. ", doi="10.2196/59377", url="https://pediatrics.jmir.org/2025/1/e59377" } @Article{info:doi/10.2196/65651, author="Bena{\"i}che, Alexandre and Billaut-Laden, Ingrid and Randriamihaja, Herivelo and Bertocchio, Jean-Philippe", title="Assessment of the Efficiency of a ChatGPT-Based Tool, MyGenAssist, in an Industry Pharmacovigilance Department for Case Documentation: Cross-Over Study", journal="J Med Internet Res", year="2025", month="Mar", day="10", volume="27", pages="e65651", keywords="MyGenAssist", keywords="large language model", keywords="artificial intelligence", keywords="ChatGPT", keywords="pharmacovigilance", keywords="efficiency", abstract="Background: At the end of 2023, Bayer AG launched its own internal large language model (LLM), MyGenAssist, based on ChatGPT technology to overcome data privacy concerns. It may offer the possibility to decrease their harshness and save time spent on repetitive and recurrent tasks that could then be dedicated to activities with higher added value. Although there is a current worldwide reflection on whether artificial intelligence should be integrated into pharmacovigilance, medical literature does not provide enough data concerning LLMs and their daily applications in such a setting. Here, we studied how this tool could improve the case documentation process, which is a duty for authorization holders as per European and French good vigilance practices. Objective: The aim of the study is to test whether the use of an LLM could improve the pharmacovigilance documentation process. Methods: MyGenAssist was trained to draft templates for case documentation letters meant to be sent to the reporters. Information provided within the template changes depending on the case: such data come from a table sent to the LLM. We then measured the time spent on each case for a period of 4 months (2 months before using the tool and 2 months after its implementation). A multiple linear regression model was created with the time spent on each case as the explained variable, and all parameters that could influence this time were included as explanatory variables (use of MyGenAssist, type of recipient, number of questions, and user). To test if the use of this tool impacts the process, we compared the recipients' response rates with and without the use of MyGenAssist. Results: An average of 23.3\% (95\% CI 13.8\%-32.8\%) of time saving was made thanks to MyGenAssist (P<.001; adjusted R2=0.286) on each case, which could represent an average of 10.7 (SD 3.6) working days saved each year. The answer rate was not modified by the use of MyGenAssist (20/48, 42\% vs 27/74, 36\%; P=.57) whether the recipient was a physician or a patient. No significant difference was found regarding the time spent by the recipient to answer (mean 2.20, SD 3.27 days vs mean 2.65, SD 3.30 days after the last attempt of contact; P=.64). The implementation of MyGenAssist for this activity only required a 2-hour training session for the pharmacovigilance team. Conclusions: Our study is the first to show that a ChatGPT-based tool can improve the efficiency of a good practice activity without needing a long training session for the affected workforce. These first encouraging results could be an incentive for the implementation of LLMs in other processes. ", doi="10.2196/65651", url="https://www.jmir.org/2025/1/e65651" } @Article{info:doi/10.2196/44027, author="Karamchand, Sumanth and Chipamaunga, Tsungai and Naidoo, Poobalan and Naidoo, Kiolan and Rambiritch, Virendra and Ho, Kevin and Chilton, Robert and McMahon, Kyle and Leisegang, Rory and Weich, Hellmuth and Hassan, Karim", title="Novel Versus Conventional Sequencing of $\beta$-Blockers, Sodium/Glucose Cotransportor 2 Inhibitors, Angiotensin Receptor-Neprilysin Inhibitors, and Mineralocorticoid Receptor Antagonists in Stable Patients With Heart Failure With Reduced Ejection Fraction (NovCon Sequencing Study): Protocol for a Randomized Controlled Trial", journal="JMIR Res Protoc", year="2025", month="Mar", day="10", volume="14", pages="e44027", keywords="heart failure", keywords="SGLT2i", keywords="sodium/glucose cotransporter 2 inhibitors", keywords="ARNi", keywords="angiotensin receptor-neprilysin inhibitors", keywords="HFrEF", keywords="heart failure with reduced ejection fraction", keywords="idiopathic dilated cardiomyopathy", keywords="heart", keywords="chronic heart failure", keywords="patient", keywords="control", keywords="clinical", keywords="adult", keywords="cardiomyopathy", keywords="therapy", abstract="Background: Chronic heart failure has high morbidity and mortality, with approximately half of the patients dying within 5 years of diagnosis. Recent additions to the armamentarium of anti--heart failure therapies include angiotensin receptor-neprilysin inhibitors (ARNIs) and sodium/glucose cotransporter 2 inhibitors (SGLT2is). Both classes have demonstrated mortality and morbidity benefits. Although these new therapies have morbidity and mortality benefits, it is not known whether rapid initiation is beneficial when compared with the conventional, slower-stepped approach. Many clinicians have been taught that starting with low-dose therapies and gradually increasing the dose is a safe way of intensifying treatment regimens. Pharmacologically, it is rational to use a combination of drugs that target multiple pathological mechanisms, as there is potential synergism and better therapeutic outcomes. Theoretically, the quicker the right combinations are used, the more likely the beneficial effects will be experienced. However, rapid up-titration must be balanced with patient safety and tolerability. Objective: This study aims to determine if early addition of ARNIs, SGLT2is, $\beta$-blockers, and mineralocorticoid receptor antagonists (within 4 weeks), when compared with the same therapies initiated slower (within 6 months), will reduce all-cause mortality and hospitalizations for heart failure in patients with stable heart failure with reduced ejection fraction. Methods: This is a single-center, randomized controlled, double-arm, assessor-blinded, active control, and pragmatic clinical trial. Adults with stable heart failure with reduced ejection fraction and idiopathic dilated cardiomyopathy will be randomized to conventional sequencing (the control arm; over 6 months) of anti--heart failure therapies, and a second arm will receive rapid sequencing (over 4 weeks). Study participants will be followed for 5 years to assess the safety, efficacy, and tolerability of the 2 types of sequencing. Posttrial access and care will be provided to all study participants throughout their lifespan. Results: We are currently in the process of obtaining ethical clearance and funding. Conclusions: We envisage that this study will help support evidence-based medicine and inform clinical practice guidelines on the optimal rate of sequencing of anti--heart failure therapies. A third placebo arm was considered, but costs would be too much and not providing study participants with therapies with known morbidity and mortality benefits may be unethical, in our opinion. Given the post--COVID-19 economic downturn and posttrial access to interventions, a major challenge will be acquiring funding for this study. International Registered Report Identifier (IRRID): PRR1-10.2196/44027 ", doi="10.2196/44027", url="https://www.researchprotocols.org/2025/1/e44027" } @Article{info:doi/10.2196/67871, author="Wei, Shengfeng and Guo, Xiangjian and He, Shilin and Zhang, Chunhua and Chen, Zhizhuan and Chen, Jianmei and Huang, Yanmei and Zhang, Fan and Liu, Qiangqiang", title="Application of Machine Learning for Patients With Cardiac Arrest: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2025", month="Mar", day="10", volume="27", pages="e67871", keywords="cardiac arrest", keywords="machine learning", keywords="prognosis", keywords="systematic review", keywords="artificial intelligence", keywords="AI", abstract="Background: Currently, there is a lack of effective early assessment tools for predicting the onset and development of cardiac arrest (CA). With the increasing attention of clinical researchers on machine learning (ML), some researchers have developed ML models for predicting the occurrence and prognosis of CA, with certain models appearing to outperform traditional scoring tools. However, these models still lack systematic evidence to substantiate their efficacy. Objective: This systematic review and meta-analysis was conducted to evaluate the prediction value of ML in CA for occurrence, good neurological prognosis, mortality, and the return of spontaneous circulation (ROSC), thereby providing evidence-based support for the development and refinement of applicable clinical tools. Methods: PubMed, Embase, the Cochrane Library, and Web of Science were systematically searched from their establishment until May 17, 2024. The risk of bias in all prediction models was assessed using the Prediction Model Risk of Bias Assessment Tool. Results: In total, 93 studies were selected, encompassing 5,729,721 in-hospital and out-of-hospital patients. The meta-analysis revealed that, for predicting CA, the pooled C-index, sensitivity, and specificity derived from the imbalanced validation dataset were 0.90 (95\% CI 0.87-0.93), 0.83 (95\% CI 0.79-0.87), and 0.93 (95\% CI 0.88-0.96), respectively. On the basis of the balanced validation dataset, the pooled C-index, sensitivity, and specificity were 0.88 (95\% CI 0.86-0.90), 0.72 (95\% CI 0.49-0.95), and 0.79 (95\% CI 0.68-0.91), respectively. For predicting the good cerebral performance category score 1 to 2, the pooled C-index, sensitivity, and specificity based on the validation dataset were 0.86 (95\% CI 0.85-0.87), 0.72 (95\% CI 0.61-0.81), and 0.79 (95\% CI 0.66-0.88), respectively. For predicting CA mortality, the pooled C-index, sensitivity, and specificity based on the validation dataset were 0.85 (95\% CI 0.82-0.87), 0.83 (95\% CI 0.79-0.87), and 0.79 (95\% CI 0.74-0.83), respectively. For predicting ROSC, the pooled C-index, sensitivity, and specificity based on the validation dataset were 0.77 (95\% CI 0.74-0.80), 0.53 (95\% CI 0.31-0.74), and 0.88 (95\% CI 0.71-0.96), respectively. In predicting CA, the most significant modeling variables were respiratory rate, blood pressure, age, and temperature. In predicting a good cerebral performance category score 1 to 2, the most significant modeling variables in the in-hospital CA group were rhythm (shockable or nonshockable), age, medication use, and gender; the most significant modeling variables in the out-of-hospital CA group were age, rhythm (shockable or nonshockable), medication use, and ROSC. Conclusions: ML represents a currently promising approach for predicting the occurrence and outcomes of CA. Therefore, in future research on CA, we may attempt to systematically update traditional scoring tools based on the superior performance of ML in specific outcomes, achieving artificial intelligence--driven enhancements. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42024518949; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=518949 ", doi="10.2196/67871", url="https://www.jmir.org/2025/1/e67871", url="http://www.ncbi.nlm.nih.gov/pubmed/40063076" } @Article{info:doi/10.2196/65563, author="Tan, Jiaxing and Yang, Rongxin and Xiao, Liyin and Dong, Lingqiu and Zhong, Zhengxia and Zhou, Ling and Qin, Wei", title="Risk Stratification in Immunoglobulin A Nephropathy Using Network Biomarkers: Development and Validation Study", journal="J Med Internet Res", year="2025", month="Mar", day="10", volume="27", pages="e65563", keywords="IgA nephropathy", keywords="unsupervised learning", keywords="network biomarker", keywords="metabolomics", keywords="gut microbiota", keywords="biomarkers", keywords="risk stratification", keywords="IgA", keywords="immunoglobulin A", keywords="renal biopsy", keywords="renal", keywords="prospective cohort", keywords="Berger disease", keywords="synpharyngitic glomerulonephritis", keywords="kidney", keywords="immune system", keywords="glomerulonephritis", keywords="kidney inflammation", keywords="chronic kidney disease", keywords="renal disease", keywords="nephropathy", keywords="nephritis", abstract="Background: Traditional risk models for immunoglobulin A nephropathy (IgAN), which primarily rely on renal indicators, lack comprehensive assessment and therapeutic guidance, necessitating more refined and integrative approaches. Objective: This study integrated network biomarkers with unsupervised learning clustering (k-means clustering based on network biomarkers [KMN]) to refine risk stratification in IgAN and explore its clinical value. Methods: Involving a multicenter prospective cohort, we analyzed 1460 patients and validated the approach externally with 200 additional patients. Deeper metabolic and microbiomic insights were gained from 2 distinct cohorts: 63 patients underwent ultraperformance liquid chromatography--mass spectrometry, while another 45 underwent fecal 16S RNA sequencing. Our approach used hierarchical clustering and k-means methods, using 3 sets of indicators: demographic and renal indicators, renal and extrarenal indicators, and network biomarkers derived from all indicators. Results: Among 6 clustering methods tested, the KMN scheme was the most effective, accurately reflecting patient severity and prognosis with a prognostic accuracy area under the curve (AUC) of 0.77, achieved solely through cluster grouping without additional indicators. The KMN stratification significantly outperformed the existing International IgA Nephropathy Prediction Tool (AUC of 0.72) and renal function-renal histology grading schemes (AUC of 0.69). Clinically, this stratification facilitated personalized treatment, recommending angiotensin-converting enzyme inhibitors or angiotensin receptor blockers for lower-risk groups and considering immunosuppressive therapy for higher-risk groups. Preliminary findings also indicated a correlation between IgAN progression and alterations in serum metabolites and gut microbiota, although further research is needed to establish causality. Conclusions: The effectiveness and applicability of the KMN scheme indicate its substantial potential for clinical application in IgAN management. ", doi="10.2196/65563", url="https://www.jmir.org/2025/1/e65563", url="http://www.ncbi.nlm.nih.gov/pubmed/40063072" } @Article{info:doi/10.2196/69068, author="Hansun, Seng and Argha, Ahmadreza and Bakhshayeshi, Ivan and Wicaksana, Arya and Alinejad-Rokny, Hamid and Fox, J. Greg and Liaw, Siaw-Teng and Celler, G. Branko and Marks, B. Guy", title="Diagnostic Performance of Artificial Intelligence--Based Methods for Tuberculosis Detection: Systematic Review", journal="J Med Internet Res", year="2025", month="Mar", day="7", volume="27", pages="e69068", keywords="AI", keywords="artificial intelligence", keywords="deep learning", keywords="diagnostic performance", keywords="machine learning", keywords="PRISMA", keywords="Preferred Reporting Items for Systematic Reviews and Meta-Analysis", keywords="QUADAS-2", keywords="Quality Assessment of Diagnostic Accuracy Studies version 2", keywords="systematic literature review", keywords="tuberculosis detection", abstract="Background: Tuberculosis (TB) remains a significant health concern, contributing to the highest mortality among infectious diseases worldwide. However, none of the various TB diagnostic tools introduced is deemed sufficient on its own for the diagnostic pathway, so various artificial intelligence (AI)--based methods have been developed to address this issue. Objective: We aimed to provide a comprehensive evaluation of AI-based algorithms for TB detection across various data modalities. Methods: Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) 2020 guidelines, we conducted a systematic review to synthesize current knowledge on this topic. Our search across 3 major databases (Scopus, PubMed, Association for Computing Machinery [ACM] Digital Library) yielded 1146 records, of which we included 152 (13.3\%) studies in our analysis. QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies version 2) was performed for the risk-of-bias assessment of all included studies. Results: Radiographic biomarkers (n=129, 84.9\%) and deep learning (DL; n=122, 80.3\%) approaches were predominantly used, with convolutional neural networks (CNNs) using Visual Geometry Group (VGG)-16 (n=37, 24.3\%), ResNet-50 (n=33, 21.7\%), and DenseNet-121 (n=19, 12.5\%) architectures being the most common DL approach. The majority of studies focused on model development (n=143, 94.1\%) and used a single modality approach (n=141, 92.8\%). AI methods demonstrated good performance in all studies: mean accuracy=91.93\% (SD 8.10\%, 95\% CI 90.52\%-93.33\%; median 93.59\%, IQR 88.33\%-98.32\%), mean area under the curve (AUC)=93.48\% (SD 7.51\%, 95\% CI 91.90\%-95.06\%; median 95.28\%, IQR 91\%-99\%), mean sensitivity=92.77\% (SD 7.48\%, 95\% CI 91.38\%-94.15\%; median 94.05\% IQR 89\%-98.87\%), and mean specificity=92.39\% (SD 9.4\%, 95\% CI 90.30\%-94.49\%; median 95.38\%, IQR 89.42\%-99.19\%). AI performance across different biomarker types showed mean accuracies of 92.45\% (SD 7.83\%), 89.03\% (SD 8.49\%), and 84.21\% (SD 0\%); mean AUCs of 94.47\% (SD 7.32\%), 88.45\% (SD 8.33\%), and 88.61\% (SD 5.9\%); mean sensitivities of 93.8\% (SD 6.27\%), 88.41\% (SD 10.24\%), and 93\% (SD 0\%); and mean specificities of 94.2\% (SD 6.63\%), 85.89\% (SD 14.66\%), and 95\% (SD 0\%) for radiographic, molecular/biochemical, and physiological types, respectively. AI performance across various reference standards showed mean accuracies of 91.44\% (SD 7.3\%), 93.16\% (SD 6.44\%), and 88.98\% (SD 9.77\%); mean AUCs of 90.95\% (SD 7.58\%), 94.89\% (SD 5.18\%), and 92.61\% (SD 6.01\%); mean sensitivities of 91.76\% (SD 7.02\%), 93.73\% (SD 6.67\%), and 91.34\% (SD 7.71\%); and mean specificities of 86.56\% (SD 12.8\%), 93.69\% (SD 8.45\%), and 92.7\% (SD 6.54\%) for bacteriological, human reader, and combined reference standards, respectively. The transfer learning (TL) approach showed increasing popularity (n=89, 58.6\%). Notably, only 1 (0.7\%) study conducted domain-shift analysis for TB detection. Conclusions: Findings from this review underscore the considerable promise of AI-based methods in the realm of TB detection. Future research endeavors should prioritize conducting domain-shift analyses to better simulate real-world scenarios in TB detection. Trial Registration: PROSPERO CRD42023453611; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023453611 ", doi="10.2196/69068", url="https://www.jmir.org/2025/1/e69068", url="http://www.ncbi.nlm.nih.gov/pubmed/40053773" } @Article{info:doi/10.2196/60391, author="Shmilovitch, Haim Amit and Katson, Mark and Cohen-Shelly, Michal and Peretz, Shlomi and Aran, Dvir and Shelly, Shahar", title="GPT-4 as a Clinical Decision Support Tool in Ischemic Stroke Management: Evaluation Study", journal="JMIR AI", year="2025", month="Mar", day="7", volume="4", pages="e60391", keywords="GPT-4", keywords="ischemic stroke", keywords="clinical decision support", keywords="artificial intelligence", keywords="neurology", abstract="Background: Cerebrovascular diseases are the second most common cause of death worldwide and one of the major causes of disability burden. Advancements in artificial intelligence have the potential to revolutionize health care delivery, particularly in critical decision-making scenarios such as ischemic stroke management. Objective: This study aims to evaluate the effectiveness of GPT-4 in providing clinical support for emergency department neurologists by comparing its recommendations with expert opinions and real-world outcomes in acute ischemic stroke management. Methods: A cohort of 100 patients with acute stroke symptoms was retrospectively reviewed. Data used for decision-making included patients' history, clinical evaluation, imaging study results, and other relevant details. Each case was independently presented to GPT-4, which provided scaled recommendations (1-7) regarding the appropriateness of treatment, the use of tissue plasminogen activator, and the need for endovascular thrombectomy. Additionally, GPT-4 estimated the 90-day mortality probability for each patient and elucidated its reasoning for each recommendation. The recommendations were then compared with a stroke specialist's opinion and actual treatment decisions. Results: In our cohort of 100 patients, treatment recommendations by GPT-4 showed strong agreement with expert opinion (area under the curve [AUC] 0.85, 95\% CI 0.77-0.93) and real-world treatment decisions (AUC 0.80, 95\% CI 0.69-0.91). GPT-4 showed near-perfect agreement with real-world decisions in recommending endovascular thrombectomy (AUC 0.94, 95\% CI 0.89-0.98) and strong agreement for tissue plasminogen activator treatment (AUC 0.77, 95\% CI 0.68-0.86). Notably, in some cases, GPT-4 recommended more aggressive treatment than human experts, with 11 instances where GPT-4 suggested tissue plasminogen activator use against expert opinion. For mortality prediction, GPT-4 accurately identified 10 (77\%) out of 13 deaths within its top 25 high-risk predictions (AUC 0.89, 95\% CI 0.8077-0.9739; hazard ratio 6.98, 95\% CI 2.88-16.9; P<.001), outperforming supervised machine learning models such as PRACTICE (AUC 0.70; log-rank P=.02) and PREMISE (AUC 0.77; P=.07). Conclusions: This study demonstrates the potential of GPT-4 as a viable clinical decision-support tool in the management of acute stroke. Its ability to provide explainable recommendations without requiring structured data input aligns well with the routine workflows of treating physicians. However, the tendency toward more aggressive treatment recommendations highlights the importance of human oversight in clinical decision-making. Future studies should focus on prospective validations and exploring the safe integration of such artificial intelligence tools into clinical practice. ", doi="10.2196/60391", url="https://ai.jmir.org/2025/1/e60391", url="http://www.ncbi.nlm.nih.gov/pubmed/40053715" } @Article{info:doi/10.2196/67290, author="Dagli, Marcel Mert and Turlip, William Ryan and Oettl, C. Felix and Emara, Mohamed and Gujral, Jaskeerat and Chauhan, Daksh and Ahmad, S. Hasan and Santangelo, Gabrielle and Wathen, Connor and Ghenbot, Yohannes and Arena, D. John and Golubovsky, L. Joshua and Gu, J. Ben and Shin, H. John and Yoon, Won Jang and Ozturk, K. Ali and Welch, C. William", title="Comparison of Outcomes Between Staged and Same-Day Circumferential Spinal Fusion for Adult Spinal Deformity: Systematic Review and Meta-Analysis", journal="Interact J Med Res", year="2025", month="Mar", day="6", volume="14", pages="e67290", keywords="adults", keywords="circumferential fusion", keywords="scoliosis", keywords="spinal curvature", keywords="spinal fusion", keywords="spinal deformity", keywords="intraoperative", keywords="postoperative", keywords="perioperative", keywords="systematic reviews", keywords="meta-analysis", keywords="PRISMA", abstract="Background: Adult spinal deformity (ASD) is a prevalent condition often treated with circumferential spinal fusion (CF), which can be performed as staged or same-day procedures. However, evidence guiding the choice between these approaches is lacking. Objective: This study aims to compare patient outcomes following staged and same-day CF for ASD. Methods: Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a comprehensive literature search was conducted in PubMed, MEDLINE, Embase, Cochrane CENTRAL, Web of Science, and Scopus. Eligibility criteria included studies comparing outcomes following staged and same-day CF in adults with ASD. Searches were exported to Covidence, and records were deduplicated automatically. Title and abstract screening, full-text review, and data extraction were performed by two independent reviewers, with all conflicts being resolved by a third reviewer. A meta-analysis was conducted for outcomes reported in 3 or more studies. Results: Seven studies with 741 patients undergoing CF for ASD were included in the review (staged: n=331, 44.7\% and same-day: n=410, 55.3\%). Four studies that had comparable outcomes were merged for the quantitative meta-analysis and split based on observed measures. The meta-analysis revealed significantly shorter hospital length of stay (mean difference 3.98, 95\% CI 2.23-5.72 days; P<.001) for same-day CF. Three studies compared the operative time between staged and same-day CF, with all reporting a lower mean operative time for same-day CF (mean between 291-479, SD 129 minutes) compared to staged CF (mean between 426-541, SD 124 minutes); however, inconsistent reporting of mean and SD made quantitative analyses unattainable. Of the 4 studies that compared estimated blood loss (EBL) in the relevant groups, 3 presented a lower EBL (mean between 412-1127, SD 954 mL) in same-day surgery compared to staged surgery (mean between 642, SD 550 to 1351, SD 869 mL). Both studies that reported intra- and postoperative adverse events showed more intraoperative adverse events in staged CF (10.9\% and 13.6\%, respectively) compared to same-day CF (9.1\% and 3.6\%, respectively). Four studies measuring any perioperative adverse events showed a higher incidence of adverse events in staged CF than all studies combined. However, quantitative analysis of EBL, intraoperative adverse events, and perioperative adverse events found no statistically significant difference. Postoperative adverse events, reoperation, infection rates, and readmission rates showed inconsistent findings between studies. Data quality assessment revealed a moderate degree of bias for all included studies. Conclusions: Same-day CF may offer shorter operating time and hospital stay compared to staged CF for ASD. However, there was marked heterogeneity in perioperative outcomes reporting, and continuous variables were inconsistently presented. This underscored the need for standardized reporting of clinical variables and patient-reported outcomes and higher evidence of randomized controlled trials to elucidate the clinical superiority of either approach. Trial Registration: PROSPERO CRD42022339764; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=339764 International Registered Report Identifier (IRRID): RR2-10.2196/42331 ", doi="10.2196/67290", url="https://www.i-jmr.org/2025/1/e67290", url="http://www.ncbi.nlm.nih.gov/pubmed/40053742" } @Article{info:doi/10.2196/65152, author="Nightingale, L. Chandylen and Dressler, V. Emily and Kepper, Maura and Klepin, D. Heidi and Lee, Craddock Simon and Smith, Sydney and Aguilar, Aylin and Wiseman, D. Kimberly and Sohl, J. Stephanie and Wells, J. Brian and DeMari, A. Joseph and Throckmorton, Alyssa and Kulbacki, W. Lindsey and Hanna, Jenny and Foraker, E. Randi and Weaver, E. Kathryn", title="Oncology Provider and Patient Perspectives on a Cardiovascular Health Assessment Tool Used During Posttreatment Survivorship Care in Community Oncology (Results from WF-1804CD): Mixed Methods Observational Study", journal="J Med Internet Res", year="2025", month="Mar", day="6", volume="27", pages="e65152", keywords="cancer", keywords="cardiovascular health", keywords="cancer survivors", keywords="community oncology", keywords="electronic health record integration", keywords="provider acceptability", keywords="patient-provider", keywords="assessment tool", keywords="electronic health records", keywords="clinical decision support", keywords="surveys", keywords="interviews", keywords="survivors", keywords="Automated Heart-Health Assessment", abstract="Background: Most survivors of cancer have multiple cardiovascular risk factors, increasing their risk of poor cardiovascular and cancer outcomes. The Automated Heart-Health Assessment (AH-HA) tool is a novel electronic health record clinical decision support tool based on the American Heart Association's Life's Simple 7 cardiovascular health metrics to promote cardiovascular health assessment and discussion in outpatient oncology. Before proceeding to future implementation trials, it is critical to establish the acceptability of the tool among providers and survivors. Objective: This study aims to assess provider and survivor acceptability of the AH-HA tool and provider training at practices randomized to the AH-HA tool arm within WF-1804CD. Methods: Providers (physicians, nurse practitioners, and physician assistants) completed a survey to assess the acceptability of the AH-HA training, immediately following training. Providers also completed surveys to assess AH-HA tool acceptability and potential sustainability. Tool acceptability was assessed after 30 patients were enrolled at the practice with both a survey developed for the study as well as with domains from the Unified Theory of Acceptance and Use of Technology survey (performance expectancy, effort expectancy, attitude toward using technology, and facilitating conditions). Semistructured interviews at the end of the study captured additional provider perceptions of the AH-HA tool. Posttreatment survivors (breast, prostate, colorectal, endometrial, and lymphomas) completed a survey to assess the acceptability of the AH-HA tool immediately after the designated study appointment. Results: Providers (n=15) reported high overall acceptability of the AH-HA training (mean 5.8, SD 1.0) and tool (mean 5.5, SD 1.4); provider acceptability was also supported by the Unified Theory of Acceptance and Use of Technology scores (eg, effort expectancy: mean 5.6, SD 1.5). Qualitative data also supported provider acceptability of different aspects of the AH-HA tool (eg, ``It helps focus the conversation and give the patient a visual of continuum of progress''). Providers were more favorable about using the AH-HA tool for posttreatment survivorship care. Enrolled survivors (n=245) were an average of 4.4 (SD 3.7) years posttreatment. Most survivors reported that they strongly agreed or agreed that they liked the AH-HA tool (n=231, 94.3\%). A larger proportion of survivors with high health literacy strongly agreed or agreed that it was helpful to see their heart health score (n=161, 98.2\%) compared to survivors with lower health literacy scores (n=68, 89.5\%; P=.005). Conclusions: Quantitative surveys and qualitative interview data both demonstrate high acceptability of the AH-HA tool among both providers and survivors. Although most survivors found it helpful to see their heart health score, there may be room for improving communication with survivors who have lower health literacy. Trial Registration: ClinicalTrials.gov NCT03935282; http://clinicaltrials.gov/ct2/show/NCT03935282 International Registered Report Identifier (IRRID): RR2-https://doi-org.wake.idm.oclc.org/10.1016/j.conctc.2021.100808 ", doi="10.2196/65152", url="https://www.jmir.org/2025/1/e65152", url="http://www.ncbi.nlm.nih.gov/pubmed/39854647" } @Article{info:doi/10.2196/71439, author="Watanabe, Seiya and Kizaki, Hayato and Hori, Satoko", title="Development of a Patient-Centered Symptom-Reporting Application in Pharmacy Settings Using a Hierarchical Patient-Friendly Symptom List: Developmental and Usability Study", journal="JMIR Hum Factors", year="2025", month="Mar", day="6", volume="12", pages="e71439", keywords="patient symptom monitoring", keywords="hierarchical symptom list", keywords="community pharmacy", keywords="interview survey", keywords="mobile application", abstract="Background: Effective symptom identification, a key responsibility for community pharmacists, requires patients to describe their symptoms accurately and comprehensively. However, current practices in pharmacies may be insufficient in capturing patient-reported symptoms comprehensively, potentially affecting the quality of pharmaceutical care and patient safety. Objective: This study aimed to construct a new, hierarchical symptom list derived from the Patient-Friendly Term List of the Medical Dictionary for Regulatory Activities (MedDRA) and to develop and evaluate a mobile app incorporating this list for facilitating symptom reporting by patients in pharmacy settings. The study also aimed to assess the usability and acceptance of this app among potential users. Methods: Subjective symptom-related terms were extracted from the Patient-Friendly Term List version 23.0 of the MedDRA. These terms were systematically consolidated and organized into a hierarchical, user-friendly symptom list. A mobile app incorporating this list was developed for pharmacy settings, featuring a symptom selection interface and a free-text input field for additional symptoms. The app included an instructional video explaining the importance of symptom reporting and guidance on navigation. Usability tests and semistructured interviews were conducted with participants aged >20 years. Interview transcripts were analyzed using the Unified Theory of Acceptance and Use of Technology (UTAUT) model to evaluate factors influencing the acceptance of technology. Results: From the initial 1440 terms in the Patient-Friendly Term List, 795 relevant terms were selected and organized into 40 site-specific subcategories, which were then grouped into broader site categories (mental, head, trunk, upper limb, lower limb, physical condition, and others). These terms were further consolidated into 211 patient-friendly symptom terms, forming a hierarchical symptom list. The app's interface design limited options to 10 items per screen to assist with decision-making. A total of 5 adults participated in the usability test. Participants found the interface intuitive and easy to use, requiring minimal effort, and provided positive feedback regarding the potential utility of the app in pharmacy settings. The UTAUT analysis identified several facilitating factors, including ease of use and the potential for enhanced pharmacist-patient communication. However, concerns were raised about usability for older adults and the need for simplified technical terminology. Conclusions: The user-friendly app with a hierarchically structured symptom list and complementary free-text entry has potential benefits for improving the accuracy and efficiency of symptom reporting in pharmacy settings. The positive user acceptance and identified areas for improvement provide a foundation for further development and implementation of this technology to enhance communication between patients and pharmacists. Future improvements should focus on addressing usability for older adults and simplifying technical terminology. ", doi="10.2196/71439", url="https://humanfactors.jmir.org/2025/1/e71439", url="http://www.ncbi.nlm.nih.gov/pubmed/40053749" } @Article{info:doi/10.2196/68509, author="Dong, Jiale and Jin, Zhechuan and Li, Chengxiang and Yang, Jian and Jiang, Yi and Li, Zeqian and Chen, Cheng and Zhang, Bo and Ye, Zhaofei and Hu, Yang and Ma, Jianguo and Li, Ping and Li, Yulin and Wang, Dongjin and Ji, Zhili", title="Machine Learning Models With Prognostic Implications for Predicting Gastrointestinal Bleeding After Coronary Artery Bypass Grafting and Guiding Personalized Medicine: Multicenter Cohort Study", journal="J Med Internet Res", year="2025", month="Mar", day="6", volume="27", pages="e68509", keywords="machine learning", keywords="personalized medicine", keywords="coronary artery bypass grafting", keywords="adverse outcome", keywords="gastrointestinal bleeding", abstract="Background: Gastrointestinal bleeding is a serious adverse event of coronary artery bypass grafting and lacks tailored risk assessment tools for personalized prevention. Objective: This study aims to develop and validate predictive models to assess the risk of gastrointestinal bleeding after coronary artery bypass grafting (GIBCG) and to guide personalized prevention. Methods: Participants were recruited from 4 medical centers, including a prospective cohort and the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. From an initial cohort of 18,938 patients, 16,440 were included in the final analysis after applying the exclusion criteria. Thirty combinations of machine learning algorithms were compared, and the optimal model was selected based on integrated performance metrics, including the area under the receiver operating characteristic curve (AUROC) and the Brier score. This model was then developed into a web-based risk prediction calculator. The Shapley Additive Explanations method was used to provide both global and local explanations for the predictions. Results: The model was developed using data from 3 centers and a prospective cohort (n=13,399) and validated on the Drum Tower cohort (n=2745) and the MIMIC cohort (n=296). The optimal model, based on 15 easily accessible admission features, demonstrated an AUROC of 0.8482 (95\% CI 0.8328-0.8618) in the derivation cohort. In external validation, the AUROC was 0.8513 (95\% CI 0.8221-0.8782) for the Drum Tower cohort and 0.7811 (95\% CI 0.7275-0.8343) for the MIMIC cohort. The analysis indicated that high-risk patients identified by the model had a significantly increased mortality risk (odds ratio 2.98, 95\% CI 1.784-4.978; P<.001). For these high-risk populations, preoperative use of proton pump inhibitors was an independent protective factor against the occurrence of GIBCG. By contrast, dual antiplatelet therapy and oral anticoagulants were identified as independent risk factors. However, in low-risk populations, the use of proton pump inhibitors ($\chi$21=0.13, P=.72), dual antiplatelet therapy ($\chi$21=0.38, P=.54), and oral anticoagulants ($\chi$21=0.15, P=.69) were not significantly associated with the occurrence of GIBCG. Conclusions: Our machine learning model accurately identified patients at high risk of GIBCG, who had a poor prognosis. This approach can aid in early risk stratification and personalized prevention. Trial Registration: Chinese Clinical Registry Center ChiCTR2400086050; http://www.chictr.org.cn/showproj.html?proj=226129 ", doi="10.2196/68509", url="https://www.jmir.org/2025/1/e68509", url="http://www.ncbi.nlm.nih.gov/pubmed/40053791" } @Article{info:doi/10.2196/65190, author="Esumi, Ryo and Funao, Hiroki and Kawamoto, Eiji and Sakamoto, Ryota and Ito-Masui, Asami and Okuno, Fumito and Shinkai, Toru and Hane, Atsuya and Ikejiri, Kaoru and Akama, Yuichi and Gaowa, Arong and Park, Jeong Eun and Momosaki, Ryo and Kaku, Ryuji and Shimaoka, Motomu", title="Machine Learning--Based Prediction of Delirium and Risk Factor Identification in Intensive Care Unit Patients With Burns: Retrospective Observational Study", journal="JMIR Form Res", year="2025", month="Mar", day="5", volume="9", pages="e65190", keywords="burns", keywords="delirium", keywords="intensive care unit", keywords="machine learning", keywords="prediction model", keywords="artificial intelligence", keywords="AI", abstract="Background: The incidence of delirium in patients with burns receiving treatment in the intensive care unit (ICU) is high, reaching up to 77\%, and has been associated with increased mortality rates. Therefore, early identification of patients at high risk of delirium onset is essential for improving treatment strategies. Objective: This study aimed to create a machine learning model for predicting delirium in patients with burns during their ICU stay using patient data from the first day of ICU admission and identify predictive factors for ICU delirium in patients with burns. Methods: This study focused on 82 patients with burns aged ?18 years who were admitted to the ICU at Mie University Hospital for ?24 hours between January 2015 and June 2023. In total, 70 variables were measured in patients upon ICU admission and used as explanatory variables in the ICU delirium prediction model. Delirium was assessed using the Intensive Care Delirium Screening Checklist every 8 hours after ICU admission. A total of 10 different machine learning methods were used to predict ICU delirium. Multiple receiver operating characteristic curves were plotted for various machine learning models, and the area under the curve (AUC) for each was compared. In addition, the top 15 risk factors contributing to delirium onset were identified using Shapley additive explanations analysis. Results: Among the 10 machine learning models tested, logistic regression (mean AUC 0.906, SD 0.073), support vector machine (mean AUC 0.897, SD 0.056), k-nearest neighbor (mean AUC 0.894, SD 0.060), neural network (mean AUC 0.857, SD 0.058), random forest (mean AUC 0.850, SD 0.074), adaptive boosting (mean AUC 0.832, SD 0.094), gradient boosting machine (mean AUC 0.821, SD 0.074), and na{\"i}ve Bayes (mean AUC 0.827, SD 0.095) demonstrated the highest accuracy in predicting ICU delirium. Specifically, 24-hour urine output (from ICU admission to 24 hours), oxygen saturation, burn area, total bilirubin level, and intubation upon ICU admission were identified as the major risk factors for delirium onset. In addition, variables, such as the proportion of white blood cell fractions, including monocytes; methemoglobin concentration; and respiratory rate, were identified as important risk factors for ICU delirium. Conclusions: This study demonstrated the ability of machine learning models trained using vital signs and blood data upon ICU admission to predict delirium in patients with burns during their ICU stay. ", doi="10.2196/65190", url="https://formative.jmir.org/2025/1/e65190", url="http://www.ncbi.nlm.nih.gov/pubmed/39895101" } @Article{info:doi/10.2196/56671, author="Oh, Geum Eui and Oh, Sunyoung and Cho, Seunghyeon and Moon, Mir", title="Predicting Readmission Among High-Risk Discharged Patients Using a Machine Learning Model With Nursing Data: Retrospective Study", journal="JMIR Med Inform", year="2025", month="Mar", day="5", volume="13", pages="e56671", keywords="machine learning", keywords="EHR", keywords="electronic health record", keywords="electronic medical record", keywords="EMR", keywords="artificial intelligence", keywords="readmission", keywords="nursing data", keywords="clinical decision support", keywords="prediction", keywords="predictive", keywords="discharge", keywords="admission", keywords="hospitalization", abstract="Background: Unplanned readmissions increase unnecessary health care costs and reduce the quality of care. It is essential to plan the discharge care from the beginning of hospitalization to reduce the risk of readmission. Machine learning--based readmission prediction models can support patients' preemptive discharge care services with improved predictive power. Objective: This study aimed to develop a readmission early prediction model utilizing nursing data for high-risk discharge patients. Methods: This retrospective study included the electronic medical records of 12,977 patients with 1 of the top 6 high-risk readmission diseases at a tertiary hospital in Seoul from January 2018 to January 2020. We used demographic, clinical, and nursing data to construct a prediction model. We constructed unplanned readmission prediction models by dividing them into Model 1 and Model 2. Model 1 used early hospitalization data (up to 1 day after admission), and Model 2 used all the data. To improve the performance of the machine learning method, we performed 5-fold cross-validation and utilized adaptive synthetic sampling to address data imbalance. The 6 algorithms of logistic regression, random forest, decision tree, XGBoost, CatBoost, and multiperceptron layer were employed to develop predictive models. The analysis was conducted using Python Language Reference, version 3.11.3. (Python Software Foundation). Results: In Model 1, among the 6 prediction model algorithms, the random forest model had the best result, with an area under the receiver operating characteristic (AUROC) curve of 0.62. In Model 2, the CatBoost model had the best result, with an AUROC of 0.64. BMI, systolic blood pressure, and age consistently emerged as the most significant predictors of readmission risk across Models 1 and 2. Model 1, which enabled early readmission prediction, showed a higher proportion of nursing data variables among its important predictors compared to Model 2. Conclusions: Machine learning--based readmission prediction models utilizing nursing data provide basic data for evidence-based clinical decision support for high-risk discharge patients with complex conditions and facilitate early intervention. By integrating nursing data containing diverse patient information, these models can provide more comprehensive risk assessment and improve patient outcomes. ", doi="10.2196/56671", url="https://medinform.jmir.org/2025/1/e56671" } @Article{info:doi/10.2196/64364, author="Berman, Eliza and Sundberg Malek, Holly and Bitzer, Michael and Malek, Nisar and Eickhoff, Carsten", title="Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards: Algorithmic Development and Validation Study", journal="J Med Internet Res", year="2025", month="Mar", day="5", volume="27", pages="e64364", keywords="large language models", keywords="retrieval augmented generation", keywords="LLaMA", keywords="precision oncology", keywords="molecular tumor board", keywords="molecular tumor", keywords="LLMs", keywords="augmented therapy", keywords="MTB", keywords="oncology", keywords="tumor", keywords="clinical trials", keywords="patient care", keywords="treatment", keywords="evidence-based", keywords="accessibility to care", abstract="Background: Molecular tumor boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large language models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology. Objective: In this study, we aimed to investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs' ability to generate evidence-based treatment recommendations using PubMed references. Methods: We built a retrieval augmented generation pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses. Results: A total of 75\% of the referenced articles were properly cited from PubMed, while 17\% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labeling 25\% of LLM responses as equal to their recommendations and 37.5\% as alternative plausible treatments. Conclusions: This study demonstrates how retrieval augmented generation--enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline. ", doi="10.2196/64364", url="https://www.jmir.org/2025/1/e64364", url="http://www.ncbi.nlm.nih.gov/pubmed/40053768" } @Article{info:doi/10.2196/57719, author="Serin, Oguzhan and Akbasli, Turkalp Izzet and Cetin, Bocutcu Sena and Koseoglu, Busra and Deveci, Fatih Ahmet and Ugur, Zahid Muhsin and Ozsurekci, Yasemin", title="Predicting Escalation of Care for Childhood Pneumonia Using Machine Learning: Retrospective Analysis and Model Development", journal="JMIRx Med", year="2025", month="Mar", day="4", volume="6", pages="e57719", keywords="childhood pneumonia", keywords="community-acquired pneumonia", keywords="machine learning", keywords="clinical decision support system", keywords="prognostic care decision", abstract="Background: Pneumonia is a leading cause of mortality in children aged <5 years. While machine learning (ML) has been applied to pneumonia diagnostics, few studies have focused on predicting the need for escalation of care in pediatric cases. This study aims to develop an ML-based clinical decision support tool for predicting the need for escalation of care in community-acquired pneumonia cases. Objective: The primary objective was to develop a robust predictive tool to help primary care physicians determine where and how a case should be managed. Methods: Data from 437 children with community-acquired pneumonia, collected before the COVID-19 pandemic, were retrospectively analyzed. Pediatricians encoded key clinical features from unstructured medical records based on Integrated Management of Childhood Illness guidelines. After preprocessing with Synthetic Minority Oversampling Technique--Tomek to handle imbalanced data, feature selection was performed using Shapley additive explanations values. The model was optimized through hyperparameter tuning and ensembling. The primary outcome was the level of care severity, defined as the need for referral to a tertiary care unit for intensive care or respiratory support. Results: A total of 437 cases were analyzed, and the optimized models predicted the need for transfer to a higher level of care with an accuracy of 77\% to 88\%, achieving an area under the receiver operator characteristic curve of 0.88 and an area under the precision-recall curve of 0.96. Shapley additive explanations value analysis identified hypoxia, respiratory distress, age, weight-for-age z score, and complaint duration as the most important clinical predictors independent of laboratory diagnostics. Conclusions: This study demonstrates the feasibility of applying ML techniques to create a prognostic care decision tool for childhood pneumonia. It provides early identification of cases requiring escalation of care by combining foundational clinical skills with data science methods. ", doi="10.2196/57719", url="https://xmed.jmir.org/2025/1/e57719" } @Article{info:doi/10.2196/66699, author="D{\"u}vel, Andrea Juliane and Lampe, David and Kirchner, Maren and Elkenkamp, Svenja and Cimiano, Philipp and D{\"u}sing, Christoph and Marchi, Hannah and Schmiegel, Sophie and Fuchs, Christiane and Cla{\ss}en, Simon and Meier, Kirsten-Laura and Borgstedt, Rainer and Rehberg, Sebastian and Greiner, Wolfgang", title="An AI-Based Clinical Decision Support System for Antibiotic Therapy in Sepsis (KINBIOTICS): Use Case Analysis", journal="JMIR Hum Factors", year="2025", month="Mar", day="4", volume="12", pages="e66699", keywords="CDSS", keywords="use case analysis", keywords="technology acceptance", keywords="sepsis", keywords="infection", keywords="infectious disease", keywords="antimicrobial resistance", keywords="clinical decision support system", keywords="decision-making", keywords="clinical support", keywords="machine learning", keywords="ML", keywords="artificial intelligence", keywords="AI", keywords="algorithm", keywords="model", keywords="analytics", keywords="predictive models", keywords="deep learning", keywords="early warning", keywords="early detection", abstract="Background: Antimicrobial resistances pose significant challenges in health care systems. Clinical decision support systems (CDSSs) represent a potential strategy for promoting a more targeted and guideline-based use of antibiotics. The integration of artificial intelligence (AI) into these systems has the potential to support physicians in selecting the most effective drug therapy for a given patient. Objective: This study aimed to analyze the feasibility of an AI-based CDSS pilot version for antibiotic therapy in sepsis patients and identify facilitating and inhibiting conditions for its implementation in intensive care medicine. Methods: The evaluation was conducted in 2 steps, using a qualitative methodology. Initially, expert interviews were conducted, in which intensive care physicians were asked to assess the AI-based recommendations for antibiotic therapy in terms of plausibility, layout, and design. Subsequently, focus group interviews were conducted to examine the technology acceptance of the AI-based CDSS. The interviews were anonymized and evaluated using content analysis. Results: In terms of the feasibility, barriers included variability in previous antibiotic administration practices, which affected the predictive ability of AI recommendations, and the increased effort required to justify deviations from these recommendations. Physicians' confidence in accepting or rejecting recommendations depended on their level of professional experience. The ability to re-evaluate CDSS recommendations and an intuitive, user-friendly system design were identified as factors that enhanced acceptance and usability. Overall, barriers included low levels of digitization in clinical practice, limited availability of cross-sectoral data, and negative previous experiences with CDSSs. Conversely, facilitators to CDSS implementation were potential time savings, physicians' openness to adopting new technologies, and positive previous experiences. Conclusions: Early integration of users is beneficial for both the identification of relevant context factors and the further development of an effective CDSS. Overall, the potential of AI-based CDSSs is offset by inhibiting contextual conditions that impede its acceptance and implementation. The advancement of AI-based CDSSs and the mitigation of these inhibiting conditions are crucial for the realization of its full potential. ", doi="10.2196/66699", url="https://humanfactors.jmir.org/2025/1/e66699" } @Article{info:doi/10.2196/63740, author="Wu, Peng and Hurst, H. Jillian and French, Alexis and Chrestensen, Michael and Goldstein, A. Benjamin", title="Linking Electronic Health Record Prescribing Data and Pharmacy Dispensing Records to Identify Patient-Level Factors Associated With Psychotropic Medication Receipt: Retrospective Study", journal="JMIR Med Inform", year="2025", month="Mar", day="4", volume="13", pages="e63740", keywords="electronic health records", keywords="pharmacy dispensing", keywords="psychotropic medications", keywords="prescriptions", keywords="predictive modeling", abstract="Background: Pharmacoepidemiology studies using electronic health record (EHR) data typically rely on medication prescriptions to determine which patients have received a medication. However, such data do not affirmatively indicate whether these prescriptions have been filled. External dispensing databases can bridge this information gap; however, few established methods exist for linking EHR data and pharmacy dispensing records. Objective: We described a process for linking EHR prescribing data with pharmacy dispensing records from Surescripts. As a use case, we considered the prescriptions and resulting fills for psychotropic medications among pediatric patients. We evaluated how dispensing information affects identifying patients receiving prescribed medications and assessing the association between filling prescriptions and subsequent health behaviors. Methods: This retrospective study identified all new psychotropic prescriptions to patients younger than 18 years of age at Duke University Health System in 2021. We linked dispensing to prescribing data using proximate dates and matching codes between RxNorm concept unique identifiers and National Drug Codes. We described demographic, clinical, and service use characteristics to assess differences between patients who did versus did not fill prescriptions. We fit a least absolute shrinkage and selection operator (LASSO) regression model to evaluate the predictability of a fill. We then fit time-to-event models to assess the association between whether a patient filled a prescription and a future provider visit. Results: We identified 1254 pediatric patients with a new psychotropic prescription. In total, 976 (77.8\%) patients filled their prescriptions within 30 days of their prescribing encounters. Thus, we set 30 days as a cut point for defining a valid prescription fill. Patients who filled prescriptions differed from those who did not in several key factors. Those who did not fill had slightly higher BMIs, lived in more disadvantaged neighborhoods, were more likely to have public insurance or self-pay, and included a higher proportion of male patients. Patients with prior well-child visits or prescriptions from primary care providers were more likely to fill. Additionally, patients with anxiety diagnoses and those prescribed selective serotonin reuptake inhibitors were more likely to fill prescriptions. The LASSO model achieved an area under the receiver operator characteristic curve of 0.816. The time to the follow-up visit with the same provider was censored at 90 days after the initial encounter. Patients who filled prescriptions showed higher levels of follow-up visits. The marginal hazard ratio of a follow-up visit with the same provider was 1.673 (95\% CI 1.463?1.913) for patients who filled their prescriptions. Using the LASSO model as a propensity-based weight, we calculated the weighted hazard ratio as 1.447 (95\% CI 1.257?1.665). Conclusions: Systematic differences existed between patients who did versus did not fill prescriptions. Incorporating external dispensing databases into EHR-based studies informs medication receipt and associated health outcomes. ", doi="10.2196/63740", url="https://medinform.jmir.org/2025/1/e63740" } @Article{info:doi/10.2196/62779, author="Doru, Berin and Maier, Christoph and Busse, Sophie Johanna and L{\"u}cke, Thomas and Sch{\"o}nhoff, Judith and Enax- Krumova, Elena and Hessler, Steffen and Berger, Maria and Tokic, Marianne", title="Detecting Artificial Intelligence--Generated Versus Human-Written Medical Student Essays: Semirandomized Controlled Study", journal="JMIR Med Educ", year="2025", month="Mar", day="3", volume="11", pages="e62779", keywords="artificial intelligence", keywords="ChatGPT", keywords="large language models", keywords="textual analysis", keywords="writing style", keywords="AI", keywords="chatbot", keywords="LLMs", keywords="detection", keywords="authorship", keywords="medical student", keywords="linguistic quality", keywords="decision-making", keywords="logical coherence", abstract="Background: Large language models, exemplified by ChatGPT, have reached a level of sophistication that makes distinguishing between human- and artificial intelligence (AI)--generated texts increasingly challenging. This has raised concerns in academia, particularly in medicine, where the accuracy and authenticity of written work are paramount. Objective: This semirandomized controlled study aims to examine the ability of 2 blinded expert groups with different levels of content familiarity---medical professionals and humanities scholars with expertise in textual analysis---to distinguish between longer scientific texts in German written by medical students and those generated by ChatGPT. Additionally, the study sought to analyze the reasoning behind their identification choices, particularly the role of content familiarity and linguistic features. Methods: Between May and August 2023, a total of 35 experts (medical: n=22; humanities: n=13) were each presented with 2 pairs of texts on different medical topics. Each pair had similar content and structure: 1 text was written by a medical student, and the other was generated by ChatGPT (version 3.5, March 2023). Experts were asked to identify the AI-generated text and justify their choice. These justifications were analyzed through a multistage, interdisciplinary qualitative analysis to identify relevant textual features. Before unblinding, experts rated each text on 6 characteristics: linguistic fluency and spelling/grammatical accuracy, scientific quality, logical coherence, expression of knowledge limitations, formulation of future research questions, and citation quality. Univariate tests and multivariate logistic regression analyses were used to examine associations between participants' characteristics, their stated reasons for author identification, and the likelihood of correctly determining a text's authorship. Results: Overall, in 48 out of 69 (70\%) decision rounds, participants accurately identified the AI-generated texts, with minimal difference between groups (medical: 31/43, 72\%; humanities: 17/26, 65\%; odds ratio [OR] 1.37, 95\% CI 0.5-3.9). While content errors had little impact on identification accuracy, stylistic features---particularly redundancy (OR 6.90, 95\% CI 1.01-47.1), repetition (OR 8.05, 95\% CI 1.25-51.7), and thread/coherence (OR 6.62, 95\% CI 1.25-35.2)---played a crucial role in participants' decisions to identify a text as AI-generated. Conclusions: The findings suggest that both medical and humanities experts were able to identify ChatGPT-generated texts in medical contexts, with their decisions largely based on linguistic attributes. The accuracy of identification appears to be independent of experts' familiarity with the text content. As the decision-making process primarily relies on linguistic attributes---such as stylistic features and text coherence---further quasi-experimental studies using texts from other academic disciplines should be conducted to determine whether instructions based on these features can enhance lecturers' ability to distinguish between student-authored and AI-generated work. ", doi="10.2196/62779", url="https://mededu.jmir.org/2025/1/e62779", url="http://www.ncbi.nlm.nih.gov/pubmed/40053752" } @Article{info:doi/10.2196/56254, author="Wu, Jia and Zeng, Youjia and Yang, Jun and Yao, Yutong and Xu, Xiuling and Song, Gaofeng and Yi, Wuyong and Wang, Taifen and Zheng, Yihou and Jia, Zhongwei and Yan, Xiangyu", title="Daily Treatment Monitoring for Patients Receiving Home-Based Peritoneal Dialysis and Prediction of Heart Failure Risk: mHealth Tool Development and Modeling Study", journal="JMIR Form Res", year="2025", month="Mar", day="3", volume="9", pages="e56254", keywords="peritoneal dialysis", keywords="mHealth", keywords="patient management", keywords="heart failure", keywords="prediction model", abstract="Background: Peritoneal dialysis is one of the major renal replacement modalities for patients with end-stage renal disease. Heart failure is a common adverse event among patients who undergo peritoneal dialysis treatment, especially for those who undergo continuous ambulatory peritoneal dialysis at home, because of the lack of professional input-output volume monitoring and management during treatment. Objective: This study aims to develop novel mobile health (mHealth) tools to improve the quality of home-based continuous ambulatory peritoneal dialysis treatment and to build a prediction model of heart failure based on the system's daily treatment monitoring data. Methods: The mHealth tools with a 4-layer system were designed and developed using Spring Boot, MyBatis Plus, MySQL, and Redis as backend technology stack, and Vue, Element User Interface, and WeChat Mini Program as front-end technology stack. Patients were recruited to use the tool during daily peritoneal dialysis treatment from January 1, 2017, to April 20, 2023. Logistic regression models based on real-time treatment monitoring data were used for heart failure prediction. The sensitivity, specificity, accuracy, and Youden index were calculated to evaluate the performance of the prediction model. In the sensitivity analysis, the ratio of patients with and without heart failure was set to 1:4 and 1:10, respectively, to better evaluate the stability of the prediction model. Results: A WeChat Mini Program named Futou Bao for patients and a patient data management platform for doctors was developed. Futou Bao included an intelligent data upload function module and an auxiliary function module. The doctor's data management platform consisted of 4 function modules, that is, patient management, data visualization and marking, data statistics, and system management. During the study period, the records of 6635 patients who received peritoneal dialysis treatment were uploaded in Futou Bao, with 0.71\% (47/6635) of them experiencing heart failure. The prediction model that included sex, age, and diastolic blood pressure was considered as the optimal model, wherein the sensitivity, specificity, accuracy, and Youden index were 0.75, 0.91, 0.89, and 0.66, respectively, with an area under the curve value of 0.879 (95\% CI 0.772-0.986) using the validation dataset. The sensitivity analysis showed stable results. Conclusions: This study provides a new home-based peritoneal dialysis management paradigm that enables the daily monitoring and early warning of heart failure risk. This novel paradigm is of great value for improving the efficiency, security, and personalization of peritoneal dialysis. ", doi="10.2196/56254", url="https://formative.jmir.org/2025/1/e56254", url="http://www.ncbi.nlm.nih.gov/pubmed/40053710" } @Article{info:doi/10.2196/68354, author="Huang, Pinjie and Yang, Jirong and Zhao, Dizhou and Ran, Taojia and Luo, Yuheng and Yang, Dong and Zheng, Xueqin and Zhou, Shaoli and Chen, Chaojin", title="Machine Learning--Based Prediction of Early Complications Following Surgery for Intestinal Obstruction: Multicenter Retrospective Study", journal="J Med Internet Res", year="2025", month="Mar", day="3", volume="27", pages="e68354", keywords="postoperative complications", keywords="intestinal obstruction", keywords="machine learning", keywords="early intervention", keywords="risk calculator", keywords="prediction model", keywords="Shapley additive explanations", abstract="Background: Early complications increase in-hospital stay and mortality after intestinal obstruction surgery. It is important to identify the risk of postoperative early complications for patients with intestinal obstruction at a sufficiently early stage, which would allow preemptive individualized enhanced therapy to be conducted to improve the prognosis of patients with intestinal obstruction. A risk predictive model based on machine learning is helpful for early diagnosis and timely intervention. Objective: This study aimed to construct an online risk calculator for early postoperative complications in patients after intestinal obstruction surgery based on machine learning algorithms. Methods: A total of 396 patients undergoing intestinal obstruction surgery from April 2013 to April 2021 at an independent medical center were enrolled as the training cohort. Overall, 7 machine learning methods were used to establish prediction models, with their performance appraised via the area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, and F1-score. The best model was validated through 2 independent medical centers, a publicly available perioperative dataset the Informative Surgical Patient dataset for Innovative Research Environment (INSPIRE), and a mixed cohort consisting of the above 3 datasets, involving 50, 66, 48, and 164 cases, respectively. Shapley Additive Explanations were measured to identify risk factors. Results: The incidence of postoperative complications in the training cohort was 47.44\% (176/371), while the incidences in 4 external validation cohorts were 34\% (17/50), 56.06\% (37/66), 52.08\% (25/48), and 48.17\% (79/164), respectively. Postoperative complications were associated with 8-item features: Physiological Severity Score for the Enumeration of Mortality and Morbidity (POSSUM physiological score), the amount of colloid infusion, shock index before anesthesia induction, ASA (American Society of Anesthesiologists) classification, the percentage of neutrophils, shock index at the end of surgery, age, and total protein. The random forest model showed the best overall performance, with an AUROC of 0.788 (95\% CI 0.709-0.869), accuracy of 0.756, sensitivity of 0.695, specificity of 0.810, and F1-score of 0.727 in the training cohort. The random forest model also achieved a comparable AUROC of 0.755 (95\% CI 0.652-0.839) in validation cohort 1, a greater AUROC of 0.817 (95\% CI 0.695-0.913) in validation cohort 2, a similar AUROC of 0.786 (95\% CI 0.628-0.902) in validation cohort 3, and the comparable AUROC of 0.720 (95\% CI 0.671-0.768) in validation cohort 4. We visualized the random forest model and created a web-based online risk calculator. Conclusions: We have developed and validated a generalizable random forest model to predict postoperative early complications in patients undergoing intestinal obstruction surgery, enabling clinicians to screen high-risk patients and implement early individualized interventions. An online risk calculator for early postoperative complications was developed to make the random forest model accessible to clinicians around the world. ", doi="10.2196/68354", url="https://www.jmir.org/2025/1/e68354", url="http://www.ncbi.nlm.nih.gov/pubmed/40053794" } @Article{info:doi/10.2196/64611, author="Byrom, Bill and Everhart, Anthony and Cordero, Paul and Garratt, Chris and Meyer, Tim", title="Leveraging Patient-Reported Outcome Measures for Optimal Dose Selection in Early Phase Cancer Trials", journal="JMIR Cancer", year="2025", month="Feb", day="28", volume="11", pages="e64611", keywords="clinical trials", keywords="early phase", keywords="dose finding", keywords="patient-reported outcome", keywords="PRO", keywords="electronic patient-reported outcome", keywords="ePRO", keywords="PRO-CTCAE", keywords="adverse events", keywords="tolerability", keywords="optimal dose", keywords="cancer trials", keywords="dose toxicity", keywords="oncology", keywords="drug development", keywords="electronic collection", keywords="dose level", keywords="pharmacodynamic", keywords="cytotoxic chemotherapy drugs", keywords="cytotoxic", keywords="chemotherapy drug", keywords="life-threatening disease", keywords="Common Terminology Criteria for Adverse Events", doi="10.2196/64611", url="https://cancer.jmir.org/2025/1/e64611" } @Article{info:doi/10.2196/67576, author="Lu, Shao-Chi and Chen, Guang-Yuan and Liu, An-Sheng and Sun, Jen-Tang and Gao, Jun-Wan and Huang, Chien-Hua and Tsai, Chu-Lin and Fu, Li-Chen", title="Deep Learning--Based Electrocardiogram Model (EIANet) to Predict Emergency Department Cardiac Arrest: Development and External Validation Study", journal="J Med Internet Res", year="2025", month="Feb", day="28", volume="27", pages="e67576", keywords="cardiac arrest", keywords="emergency department", keywords="deep learning", keywords="computer vision", keywords="electrocardiogram", abstract="Background: In-hospital cardiac arrest (IHCA) is a severe and sudden medical emergency that is characterized by the abrupt cessation of circulatory function, leading to death or irreversible organ damage if not addressed immediately. Emergency department (ED)--based IHCA (EDCA) accounts for 10\% to 20\% of all IHCA cases. Early detection of EDCA is crucial, yet identifying subtle signs of cardiac deterioration is challenging. Traditional EDCA prediction methods primarily rely on structured vital signs or electrocardiogram (ECG) signals, which require additional preprocessing or specialized devices. This study introduces a novel approach using image-based 12-lead ECG data obtained at ED triage, leveraging the inherent richness of visual ECG patterns to enhance prediction and integration into clinical workflows. Objective: This study aims to address the challenge of early detection of EDCA by developing an innovative deep learning model, the ECG-Image-Aware Network (EIANet), which uses 12-lead ECG images for early prediction of EDCA. By focusing on readily available triage ECG images, this research seeks to create a practical and accessible solution that seamlessly integrates into real-world ED workflows. Methods: For adult patients with EDCA (cases), 12-lead ECG images at ED triage were obtained from 2 independent data sets: National Taiwan University Hospital (NTUH) and Far Eastern Memorial Hospital (FEMH). Control ECGs were randomly selected from adult ED patients without cardiac arrest during the same study period. In EIANet, ECG images were first converted to binary form, followed by noise reduction, connected component analysis, and morphological opening. A spatial attention module was incorporated into the ResNet50 architecture to enhance feature extraction, and a custom binary recall loss (BRLoss) was used to balance precision and recall, addressing slight data set imbalance. The model was developed and internally validated on the NTUH-ECG data set and was externally validated on an independent FEMH-ECG data set. The model performance was evaluated using the F1-score, area under the receiver operating characteristic curve (AUROC), and area under the precision-recall curve (AUPRC). Results: There were 571 case ECGs and 826 control ECGs in the NTUH data set and 378 case ECGs and 713 control ECGs in the FEMH data set. The novel EIANet model achieved an F1-score of 0.805, AUROC of 0.896, and AUPRC of 0.842 on the NTUH-ECG data set with a 40\% positive sample ratio. It achieved an F1-score of 0.650, AUROC of 0.803, and AUPRC of 0.678 on the FEMH-ECG data set with a 34.6\% positive sample ratio. The feature map showed that the region of interest in the ECG was the ST segment. Conclusions: EIANet demonstrates promising potential for accurately predicting EDCA using triage ECG images, offering an effective solution for early detection of high-risk cases in emergency settings. This approach may enhance the ability of health care professionals to make timely decisions, with the potential to improve patient outcomes by enabling earlier interventions for EDCA. ", doi="10.2196/67576", url="https://www.jmir.org/2025/1/e67576", url="http://www.ncbi.nlm.nih.gov/pubmed/40053733" } @Article{info:doi/10.2196/53892, author="Cabral, Pereira Bernardo and Braga, Maciel Luiza Amara and Conte Filho, Gilbert Carlos and Penteado, Bruno and Freire de Castro Silva, Luis Sandro and Castro, Leonardo and Fornazin, Marcelo and Mota, Fabio", title="Future Use of AI in Diagnostic Medicine: 2-Wave Cross-Sectional Survey Study", journal="J Med Internet Res", year="2025", month="Feb", day="27", volume="27", pages="e53892", keywords="artificial intelligence", keywords="AI", keywords="diagnostic medicine", keywords="survey research", keywords="researcher opinion", keywords="future", abstract="Background: The rapid evolution of artificial intelligence (AI) presents transformative potential for diagnostic medicine, offering opportunities to enhance diagnostic accuracy, reduce costs, and improve patient outcomes. Objective: This study aimed to assess the expected future impact of AI on diagnostic medicine by comparing global researchers' expectations using 2 cross-sectional surveys. Methods: The surveys were conducted in September 2020 and February 2023. Each survey captured a 10-year projection horizon, gathering insights from >3700 researchers with expertise in AI and diagnostic medicine from all over the world. The survey sought to understand the perceived benefits, integration challenges, and evolving attitudes toward AI use in diagnostic settings. Results: Results indicated a strong expectation among researchers that AI will substantially influence diagnostic medicine within the next decade. Key anticipated benefits include enhanced diagnostic reliability, reduced screening costs, improved patient care, and decreased physician workload, addressing the growing demand for diagnostic services outpacing the supply of medical professionals. Specifically, x-ray diagnosis, heart rhythm interpretation, and skin malignancy detection were identified as the diagnostic tools most likely to be integrated with AI technologies due to their maturity and existing AI applications. The surveys highlighted the growing optimism regarding AI's ability to transform traditional diagnostic pathways and enhance clinical decision-making processes. Furthermore, the study identified barriers to the integration of AI in diagnostic medicine. The primary challenges cited were the difficulties of embedding AI within existing clinical workflows, ethical and regulatory concerns, and data privacy issues. Respondents emphasized uncertainties around legal responsibility and accountability for AI-supported clinical decisions, data protection challenges, and the need for robust regulatory frameworks to ensure safe AI deployment. Ethical concerns, particularly those related to algorithmic transparency and bias, were noted as increasingly critical, reflecting a heightened awareness of the potential risks associated with AI adoption in clinical settings. Differences between the 2 survey waves indicated a growing focus on ethical and regulatory issues, suggesting an evolving recognition of these challenges over time. Conclusions: Despite these barriers, there was notable consistency in researchers' expectations across the 2 survey periods, indicating a stable and sustained outlook on AI's transformative potential in diagnostic medicine. The findings show the need for interdisciplinary collaboration among clinicians, AI developers, and regulators to address ethical and practical challenges while maximizing AI's benefits. This study offers insights into the projected trajectory of AI in diagnostic medicine, guiding stakeholders, including health care providers, policy makers, and technology developers, on navigating the opportunities and challenges of AI integration. ", doi="10.2196/53892", url="https://www.jmir.org/2025/1/e53892", url="http://www.ncbi.nlm.nih.gov/pubmed/40053779" } @Article{info:doi/10.2196/63601, author="Dai, Pei-Yu and Lin, Pei-Yi and Sheu, Ruey-Kai and Liu, Shu-Fang and Wu, Yu-Cheng and Wu, Chieh-Liang and Chen, Wei-Lin and Huang, Chien-Chung and Lin, Guan-Yin and Chen, Lun-Chi", title="Predicting Agitation-Sedation Levels in Intensive Care Unit Patients: Development of an Ensemble Model", journal="JMIR Med Inform", year="2025", month="Feb", day="26", volume="13", pages="e63601", keywords="intensive care units", keywords="ICU", keywords="agitation", keywords="sedation", keywords="ensemble learning", keywords="machine learning", keywords="ML", keywords="artificial intelligence", keywords="AI", keywords="patient safety", keywords="efficiency", keywords="automation", keywords="ICU care", keywords="ensemble model", keywords="learning model", keywords="explanatory analysis", abstract="Background: Agitation and sedation management is critical in intensive care as it affects patient safety. Traditional nursing assessments suffer from low frequency and subjectivity. Automating these assessments can boost intensive care unit (ICU) efficiency, treatment capacity, and patient safety. Objectives: The aim of this study was to develop a machine-learning based assessment of agitation and sedation. Methods: Using data from the Taichung Veterans General Hospital ICU database (2020), an ensemble learning model was developed for classifying the levels of agitation and sedation. Different ensemble learning model sequences were compared. In addition, an interpretable artificial intelligence approach, SHAP (Shapley additive explanations), was employed for explanatory analysis. Results: With 20 features and 121,303 data points, the random forest model achieved high area under the curve values across all models (sedation classification: 0.97; agitation classification: 0.88). The ensemble learning model enhanced agitation sensitivity (0.82) while maintaining high AUC values across all categories (all >0.82). The model explanations aligned with clinical experience. Conclusions: This study proposes an ICU agitation-sedation assessment automation using machine learning, enhancing efficiency and safety. Ensemble learning improves agitation sensitivity while maintaining accuracy. Real-time monitoring and future digital integration have the potential for advancements in intensive care. ", doi="10.2196/63601", url="https://medinform.jmir.org/2025/1/e63601" } @Article{info:doi/10.2196/67010, author="Song, Xiaowei and Wang, Jiayi and He, Feifei and Yin, Wei and Ma, Weizhi and Wu, Jian", title="Stroke Diagnosis and Prediction Tool Using ChatGLM: Development and Validation Study", journal="J Med Internet Res", year="2025", month="Feb", day="26", volume="27", pages="e67010", keywords="stroke", keywords="diagnosis", keywords="large language model", keywords="ChatGLM", keywords="generative language model", keywords="primary care", keywords="acute stroke", keywords="prediction tool", keywords="stroke detection", keywords="treatment", keywords="electronic health records", keywords="noncontrast computed tomography", abstract="Background: Stroke is a globally prevalent disease that imposes a significant burden on health care systems and national economies. Accurate and rapid stroke diagnosis can substantially increase reperfusion rates, mitigate disability, and reduce mortality. However, there are considerable discrepancies in the diagnosis and treatment of acute stroke. Objective: The aim of this study is to develop and validate a stroke diagnosis and prediction tool using ChatGLM-6B, which uses free-text information from electronic health records in conjunction with noncontrast computed tomography (NCCT) reports to enhance stroke detection and treatment. Methods: A large language model (LLM) using ChatGLM-6B was proposed to facilitate stroke diagnosis by identifying optimal input combinations, using external tools, and applying instruction tuning and low-rank adaptation (LoRA) techniques. A dataset containing details of 1885 patients with and those without stroke from 2016 to 2024 was used for training and internal validation; another 335 patients from two hospitals were used as an external test set, including 230 patients from the training hospital but admitted at different periods, and 105 patients from another hospital. Results: The LLM, which is based on clinical notes and NCCT, demonstrates exceptionally high accuracy in stroke diagnosis, achieving 99\% in the internal validation dataset and 95.5\% and 79.1\% in two external test cohorts. It effectively distinguishes between ischemia and hemorrhage, with an accuracy of 100\% in the validation dataset and 99.1\% and 97.1\% in the other test cohorts. In addition, it identifies large vessel occlusions (LVO) with an accuracy of 80\% in the validation dataset and 88.6\% and 83.3\% in the other test cohorts. Furthermore, it screens patients eligible for intravenous thrombolysis (IVT) with an accuracy of 89.4\% in the validation dataset and 60\% and 80\% in the other test cohorts. Conclusions: We developed an LLM that leverages clinical text and NCCT to identify strokes and guide recanalization therapy. While our results necessitate validation through widespread deployment, they hold the potential to enhance stroke identification and reduce reperfusion time. ", doi="10.2196/67010", url="https://www.jmir.org/2025/1/e67010" } @Article{info:doi/10.2196/55492, author="Campagner, Andrea and Agnello, Luisa and Carobene, Anna and Padoan, Andrea and Del Ben, Fabio and Locatelli, Massimo and Plebani, Mario and Ognibene, Agostino and Lorubbio, Maria and De Vecchi, Elena and Cortegiani, Andrea and Piva, Elisa and Poz, Donatella and Curcio, Francesco and Cabitza, Federico and Ciaccio, Marcello", title="Complete Blood Count and Monocyte Distribution Width--Based Machine Learning Algorithms for Sepsis Detection: Multicentric Development and External Validation Study", journal="J Med Internet Res", year="2025", month="Feb", day="26", volume="27", pages="e55492", keywords="sepsis", keywords="medical machine learning", keywords="external validation", keywords="complete blood count", keywords="controllable AI", keywords="machine learning", keywords="artificial intelligence", keywords="development study", keywords="validation study", keywords="organ", keywords="organ dysfunction", keywords="detection", keywords="clinical signs", keywords="clinical symptoms", keywords="biomarker", keywords="diagnostic", keywords="machine learning model", keywords="sepsis detection", keywords="early detection", keywords="data distribution", abstract="Background: Sepsis is an organ dysfunction caused by a dysregulated host response to infection. Early detection is fundamental to improving the patient outcome. Laboratory medicine can play a crucial role by providing biomarkers whose alteration can be detected before the onset of clinical signs and symptoms. In particular, the relevance of monocyte distribution width (MDW) as a sepsis biomarker has emerged in the previous decade. However, despite encouraging results, MDW has poor sensitivity and positive predictive value when compared to other biomarkers. Objective: This study aims to investigate the use of machine learning (ML) to overcome the limitations mentioned earlier by combining different parameters and therefore improving sepsis detection. However, making ML models function in clinical practice may be problematic, as their performance may suffer when deployed in contexts other than the research environment. In fact, even widely used commercially available models have been demonstrated to generalize poorly in out-of-distribution scenarios. Methods: In this multicentric study, we developed ML models whose intended use is the early detection of sepsis on the basis of MDW and complete blood count parameters. In total, data from 6 patient cohorts (encompassing 5344 patients) collected at 5 different Italian hospitals were used to train and externally validate ML models. The models were trained on a patient cohort encompassing patients enrolled at the emergency department, and it was externally validated on 5 different cohorts encompassing patients enrolled at both the emergency department and the intensive care unit. The cohorts were selected to exhibit a variety of data distribution shifts compared to the training set, including label, covariate, and missing data shifts, enabling a conservative validation of the developed models. To improve generalizability and robustness to different types of distribution shifts, the developed ML models combine traditional methodologies with advanced techniques inspired by controllable artificial intelligence (AI), namely cautious classification, which gives the ML models the ability to abstain from making predictions, and explainable AI, which provides health operators with useful information about the models' functioning. Results: The developed models achieved good performance on the internal validation (area under the receiver operating characteristic curve between 0.91 and 0.98), as well as consistent generalization performance across the external validation datasets (area under the receiver operating characteristic curve between 0.75 and 0.95), outperforming baseline biomarkers and state-of-the-art ML models for sepsis detection. Controllable AI techniques were further able to improve performance and were used to derive an interpretable set of diagnostic rules. Conclusions: Our findings demonstrate how controllable AI approaches based on complete blood count and MDW may be used for the early detection of sepsis while also demonstrating how the proposed methodology can be used to develop ML models that are more resistant to different types of data distribution shifts. ", doi="10.2196/55492", url="https://www.jmir.org/2025/1/e55492" } @Article{info:doi/10.2196/52358, author="Ejaz, Hamza and Tsui, Keith Hon Lung and Patel, Mehul and Ulloa Paredes, Rafael Luis and Knights, Ellen and Aftab, Bakht Shah and Subbe, Peter Christian", title="Comparison of a Novel Machine Learning--Based Clinical Query Platform With Traditional Guideline Searches for Hospital Emergencies: Prospective Pilot Study of User Experience and Time Efficiency", journal="JMIR Hum Factors", year="2025", month="Feb", day="25", volume="12", pages="e52358", keywords="artificial intelligence", keywords="machine learning", keywords="information search", keywords="emergency care", keywords="developing", keywords="testing", keywords="information retrieval", keywords="hospital care", keywords="training", keywords="clinical practice", keywords="clinical experience", keywords="user satisfaction", keywords="clinical impact", keywords="user group", keywords="users", keywords="study design", keywords="mobile phone", abstract="Background: Emergency and acute medicine doctors require easily accessible evidence-based information to safely manage a wide range of clinical presentations. The inability to find evidence-based local guidelines on the trust's intranet leads to information retrieval from the World Wide Web. Artificial intelligence (AI) has the potential to make evidence-based information retrieval faster and easier. Objective: The aim of the study is to conduct a time-motion analysis, comparing cohorts of junior doctors using (1) an AI-supported search engine versus (2) the traditional hospital intranet. The study also aims to examine the impact of the AI-supported search engine on the duration of searches and workflow when seeking answers to clinical queries at the point of care. Methods: This pre- and postobservational study was conducted in 2 phases. In the first phase, clinical information searches by 10 doctors caring for acutely unwell patients in acute medicine were observed during 10 working days. Based on these findings and input from a focus group of 14 clinicians, an AI-supported, context-sensitive search engine was implemented. In the second phase, clinical practice was observed for 10 doctors for an additional 10 working days using the new search engine. Results: The hospital intranet group (n=10) had a median of 23 months of clinical experience, while the AI-supported search engine group (n=10) had a median of 54 months. Participants using the AI-supported engine conducted fewer searches. User satisfaction and query resolution rates were similar between the 2 phases. Searches with the AI-supported engine took 43 seconds longer on average. Clinicians rated the new app with a favorable Net Promoter Score of 20. Conclusions: We report a successful feasibility pilot of an AI-driven search engine for clinical guidelines. Further development of the engine including the incorporation of large language models might improve accuracy and speed. More research is required to establish clinical impact in different user groups. Focusing on new staff at beginning of their post might be the most suitable study design. ", doi="10.2196/52358", url="https://humanfactors.jmir.org/2025/1/e52358" } @Article{info:doi/10.2196/63763, author="Duguay, V{\'e}ronique and Comeau, Dominique and Turgeon, Tiffany and Bouhamdani, Nadia and Belanger, Mathieu and Weston, Lyle and Johnson, Tammy and Manzer, Nicole and Giberson, Melissa and Chamard-Witkowski, Ludivine", title="Evaluating the Knowledge and Information-Seeking Behaviors of People Living With Multiple Sclerosis: Cross-Sectional Questionnaire Study", journal="J Med Internet Res", year="2025", month="Feb", day="25", volume="27", pages="e63763", keywords="multiple sclerosis", keywords="chronic illness", keywords="misinformation", keywords="web-based searches", keywords="education", keywords="health information", keywords="social media", keywords="health literacy", keywords="patient-doctor relationship", keywords="health-related information", keywords="information-seeking behavior", abstract="Background: The internet has emerged as a primary source of health-related information for people living with multiple sclerosis (MS). However, given the abundance of misinformation found on the web, this behavior may pose a significant threat to internet users. Objective: This study aims to explore the knowledge and information-seeking behavior of people living with MS followed at a specialized MS clinic where education is a cornerstone of care. Methods: This cross-sectional survey--based study comprised 20 true or false statements, covering both scientific facts and popular misinformation about MS treatments. A ``scientific fact score'' and a ``misinformation score'' were calculated by attributing a scoring system to each point in the survey: +1 point was attributed to correct answers, --1 point was attributed to incorrect answers, and 0 point was attributed to ``I don't know.'' Furthermore, the survey inquired about participants' health-seeking behaviors. Results: The mean age of the 69 participants was 48.4 (SD 10.9) years, 78\% (54/69) were female, 81\% (56/69) were highly educated, 90\% (62/69) were receiving a disease-modifying therapy, and 52\% (30/58) had experimented with alternative therapies. The mean score for answering the scientific and misinformation questions correctly was 69\% (SD 2.4\%) and 22\% (SD 4.5\%), respectively (P<.001). Notably, when questioned about misinformation, answering correctly dropped significantly (P<.001), while indecision (P<.001) and answering incorrectly (P=.02) increased. Sociodemographic factors and medical questions were not significantly associated with scientific and misinformation scores (all P>.05); however, misinformation scores did significantly correlate with levels of education (P=.04). The main sources of health-related information were from expert-led MS websites (48/58, 82\%) and health care professionals (34/58, 59\%). Low-reliability sources were less used; however, word of mouth seemed to be prevalent (14/58, 24\%), followed by Facebook (10/58, 17\%). On average, people with MS reported having consulted 3 high- to moderate-quality sources and only 1 low-quality source. Conclusions: Education at the clinic and consulting primarily moderate- to high-quality sources did not safeguard against misinformation, indicating a need for more misinformation-geared education at the clinic. Notably, there is a need to proactively educate patients about misinformation commonly found on the web, and more importantly, create space for them to discuss the information without prejudice. As novel educational methods may be relatively more time-consuming, implementing change may be challenging. Furthermore, age, sex, education level, and health literacy might not safeguard against misinformation. Herein, we were unable to identify correlations associated with scores obtained on the questionnaire other than educational level. Although the educational level did seem to impact the misinformation score, this did not stop participants from experimenting with alternative therapies. Although studies are exploring novel ways to effectively deal with health misinformation on the web, more research is needed to fully understand this highly complex social phenomenon. ", doi="10.2196/63763", url="https://www.jmir.org/2025/1/e63763", url="http://www.ncbi.nlm.nih.gov/pubmed/39998866" } @Article{info:doi/10.2196/65565, author="Owoyemi, Ayomide and Osuchukwu, Joanne and Salwei, E. Megan and Boyd, Andrew", title="Checklist Approach to Developing and Implementing AI in Clinical Settings: Instrument Development Study", journal="JMIRx Med", year="2025", month="Feb", day="20", volume="6", pages="e65565", keywords="artificial intelligence", keywords="machine learning", keywords="algorithm", keywords="model", keywords="analytics", keywords="AI deployment", keywords="human-AI interaction", keywords="AI integration", keywords="checklist", keywords="clinical workflow", keywords="clinical setting", keywords="literature review", abstract="Background: The integration of artificial intelligence (AI) in health care settings demands a nuanced approach that considers both technical performance and sociotechnical factors. Objective: This study aimed to develop a checklist that addresses the sociotechnical aspects of AI deployment in health care and provides a structured, holistic guide for teams involved in the life cycle of AI systems. Methods: A literature synthesis identified 20 relevant studies, forming the foundation for the Clinical AI Sociotechnical Framework checklist. A modified Delphi study was then conducted with 35 global health care professionals. Participants assessed the checklist's relevance across 4 stages: ``Planning,'' ``Design,'' ``Development,'' and ``Proposed Implementation.'' A consensus threshold of 80\% was established for each item. IQRs and Cronbach $\alpha$ were calculated to assess agreement and reliability. Results: The initial checklist had 45 questions. Following participant feedback, the checklist was refined to 34 items, and a final round saw 100\% consensus on all items (mean score >0.8, IQR 0). Based on the outcome of the Delphi study, a final checklist was outlined, with 1 more question added to make 35 questions in total. Conclusions: The Clinical AI Sociotechnical Framework checklist provides a comprehensive, structured approach to developing and implementing AI in clinical settings, addressing technical and social factors critical for adoption and success. This checklist is a practical tool that aligns AI development with real-world clinical needs, aiming to enhance patient outcomes and integrate smoothly into health care workflows. ", doi="10.2196/65565", url="https://xmed.jmir.org/2025/1/e65565" } @Article{info:doi/10.2196/69544, author="Ringeval, Micka{\"e}l and Etindele Sosso, Armel Faustin and Cousineau, Martin and Par{\'e}, Guy", title="Advancing Health Care With Digital Twins: Meta-Review of Applications and Implementation Challenges", journal="J Med Internet Res", year="2025", month="Feb", day="19", volume="27", pages="e69544", keywords="digital twins", keywords="meta-review", keywords="health IT", keywords="applications", keywords="challenges", keywords="healthcare innovation", keywords="personalized medicine", keywords="operational efficiency", abstract="Background: Digital twins (DTs) are digital representations of real-world systems, enabling advanced simulations, predictive modeling, and real-time optimization in various fields, including health care. Despite growing interest, the integration of DTs in health care faces challenges such as fragmented applications, ethical concerns, and barriers to adoption. Objective: This study systematically reviews the existing literature on DT applications in health care with three objectives: (1) to map primary applications, (2) to identify key challenges and limitations, and (3) to highlight gaps that can guide future research. Methods: A meta-review was conducted in a systematic fashion, adhering to PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, and included 25 literature reviews published between 2021 and 2024. The search encompassed 5 databases: PubMed, CINAHL, Web of Science, Embase, and PsycINFO. Thematic synthesis was used to categorize DT applications, stakeholders, and barriers to adoption. Results: A total of 3 primary DT applications in health care were identified: personalized medicine, operational efficiency, and medical research. While current applications, such as predictive diagnostics, patient-specific treatment simulations, and hospital resource optimization, remain in their early stages of development, they highlight the significant potential of DTs. Challenges include data quality, ethical issues, and socioeconomic barriers. This review also identified gaps in scalability, interoperability, and clinical validation. Conclusions: DTs hold transformative potential in health care, providing individualized care, operational optimization, and accelerated research. However, their adoption is hindered by technical, ethical, and financial barriers. Addressing these issues requires interdisciplinary collaboration, standardized protocols, and inclusive implementation strategies to ensure equitable access and meaningful impact. ", doi="10.2196/69544", url="https://www.jmir.org/2025/1/e69544", url="http://www.ncbi.nlm.nih.gov/pubmed/39969978" } @Article{info:doi/10.2196/65473, author="Johansson, I. Birgitta and Landahl, Jonas and Tammelin, Karin and Aerts, Erik and Lundberg, E. Christina and Adiels, Martin and Lindgren, Martin and Rosengren, Annika and Papachrysos, Nikolaos and Filipsson Nystr{\"o}m, Helena and Sj{\"o}land, Helen", title="Automated Process for Monitoring of Amiodarone Treatment: Development and Evaluation", journal="J Med Internet Res", year="2025", month="Feb", day="19", volume="27", pages="e65473", keywords="thyroid function", keywords="robotics", keywords="follow-up studies", keywords="disease management", keywords="decision support", keywords="automated process", keywords="monitoring", keywords="amiodarone treatment", keywords="anti-arrhythmic medication", keywords="anti-arrhythmic", keywords="development", keywords="evaluation", keywords="thyroid", keywords="liver", keywords="side effects", keywords="cardiac dysrhythmias", keywords="ventricular tachycardia", keywords="ventricular fibrillation", keywords="arrhythmia", keywords="automation", keywords="robot", keywords="algorithm", keywords="clinical decision support system", keywords="thyroid gland", keywords="heart", keywords="atrial fibrillation", abstract="Background: Amiodarone treatment requires repeated laboratory evaluations of thyroid and liver function due to potential side effects. Robotic process automation uses software robots to automate repetitive and routine tasks, and their use may be extended to clinical settings. Objective: Thus, this study aimed to develop a robot using a diagnostic classification algorithm to automate repetitive laboratory evaluations for amiodarone follow-up. Methods: We designed a robot and clinical decision support system based on expert clinical advice and current best practices in thyroid and liver disease management. The robot provided recommendations on the time interval to follow-up laboratory testing and management suggestions, while the final decision rested with a physician, acting as a human-in-the-loop. The performance of the robot was compared to the existing real-world manual follow-up routine for amiodarone treatment. Results: Following iterative technical improvements, a robot prototype was validated against physician orders (n=390 paired orders). The robot recommended a mean follow-up time interval of 4.5 (SD 2.4) months compared to the 3.1 (SD 1.4) months ordered by physicians (P<.001). For normal laboratory values, the robot recommended a 6-month follow-up in 281 (72.1\%) of cases, whereas physicians did so in only 38 (9.7\%) of cases, favoring a 3- to 4-month follow-up (n=227, 58.2\%). All patients diagnosed with new side effects (n=12) were correctly detected by the robot, whereas only 8 were by the physician. Conclusions: An automated process, using a software robot and a diagnostic classification algorithm, is a technically and medically reliable alternative for amiodarone follow-up. It may reduce manual labor, decrease the frequency of laboratory testing, and improve the detection of side effects, thereby reducing costs and enhancing patient value. ", doi="10.2196/65473", url="https://www.jmir.org/2025/1/e65473" } @Article{info:doi/10.2196/55316, author="Dauber-Decker, L. Katherine and Feldstein, David and Hess, Rachel and Mann, Devin and Kim, Ji Eun and Gautam-Goyal, Pranisha and Solomon, Jeffrey and Khan, Sundas and Malik, Fatima and Xu, Lynn and Huffman, Ainsley and Smith, D. Paul and Halm, Wendy and Yuroff, Alice and Richardson, Safiya", title="Snowball Group Usability Testing for Rapid and Iterative Multisite Tool Development: Method Development Study", journal="JMIR Form Res", year="2025", month="Feb", day="18", volume="9", pages="e55316", keywords="clinical decision support", keywords="CDS", keywords="decision aid", keywords="clinical aid", keywords="cough", keywords="sore throat", keywords="strep pharyngitis", keywords="snowball group usability testing", keywords="snowball group", keywords="usability testing", abstract="Background: Usability testing is valuable for assessing a new tool or system's usefulness and ease-of-use. Several established methods of usability testing exist, including think-aloud testing. Although usability testing has been shown to be crucial for successful clinical decision support (CDS) tool development, it is often difficult to conduct across multisite development projects due to its time- and labor-intensiveness, cost, and the skills required to conduct the testing. Objective: Our objective was to develop a new method of usability testing that would enable efficient acquisition and dissemination of results among multiple sites. We sought to address the existing barriers to successfully completing usability testing during CDS tool development. Methods: We combined individual think-aloud testing and focus groups into one session and performed sessions serially across 4 sites (snowball group usability testing) to assess the usability of two CDS tools designed for use by nurses in primary and urgent care settings. We recorded each session and took notes in a standardized format. Each site shared feedback from their individual sessions with the other sites in the study so that they could incorporate that feedback into their tools prior to their own testing sessions. Results: The group testing and snowballing components of our new usability testing method proved to be highly beneficial. We identified 3 main benefits of snowball group usability testing. First, by interviewing several participants in a single session rather than individuals over the course of weeks, each site was able to quickly obtain their usability feedback. Second, combining the individualized think-aloud component with a focus group component in the same session helped study teams to more easily notice similarities in feedback among participants and to discuss and act upon suggestions efficiently. Third, conducting usability testing in series across sites allowed study teams to incorporate feedback based on previous sites' sessions prior to conducting their own testing. Conclusions: Snowball group usability testing provides an efficient method of obtaining multisite feedback on newly developed tools and systems, while addressing barriers typically associated with traditional usability testing methods. This method can be applied to test a wide variety of tools, including CDS tools, prior to launch so that they can be efficiently optimized. Trial Registration: Clinicaltrials.gov NCT04255303; https://clinicaltrials.gov/study/NCT04255303 ", doi="10.2196/55316", url="https://formative.jmir.org/2025/1/e55316" } @Article{info:doi/10.2196/62851, author="Fu, Yao and Huang, Zongyao and Deng, Xudong and Xu, Linna and Liu, Yang and Zhang, Mingxing and Liu, Jinyi and Huang, Bin", title="Artificial Intelligence in Lymphoma Histopathology: Systematic Review", journal="J Med Internet Res", year="2025", month="Feb", day="14", volume="27", pages="e62851", keywords="lymphoma", keywords="artificial intelligence", keywords="bias", keywords="histopathology", keywords="tumor", keywords="hematological", keywords="lymphatic disease", keywords="public health", keywords="pathologists", keywords="pathology", keywords="immunohistochemistry", keywords="diagnosis", keywords="prognosis", abstract="Background: Artificial intelligence (AI) shows considerable promise in the areas of lymphoma diagnosis, prognosis, and gene prediction. However, a comprehensive assessment of potential biases and the clinical utility of AI models is still needed. Objective: Our goal was to evaluate the biases of published studies using AI models for lymphoma histopathology and assess the clinical utility of comprehensive AI models for diagnosis or prognosis. Methods: This study adhered to the Systematic Review Reporting Standards. A comprehensive literature search was conducted across PubMed, Cochrane Library, and Web of Science from their inception until August 30, 2024. The search criteria included the use of AI for prognosis involving human lymphoma tissue pathology images, diagnosis, gene mutation prediction, etc. The risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Information for each AI model was systematically tabulated, and summary statistics were reported. The study is registered with PROSPERO (CRD42024537394) and follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 reporting guidelines. Results: The search identified 3565 records, with 41 articles ultimately meeting the inclusion criteria. A total of 41 AI models were included in the analysis, comprising 17 diagnostic models, 10 prognostic models, 2 models for detecting ectopic gene expression, and 12 additional models related to diagnosis. All studies exhibited a high or unclear risk of bias, primarily due to limited analysis and incomplete reporting of participant recruitment. Most high-risk models (10/41) predominantly assigned high-risk classifications to participants. Almost all the articles presented an unclear risk of bias in at least one domain, with the most frequent being participant selection (16/41) and statistical analysis (37/41). The primary reasons for this were insufficient analysis of participant recruitment and a lack of interpretability in outcome analyses. In the diagnostic models, the most frequently studied lymphoma subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and mantle cell lymphoma, while in the prognostic models, the most common subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and Hodgkin lymphoma. In the internal validation results of all models, the area under the receiver operating characteristic curve (AUC) ranged from 0.75 to 0.99 and accuracy ranged from 68.3\% to 100\%. In models with external validation results, the AUC ranged from 0.93 to 0.99. Conclusions: From a methodological perspective, all models exhibited biases. The enhancement of the accuracy of AI models and the acceleration of their clinical translation hinge on several critical aspects. These include the comprehensive reporting of data sources, the diversity of datasets, the study design, the transparency and interpretability of AI models, the use of cross-validation and external validation, and adherence to regulatory guidance and standardized processes in the field of medical AI. ", doi="10.2196/62851", url="https://www.jmir.org/2025/1/e62851" } @Article{info:doi/10.2196/67921, author="Colwell, Rebecca and Gullickson, Mitchell and Cutlan, Jonathan and Stratman, Erik", title="Cutaneous Atrophy Following Corticosteroid Injections for Tendonitis: Report of Two Cases", journal="JMIR Dermatol", year="2025", month="Feb", day="13", volume="8", pages="e67921", keywords="lipoatrophy", keywords="cutaneous atrophy", keywords="corticosteroid", keywords="adverse effects", keywords="tendonitis", keywords="musculoskeletal", doi="10.2196/67921", url="https://derma.jmir.org/2025/1/e67921" } @Article{info:doi/10.2196/62706, author="Blotenberg, Iris and Boekholt, Melanie and Lieberknecht, Nils and S{\"a}ring, Paula and Thyrian, Ren{\'e} Jochen", title="Acceptance of Unsupervised App-Based Cognitive Assessment in Outpatient Care: An Implementation Study", journal="JMIR Form Res", year="2025", month="Feb", day="13", volume="9", pages="e62706", keywords="mild cognitive impairment", keywords="Alzheimer disease", keywords="dementia", keywords="cognition", keywords="computerized assessment", keywords="digital assessment", keywords="digital cognitive biomarkers", keywords="home-based assessment", keywords="digital platform", keywords="mobile phone", abstract="Background: The use of unsupervised digital cognitive assessments provides considerable opportunities for early and comprehensive testing for Alzheimer disease, minimizing the demand on time and personnel resources in medical practices. However, the acceptance within health care has yet to be assessed. Objective: In this implementation study, the acceptance of an app-based, repeated cognitive assessment for early symptoms of Alzheimer disease in the outpatient care setting from both physicians' and patients' perspectives was examined. Methods: In total, 15 primary care practices participated, where patients with self- or relative-reported memory problems could be prescribed an app (neotivCare app [neotiv GmbH]) for comprehensive cognitive testing. Patients used the app to test their episodic memory function weekly for 12 weeks at home. After the testing period and the final consultation, physicians and patients received questionnaires to assess the app's acceptance. Results: We received completed questionnaires from physicians for 45 patients. In addition, we received 45 completed questionnaires from the patients themselves. The physicians reported that, for most patients, the app supported their decision-making in the diagnostic process (26/45, 58\%). In addition, most physicians found the app's information dependable (34/45, 76\%) and felt more certain in their decisions (38/45, 84\%). From the patients' perspective, a majority felt thoroughly tested (34/45, 76\%), and only a few considered the time commitment for the cognitive tests to be too burdensome (7/45, 16\%). Furthermore, despite the weekly cognitive testing and the lengthy 12-week testing period, a majority of patients participated in all tests (39/54, 72\%). Conclusions: Our results indicate a high level of acceptance by physicians and patients, suggesting significant potential for the implementation of unsupervised digital cognitive assessments into routine health care. In the future, acceptance should be assessed in large-scale studies, with a particular focus on the impact on health care delivery and patient outcomes. ", doi="10.2196/62706", url="https://formative.jmir.org/2025/1/e62706" } @Article{info:doi/10.2196/65923, author="She, Jou Wan and Siriaraya, Panote and Iwakoshi, Hibiki and Kuwahara, Noriaki and Senoo, Keitaro", title="An Explainable AI Application (AF'fective) to Support Monitoring of Patients With Atrial Fibrillation After Catheter Ablation: Qualitative Focus Group, Design Session, and Interview Study", journal="JMIR Hum Factors", year="2025", month="Feb", day="13", volume="12", pages="e65923", keywords="atrial fibrillation", keywords="explainable artificial intelligence", keywords="explainable AI", keywords="user-centered design", keywords="prevention", keywords="postablation monitoring", abstract="Background: The opaque nature of artificial intelligence (AI) algorithms has led to distrust in medical contexts, particularly in the treatment and monitoring of atrial fibrillation. Although previous studies in explainable AI have demonstrated potential to address this issue, they often focus solely on electrocardiography graphs and lack real-world field insights. Objective: We addressed this gap by incorporating standardized clinical interpretation of electrocardiography graphs into the system and collaborating with cardiologists to co-design and evaluate this approach using real-world patient cases and data. Methods: We conducted a 3-stage iterative design process with 23 cardiologists to co-design, evaluate, and pilot an explainable AI application. In the first stage, we identified 4 physician personas and 7 explainability strategies, which were reviewed in the second stage. A total of 4 strategies were deemed highly effective and feasible for pilot deployment. On the basis of these strategies, we developed a progressive web application and tested it with cardiologists in the third stage. Results: The final progressive web application prototype received above-average user experience evaluations and effectively motivated physicians to adopt it owing to its ease of use, reliable information, and explainable functionality. In addition, we gathered in-depth field insights from cardiologists who used the system in clinical contexts. Conclusions: Our study identified effective explainability strategies, emphasized the importance of curating actionable features and setting accurate expectations, and suggested that many of these insights could apply to other disease care contexts, paving the way for future real-world clinical evaluations. ", doi="10.2196/65923", url="https://humanfactors.jmir.org/2025/1/e65923" } @Article{info:doi/10.2196/66910, author="Seinen, M. Tom and Kors, A. Jan and van Mulligen, M. Erik and Rijnbeek, R. Peter", title="Using Structured Codes and Free-Text Notes to Measure Information Complementarity in Electronic Health Records: Feasibility and Validation Study", journal="J Med Internet Res", year="2025", month="Feb", day="13", volume="27", pages="e66910", keywords="natural language processing", keywords="named entity recognition", keywords="clinical concept extraction", keywords="machine learning", keywords="electronic health records", keywords="EHR", keywords="word embeddings", keywords="clinical concept similarity", keywords="text mining", keywords="code", keywords="free-text", keywords="information", keywords="electronic record", keywords="data", keywords="patient records", keywords="framework", keywords="structured data", keywords="unstructured data", abstract="Background: Electronic health records (EHRs) consist of both structured data (eg, diagnostic codes) and unstructured data (eg, clinical notes). It is commonly believed that unstructured clinical narratives provide more comprehensive information. However, this assumption lacks large-scale validation and direct validation methods. Objective: This study aims to quantitatively compare the information in structured and unstructured EHR data and directly validate whether unstructured data offers more extensive information across a patient population. Methods: We analyzed both structured and unstructured data from patient records and visits in a large Dutch primary care EHR database between January 2021 and January 2024. Clinical concepts were identified from free-text notes using an extraction framework tailored for Dutch and compared with concepts from structured data. Concept embeddings were generated to measure semantic similarity between structured and extracted concepts through cosine similarity. A similarity threshold was systematically determined via annotated matches and minimized weighted Gini impurity. We then quantified the concept overlap between structured and unstructured data across various concept domains and patient populations. Results: In a population of 1.8 million patients, only 13\% of extracted concepts from patient records and 7\% from individual visits had similar structured counterparts. Conversely, 42\% of structured concepts in records and 25\% in visits had similar matches in unstructured data. Condition concepts had the highest overlap, followed by measurements and drug concepts. Subpopulation visits, such as those with chronic conditions or psychological disorders, showed different proportions of data overlap, indicating varied reliance on structured versus unstructured data across clinical contexts. Conclusions: Our study demonstrates the feasibility of quantifying the information difference between structured and unstructured data, showing that the unstructured data provides important additional information in the studied database and populations. The annotated concept matches are made publicly available for the clinical natural language processing community. Despite some limitations, our proposed methodology proves versatile, and its application can lead to more robust and insightful observational clinical research. ", doi="10.2196/66910", url="https://www.jmir.org/2025/1/e66910" } @Article{info:doi/10.2196/48328, author="Kottlors, Jonathan and Hahnfeldt, Robert and G{\"o}rtz, Lukas and Iuga, Andra-Iza and Fervers, Philipp and Bremm, Johannes and Zopfs, David and Laukamp, R. Kai and Onur, A. Oezguer and Lennartz, Simon and Sch{\"o}nfeld, Michael and Maintz, David and Kabbasch, Christoph and Persigehl, Thorsten and Schlamann, Marc", title="Large Language Models--Supported Thrombectomy Decision-Making in Acute Ischemic Stroke Based on Radiology Reports: Feasibility Qualitative Study", journal="J Med Internet Res", year="2025", month="Feb", day="13", volume="27", pages="e48328", keywords="artificial intelligence", keywords="radiology", keywords="report", keywords="large language model", keywords="text-based augmented supporting system", keywords="mechanical thrombectomy", keywords="GPT", keywords="stroke", keywords="decision-making", keywords="thrombectomy", keywords="imaging", keywords="model", keywords="machine learning", keywords="ischemia", abstract="Background: The latest advancement of artificial intelligence (AI) is generative pretrained transformer large language models (LLMs). They have been trained on massive amounts of text, enabling humanlike and semantical responses to text-based inputs and requests. Foreshadowing numerous possible applications in various fields, the potential of such tools for medical data integration and clinical decision-making is not yet clear. Objective: In this study, we investigate the potential of LLMs in report-based medical decision-making on the example of acute ischemic stroke (AIS), where clinical and image-based information may indicate an immediate need for mechanical thrombectomy (MT). The purpose was to elucidate the feasibility of integrating radiology report data and other clinical information in the context of therapy decision-making using LLMs. Methods: A hundred patients with AIS were retrospectively included, for which 50\% (50/100) was indicated for MT, whereas the other 50\% (50/100) was not. The LLM was provided with the computed tomography report, information on neurological symptoms and onset, and patients' age. The performance of the AI decision-making model was compared with an expert consensus regarding the binary determination of MT indication, for which sensitivity, specificity, and accuracy were calculated. Results: The AI model had an overall accuracy of 88\%, with a specificity of 96\% and a sensitivity of 80\%. The area under the curve for the report-based MT decision was 0.92. Conclusions: The LLM achieved promising accuracy in determining the eligibility of patients with AIS for MT based on radiology reports and clinical information. Our results underscore the potential of LLMs for radiological and medical data integration. This investigation should serve as a stimulus for further clinical applications of LLMs, in which this AI should be used as an augmented supporting system for human decision-making. ", doi="10.2196/48328", url="https://www.jmir.org/2025/1/e48328", url="http://www.ncbi.nlm.nih.gov/pubmed/39946168" } @Article{info:doi/10.2196/64318, author="Seo, Sujeong and Kim, Kyuli and Yang, Heyoung", title="Performance Assessment of Large Language Models in Medical Consultation: Comparative Study", journal="JMIR Med Inform", year="2025", month="Feb", day="12", volume="13", pages="e64318", keywords="artificial intelligence", keywords="biomedical", keywords="large language model", keywords="depression", keywords="similarity measurement", keywords="text validity", abstract="Background: The recent introduction of generative artificial intelligence (AI) as an interactive consultant has sparked interest in evaluating its applicability in medical discussions and consultations, particularly within the domain of depression. Objective: This study evaluates the capability of large language models (LLMs) in AI to generate responses to depression-related queries. Methods: Using the PubMedQA and QuoraQA data sets, we compared various LLMs, including BioGPT, PMC-LLaMA, GPT-3.5, and Llama2, and measured the similarity between the generated and original answers. Results: The latest general LLMs, GPT-3.5 and Llama2, exhibited superior performance, particularly in generating responses to medical inquiries from the PubMedQA data set. Conclusions: Considering the rapid advancements in LLM development in recent years, it is hypothesized that version upgrades of general LLMs offer greater potential for enhancing their ability to generate ``knowledge text'' in the biomedical domain compared with fine-tuning for the biomedical field. These findings are expected to contribute significantly to the evolution of AI-based medical counseling systems. ", doi="10.2196/64318", url="https://medinform.jmir.org/2025/1/e64318" } @Article{info:doi/10.2196/59961, author="Lu, An-Tai and Liou, Chong-Sin and Lai, Chia-Hsin and Shian, Bo-Tsz and Li, Ming-Ta and Sun, Chih-Yen and Kao, Hao-Yun and Dai, Hong-Jie and Tsai, Ming-Ju", title="Application of Clinical Department--Specific AI-Assisted Coding Using Taiwan Diagnosis-Related Groups: Retrospective Validation Study", journal="JMIR Hum Factors", year="2025", month="Feb", day="12", volume="12", pages="e59961", keywords="diagnosis-related group", keywords="artificial intelligence coding", keywords="International Classification of Diseases, Tenth Revision, Clinical Modification", keywords="ICD-10-CM", keywords="coding professionals", abstract="Background: The accuracy of the ICD-10-CM (International Classification of Diseases, Tenth Revision, Clinical Modification) procedure coding system (PCS) is crucial for generating correct Taiwan diagnosis-related groups (DRGs), as coding errors can lead to financial losses for hospitals. Objective: The study aimed to determine the consistency between an artificial intelligence (AI)-assisted coding module and manual coding, as well as to identify clinical specialties suitable for implementing the developed AI-assisted coding module. Methods: This study examined the AI-assisted coding module from the perspective of health care professionals. The research period started in February 2023. The study excluded cases outside of Taiwan DRGs, those with incomplete medical records, and cases with Taiwan DRG disposals ICD-10 (International Statistical Classification of Diseases, Tenth Revision) PCS. Data collection was conducted through retrospective medical record review. The AI-assisted module was constructed using a hierarchical attention network. The verification of the Taiwan DRGs results from the AI-assisted coding model focused on the major diagnostic categories (MDCs). Statistical computations were conducted using SPSS version 19. Research variables consisted of categorical variables represented by MDC, and continuous variables were represented by the relative weight of Taiwan DRGs. Results: A total of 2632 discharge records meeting the research criteria were collected from February to April 2023. In terms of inferential statistics, $\kappa$ statistics were used for MDC analysis. The infectious and parasitic diseases MDC, as well as the respiratory diseases MDC had $\kappa$ values exceeding 0.8. Clinical inpatient specialties were statistically analyzed using the Wilcoxon signed rank test. There was not a difference in coding results between the 23 clinical departments, such as the Division of Cardiology, the Division of Nephrology, and the Department of Urology. Conclusions: For human coders, with the assistance of the ICD-10-CM AI-assisted coding system, work time is reduced. Additionally, strengthening knowledge in clinical documentation enables human coders to maximize their role. This positions them to become clinical documentation experts, preparing them for further career development. Future research will apply the same method to validate the ICD-10 AI-assisted coding module. ", doi="10.2196/59961", url="https://humanfactors.jmir.org/2025/1/e59961" } @Article{info:doi/10.2196/66222, author="Harari, E. Rayan and Schulwolf, L. Sara and Borges, Paulo and Salmani, Hamid and Hosseini, Farhang and Bailey, T. Shannon K. and Quach, Brian and Nohelty, Eric and Park, Sandra and Verma, Yash and Goralnick, Eric and Goldberg, A. Scott and Shokoohi, Hamid and Dias, D. Roger and Eyre, Andrew", title="Applications of Augmented Reality for Prehospital Emergency Care: Systematic Review of Randomized Controlled Trials", journal="JMIR XR Spatial Comput", year="2025", month="Feb", day="11", volume="2", pages="e66222", keywords="prehospital emergency care", keywords="emergency medical services", keywords="randomized controlled trials", keywords="clinical decision support", keywords="training", keywords="augmented reality", keywords="emergency", keywords="care", keywords="systematic review", keywords="BLS", keywords="procedures", keywords="traumatic injury", keywords="survival", keywords="prehospital", keywords="emergency care", keywords="AR", keywords="decision-making", keywords="educational", keywords="education", keywords="EMS", keywords="database", keywords="technology", keywords="critical care", keywords="basic life support", abstract="Background: Delivering high-quality prehospital emergency care remains challenging, especially in resource-limited settings where real-time clinical decision support is limited. Augmented reality (AR) has emerged as a promising health care technology, offering potential solutions to enhance decision-making, care processes, and emergency medical service (EMS) training. Objective: This systematic review assesses the effectiveness of AR in improving clinical decision-making, care delivery, and educational outcomes for EMS providers. Methods: We searched databases including PubMed, Cochrane CENTRAL, Web of Science, Institute of Electrical and Electronics Engineers (IEEE), Embase, PsycInfo, and Association for Computing Machinery (ACM). Studies were selected based on their focus on AR in prehospital care. A total of 14 randomized controlled trials were selected from an initial screening of 2081 manuscripts. Included studies focused on AR use by EMS personnel, examining clinical and educational impacts. Data such as study demographics, intervention type, outcomes, and methodologies were extracted using a standardized form. Primary outcomes assessed included clinical task accuracy, response times, and training efficacy. A narrative synthesis was conducted, and bias was evaluated using Cochrane's risk of bias tool. Improvements in AR-assisted interventions and their limitations were analyzed. Results: AR significantly improved clinical decision-making accuracy and EMS training outcomes, reducing response times in simulations and real-world applications. However, small sample sizes and challenges in integrating AR into workflows limit the generalizability of the findings. Conclusions: AR holds promise for transforming prehospital care by enhancing real-time decision-making and EMS training. Future research should address technological integration and scalability to fully realize AR's potential in EMS. ", doi="10.2196/66222", url="https://xr.jmir.org/2025/1/e66222" } @Article{info:doi/10.2196/60273, author="Kim, Yong Jin and Marshall, D. Vincent and Rowell, Brigid and Chen, Qiyuan and Zheng, Yifan and Lee, D. John and Kontar, Al Raed and Lester, Corey and Yang, Jessie Xi", title="The Effects of Presenting AI Uncertainty Information on Pharmacists' Trust in Automated Pill Recognition Technology: Exploratory Mixed Subjects Study", journal="JMIR Hum Factors", year="2025", month="Feb", day="11", volume="12", pages="e60273", keywords="artificial intelligence", keywords="human-computer interaction", keywords="uncertainty communication", keywords="visualization", keywords="medication errors", keywords="safety", keywords="artificial intelligence aid", keywords="pharmacists", keywords="pill verification", keywords="automation", abstract="Background: Dispensing errors significantly contribute to adverse drug events, resulting in substantial health care costs and patient harm. Automated pill verification technologies have been developed to aid pharmacists with medication dispensing. However, pharmacists' trust in such automated technologies remains unexplored. Objective: This study aims to investigate pharmacists' trust in automated pill verification technology designed to support medication dispensing. Methods: Thirty licensed pharmacists in the United States performed a web-based simulated pill verification task to determine whether an image of a filled medication bottle matched a known reference image. Participants completed a block of 100 verification trials without any help, and another block of 100 trials with the help of an imperfect artificial intelligence (AI) aid recommending acceptance or rejection of a filled medication bottle. The experiment used a mixed subjects design. The between-subjects factor was the AI aid type, with or without an AI uncertainty plot. The within-subjects factor was the four potential verification outcomes: (1) the AI rejects the incorrect drug, (2) the AI rejects the correct drug, (3) the AI approves the incorrect drug, and (4) the AI approves the correct drug. Participants' trust in the AI system was measured. Mixed model (generalized linear models) tests were conducted with 2-tailed t tests to compare the means between the 2 AI aid types for each verification outcome. Results: Participants had an average trust propensity score of 72 (SD 18.08) out of 100, indicating a positive attitude toward trusting automated technologies. The introduction of an uncertainty plot to the AI aid significantly enhanced pharmacists' end trust (t28=--1.854; P=.04). Trust dynamics were influenced by AI aid type and verification outcome. Specifically, pharmacists using the AI aid with the uncertainty plot had a significantly larger trust increment when the AI approved the correct drug (t78.98=3.93; P<.001) and a significantly larger trust decrement when the AI approved the incorrect drug (t2939.72=--4.78; P<.001). Intriguingly, the absence of the uncertainty plot led to an increase in trust when the AI correctly rejected an incorrect drug, whereas the presence of the plot resulted in a decrease in trust under the same circumstances (t509.77=--3.96; P<.001). A pronounced ``negativity bias'' was observed, where the degree of trust reduction when the AI made an error exceeded the trust gain when the AI made a correct decision (z=--11.30; P<.001). Conclusions: To the best of our knowledge, this study is the first attempt to examine pharmacists' trust in automated pill verification technology. Our findings reveal that pharmacists have a favorable disposition toward trusting automation. Moreover, providing uncertainty information about the AI's recommendation significantly boosts pharmacists' trust in AI aid, highlighting the importance of developing transparent AI systems within health care. ", doi="10.2196/60273", url="https://humanfactors.jmir.org/2025/1/e60273", url="http://www.ncbi.nlm.nih.gov/pubmed/39932773" } @Article{info:doi/10.2196/48775, author="Bhavnani, K. Suresh and Zhang, Weibin and Bao, Daniel and Raji, Mukaila and Ajewole, Veronica and Hunter, Rodney and Kuo, Yong-Fang and Schmidt, Susanne and Pappadis, R. Monique and Smith, Elise and Bokov, Alex and Reistetter, Timothy and Visweswaran, Shyam and Downer, Brian", title="Subtyping Social Determinants of Health in the ``All of Us'' Program: Network Analysis and Visualization Study", journal="J Med Internet Res", year="2025", month="Feb", day="11", volume="27", pages="e48775", keywords="social determinants of health", keywords="All of Us", keywords="bipartite networks", keywords="financial resources", keywords="health care", keywords="health outcomes", keywords="precision medicine", keywords="decision support", keywords="health industry", keywords="clinical implications", keywords="machine learning methods", abstract="Background: Social determinants of health (SDoH), such as financial resources and housing stability, account for between 30\% and 55\% of people's health outcomes. While many studies have identified strong associations between specific SDoH and health outcomes, little is known about how SDoH co-occur to form subtypes critical for designing targeted interventions. Such analysis has only now become possible through the All of Us program. Objective: This study aims to analyze the All of Us dataset for addressing two research questions: (1) What are the range of and responses to survey questions related to SDoH? and (2) How do SDoH co-occur to form subtypes, and what are their risks for adverse health outcomes? Methods: For question 1, an expert panel analyzed the range of and responses to SDoH questions across 6 surveys in the full All of Us dataset (N=372,397; version 6). For question 2, due to systematic missingness and uneven granularity of questions across the surveys, we selected all participants with valid and complete SDoH data and used inverse probability weighting to adjust their imbalance in demographics. Next, an expert panel grouped the SDoH questions into SDoH factors to enable more consistent granularity. To identify the subtypes, we used bipartite modularity maximization for identifying SDoH biclusters and measured their significance and replicability. Next, we measured their association with 3 outcomes (depression, delayed medical care, and emergency room visits in the last year). Finally, the expert panel inferred the subtype labels, potential mechanisms, and targeted interventions. Results: The question 1 analysis identified 110 SDoH questions across 4 surveys covering all 5 domains in Healthy People 2030. As the SDoH questions varied in granularity, they were categorized by an expert panel into 18 SDoH factors. The question 2 analysis (n=12,913; d=18) identified 4 biclusters with significant biclusteredness (Q=0.13; random-Q=0.11; z=7.5; P<.001) and significant replication (real Rand index=0.88; random Rand index=0.62; P<.001). Each subtype had significant associations with specific outcomes and had meaningful interpretations and potential targeted interventions. For example, the Socioeconomic barriers subtype included 6 SDoH factors (eg, not employed and food insecurity) and had a significantly higher odds ratio (4.2, 95\% CI 3.5-5.1; P<.001) for depression when compared to other subtypes. The expert panel inferred implications of the results for designing interventions and health care policies based on SDoH subtypes. Conclusions: This study identified SDoH subtypes that had statistically significant biclusteredness and replicability, each of which had significant associations with specific adverse health outcomes and with translational implications for targeted SDoH interventions and health care policies. However, the high degree of systematic missingness requires repeating the analysis as the data become more complete by using our generalizable and scalable machine learning code available on the All of Us workbench. ", doi="10.2196/48775", url="https://www.jmir.org/2025/1/e48775", url="http://www.ncbi.nlm.nih.gov/pubmed/39932771" } @Article{info:doi/10.2196/66127, author="Khairat, Saif and Morelli, Jennifer and Boynton, H. Marcella and Bice, Thomas and Gold, A. Jeffrey and Carson, S. Shannon", title="Investigation of Information Overload in Electronic Health Records: Protocol for Usability Study", journal="JMIR Res Protoc", year="2025", month="Feb", day="11", volume="14", pages="e66127", keywords="electronic health records", keywords="information overload", keywords="eye-tracking", keywords="EHR usability", keywords="EHR interface", abstract="Background: Electronic health records (EHRs) have been associated with information overload, causing providers to miss critical information, make errors, and delay care. Information overload can be especially prevalent in medical intensive care units (ICUs) where patients are often critically ill and their charts contain large amounts of data points such as vitals, test and laboratory results, medications, and notes. Objective: We propose to study the relationship between information overload and EHR use among medical ICU providers in 4 major United States medical centers. In this study, we examined 2 prominent EHR systems in the United States to generate reproducible and generalizable findings. Methods: Our study collected physiological and objective data through the use of a screen-mounted eye-tracker. We aim to characterize information overload in the EHR by examining ICU providers' decision-making and EHR usability. We also surveyed providers on their institution's EHR to better understand how they rate the system's task load and usability using the NASA (National Aeronautics and Space Administration) Task Load Index and Computer System Usability Questionnaire. Primary outcomes include the number of eye fixations during each case, the number of correct decisions, the time to complete each case, and number of screens visited. Secondary outcomes include case complexity performance, frequency of mouse clicks, and EHR task load and usability using provided surveys. Results: This EHR usability study was funded in 2021. The study was initiated in 2022 with a completion date of 2025. Data collection for this study was completed in December 2023 and data analysis is ongoing with a total of 81 provider sessions recorded. Conclusions: Our study aims to characterize information overload in the EHR among medical ICU providers. By conducting a multisite, cross-sectional usability assessment of information overload in 2 leading EHRs, we hope to reveal mechanisms that explain information overload. The insights gained from this study may lead to potential improvements in EHR usability and interface design, which could improve health care delivery and patient safety. International Registered Report Identifier (IRRID): DERR1-10.2196/66127 ", doi="10.2196/66127", url="https://www.researchprotocols.org/2025/1/e66127", url="http://www.ncbi.nlm.nih.gov/pubmed/39932774" } @Article{info:doi/10.2196/63149, author="Downing, J. Gregory and Tramontozzi, M. Lucas and Garcia, Jackson and Villanueva, Emma", title="Harnessing Internet Search Data as a Potential Tool for Medical Diagnosis: Literature Review", journal="JMIR Ment Health", year="2025", month="Feb", day="11", volume="12", pages="e63149", keywords="health", keywords="informatics", keywords="internet search data", keywords="early diagnosis", keywords="web search", keywords="information technology", keywords="internet", keywords="machine learning", keywords="medical records", keywords="diagnosis", keywords="health care", keywords="self-diagnosis", keywords="detection", keywords="intervention", keywords="patient education", keywords="internet search", keywords="health-seeking behavior", keywords="artificial intelligence", keywords="AI", abstract="Background: The integration of information technology into health care has created opportunities to address diagnostic challenges. Internet searches, representing a vast source of health-related data, hold promise for improving early disease detection. Studies suggest that patterns in search behavior can reveal symptoms before clinical diagnosis, offering potential for innovative diagnostic tools. Leveraging advancements in machine learning, researchers have explored linking search data with health records to enhance screening and outcomes. However, challenges like privacy, bias, and scalability remain critical to its widespread adoption. Objective: We aimed to explore the potential and challenges of using internet search data in medical diagnosis, with a specific focus on diseases and conditions such as cancer, cardiovascular disease, mental and behavioral health, neurodegenerative disorders, and nutritional and metabolic diseases. We examined ethical, technical, and policy considerations while assessing the current state of research, identifying gaps and limitations, and proposing future research directions to advance this emerging field. Methods: We conducted a comprehensive analysis of peer-reviewed literature and informational interviews with subject matter experts to examine the landscape of internet search data use in medical research. We searched for published peer-reviewed literature on the PubMed database between October and December 2023. Results: Systematic selection based on predefined criteria included 40 articles from the 2499 identified articles. The analysis revealed a nascent domain of internet search data research in medical diagnosis, marked by advancements in analytics and data integration. Despite challenges such as bias, privacy, and infrastructure limitations, emerging initiatives could reshape data collection and privacy safeguards. Conclusions: We identified signals correlating with diagnostic considerations in certain diseases and conditions, indicating the potential for such data to enhance clinical diagnostic capabilities. However, leveraging internet search data for improved early diagnosis and health care outcomes requires effectively addressing ethical, technical, and policy challenges. By fostering interdisciplinary collaboration, advancing infrastructure development, and prioritizing patient engagement and consent, researchers can unlock the transformative potential of internet search data in medical diagnosis to ultimately enhance patient care and advance health care practice and policy. ", doi="10.2196/63149", url="https://mental.jmir.org/2025/1/e63149", url="http://www.ncbi.nlm.nih.gov/pubmed/39813106" } @Article{info:doi/10.2196/46007, author="Beqaj, Safedin and Shrestha, Rojeet and Hamill, Tim", title="An Automated Clinical Laboratory Decision Support System for Test Utilization, Medical Necessity Verification, and Payment Processing", journal="Interact J Med Res", year="2025", month="Feb", day="11", volume="14", pages="e46007", keywords="clinical decision system", keywords="CDSS", keywords="laboratory decision system", keywords="laboratory testing", keywords="test utilization", keywords="test ordering", keywords="lab test", keywords="laboratory", keywords="testing", keywords="payment", keywords="decision-making", keywords="user", keywords="utilization", keywords="processing", keywords="decision", doi="10.2196/46007", url="https://www.i-jmr.org/2025/1/e46007", url="http://www.ncbi.nlm.nih.gov/pubmed/39808833" } @Article{info:doi/10.2196/66269, author="Shan, Rui and Li, Xin and Chen, Jing and Chen, Zheng and Cheng, Yuan-Jia and Han, Bo and Hu, Run-Ze and Huang, Jiu-Ping and Kong, Gui-Lan and Liu, Hui and Mei, Fang and Song, Shi-Bing and Sun, Bang-Kai and Tian, Hui and Wang, Yang and Xiao, Wu-Cai and Yao, Xiang-Yun and Ye, Jing-Ming and Yu, Bo and Yuan, Chun-Hui and Zhang, Fan and Liu, Zheng", title="Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in Extremely Unbalanced Data: Retrospective Cohort Study and Literature Review", journal="JMIR Cancer", year="2025", month="Feb", day="10", volume="11", pages="e66269", keywords="follicular thyroid neoplasm", keywords="machine learning", keywords="prediction model", keywords="malignancy", keywords="unbalanced data", keywords="literature review", abstract="Background: Diagnosing and managing follicular thyroid neoplasms (FTNs) remains a significant challenge, as the malignancy risk cannot be determined until after diagnostic surgery. Objective: We aimed to use interpretable machine learning to predict the malignancy risk of FTNs preoperatively in a real-world setting. Methods: We conducted a retrospective cohort study at the Peking University Third Hospital in Beijing, China. Patients with postoperative pathological diagnoses of follicular thyroid adenoma (FTA) or follicular thyroid carcinoma (FTC) were included, excluding those without preoperative thyroid ultrasonography. We used 22 predictors involving demographic characteristics, thyroid sonography, and hormones to train 5 machine learning models: logistic regression, least absolute shrinkage and selection operator regression, random forest, extreme gradient boosting, and support vector machine. The optimal model was selected based on discrimination, calibration, interpretability, and parsimony. To address the highly imbalanced data (FTA:FTC ratio>5:1), model discrimination was assessed using both the area under the receiver operating characteristic curve and the area under the precision-recall curve (AUPRC). To interpret the model, we used Shapley Additive Explanations values and partial dependence and individual conditional expectation plots. Additionally, a systematic review was performed to synthesize existing evidence and validate the discrimination ability of the previously developed Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria to differentiate between benign and malignant FTNs using our data. Results: The cohort included 1539 patients (mean age 47.98, SD 14.15 years; female: n=1126, 73.16\%) with 1672 FTN tumors (FTA: n=1414; FTC: n=258; FTA:FTC ratio=5.5). The random forest model emerged as optimal, identifying mean thyroid-stimulating hormone (TSH) score, mean tumor diameter, mean TSH, TSH instability, and TSH measurement levels as the top 5 predictors in discriminating FTA from FTC, with the area under the receiver operating characteristic curve of 0.79 (95\% CI 0.77?0.81) and AUPRC of 0.40 (95\% CI 0.37-0.44). Malignancy risk increased nonlinearly with larger tumor diameters and higher TSH instability but decreased nonlinearly with higher mean TSH scores or mean TSH levels. FTCs with small sizes (mean diameter 2.88, SD 1.38 cm) were more likely to be misclassified as FTAs compared to larger ones (mean diameter 3.71, SD 1.36 cm). The systematic review of the 7 included studies revealed that (1) the FTA:FTC ratio varied from 0.6 to 4.0, lower than the natural distribution of 5.0; (2) no studies assessed prediction performance using AUPRC in unbalanced datasets; and (3) external validations of Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria underperformed relative to the original study. Conclusions: Tumor size and TSH measurements were important in screening FTN malignancy risk preoperatively, but accurately predicting the risk of small-sized FTNs remains challenging. Future research should address the limitations posed by the extreme imbalance in FTA and FTC distributions in real-world data. ", doi="10.2196/66269", url="https://cancer.jmir.org/2025/1/e66269" } @Article{info:doi/10.2196/60948, author="Wu, Xingyue and Lam, Sing Chun and Hui, Ho Ka and Loong, Ho-fung Herbert and Zhou, Rui Keary and Ngan, Chun-Kit and Cheung, Ting Yin", title="Perceptions in 3.6 Million Web-Based Posts of Online Communities on the Use of Cancer Immunotherapy: Data Mining Using BERTopic", journal="J Med Internet Res", year="2025", month="Feb", day="10", volume="27", pages="e60948", keywords="social media", keywords="cancer", keywords="immunotherapy", keywords="perceptions", keywords="data mining", keywords="oncology", keywords="web-based", keywords="lifestyle", keywords="therapeutic intervention", keywords="leukemia", keywords="lymphoma", keywords="survival", keywords="treatment", keywords="health information", keywords="decision-making", keywords="online community", keywords="machine learning", abstract="Background: Immunotherapy has become a game changer in cancer treatment. The internet has been used by patients as a platform to share personal experiences and seek medical guidance. Despite the increased utilization of immunotherapy in clinical practice, few studies have investigated the perceptions about its use by analyzing social media data. Objective: This study aims to use BERTopic (a topic modeling technique that is an extension of the Bidirectional Encoder Representation from Transformers machine learning model) to explore the perceptions of online cancer communities regarding immunotherapy. Methods: A total of 4.9 million posts were extracted from Facebook, Twitter, Reddit, and 16 online cancer-related forums. The textual data were preprocessed by natural language processing. BERTopic modeling was performed to identify topics from the posts. The effectiveness of isolating topics from the posts was evaluated using 3 metrics: topic diversity, coherence, and quality. Sentiment analysis was performed to determine the polarity of each topic and categorize them as positive or negative. Based on the topics generated through topic modeling, thematic analysis was conducted to identify themes associated with immunotherapy. Results: After data cleaning, 3.6 million posts remained for modeling. The highest overall topic quality achieved by BERTopic was 70.47\% (topic diversity: 87.86\%; topic coherence: 80.21\%). BERTopic generated 14 topics related to the perceptions of immunotherapy. The sentiment score of around 0.3 across the 14 topics suggested generally positive sentiments toward immunotherapy within the online communities. Six themes were identified, primarily covering (1) hopeful prospects offered by immunotherapy, (2) perceived effectiveness of immunotherapy, (3) complementary therapies or self-treatments, (4) financial and mental impact of undergoing immunotherapy, (5) impact on lifestyle and time schedules, and (6) side effects due to treatment. Conclusions: This study provides an overview of the multifaceted considerations essential for the application of immunotherapy as a therapeutic intervention. The topics and themes identified can serve as supporting information to facilitate physician-patient communication and the decision-making process. Furthermore, this study also demonstrates the effectiveness of BERTopic in analyzing large amounts of data to identify perceptions underlying social media and online communities. ", doi="10.2196/60948", url="https://www.jmir.org/2025/1/e60948" } @Article{info:doi/10.2196/60888, author="Luo, Aijing and Chen, Wei and Zhu, Hongtao and Xie, Wenzhao and Chen, Xi and Liu, Zhenjiang and Xin, Zirui", title="Machine Learning in the Management of Patients Undergoing Catheter Ablation for Atrial Fibrillation: Scoping Review", journal="J Med Internet Res", year="2025", month="Feb", day="10", volume="27", pages="e60888", keywords="atrial fibrillation", keywords="catheter ablation", keywords="deep learning", keywords="patient management", keywords="prognosis", keywords="quality assessment tools", keywords="cardiac arrhythmia", keywords="public health", keywords="quality of life", keywords="severe medical condition", keywords="electrocardiogram", keywords="electronic health record", keywords="morbidity", keywords="mortality", keywords="thromboembolism", keywords="clinical intervention", abstract="Background: Although catheter ablation (CA) is currently the most effective clinical treatment for atrial fibrillation, its variable therapeutic effects among different patients present numerous problems. Machine learning (ML) shows promising potential in optimizing the management and clinical outcomes of patients undergoing atrial fibrillation CA (AFCA). Objective: This scoping review aimed to evaluate the current scientific evidence on the application of ML for managing patients undergoing AFCA, compare the performance of various models across specific clinical tasks within AFCA, and summarize the strengths and limitations of ML in this field. Methods: Adhering to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, relevant studies published up to October 7, 2023, were searched from PubMed, Web of Science, Embase, the Cochrane Library, and ScienceDirect. The final included studies were confirmed based on inclusion and exclusion criteria and manual review. The PROBAST (Prediction model Risk Of Bias Assessment Tool) and QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2) methodological quality assessment tools were used to review the included studies, and narrative data synthesis was performed on the modeled results provided by these studies. Results: The analysis of 23 included studies showcased the contributions of ML in identifying potential ablation targets, improving ablation strategies, and predicting patient prognosis. The patient data used in these studies comprised demographics, clinical characteristics, various types of imaging (9/23, 39\%), and electrophysiological signals (7/23, 30\%). In terms of model type, deep learning, represented by convolutional neural networks, was most frequently applied (14/23, 61\%). Compared with traditional clinical scoring models or human clinicians, the model performance reported in the included studies was generally satisfactory, but most models (14/23, 61\%) showed a high risk of bias due to lack of external validation. Conclusions: Our evidence-based findings suggest that ML is a promising tool for improving the effectiveness and efficiency of managing patients undergoing AFCA. While guiding data preparation and model selection for future studies, this review highlights the need to address prevalent limitations, including lack of external validation, and to further explore model generalization and interpretability. ", doi="10.2196/60888", url="https://www.jmir.org/2025/1/e60888" } @Article{info:doi/10.2196/64414, author="Stroud, M. Austin and Curtis, H. Susan and Weir, B. Isabel and Stout, J. Jeremiah and Barry, A. Barbara and Bobo, V. William and Athreya, P. Arjun and Sharp, R. Richard", title="Physician Perspectives on the Potential Benefits and Risks of Applying Artificial Intelligence in Psychiatric Medicine: Qualitative Study", journal="JMIR Ment Health", year="2025", month="Feb", day="10", volume="12", pages="e64414", keywords="artificial intelligence", keywords="machine learning", keywords="digital health", keywords="mental health", keywords="psychiatry", keywords="depression", keywords="interviews", keywords="family medicine", keywords="physicians", keywords="qualitative", keywords="providers", keywords="attitudes", keywords="opinions", keywords="perspectives", keywords="ethics", abstract="Background: As artificial intelligence (AI) tools are integrated more widely in psychiatric medicine, it is important to consider the impact these tools will have on clinical practice. Objective: This study aimed to characterize physician perspectives on the potential impact AI tools will have in psychiatric medicine. Methods: We interviewed 42 physicians (21 psychiatrists and 21 family medicine practitioners). These interviews used detailed clinical case scenarios involving the use of AI technologies in the evaluation, diagnosis, and treatment of psychiatric conditions. Interviews were transcribed and subsequently analyzed using qualitative analysis methods. Results: Physicians highlighted multiple potential benefits of AI tools, including potential support for optimizing pharmaceutical efficacy, reducing administrative burden, aiding shared decision-making, and increasing access to health services, and were optimistic about the long-term impact of these technologies. This optimism was tempered by concerns about potential near-term risks to both patients and themselves including misguiding clinical judgment, increasing clinical burden, introducing patient harms, and creating legal liability. Conclusions: Our results highlight the importance of considering specialist perspectives when deploying AI tools in psychiatric medicine. ", doi="10.2196/64414", url="https://mental.jmir.org/2025/1/e64414", url="http://www.ncbi.nlm.nih.gov/pubmed/39928397" } @Article{info:doi/10.2196/55825, author="Bhak, Youngmin and Lee, Ho Yu and Kim, Joonhyung and Lee, Kiwon and Lee, Daehwan and Jang, Chan Eun and Jang, Eunjeong and Lee, Seungkyu Christopher and Kang, Seok Eun and Park, Sehee and Han, Wook Hyun and Nam, Min Sang", title="Diagnosis of Chronic Kidney Disease Using Retinal Imaging and Urine Dipstick Data: Multimodal Deep Learning Approach", journal="JMIR Med Inform", year="2025", month="Feb", day="7", volume="13", pages="e55825", keywords="multimodal deep learning", keywords="chronic kidney disease", keywords="fundus image", keywords="saliency map", keywords="urine dipstick", abstract="Background: Chronic kidney disease (CKD) is a prevalent condition with significant global health implications. Early detection and management are critical to prevent disease progression and complications. Deep learning (DL) models using retinal images have emerged as potential noninvasive screening tools for CKD, though their performance may be limited, especially in identifying individuals with proteinuria and in specific subgroups. Objective: We aim to evaluate the efficacy of integrating retinal images and urine dipstick data into DL models for enhanced CKD diagnosis. Methods: The 3 models were developed and validated: eGFR-RIDL (estimated glomerular filtration rate--retinal image deep learning), eGFR-UDLR (logistic regression using urine dipstick data), and eGFR-MMDL (multimodal deep learning combining retinal images and urine dipstick data). All models were trained to predict an eGFR<60 mL/min/1.73 m{\texttwosuperior}, a key indicator of CKD, calculated using the 2009 CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) equation. This study used a multicenter dataset of participants aged 20?79 years, including a development set (65,082 people) and an external validation set (58,284 people). Wide Residual Networks were used for DL, and saliency maps were used to visualize model attention. Sensitivity analyses assessed the impact of numerical variables. Results: eGFR-MMDL outperformed eGFR-RIDL in both the test and external validation sets, with area under the curves of 0.94 versus 0.90 and 0.88 versus 0.77 (P<.001 for both, DeLong test). eGFR-UDLR outperformed eGFR-RIDL and was comparable to eGFR-MMDL, particularly in the external validation. However, in the subgroup analysis, eGFR-MMDL showed improvement across all subgroups, while eGFR-UDLR demonstrated no such gains. This suggested that the enhanced performance of eGFR-MMDL was not due to urine data alone, but rather from the synergistic integration of both retinal images and urine data. The eGFR-MMDL model demonstrated the best performance in individuals younger than 65 years or those with proteinuria. Age and proteinuria were identified as critical factors influencing model performance. Saliency maps indicated that urine data and retinal images provide complementary information, with urine offering insights into retinal abnormalities and retinal images, particularly the arcade vessels, being key for predicting kidney function. Conclusions: The MMDL model integrating retinal images and urine dipstick data show significant promise for noninvasive CKD screening, outperforming the retinal image--only model. However, routine blood tests are still recommended for individuals aged 65 years and older due to the model's limited performance in this age group. ", doi="10.2196/55825", url="https://medinform.jmir.org/2025/1/e55825" } @Article{info:doi/10.2196/62763, author="Gani, Illin and Litchfield, Ian and Shukla, David and Delanerolle, Gayathri and Cockburn, Neil and Pathmanathan, Anna", title="Understanding ``Alert Fatigue'' in Primary Care: Qualitative Systematic Review of General Practitioners Attitudes and Experiences of Clinical Alerts, Prompts, and Reminders", journal="J Med Internet Res", year="2025", month="Feb", day="7", volume="27", pages="e62763", keywords="primary care", keywords="general practitioners", keywords="alert fatigue", keywords="computer decision support systems", keywords="fatigue", keywords="qualitative", keywords="systematic review", keywords="quality of care", keywords="clinical behaviors", keywords="behaviors", keywords="database", keywords="family practice", keywords="family", keywords="algorithm", keywords="patient safety", keywords="patient", abstract="Background: The consistency and quality of care in modern primary care are supported by various clinical reminders (CRs), which include ``alerts'' describing the consequences of certain decisions and ``prompts'' that remind users to perform tasks promoting desirable clinical behaviors. However, not all CRs are acted upon, and many are disregarded by general practitioners (GPs), a chronic issue commonly referred to as ``alert fatigue.'' This phenomenon has significant implications for the safety and quality of care, GP burnout, and broader medicolegal consequences. Research on mitigating alert fatigue and optimizing the use of CRs remains limited. This review offers much-needed insight into GP attitudes toward the deployment, design, and overall effectiveness of CRs. Objective: This systematic review aims to synthesize current qualitative research on GPs' attitudes toward CRs, enabling an exploration of the interacting influences on the occurrence of alert fatigue in GPs, including the deployment, design, and perceived efficacy of CRs. Methods: A systematic literature search was conducted across the Health Technology Assessment database, MEDLINE, MEDLINE In-Process, Embase, CINAHL, Conference Proceedings Citation Index, PsycINFO, and OpenGrey. The search focused on primary qualitative and mixed methods research conducted in general or family practice, specifically exploring GPs' experiences with CRs. All databases were searched from inception to December 31, 2023. To ensure structured and practicable findings, we used a directed content analysis of the data, guided by the 7 domains of the Non-adoption, Abandonment, Scale-up, Spread, and Sustainability (NASSS) framework, including domains related to Technology, Adopter attitudes, and Organization. Results: A total of 9 studies were included, and the findings were organized within the 7 domains. Regarding Condition and Value Proposition, GPs viewed CRs as an effective way to maintain or improve the safety and quality of care they provide. When considering the attributes of the Technology, the efficacy of CRs was linked to their frequency, presentation, and the accuracy of their content. Within Adopters, concerns were raised about the accuracy of CRs and the risk that their use could diminish the value of GP experience and contextual understanding. From an Organization perspective, the need for training on the use and benefits of CRs was highlighted. Finally, in the context of the Wider system and their Embedding Over Time, suggestions included sharing best practices for CR use and involving GPs in their design. Conclusions: While GPs acknowledged that CRs, when used optimally, can enhance patient safety and quality of care, several concerns emerged regarding their design, content accuracy, and lack of contextual nuance. Suggestions to improve CR adherence included providing coherent training, enhancing their design, and incorporating more personalized content. Trial Registration: PROSPERO CRD42016029418; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=29418 International Registered Report Identifier (IRRID): RR2-10.1186/s13643-017-0627-z ", doi="10.2196/62763", url="https://www.jmir.org/2025/1/e62763", url="http://www.ncbi.nlm.nih.gov/pubmed/39918864" } @Article{info:doi/10.2196/55929, author="Alvarez Avendano, Alejandro Sebastian and Cochran, Amy and Odeh Couvertier, Valerie and Patterson, Brian and Shah, Manish and Zayas-Caban, Gabriel", title="Revisits, Readmission, and Mortality From Emergency Department Admissions for Older Adults With Vague Presentations: Longitudinal Observational Study", journal="JMIR Aging", year="2025", month="Feb", day="6", volume="8", pages="e55929", keywords="gerontology", keywords="geriatric", keywords="older adults", keywords="elderly", keywords="older people", keywords="aging", keywords="emergency department", keywords="emergency room", keywords="ED", keywords="disposition decision", keywords="disposition", keywords="discharge", keywords="admission", keywords="revisit", keywords="readmission", keywords="observational study", keywords="health", keywords="hospital", abstract="Background: Older adults (65 years and older) often present to the emergency department (ED) with an unclear need for hospitalization, leading to potentially harmful and costly care. This underscores the importance of measuring the trade-off between admission and discharge for these patients in terms of patient outcomes. Objective: This study aimed to measure the relationship between disposition decisions and 3-day, 9-day, and 30-day revisits, readmission, and mortality, using causal inference methods that adjust for potential measured and unmeasured confounding. Methods: A longitudinal observational study (n=3591) was conducted using electronic health records from a large tertiary teaching hospital with an ED between January 1, 2014 and September 27, 2018. The sample consisted of older adult patients with 1 of 6 presentations with significant variability in admission: falls, weakness, syncope, urinary tract infection, pneumonia, and cellulitis. The exposure under consideration was the ED disposition decision (admission to the hospital or discharge). Nine outcome variables were considered: ED revisits, hospital readmission, and mortality within 3, 9, and 30 days of being discharged from either the hospital for admitted patients or the ED for discharged patients. Results: Admission was estimated to significantly decrease the risk of an ED revisit after discharge (30-day window: ?6.4\%, 95\% CI ?7.8 to ?5.0), while significantly increasing the risk of hospital readmission (30-day window: 5.8\%, 95\% CI 5.0 to 6.5) and mortality (30-day window: 1.0\%, 95\% CI 0.4 to 1.6). Admission was found to be especially adverse for patients with weakness and pneumonia, and relatively less adverse for older adult patients with falls and syncope. Conclusions: Admission may not be the safe option for older adults with gray area presentations, and while revisits and readmissions are commonly used to evaluate the quality of care in the ED, their divergence suggests that caution should be used when interpreting either in isolation. ", doi="10.2196/55929", url="https://aging.jmir.org/2025/1/e55929" } @Article{info:doi/10.2196/68371, author="Rose, Christian and Shearer, Emily and Woller, Isabela and Foster, Ashley and Ashenburg, Nicholas and Kim, Ireh and Newberry, Jennifer", title="Identifying High-Priority Ethical Challenges for Precision Emergency Medicine: Nominal Group Study", journal="JMIR Form Res", year="2025", month="Feb", day="6", volume="9", pages="e68371", keywords="precision medicine", keywords="emergency medicine", keywords="ethical considerations", keywords="nominal group study", keywords="consensus framework", abstract="Background: Precision medicine promises to revolutionize health care by providing the right care to the right patient at the right time. However, the emergency department's unique mandate to treat ``anyone, anywhere, anytime'' creates critical tensions with precision medicine's requirements for comprehensive patient data and computational analysis. As emergency departments serve as health care's safety net and provide a growing proportion of acute care in America, identifying and addressing the ethical challenges of implementing precision medicine in this setting is crucial to prevent exacerbation of existing health care disparities. The rapid advancement of precision medicine technologies makes it imperative to understand these challenges before widespread implementation in emergency care settings. Objective: This study aimed to identify high priority ethical concerns facing the implementation of precision medicine in the emergency department. Methods: We conducted a qualitative study using a modified nominal group technique (NGT) with emergency physicians who had previous knowledge of precision medicine concepts. The NGT process consisted of four phases: (1) silent generation of ideas, (2) round-robin sharing of ideas, (3) structured discussion and clarification, and (4) thematic grouping of priorities. Participants represented diverse practice settings (county hospital, community hospital, academic center, and integrated managed care consortium) and subspecialties (education, ethics, pediatrics, diversity, equity, inclusion, and informatics) across various career stages from residents to late-career physicians. Results: A total of 12 emergency physicians identified 82 initial challenges during individual ideation, which were consolidated to 48 unique challenges after removing duplicates and combining related items. The average participant contributed 6.8 (SD 2.9) challenges. These challenges were organized into a framework with 3 themes: values, privacy, and justice. The framework identified the need to address these themes across 3 time points of the precision medicine process: acquisition of data, actualization in the care setting, and the after effects of its use. This systematic organization revealed interrelated concerns spanning from data collection and bias to implementation challenges and long-term consequences for health care equity. Conclusions: Our study developed a novel framework that maps critical ethical challenges across 3 domains (values, privacy, and justice) and 3 temporal stages of precision medicine implementation. This framework identifies high-priority areas for future research and policy development, particularly around data representation, privacy protection, and equitable access. Successfully addressing these challenges is essential to realize precision medicine's potential while preserving emergency medicine's core mission as health care's safety net. ", doi="10.2196/68371", url="https://formative.jmir.org/2025/1/e68371" } @Article{info:doi/10.2196/58779, author="Liu, Guanghao and Zheng, Shixiang and He, Jun and Zhang, Zi-Mei and Wu, Ruoqiong and Yu, Yingying and Fu, Hao and Han, Li and Zhu, Haibo and Xu, Yichang and Shao, Huaguo and Yan, Haidan and Chen, Ting and Shen, Xiaopei", title="An Easy and Quick Risk-Stratified Early Forewarning Model for Septic Shock in the Intensive Care Unit: Development, Validation, and Interpretation Study", journal="J Med Internet Res", year="2025", month="Feb", day="6", volume="27", pages="e58779", keywords="septic shock", keywords="early forewarning", keywords="risk stratification", keywords="machine learning", abstract="Background: Septic shock (SS) is a syndrome with high mortality. Early forewarning and diagnosis of SS, which are critical in reducing mortality, are still challenging in clinical management. Objective: We propose a simple and fast risk-stratified forewarning model for SS to help physicians recognize patients in time. Moreover, further insights can be gained from the application of the model to improve our understanding of SS. Methods: A total of 5125 patients with sepsis from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database were divided into training, validation, and test sets. In addition, 2180 patients with sepsis from the eICU Collaborative Research Database (eICU) were used as an external validation set. We developed a simplified risk-stratified early forewarning model for SS based on the weight of evidence and logistic regression, which was compared with multi-feature complex models, and clinical characteristics among risk groups were evaluated. Results: Using only vital signs and rapid arterial blood gas test features according to feature importance, we constructed the Septic Shock Risk Predictor (SORP), with an area under the curve (AUC) of 0.9458 in the test set, which is only slightly lower than that of the optimal multi-feature complex model (0.9651). A median forewarning time of 13 hours was calculated for SS patients. 4 distinct risk groups (high, medium, low, and ultralow) were identified by the SORP 6 hours before onset, and the incidence rates of SS in the 4 risk groups in the postonset interval were 88.6\% (433/489), 34.5\% (262/760), 2.5\% (67/2707), and 0.3\% (4/1301), respectively. The severity increased significantly with increasing risk in both clinical features and survival. Clustering analysis demonstrated a high similarity of pathophysiological characteristics between the high-risk patients without SS diagnosis (NS\_HR) and the SS patients, while a significantly worse overall survival was shown in NS\_HR patients. On further exploring the characteristics of the treatment and comorbidities of the NS\_HR group, these patients demonstrated a significantly higher incidence of mean blood pressure <65 mmHg, significantly lower vasopressor use and infused volume, and more severe renal dysfunction. The above findings were further validated by multicenter eICU data. Conclusions: The SORP demonstrated accurate forewarning and a reliable risk stratification capability. Among patients forewarned as high risk, similar pathophysiological phenotypes and high mortality were observed in both those subsequently diagnosed as having SS and those without such a diagnosis. NS\_HR patients, overlooked by the Sepsis-3 definition, may provide further insights into the pathophysiological processes of SS onset and help to complement its diagnosis and precise management. The importance of precise fluid resuscitation management in SS patients with renal dysfunction is further highlighted. For convenience, an online service for the SORP has been provided. ", doi="10.2196/58779", url="https://www.jmir.org/2025/1/e58779" } @Article{info:doi/10.2196/66666, author="Rudin, S. Robert and Herman, M. Patricia and Vining, Robert", title="Addressing the ``Black Hole'' of Low Back Pain Care With Clinical Decision Support: User-Centered Design and Initial Usability Study", journal="JMIR Form Res", year="2025", month="Feb", day="4", volume="9", pages="e66666", keywords="low back pain", keywords="clinical decision support", keywords="user-centered design", keywords="usability", keywords="back pain", keywords="low back pain care", keywords="pain", keywords="clinical decision", keywords="societal burden", keywords="substantial", keywords="burden", keywords="evidence-based", keywords="treatment", keywords="diagnosis", keywords="support tool", keywords="clinicians", keywords="chiropractic", keywords="chiropractor", keywords="reviews", keywords="scenario-based interviews", keywords="interviews", abstract="Background: Low back pain (LBP) is a highly prevalent problem causing substantial personal and societal burden. Although there are specific types of LBP, each with evidence-based treatment recommendations, most patients receive a nonspecific diagnosis that does not facilitate evidence-based and individualized care. Objectives: We designed, developed, and initially tested the usability of a LBP diagnosis and treatment decision support tool based on the available evidence for use by clinicians who treat LBP, with an initial focus on chiropractic care. Methods: Our 3-step user-centered design approach consisted of identifying clinical requirements through the analysis of evidence reviews, iteratively identifying task-based user requirements and developing a working web-based prototype, and evaluating usability through scenario-based interviews and the System Usability Scale. Results: The 5 participating users had an average of 18.5 years of practicing chiropractic medicine. Clinical requirements included 44 patient interview and examination items. Of these, 13 interview items were enabled for all patients and 13 were enabled conditional on other input items. One examination item was enabled for all patients and 16 were enabled conditional on other items. One item was a synthesis of interview and examination items. These items provided evidence of 12 possible working diagnoses of which 3 were macrodiagnoses and 9 were microdiagnoses. Each diagnosis had relevant treatment recommendations and corresponding patient educational materials. User requirements focused on tasks related to inputting data, and reviewing and selecting working diagnoses, treatments, and patient education. User input led to key refinements in the design, such as organizing the input questions by microdiagnosis, adding a patient summary screen that persists during data input and when reviewing output, adding more information buttons and graphics to input questions, and providing traceability by highlighting the input items used by the clinical logic to suggest a working diagnosis. Users believed that it would be important to have the tool accessible from within an electronic health record for adoption within their workflows. The System Usability Scale score for the prototype was 84.75 (range: 67.5?95), considered as the top 10th percentile. Users believed that the tool was easy to use although it would require training and practice on the clinical content to use it effectively. With such training and practice, users believed that it would improve care and shed light on the ``black hole'' of LBP diagnosis and treatment. Conclusions: Our systematic process of defining clinical requirements and eliciting user requirements to inform a clinician-facing decision support tool produced a prototype application that was viewed positively and with enthusiasm by clinical users. With further planned development, this tool has the potential to guide clinical evaluation, inform more specific diagnosis, and encourage patient education and individualized treatment planning for patients with LBP through the application of evidence at the point of care. ", doi="10.2196/66666", url="https://formative.jmir.org/2025/1/e66666" } @Article{info:doi/10.2196/63377, author="Giebel, Denk Godwin and Raszke, Pascal and Nowak, Hartmuth and Palmowski, Lars and Adamzik, Michael and Heinz, Philipp and Tokic, Marianne and Timmesfeld, Nina and Brunkhorst, Frank and Wasem, J{\"u}rgen and Blase, Nikola", title="Problems and Barriers Related to the Use of AI-Based Clinical Decision Support Systems: Interview Study", journal="J Med Internet Res", year="2025", month="Feb", day="3", volume="27", pages="e63377", keywords="decision support", keywords="artificial intelligence", keywords="machine learning", keywords="clinical decision support system", keywords="digitalization", keywords="health care", keywords="technology", keywords="innovation", keywords="semistructured interview", keywords="qualitative", keywords="quality assurance", keywords="web-based", keywords="digital health", keywords="health informatics", abstract="Background: Digitalization is currently revolutionizing health care worldwide. A promising technology in this context is artificial intelligence (AI). The application of AI can support health care providers in their daily work in various ways. The integration of AI is particularly promising in clinical decision support systems (CDSSs). While the opportunities of this technology are numerous, the problems should not be overlooked. Objective: This study aimed to identify challenges and barriers in the context of AI-based CDSSs from the perspectives of experts across various disciplines. Methods: Semistructured expert interviews were conducted with different stakeholders. These included representatives of patients, physicians and caregivers, developers of AI-based CDSSs, researchers (studying AI in health care and social and health law), quality management and quality assurance representatives, a representative of an ethics committee, a representative of a health insurance fund, and medical product consultants. The interviews took place on the web and were recorded, transcribed, and subsequently subjected to a qualitative content analysis based on the method by Kuckartz. The analysis was conducted using MAXQDA software. Initially, the problems were separated into ``general,'' ``development,'' and ``clinical use.'' Finally, a workshop within the project consortium served to systematize the identified problems. Results: A total of 15 expert interviews were conducted, and 309 expert statements with reference to problems and barriers in the context of AI-based CDSSs were identified. These emerged in 7 problem categories: technology (46/309, 14.9\%), data (59/309, 19.1\%), user (102/309, 33\%), studies (17/309, 5.5\%), ethics (20/309, 6.5\%), law (33/309, 10.7\%), and general (32/309, 10.4\%). The problem categories were further divided into problem areas, which in turn comprised the respective problems. Conclusions: A large number of problems and barriers were identified in the context of AI-based CDSSs. These can be systematized according to the point at which they occur (``general,'' ``development,'' and ``clinical use'') or according to the problem category (``technology,'' ``data,'' ``user,'' ``studies,'' ``ethics,'' ``law,'' and ``general''). The problems identified in this work should be further investigated. They can be used as a basis for deriving solutions to optimize development, acceptance, and use of AI-based CDSSs. International Registered Report Identifier (IRRID): RR2-10.2196/preprints.62704 ", doi="10.2196/63377", url="https://www.jmir.org/2025/1/e63377", url="http://www.ncbi.nlm.nih.gov/pubmed/39899342" } @Article{info:doi/10.2196/62670, author="Heaney-Huls, Krysta and Shams, Rida and Nwefo, Ruth and Kane, Rachel and Gordon, Janna and Laffan, M. Alison and Stare, Scott and Dullabh, Prashila", title="Electronic Health Record Data Collection Practices to Advance Standardization and Interoperability of Patient Preferences for Interpretation Services: Qualitative Study", journal="J Med Internet Res", year="2025", month="Jan", day="31", volume="27", pages="e62670", keywords="health information exchange", keywords="interoperability", keywords="electronic health records", keywords="interpreter", keywords="limited English proficiency", keywords="communication barriers", abstract="Background: Poor health outcomes are well documented among patients with a non-English language preference (NELP). The use of interpreters can improve the quality of care for patients with NELP. Despite a growing and unmet need for interpretation services in the US health care system, rates of interpreter use in the care setting are consistently low. Standardized collection and exchange of patient interpretation needs can improve access to appropriate language assistance services. Objective: This study aims to examine current practices for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter in the electronic health record (EHR) and the implementation maturity and adoption level of available data standards. The paper identifies standards implementation; data collection workflows; use cases for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter; challenges to data collection and use; and opportunities to advance standardization of the interpreter needed data element to facilitate patient-centered care. Methods: We conducted a narrative review to describe the availability of terminology standards to facilitate health care organization documentation of a patient's self-reported preference for an interpreter in the EHR. Key informant discussions with EHR developers, health systems, clinicians, a practice-based research organization, a national standards collaborative, a professional health care association, and Federal agency representatives filled in gaps from the narrative review. Results: The findings indicate that health care organizations value standardized collection and exchange of patient language assistance service needs and preferences. Informants identified three use cases for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter, which are (1) person-centered care, (2) transitions of care, and (3) health care administration. The discussions revealed that EHR developers provide a data field for documenting interpreter needed data, which are routinely collected across health care organizations through commonly used data collection workflows. However, this data element is not mapped to standard terminologies, such as Logical Observation Identifiers Names and Codes (LOINC) or Systematized Medical Nomenclature for Medicine--Clinical Terminology (SNOMED-CT), consequently limiting the opportunities to electronically share these data between health systems and community-based organizations. The narrative review and key informant discussions identified three potential challenges to using information on a patient's self-reported preference for an interpreter for person-centered care and quality improvement, which are (1) lack of adoption of available data standards, (2) limited electronic exchange, and (3) patient mistrust. Conclusions: Collecting and documenting patient's self-reported interpreter preferences can improve the quality of services provided, patient care experiences, and equitable health care delivery without invoking a significant burden on the health care system. Although there is routine collection and documentation of patient interpretation needs, the lack of standardization limits the exchange of this information among health care and community-based organizations. ", doi="10.2196/62670", url="https://www.jmir.org/2025/1/e62670", url="http://www.ncbi.nlm.nih.gov/pubmed/39888652" } @Article{info:doi/10.2196/59946, author="Tsai, Chuan-Ching and Kim, Yong Jin and Chen, Qiyuan and Rowell, Brigid and Yang, Jessie X. and Kontar, Raed and Whitaker, Megan and Lester, Corey", title="Effect of Artificial Intelligence Helpfulness and Uncertainty on Cognitive Interactions with Pharmacists: Randomized Controlled Trial", journal="J Med Internet Res", year="2025", month="Jan", day="31", volume="27", pages="e59946", keywords="CDSS", keywords="eye-tracking", keywords="medication verification", keywords="uncertainty visualization", keywords="AI helpfulness and accuracy", keywords="artificial intelligence", keywords="cognitive interactions", keywords="clinical decision support system", keywords="cognition", keywords="pharmacists", keywords="medication", keywords="interaction", keywords="decision-making", keywords="cognitive processing", abstract="Background: Clinical decision support systems leveraging artificial intelligence (AI) are increasingly integrated into health care practices, including pharmacy medication verification. Communicating uncertainty in an AI prediction is viewed as an important mechanism for boosting human collaboration and trust. Yet, little is known about the effects on human cognition as a result of interacting with such types of AI advice. Objective: This study aimed to evaluate the cognitive interaction patterns of pharmacists during medication product verification when using an AI prototype. Moreover, we examine the impact of AI's assistance, both helpful and unhelpful, and the communication of uncertainty of AI-generated results on pharmacists' cognitive interaction with the prototype. Methods: In a randomized controlled trial, 30 pharmacists from professional networks each performed 200 medication verification tasks while their eye movements were recorded using an online eye tracker. Participants completed 100 verifications without AI assistance and 100 with AI assistance (either with black box help without uncertainty information or uncertainty-aware help, which displays AI uncertainty). Fixation patterns (first and last areas fixated, number of fixations, fixation duration, and dwell times) were analyzed in relation to AI help type and helpfulness. Results: Pharmacists shifted 19\%-26\% of their total fixations to AI-generated regions when these were available, suggesting the integration of AI advice in decision-making. AI assistance did not reduce the number of fixations on fill images, which remained the primary focus area. Unhelpful AI advice led to longer dwell times on reference and fill images, indicating increased cognitive processing. Displaying AI uncertainty led to longer cognitive processing times as measured by dwell times in original images. Conclusions: Unhelpful AI increases cognitive processing time in the original images. Transparency in AI is needed in ``black box'' systems, but showing more information can add a cognitive burden. Therefore, the communication of uncertainty should be optimized and integrated into clinical workflows using user-centered design to avoid increasing cognitive load or impeding clinicians' original workflow. Trial Registration: ClinicalTrials.gov NCT06795477; https://clinicaltrials.gov/study/NCT06795477 ", doi="10.2196/59946", url="https://www.jmir.org/2025/1/e59946", url="http://www.ncbi.nlm.nih.gov/pubmed/39888668" } @Article{info:doi/10.2196/67346, author="Kou, Yanqi and Ye, Shicai and Tian, Yuan and Yang, Ke and Qin, Ling and Huang, Zhe and Luo, Botao and Ha, Yanping and Zhan, Liping and Ye, Ruyin and Huang, Yujie and Zhang, Qing and He, Kun and Liang, Mouji and Zheng, Jieming and Huang, Haoyuan and Wu, Chunyi and Ge, Lei and Yang, Yuping", title="Risk Factors for Gastrointestinal Bleeding in Patients With Acute Myocardial Infarction: Multicenter Retrospective Cohort Study", journal="J Med Internet Res", year="2025", month="Jan", day="30", volume="27", pages="e67346", keywords="acute myocardial infarction", keywords="gastrointestinal bleeding", keywords="machine learning", keywords="in-hospital", keywords="prediction model", abstract="Background: Gastrointestinal bleeding (GIB) is a severe and potentially life-threatening complication in patients with acute myocardial infarction (AMI), significantly affecting prognosis during hospitalization. Early identification of high-risk patients is essential to reduce complications, improve outcomes, and guide clinical decision-making. Objective: This study aimed to develop and validate a machine learning (ML)--based model for predicting in-hospital GIB in patients with AMI, identify key risk factors, and evaluate the clinical applicability of the model for risk stratification and decision support. Methods: A multicenter retrospective cohort study was conducted, including 1910 patients with AMI from the Affiliated Hospital of Guangdong Medical University (2005-2024). Patients were divided into training (n=1575) and testing (n=335) cohorts based on admission dates. For external validation, 1746 patients with AMI were included in the publicly available MIMIC-IV (Medical Information Mart for Intensive Care IV) database. Propensity score matching was adjusted for demographics, and the Boruta algorithm identified key predictors. A total of 7 ML algorithms---logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest (RF), extreme gradient boosting, and neural networks---were trained using 10-fold cross-validation. The models were evaluated for the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, recall, F1-score, and decision curve analysis. Shapley additive explanations analysis ranked variable importance. Kaplan-Meier survival analysis evaluated the impact of GIB on short-term survival. Multivariate logistic regression assessed the relationship between coronary heart disease (CHD) and in-hospital GIB after adjusting for clinical variables. Results: The RF model outperformed other ML models, achieving an area under the receiver operating characteristic curve of 0.77 in the training cohort, 0.77 in the testing cohort, and 0.75 in the validation cohort. Key predictors included red blood cell count, hemoglobin, maximal myoglobin, hematocrit, CHD, and other variables, all of which were strongly associated with GIB risk. Decision curve analysis demonstrated the clinical use of the RF model for early risk stratification. Kaplan-Meier survival analysis showed no significant differences in 7- and 15-day survival rates between patients with AMI with and without GIB (P=.83 for 7-day survival and P=.87 for 15-day survival). Multivariate logistic regression showed that CHD was an independent risk factor for in-hospital GIB (odds ratio 2.79, 95\% CI 2.09-3.74). Stratified analyses by sex, age, occupation, marital status, and other subgroups consistently showed that the association between CHD and GIB remained robust across all subgroups. Conclusions: The ML-based RF model provides a robust and clinically applicable tool for predicting in-hospital GIB in patients with AMI. By leveraging routinely available clinical and laboratory data, the model supports early risk stratification and personalized preventive strategies. ", doi="10.2196/67346", url="https://www.jmir.org/2025/1/e67346" } @Article{info:doi/10.2196/62704, author="Raszke, Pascal and Giebel, Denk Godwin and Abels, Carina and Wasem, J{\"u}rgen and Adamzik, Michael and Nowak, Hartmuth and Palmowski, Lars and Heinz, Philipp and Mreyen, Silke and Timmesfeld, Nina and Tokic, Marianne and Brunkhorst, Martin Frank and Blase, Nikola", title="User-Oriented Requirements for Artificial Intelligence--Based Clinical Decision Support Systems in Sepsis: Protocol for a Multimethod Research Project", journal="JMIR Res Protoc", year="2025", month="Jan", day="30", volume="14", pages="e62704", keywords="medical informatics", keywords="artificial intelligence", keywords="machine learning", keywords="computational intelligence", keywords="clinical decision support systems", keywords="CDSS", keywords="decision support", keywords="sepsis", keywords="bloodstream infection", abstract="Background: Artificial intelligence (AI)--based clinical decision support systems (CDSS) have been developed for several diseases. However, despite the potential to improve the quality of care and thereby positively impact patient-relevant outcomes, the majority of AI-based CDSS have not been adopted in standard care. Possible reasons for this include barriers in the implementation and a nonuser-oriented development approach, resulting in reduced user acceptance. Objective: This research project has 2 objectives. First, problems and corresponding solutions that hinder or support the development and implementation of AI-based CDSS are identified. Second, the research project aims to increase user acceptance by creating a user-oriented requirement profile, using the example of sepsis. Methods: The research project is based on a multimethod approach combining (1) a scoping review, (2) focus groups with physicians and professional caregivers, and (3) semistructured interviews with relevant stakeholders. The research modules mentioned provide the basis for the development of a (4) survey, including a discrete choice experiment (DCE) with physicians. A minimum of 6667 physicians with expertise in the clinical picture of sepsis are contacted for this purpose. The survey is followed by the development of a requirement profile for AI-based CDSS and the derivation of policy recommendations for action, which are evaluated in a (5) expert roundtable discussion. Results: The multimethod research project started in November 2022. It provides an overview of the barriers and corresponding solutions related to the development and implementation of AI-based CDSS. Using sepsis as an example, a user-oriented requirement profile for AI-based CDSS is developed. The scoping review has been concluded and the qualitative modules have been subjected to analysis. The start of the survey, including the DCE, was at the end of July 2024. Conclusions: The results of the research project represent the first attempt to create a comprehensive user-oriented requirement profile for the development of sepsis-specific AI-based CDSS. In addition, general recommendations are derived, in order to reduce barriers in the development and implementation of AI-based CDSS. The findings of this research project have the potential to facilitate the integration of AI-based CDSS into standard care in the long term. International Registered Report Identifier (IRRID): DERR1-10.2196/62704 ", doi="10.2196/62704", url="https://www.researchprotocols.org/2025/1/e62704" } @Article{info:doi/10.2196/62865, author="Gautam, Dipak and Kellmeyer, Philipp", title="Exploring the Credibility of Large Language Models for Mental Health Support: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2025", month="Jan", day="29", volume="14", pages="e62865", keywords="large language model", keywords="LLM", keywords="mental health", keywords="explainability", keywords="credibility", keywords="mobile phone", abstract="Background: The rapid evolution of large language models (LLMs), such as Bidirectional Encoder Representations from Transformers (BERT; Google) and GPT (OpenAI), has introduced significant advancements in natural language processing. These models are increasingly integrated into various applications, including mental health support. However, the credibility of LLMs in providing reliable and explainable mental health information and support remains underexplored. Objective: This scoping review systematically maps the factors influencing the credibility of LLMs in mental health support, including reliability, explainability, and ethical considerations. The review is expected to offer critical insights for practitioners, researchers, and policy makers, guiding future research and policy development. These findings will contribute to the responsible integration of LLMs into mental health care, with a focus on maintaining ethical standards and user trust. Methods: This review follows PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines and the Joanna Briggs Institute (JBI) methodology. Eligibility criteria include studies that apply transformer-based generative language models in mental health support, such as BERT and GPT. Sources include PsycINFO, MEDLINE via PubMed, Web of Science, IEEE Xplore, and ACM Digital Library. A systematic search of studies from 2019 onward will be conducted and updated until October 2024. Data will be synthesized qualitatively. The Population, Concept, and Context framework will guide the inclusion criteria. Two independent reviewers will screen and extract data, resolving discrepancies through discussion. Data will be synthesized and presented descriptively. Results: As of September 2024, this study is currently in progress, with the systematic search completed and the screening phase ongoing. We expect to complete data extraction by early November 2024 and synthesis by late November 2024. Conclusions: This scoping review will map the current evidence on the credibility of LLMs in mental health support. It will identify factors influencing the reliability, explainability, and ethical considerations of these models, providing insights for practitioners, researchers, policy makers, and users. These findings will fill a critical gap in the literature and inform future research, practice, and policy development, ensuring the responsible integration of LLMs in mental health services. International Registered Report Identifier (IRRID): DERR1-10.2196/62865 ", doi="10.2196/62865", url="https://www.researchprotocols.org/2025/1/e62865" } @Article{info:doi/10.2196/63809, author="Thomas, Julia and Lucht, Antonia and Segler, Jacob and Wundrack, Richard and Mich{\'e}, Marcel and Lieb, Roselind and Kuchinke, Lars and Meinlschmidt, Gunther", title="An Explainable Artificial Intelligence Text Classifier for Suicidality Prediction in Youth Crisis Text Line Users: Development and Validation Study", journal="JMIR Public Health Surveill", year="2025", month="Jan", day="29", volume="11", pages="e63809", keywords="deep learning", keywords="explainable artificial intelligence (XAI)", keywords="large language model (LLM)", keywords="machine learning", keywords="neural network", keywords="prevention", keywords="risk monitoring", keywords="suicide", keywords="transformer model", keywords="suicidality", keywords="suicidal ideation", keywords="self-murder", keywords="self-harm", keywords="youth", keywords="adolescent", keywords="adolescents", keywords="public health", keywords="language model", keywords="language models", keywords="chat protocols", keywords="crisis helpline", keywords="help-seeking behaviors", keywords="German", keywords="Shapley", keywords="decision-making", keywords="mental health", keywords="health informatics", keywords="mobile phone", abstract="Background: Suicide represents a critical public health concern, and machine learning (ML) models offer the potential for identifying at-risk individuals. Recent studies using benchmark datasets and real-world social media data have demonstrated the capability of pretrained large language models in predicting suicidal ideation and behaviors (SIB) in speech and text. Objective: This study aimed to (1) develop and implement ML methods for predicting SIBs in a real-world crisis helpline dataset, using transformer-based pretrained models as a foundation; (2) evaluate, cross-validate, and benchmark the model against traditional text classification approaches; and (3) train an explainable model to highlight relevant risk-associated features. Methods: We analyzed chat protocols from adolescents and young adults (aged 14-25 years) seeking assistance from a German crisis helpline. An ML model was developed using a transformer-based language model architecture with pretrained weights and long short-term memory layers. The model predicted suicidal ideation (SI) and advanced suicidal engagement (ASE), as indicated by composite Columbia-Suicide Severity Rating Scale scores. We compared model performance against a classical word-vector-based ML model. We subsequently computed discrimination, calibration, clinical utility, and explainability information using a Shapley Additive Explanations value-based post hoc estimation model. Results: The dataset comprised 1348 help-seeking encounters (1011 for training and 337 for testing). The transformer-based classifier achieved a macroaveraged area under the curve (AUC) receiver operating characteristic (ROC) of 0.89 (95\% CI 0.81-0.91) and an overall accuracy of 0.79 (95\% CI 0.73-0.99). This performance surpassed the word-vector-based baseline model (AUC-ROC=0.77, 95\% CI 0.64-0.90; accuracy=0.61, 95\% CI 0.61-0.80). The transformer model demonstrated excellent prediction for nonsuicidal sessions (AUC-ROC=0.96, 95\% CI 0.96-0.99) and good prediction for SI and ASE, with AUC-ROCs of 0.85 (95\% CI 0.97-0.86) and 0.87 (95\% CI 0.81-0.88), respectively. The Brier Skill Score indicated a 44\% improvement in classification performance over the baseline model. The Shapley Additive Explanations model identified language features predictive of SIBs, including self-reference, negation, expressions of low self-esteem, and absolutist language. Conclusions: Neural networks using large language model--based transfer learning can accurately identify SI and ASE. The post hoc explainer model revealed language features associated with SI and ASE. Such models may potentially support clinical decision-making in suicide prevention services. Future research should explore multimodal input features and temporal aspects of suicide risk. ", doi="10.2196/63809", url="https://publichealth.jmir.org/2025/1/e63809" } @Article{info:doi/10.2196/64188, author="Jiang, Yiqun and Li, Qing and Huang, Yu-Li and Zhang, Wenli", title="Urgency Prediction for Medical Laboratory Tests Through Optimal Sparse Decision Tree: Case Study With Echocardiograms", journal="JMIR AI", year="2025", month="Jan", day="29", volume="4", pages="e64188", keywords="interpretable machine learning", keywords="urgency prediction", keywords="appointment scheduling", keywords="echocardiogram", keywords="health care management", abstract="Background: In the contemporary realm of health care, laboratory tests stand as cornerstone components, driving the advancement of precision medicine. These tests offer intricate insights into a variety of medical conditions, thereby facilitating diagnosis, prognosis, and treatments. However, the accessibility of certain tests is hindered by factors such as high costs, a shortage of specialized personnel, or geographic disparities, posing obstacles to achieving equitable health care. For example, an echocardiogram is a type of laboratory test that is extremely important and not easily accessible. The increasing demand for echocardiograms underscores the imperative for more efficient scheduling protocols. Despite this pressing need, limited research has been conducted in this area. Objective: The study aims to develop an interpretable machine learning model for determining the urgency of patients requiring echocardiograms, thereby aiding in the prioritization of scheduling procedures. Furthermore, this study aims to glean insights into the pivotal attributes influencing the prioritization of echocardiogram appointments, leveraging the high interpretability of the machine learning model. Methods: Empirical and predictive analyses have been conducted to assess the urgency of patients based on a large real-world echocardiogram appointment dataset (ie, 34,293 appointments) sourced from electronic health records encompassing administrative information, referral diagnosis, and underlying patient conditions. We used a state-of-the-art interpretable machine learning algorithm, the optimal sparse decision tree (OSDT), renowned for its high accuracy and interpretability, to investigate the attributes pertinent to echocardiogram appointments. Results: The method demonstrated satisfactory performance (F1-score=36.18\% with an improvement of 1.7\% and F2-score=28.18\% with an improvement of 0.79\% by the best-performing baseline model) in comparison to the best-performing baseline model. Moreover, due to its high interpretability, the results provide valuable medical insights regarding the identification of urgent patients for tests through the extraction of decision rules from the OSDT model. Conclusions: The method demonstrated state-of-the-art predictive performance, affirming its effectiveness. Furthermore, we validate the decision rules derived from the OSDT model by comparing them with established medical knowledge. These interpretable results (eg, attribute importance and decision rules from the OSDT model) underscore the potential of our approach in prioritizing patient urgency for echocardiogram appointments and can be extended to prioritize other laboratory test appointments using electronic health record data. ", doi="10.2196/64188", url="https://ai.jmir.org/2025/1/e64188", url="http://www.ncbi.nlm.nih.gov/pubmed/39879091" } @Article{info:doi/10.2196/63109, author="Ghaffar, Faisal and Furtado, M. Nadine and Ali, Imad and Burns, Catherine", title="Diagnostic Decision-Making Variability Between Novice and Expert Optometrists for Glaucoma: Comparative Analysis to Inform AI System Design", journal="JMIR Med Inform", year="2025", month="Jan", day="29", volume="13", pages="e63109", keywords="decision-making", keywords="human-centered AI design", keywords="human factors", keywords="experts versus novices differences", keywords="optometry", keywords="glaucoma diagnosis", keywords="experts versus novices", keywords="glaucoma", keywords="eye disease", keywords="vision", keywords="vision impairment", keywords="comparative analysis", keywords="methodology", keywords="optometrist", keywords="artificial intelligence", keywords="AI", keywords="diagnostic accuracy", keywords="consistency", keywords="clinical data", keywords="risk assessment", keywords="progression analysis", abstract="Background: While expert optometrists tend to rely on a deep understanding of the disease and intuitive pattern recognition, those with less experience may depend more on extensive data, comparisons, and external guidance. Understanding these variations is important for developing artificial intelligence (AI) systems that can effectively support optometrists with varying degrees of experience and minimize decision inconsistencies. Objective: The main objective of this study is to identify and analyze the variations in diagnostic decision-making approaches between novice and expert optometrists. By understanding these variations, we aim to provide guidelines for the development of AI systems that can support optometrists with varying levels of expertise. These guidelines will assist in developing AI systems for glaucoma diagnosis, ultimately enhancing the diagnostic accuracy of optometrists and minimizing inconsistencies in their decisions. Methods: We conducted in-depth interviews with 14 optometrists using within-subject design, including both novices and experts, focusing on their approaches to glaucoma diagnosis. The responses were coded and analyzed using a mixed method approach incorporating both qualitative and quantitative analysis. Statistical tests such as Mann-Whitney U and chi-square tests were used to find significance in intergroup variations. These findings were further supported by themes extracted through qualitative analysis, which helped to identify decision-making patterns and understand variations in their approaches. Results: Both groups showed lower concordance rates with clinical diagnosis, with experts showing almost double (7/35, 20\%) concordance rates with limited data in comparison to novices (7/69, 10\%), highlighting the impact of experience and data availability on clinical judgment; this rate increased to nearly 40\% for both groups (experts: 5/12, 42\% and novices: 8/21, 42\%) when they had access to complete historical data of the patient. We also found statistically significant intergroup differences between the first visits and subsequent visits with a P value of less than .05 on the Mann-Whitney U test in many assessments. Furthermore, approaches to the exam assessment and decision differed significantly: experts emphasized comprehensive risk assessments and progression analysis, demonstrating cognitive efficiency and intuitive decision-making, while novices relied more on structured, analytical methods and external references. Additionally, significant variations in patient follow-up times were observed, with a P value of <.001 on the chi-square test, showing a stronger influence of experience on follow-up time decisions. Conclusions: The study highlights significant variations in the decision-making process of novice and expert optometrists in glaucoma diagnosis, with experience playing a key role in accuracy, approach, and management. These findings demonstrate the critical need for AI systems tailored to varying levels of expertise. They also provide insights for the future design of AI systems aimed at enhancing the diagnostic accuracy of optometrists and consistency across different expertise levels, ultimately improving patient outcomes in optometric practice. ", doi="10.2196/63109", url="https://medinform.jmir.org/2025/1/e63109", url="http://www.ncbi.nlm.nih.gov/pubmed/39879089" } @Article{info:doi/10.2196/60653, author="Jones, Tudor Owain and Calanzani, Natalia and Scott, E. Suzanne and Matin, N. Rubeta and Emery, Jon and Walter, M. Fiona", title="User and Developer Views on Using AI Technologies to Facilitate the Early Detection of Skin Cancers in Primary Care Settings: Qualitative Semistructured Interview Study", journal="JMIR Cancer", year="2025", month="Jan", day="28", volume="11", pages="e60653", keywords="artificial intelligence", keywords="AI", keywords="machine learning", keywords="ML", keywords="primary care", keywords="skin cancer", keywords="melanoma", keywords="qualitative research", keywords="mobile phone", abstract="Background: Skin cancers, including melanoma and keratinocyte cancers, are among the most common cancers worldwide, and their incidence is rising in most populations. Earlier detection of skin cancer leads to better outcomes for patients. Artificial intelligence (AI) technologies have been applied to skin cancer diagnosis, but many technologies lack clinical evidence and/or the appropriate regulatory approvals. There are few qualitative studies examining the views of relevant stakeholders or evidence about the implementation and positioning of AI technologies in the skin cancer diagnostic pathway. Objective: This study aimed to understand the views of several stakeholder groups on the use of AI technologies to facilitate the early diagnosis of skin cancer, including patients, members of the public, general practitioners, primary care nurse practitioners, dermatologists, and AI researchers. Methods: This was a qualitative, semistructured interview study with 29 stakeholders. Participants were purposively sampled based on age, sex, and geographical location. We conducted the interviews via Zoom between September 2022 and May 2023. Transcribed recordings were analyzed using thematic framework analysis. The framework for the Nonadoption, Abandonment, and Challenges to Scale-Up, Spread, and Sustainability was used to guide the analysis to help understand the complexity of implementing diagnostic technologies in clinical settings. Results: Major themes were ``the position of AI in the skin cancer diagnostic pathway'' and ``the aim of the AI technology''; cross-cutting themes included trust, usability and acceptability, generalizability, evaluation and regulation, implementation, and long-term use. There was no clear consensus on where AI should be placed along the skin cancer diagnostic pathway, but most participants saw the technology in the hands of either patients or primary care practitioners. Participants were concerned about the quality of the data used to develop and test AI technologies and the impact this could have on their accuracy in clinical use with patients from a range of demographics and the risk of missing skin cancers. Ease of use and not increasing the workload of already strained health care services were important considerations for participants. Health care professionals and AI researchers reported a lack of established methods of evaluating and regulating AI technologies. Conclusions: This study is one of the first to examine the views of a wide range of stakeholders on the use of AI technologies to facilitate early diagnosis of skin cancer. The optimal approach and position in the diagnostic pathway for these technologies have not yet been determined. AI technologies need to be developed and implemented carefully and thoughtfully, with attention paid to the quality and representativeness of the data used for development, to achieve their potential. ", doi="10.2196/60653", url="https://cancer.jmir.org/2025/1/e60653" } @Article{info:doi/10.2196/63241, author="Baetzner, Sabine Anke and Hill, Yannick and Roszipal, Benjamin and Gerwann, Sol{\`e}ne and Beutel, Matthias and Birrenbach, Tanja and Karlseder, Markus and Mohr, Stefan and Salg, Alexander Gabriel and Schrom-Feiertag, Helmut and Frenkel, Ottilie Marie and Wrzus, Cornelia", title="Mass Casualty Incident Training in Immersive Virtual Reality: Quasi-Experimental Evaluation of Multimethod Performance Indicators", journal="J Med Internet Res", year="2025", month="Jan", day="27", volume="27", pages="e63241", keywords="prehospital decision-making", keywords="disaster medicine", keywords="emergency medicine", keywords="mass casualty incident", keywords="medical education", keywords="eye tracking", keywords="emergency simulation", keywords="virtual reality", abstract="Background: Immersive virtual reality (iVR) has emerged as a training method to prepare medical first responders (MFRs) for mass casualty incidents (MCIs) and disasters in a resource-efficient, flexible, and safe manner. However, systematic evaluations and validations of potential performance indicators for virtual MCI training are still lacking. Objective: This study aimed to investigate whether different performance indicators based on visual attention, triage performance, and information transmission can be effectively extended to MCI training in iVR by testing if they can discriminate between different levels of expertise. Furthermore, the study examined the extent to which such objective indicators correlate with subjective performance assessments. Methods: A total of 76 participants (mean age 25.54, SD 6.01 y; 45/76, 59\% male) with different medical expertise (MFRs: paramedics and emergency physicians; non-MFRs: medical students, in-hospital nurses, and other physicians) participated in 5 virtual MCI scenarios of varying complexity in a randomized order. Tasks involved assessing the situation, triaging virtual patients, and transmitting relevant information to a control center. Performance indicators included eye-tracking--based visual attention, triage accuracy, triage speed, information transmission efficiency, and self-assessment of performance. Expertise was determined based on the occupational group (39/76, 51\% MFRs vs 37/76, 49\% non-MFRs) and a knowledge test with patient vignettes. Results: Triage accuracy (d=0.48), triage speed (d=0.42), and information transmission efficiency (d=1.13) differentiated significantly between MFRs and non-MFRs. In addition, higher triage accuracy was significantly associated with higher triage knowledge test scores (Spearman $\rho$=0.40). Visual attention was not significantly associated with expertise. Furthermore, subjective performance was not correlated with any other performance indicator. Conclusions: iVR-based MCI scenarios proved to be a valuable tool for assessing the performance of MFRs. The results suggest that iVR could be integrated into current MCI training curricula to provide frequent, objective, and potentially (partly) automated performance assessments in a controlled environment. In particular, performance indicators, such as triage accuracy, triage speed, and information transmission efficiency, capture multiple aspects of performance and are recommended for integration. While the examined visual attention indicators did not function as valid performance indicators in this study, future research could further explore visual attention in MCI training and examine other indicators, such as holistic gaze patterns. Overall, the results underscore the importance of integrating objective indicators to enhance trainers' feedback and provide trainees with guidance on evaluating and reflecting on their own performance. ", doi="10.2196/63241", url="https://www.jmir.org/2025/1/e63241" } @Article{info:doi/10.2196/58981, author="Rowley, AK Elizabeth and Mitchell, K. Patrick and Yang, Duck-Hye and Lewis, Ned and Dixon, E. Brian and Vazquez-Benitez, Gabriela and Fadel, F. William and Essien, J. Inih and Naleway, L. Allison and Stenehjem, Edward and Ong, C. Toan and Gaglani, Manjusha and Natarajan, Karthik and Embi, Peter and Wiegand, E. Ryan and Link-Gelles, Ruth and Tenforde, W. Mark and Fireman, Bruce", title="Methods to Adjust for Confounding in Test-Negative Design COVID-19 Effectiveness Studies: Simulation Study", journal="JMIR Form Res", year="2025", month="Jan", day="27", volume="9", pages="e58981", keywords="disease risk score", keywords="propensity score", keywords="vaccine effectiveness", keywords="COVID-19", keywords="simulation study", keywords="usefulness", keywords="comorbidity", keywords="assessment", abstract="Background: Real-world COVID-19 vaccine effectiveness (VE) studies are investigating exposures of increasing complexity accounting for time since vaccination. These studies require methods that adjust for the confounding that arises when morbidities and demographics are associated with vaccination and the risk of outcome events. Methods based on propensity scores (PS) are well-suited to this when the exposure is dichotomous, but present challenges when the exposure is multinomial. Objective: This simulation study aimed to investigate alternative methods to adjust for confounding in VE studies that have a test-negative design. Methods: Adjustment for a disease risk score (DRS) is compared with multivariable logistic regression. Both stratification on the DRS and direct covariate adjustment of the DRS are examined. Multivariable logistic regression with all the covariates and with a limited subset of key covariates is considered. The performance of VE estimators is evaluated across a multinomial vaccination exposure in simulated datasets. Results: Bias in VE estimates from multivariable models ranged from --5.3\% to 6.1\% across 4 levels of vaccination. Standard errors of VE estimates were unbiased, and 95\% coverage probabilities were attained in most scenarios. The lowest coverage in the multivariable scenarios was 93.7\% (95\% CI 92.2\%-95.2\%) and occurred in the multivariable model with key covariates, while the highest coverage in the multivariable scenarios was 95.3\% (95\% CI 94.0\%-96.6\%) and occurred in the multivariable model with all covariates. Bias in VE estimates from DRS-adjusted models was low, ranging from --2.2\% to 4.2\%. However, the DRS-adjusted models underestimated the standard errors of VE estimates, with coverage sometimes below the 95\% level. The lowest coverage in the DRS scenarios was 87.8\% (95\% CI 85.8\%-89.8\%) and occurred in the direct adjustment for the DRS model. The highest coverage in the DRS scenarios was 94.8\% (95\% CI 93.4\%-96.2\%) and occurred in the model that stratified on DRS. Although variation in the performance of VE estimates occurred across modeling strategies, variation in performance was also present across exposure groups. Conclusions: Overall, models using a DRS to adjust for confounding performed adequately but not as well as the multivariable models that adjusted for covariates individually. ", doi="10.2196/58981", url="https://formative.jmir.org/2025/1/e58981" } @Article{info:doi/10.2196/64649, author="Liu, Weiqi and Wu, You and Zheng, Zhuozhao and Bittle, Mark and Yu, Wei and Kharrazi, Hadi", title="Enhancing Diagnostic Accuracy of Lung Nodules in Chest Computed Tomography Using Artificial Intelligence: Retrospective Analysis", journal="J Med Internet Res", year="2025", month="Jan", day="27", volume="27", pages="e64649", keywords="artificial intelligence", keywords="diagnostic accuracy", keywords="lung nodule", keywords="radiology", keywords="AI system", abstract="Background: Uncertainty in the diagnosis of lung nodules is a challenge for both patients and physicians. Artificial intelligence (AI) systems are increasingly being integrated into medical imaging to assist diagnostic procedures. However, the accuracy of AI systems in identifying and measuring lung nodules on chest computed tomography (CT) scans remains unclear, which requires further evaluation. Objective: This study aimed to evaluate the impact of an AI-assisted diagnostic system on the diagnostic efficiency of radiologists. It specifically examined the report modification rates and missed and misdiagnosed rates of junior radiologists with and without AI assistance. Methods: We obtained effective data from 12,889 patients in 2 tertiary hospitals in Beijing before and after the implementation of the AI system, covering the period from April 2018 to March 2022. Diagnostic reports written by both junior and senior radiologists were included in each case. Using reports by senior radiologists as a reference, we compared the modification rates of reports written by junior radiologists with and without AI assistance. We further evaluated alterations in lung nodule detection capability over 3 years after the integration of the AI system. Evaluation metrics of this study include lung nodule detection rate, accuracy, false negative rate, false positive rate, and positive predictive value. The statistical analyses included descriptive statistics and chi-square, Cochran-Armitage, and Mann-Kendall tests. Results: The AI system was implemented in Beijing Anzhen Hospital (Hospital A) in January 2019 and Tsinghua Changgung Hospital (Hospital C) in June 2021. The modification rate of diagnostic reports in the detection of lung nodules increased from 4.73\% to 7.23\% ($\chi$21=12.15; P<.001) at Hospital A. In terms of lung nodule detection rates postimplementation, Hospital C increased from 46.19\% to 53.45\% ($\chi$21=25.48; P<.001) and Hospital A increased from 39.29\% to 55.22\% ($\chi$21=122.55; P<.001). At Hospital A, the false negative rate decreased from 8.4\% to 5.16\% ($\chi$21=9.85; P=.002), while the false positive rate increased from 2.36\% to 9.77\% ($\chi$21=53.48; P<.001). The detection accuracy demonstrated a decrease from 93.33\% to 92.23\% for Hospital A and from 95.27\% to 92.77\% for Hospital C. Regarding the changes in lung nodule detection capability over a 3-year period following the integration of the AI system, the detection rates for lung nodules exhibited a modest increase from 54.6\% to 55.84\%, while the overall accuracy demonstrated a slight improvement from 92.79\% to 93.92\%. Conclusions: The AI system enhanced lung nodule detection, offering the possibility of earlier disease identification and timely intervention. Nevertheless, the initial reduction in accuracy underscores the need for standardized diagnostic criteria and comprehensive training for radiologists to maximize the effectiveness of AI-enabled diagnostic systems. ", doi="10.2196/64649", url="https://www.jmir.org/2025/1/e64649" } @Article{info:doi/10.2196/63548, author="Auf, Hassan and Svedberg, Petra and Nygren, Jens and Nair, Monika and Lundgren, E. Lina", title="The Use of AI in Mental Health Services to Support Decision-Making: Scoping Review", journal="J Med Internet Res", year="2025", month="Jan", day="24", volume="27", pages="e63548", keywords="artificial intelligence", keywords="AI", keywords="mental health", keywords="decision-making", keywords="shared decision-making", keywords="implementation", keywords="human-computer interaction", abstract="Background: Recent advancements in artificial intelligence (AI) have changed the care processes in mental health, particularly in decision-making support for health care professionals and individuals with mental health problems. AI systems provide support in several domains of mental health, including early detection, diagnostics, treatment, and self-care. The use of AI systems in care flows faces several challenges in relation to decision-making support, stemming from technology, end-user, and organizational perspectives with the AI disruption of care processes. Objective: This study aims to explore the use of AI systems in mental health to support decision-making, focusing on 3 key areas: the characteristics of research on AI systems in mental health; the current applications, decisions, end users, and user flow of AI systems to support decision-making; and the evaluation of AI systems for the implementation of decision-making support, including elements influencing the long-term use. Methods: A scoping review of empirical evidence was conducted across 5 databases: PubMed, Scopus, PsycINFO, Web of Science, and CINAHL. The searches were restricted to peer-reviewed articles published in English after 2011. The initial screening at the title and abstract level was conducted by 2 reviewers, followed by full-text screening based on the inclusion criteria. Data were then charted and prepared for data analysis. Results: Of a total of 1217 articles, 12 (0.99\%) met the inclusion criteria. These studies predominantly originated from high-income countries. The AI systems were used in health care, self-care, and hybrid care contexts, addressing a variety of mental health problems. Three types of AI systems were identified in terms of decision-making support: diagnostic and predictive AI, treatment selection AI, and self-help AI. The dynamics of the type of end-user interaction and system design were diverse in complexity for the integration and use of the AI systems to support decision-making in care processes. The evaluation of the use of AI systems highlighted several challenges impacting the implementation and functionality of the AI systems in care processes, including factors affecting accuracy, increase of demand, trustworthiness, patient-physician communication, and engagement with the AI systems. Conclusions: The design, development, and implementation of AI systems to support decision-making present substantial challenges for the sustainable use of this technology in care processes. The empirical evidence shows that the evaluation of the use of AI systems in mental health is still in its early stages, with need for more empirically focused research on real-world use. The key aspects requiring further investigation include the evaluation of the use of AI-supported decision-making from human-AI interaction and human-computer interaction perspectives, longitudinal implementation studies of AI systems in mental health to assess the use, and the integration of shared decision-making in AI systems. ", doi="10.2196/63548", url="https://www.jmir.org/2025/1/e63548" } @Article{info:doi/10.2196/56155, author="Lin, Tai-Han and Chung, Hsing-Yi and Jian, Ming-Jr and Chang, Chih-Kai and Lin, Hung-Hsin and Yen, Chiung-Tzu and Tang, Sheng-Hui and Pan, Pin-Ching and Perng, Cherng-Lih and Chang, Feng-Yee and Chen, Chien-Wen and Shang, Hung-Sheng", title="AI-Driven Innovations for Early Sepsis Detection by Combining Predictive Accuracy With Blood Count Analysis in an Emergency Setting: Retrospective Study", journal="J Med Internet Res", year="2025", month="Jan", day="24", volume="27", pages="e56155", keywords="sepsis", keywords="artificial intelligence", keywords="critical care", keywords="complete blood count analysis", keywords="CBC analysis", keywords="artificial intelligence clinical decision support systems", keywords="AI-CDSS", abstract="Background: Sepsis, a critical global health challenge, accounted for approximately 20\% of worldwide deaths in 2017. Although the Sequential Organ Failure Assessment (SOFA) score standardizes the diagnosis of organ dysfunction, early sepsis detection remains challenging due to its insidious symptoms. Current diagnostic methods, including clinical assessments and laboratory tests, frequently lack the speed and specificity needed for timely intervention, particularly in vulnerable populations such as older adults, intensive care unit (ICU) patients, and those with compromised immune systems. While bacterial cultures remain vital, their time-consuming nature and susceptibility to false negatives limit their effectiveness. Even promising existing machine learning approaches are restricted by reliance on complex clinical factors that could delay results, underscoring the need for faster, simpler, and more reliable diagnostic strategies. Objective: This study introduces innovative machine learning models using complete blood count with differential (CBC+DIFF) data---a routine, minimally invasive test that assesses immune response through blood cell measurements, critical for sepsis identification. The primary objective was to implement this model within an artificial intelligence--clinical decision support system (AI-CDSS) to enhance early sepsis detection and management in critical care settings. Methods: This retrospective study at Tri-Service General Hospital (September to December 2023) analyzed 746 ICU patients with suspected pneumonia-induced sepsis (supported by radiographic evidence and a SOFA score increase of ?2 points), alongside 746 stable outpatients as controls. Sepsis infection sources were confirmed through positive sputum, blood cultures, or FilmArray results. The dataset incorporated both basic hematological factors and advanced neutrophil characteristics (side scatter light intensity, cytoplasmic complexity, and neutrophil-to-lymphocyte ratio), with data from September to November used for training and data from December used for validation. Machine learning models, including light gradient boosting machine (LGBM), random forest classifier, and gradient boosting classifier, were developed using CBC+DIFF data and were assessed using metrics such as area under the curve, sensitivity, and specificity. The best-performing model was integrated into the AI-CDSS, with its implementation supported through workshops and training sessions. Results: Pathogen identification in ICU patients found 243 FilmArray-positive, 411 culture-positive, and 92 undetected cases, yielding a final dataset of 654 (43.8\%) sepsis cases out of 1492 total cases. The machine learning models demonstrated high predictive accuracy, with LGBM achieving the highest area under the curve (0.90), followed by the random forest classifier (0.89) and gradient boosting classifier (0.88). The best-performing LGBM model was selected and integrated as the core of our AI-CDSS, which was built on a web interface to facilitate rapid sepsis risk assessment using CBC+DIFF data. Conclusions: This study demonstrates that by providing streamlined predictions using CBC+DIFF data without requiring extensive clinical parameters, the AI-CDSS can be seamlessly integrated into clinical workflows, enhancing rapid, accurate identification of sepsis and improving patient care and treatment timeliness. ", doi="10.2196/56155", url="https://www.jmir.org/2025/1/e56155" } @Article{info:doi/10.2196/56946, author="Cabon, Sandie and Brihi, Sarra and Fezzani, Riadh and Pierre-Jean, Morgane and Cuggia, Marc and Bouzill{\'e}, Guillaume", title="Combining a Risk Factor Score Designed From Electronic Health Records With a Digital Cytology Image Scoring System to Improve Bladder Cancer Detection: Proof-of-Concept Study", journal="J Med Internet Res", year="2025", month="Jan", day="22", volume="27", pages="e56946", keywords="bladder cancer", keywords="clinical data reuse", keywords="multimodal data fusion", keywords="clinical decision support", keywords="machine learning", keywords="risk factors", keywords="electronic health records", keywords="detection", keywords="mortality", keywords="therapeutic intervention", keywords="diagnostic tools", keywords="digital cytology", keywords="image-based model", keywords="clinical data", keywords="algorithms", keywords="patient", keywords="biological information", abstract="Background: To reduce the mortality related to bladder cancer, efforts need to be concentrated on early detection of the disease for more effective therapeutic intervention. Strong risk factors (eg, smoking status, age, professional exposure) have been identified, and some diagnostic tools (eg, by way of cystoscopy) have been proposed. However, to date, no fully satisfactory (noninvasive, inexpensive, high-performance) solution for widespread deployment has been proposed. Some new models based on cytology image classification were recently developed and bring good perspectives, but there are still avenues to explore to improve their performance. Objective: Our team aimed to evaluate the benefit of combining the reuse of massive clinical data to build a risk factor model and a digital cytology image--based model (VisioCyt) for bladder cancer detection. Methods: The first step relied on designing a predictive model based on clinical data (ie, risk factors identified in the literature) extracted from the clinical data warehouse of the Rennes Hospital and machine learning algorithms (logistic regression, random forest, and support vector machine). It provides a score corresponding to the risk of developing bladder cancer based on the patient's clinical profile. Second, we investigated 3 strategies (logistic regression, decision tree, and a custom strategy based on score interpretation) to combine the model's score with the score from an image-based model to produce a robust bladder cancer scoring system. Results: We collected 2 data sets. The first set, including clinical data for 5422 patients extracted from the clinical data warehouse, was used to design the risk factor--based model. The second set was used to measure the models' performances and was composed of data for 620 patients from a clinical trial for which cytology images and clinicobiological features were collected. With this second data set, the combination of both models obtained areas under the curve of 0.82 on the training set and 0.83 on the test set, demonstrating the value of combining risk factor--based and image-based models. This combination offers a higher associated risk of cancer than VisioCyt alone for all classes, especially for low-grade bladder cancer. Conclusions: These results demonstrate the value of combining clinical and biological information, especially to improve detection of low-grade bladder cancer. Some improvements will need to be made to the automatic extraction of clinical features to make the risk factor--based model more robust. However, as of now, the results support the assumption that this type of approach will be of benefit to patients. ", doi="10.2196/56946", url="https://www.jmir.org/2025/1/e56946" } @Article{info:doi/10.2196/60669, author="Hoang, Uy and Agrawal, Utkarsh and Ord{\'o}{\~n}ez-Mena, Manuel Jos{\'e} and Marcum, Zachary and Radin, Jennifer and Araujo, Andre and Panozzo, A. Catherine and Balogh, Orsolya and Desai, Mihir and Eltayeb, Ahreej and Lu, Tianyi and Nicodemo, Catia and Gu, Xinchun and Goudie, Rosalind and Fan, Xuejuan and Button, Elizabeth and Smylie, Jessica and Joy, Mark and Jamie, Gavin and Elson, William and Byford, Rachel and Madia, Joan and Anand, Sneha and Ferreira, Filipa and Petrou, Stavros and Martin, David and de Lusignan, Simon", title="Clinical Characteristics of Virologically Confirmed Respiratory Syncytial Virus in English Primary Care: Protocol for an Observational Study of Acute Respiratory Infection", journal="JMIR Res Protoc", year="2025", month="Jan", day="22", volume="14", pages="e60669", keywords="infectious diseases", keywords="primary care", keywords="sentinel surveillance", keywords="point-of-care system", keywords="virologically", keywords="respiratory syncytial virus", keywords="acute respiratory infection", keywords="clinical characteristics", keywords="community dwelling", keywords="adult", keywords="vaccination", keywords="programme", keywords="united kingdom", keywords="incidence", keywords="elderly", abstract="Background: There are gaps in our understanding of the clinical characteristics and disease burden of the respiratory syncytial virus (RSV) among community-dwelling adults. This is in part due to a lack of routine testing at the point of care. More data would enhance our assessment of the need for an RSV vaccination program for adults in the United Kingdom. Objective: This study aimed to implement point-of-care-testing (POCT) in primary care to describe the incidence, clinical presentation, risk factors, and economic burden of RSV among adults presenting with acute respiratory infection. Methods: We are recruiting up to 3600 patients from at least 21 practices across England to participate in the Royal College of General Practitioners Research Surveillance Centre. Practices are selected if they undertake reference virology sampling for the Royal College of General Practitioners Research Surveillance Centre and had previous experience with respiratory illness studies. Any adult, ?40 years old, presenting with acute respiratory infection with onset ?10 days, but without RSV within the past 28 days, will be eligible to participate. We will estimate the incidence proportion of RSV, describe the clinical features, and risk factors of patients with RSV infection, and measure the economic burden of RSV infection. Results: A total of 25 practices across different English health administrative regions expressed interest and were recruited to participate. We have created and tested an educational program to deploy POCT for RSV in primary care. In addition to using the POCT device, we provide suggestions about how to integrate POCT into primary care workflow and templates for high-quality data recording of diagnosis, symptoms, and signs. In the 2023-2024 winter RSV detection in the sentinel network grew between October and late November. According to data from the UK Health Security Agency, the peak RSV swab positivity was in International Standards Organization week 48, 2023. Data collection remains ongoing, and results from the subset of practices participating in this study are not yet available. Conclusions: This study will provide data on the RSV incidence in the community as well as rapid information to inform sentinel surveillance and vaccination programs. This information could potentially improve clinical decision-making. International Registered Report Identifier (IRRID): DERR1-10.2196/60669 ", doi="10.2196/60669", url="https://www.researchprotocols.org/2025/1/e60669", url="http://www.ncbi.nlm.nih.gov/pubmed/39841515" } @Article{info:doi/10.2196/59203, author="M{\"o}llmann, Louise Henriette and Alhammadi, Eman and Boulghoudan, Soufian and Kuhlmann, Julian and Mevissen, Anica and Olbrich, Philipp and Rahm, Louisa and Frohnhofen, Helmut", title="Assessment of Geriatric Problems and Risk Factors for Delirium in Surgical Medicine: Protocol for Multidisciplinary Prospective Clinical Study", journal="JMIR Res Protoc", year="2025", month="Jan", day="22", volume="14", pages="e59203", keywords="delirium", keywords="older patients", keywords="perioperative assessment", keywords="age-related surgical risk factors", keywords="geriatric assessment", keywords="gerontology", keywords="aging", keywords="surgical medicine", keywords="surgical care", keywords="surgery", keywords="multidisciplinary", keywords="prospective study", keywords="perioperative", keywords="screening", keywords="palliative care", keywords="health informatics", abstract="Background: An aging population in combination with more gentle and less stressful surgical procedures leads to an increased number of operations on older patients. This collectively raises novel challenges due to higher age heavily impacting treatment. A major problem, emerging in up to 50\% of cases, is perioperative delirium. It is thus vital to understand whether and which existing geriatric assessments are capable of reliably identifying risk factors, how high the incidence of delirium is, and whether the resulting management of these risk factors might lead to a reduced incidence of delirium. Objective: This study aimed to determine the frequency and severity of geriatric medical problems in elective patients of the Clinics of Oral and Maxillofacial Surgery, Vascular Surgery, and Orthopedics, General Surgery, and Trauma Surgery, revealing associations with the incidence of perioperative delirium regarding potential risk factors, and recording the long-term effects of geriatric problems and any perioperative delirium that might have developed later the patient's life. Methods: We performed both pre- and postoperative assessments in patients of 4 different surgical departments who are older than 70 years. Patient-validated screening instruments will be used to identify risk factors. A geriatric assessment with the content of basal and instrumental activities of daily living (basal activities of daily living [Katz index], instrumental activities of daily living [Lawton and Brody score], cognition [6-item screener and clock drawing test], mobility [de Morton Mobility Index and Sit-to-Stand test], sleep [Pittsburgh Sleep Quality Index and Insomnia Severity Index/STOP-BANG], drug therapy [polypharmacy and quality of medication, Fit For The Aged classification, and anticholinergic burden score], and pain assessment and delirium risk (Delirium Risk Assessment Tool) will be performed. Any medical problems detected will be treated according to current standards, and no intervention is planned as part of the study. In addition, a telephone follow-up will be performed 3, 6, and 12 months after discharge. Results: Recruitment started in August 2022, with 421 patients already recruited at the time of submission. Initial analyses of the data are to be published at the end of 2024 or the beginning of 2025. Conclusions: In the current study, we investigate whether the risk factors addressed in the assessment are associated with an increase in the delirium rate. The aim is then to reduce this comprehensive assessment to the central aspects to be able to conduct targeted and efficient risk screening. Trial Registration: German Clinical Trials Registry DRKS00028614; https://www.drks.de/search/de/trial/DRKS00028614 International Registered Report Identifier (IRRID): DERR1-10.2196/59203 ", doi="10.2196/59203", url="https://www.researchprotocols.org/2025/1/e59203", url="http://www.ncbi.nlm.nih.gov/pubmed/39841510" } @Article{info:doi/10.2196/66612, author="Xiong, Xiaojuan and Fu, Hong and Xu, Bo and Wei, Wang and Zhou, Mi and Hu, Peng and Ren, Yunqin and Mao, Qingxiang", title="Ten Machine Learning Models for Predicting Preoperative and Postoperative Coagulopathy in Patients With Trauma: Multicenter Cohort Study", journal="J Med Internet Res", year="2025", month="Jan", day="22", volume="27", pages="e66612", keywords="traumatic coagulopathy", keywords="preoperative", keywords="postoperative", keywords="machine learning models", keywords="random forest", keywords="Medical Information Mart for Intensive Care", abstract="Background: Recent research has revealed the potential value of machine learning (ML) models in improving prognostic prediction for patients with trauma. ML can enhance predictions and identify which factors contribute the most to posttraumatic mortality. However, no studies have explored the risk factors, complications, and risk prediction of preoperative and postoperative traumatic coagulopathy (PPTIC) in patients with trauma. Objective: This study aims to help clinicians implement timely and appropriate interventions to reduce the incidence of PPTIC and related complications, thereby lowering in-hospital mortality and disability rates for patients with trauma. Methods: We analyzed data from 13,235 patients with trauma from 4 medical centers, including medical histories, laboratory results, and hospitalization complications. We developed 10 ML models in Python (Python Software Foundation) to predict PPTIC based on preoperative indicators. Data from 10,023 Medical Information Mart for Intensive Care patients were divided into training (70\%) and test (30\%) sets, with 3212 patients from 3 other centers used for external validation. Model performance was assessed with 5-fold cross-validation, bootstrapping, Brier score, and Shapley additive explanation values. Results: Univariate logistic regression identified PPTIC risk factors as (1) prolonged activated partial thromboplastin time, prothrombin time, and international normalized ratio; (2) decreased levels of hemoglobin, hematocrit, red blood cells, calcium, and sodium; (3) lower admission diastolic blood pressure; (4) elevated alanine aminotransferase and aspartate aminotransferase levels; (5) admission heart rate; and (6) emergency surgery and perioperative transfusion. Multivariate logistic regression revealed that patients with PPTIC faced significantly higher risks of sepsis (1.75-fold), heart failure (1.5-fold), delirium (3.08-fold), abnormal coagulation (3.57-fold), tracheostomy (2.76-fold), mortality (2.19-fold), and urinary tract infection (1.95-fold), along with longer hospital and intensive care unit stays. Random forest was the most effective ML model for predicting PPTIC, achieving an area under the receiver operating characteristic of 0.91, an area under?the precision-recall?curve of 0.89, accuracy of 0.84, sensitivity of 0.80, specificity of 0.88, precision of 0.88, F1-score of 0.84, and Brier score of 0.13 in external validation. Conclusions: Key PPTIC risk factors include (1) prolonged activated partial thromboplastin time, prothrombin time, and international normalized ratio; (2) low levels of hemoglobin, hematocrit, red blood cells, calcium, and sodium; (3) low diastolic blood pressure; (4) elevated alanine aminotransferase and aspartate aminotransferase levels; (5) admission heart rate; and (6) the need for emergency surgery and transfusion. PPTIC is associated with severe complications and extended hospital stays. Among the ML models, the random forest model was the most effective predictor. Trial Registration: Chinese Clinical Trial Registry ChiCTR2300078097; https://www.chictr.org.cn/showproj.html?proj=211051 ", doi="10.2196/66612", url="https://www.jmir.org/2025/1/e66612", url="http://www.ncbi.nlm.nih.gov/pubmed/39841523" } @Article{info:doi/10.2196/57874, author="Lee, Hocheol and Park, Myung-Bae and Won, Young-Joo", title="AI Machine Learning--Based Diabetes Prediction in Older Adults in South Korea: Cross-Sectional Analysis", journal="JMIR Form Res", year="2025", month="Jan", day="21", volume="9", pages="e57874", keywords="diabetes", keywords="prediction model", keywords="super-aging population", keywords="extreme gradient boosting model", keywords="geriatrics", keywords="older adults", keywords="aging", keywords="artificial intelligence", keywords="machine learning", abstract="Background: Diabetes is prevalent in older adults, and machine learning algorithms could help predict diabetes in this population. Objective: This study determined diabetes risk factors among older adults aged ?60 years using machine learning algorithms and selected an optimized prediction model. Methods: This cross-sectional study was conducted on 3084 older adults aged ?60 years in Seoul from January to November 2023. Data were collected using a mobile app (Gosufit) that measured depression, stress, anxiety, basal metabolic rate, oxygen saturation, heart rate, and average daily step count. Health coordinators recorded data on diabetes, hypertension, hyperlipidemia, chronic obstructive pulmonary disease, percent body fat, and percent muscle. The presence of diabetes was the target variable, with various health indicators as predictors. Machine learning algorithms, including random forest, gradient boosting model, light gradient boosting model, extreme gradient boosting model, and k-nearest neighbors, were employed for analysis. The dataset was split into 70\% training and 30\% testing sets. Model performance was evaluated using accuracy, precision, recall, F1 score, and area under the curve (AUC). Shapley additive explanations (SHAPs) were used for model interpretability. Results: Significant predictors of diabetes included hypertension ($\chi${\texttwosuperior}1=197.294; P<.001), hyperlipidemia ($\chi${\texttwosuperior}1=47.671; P<.001), age (mean: diabetes group 72.66 years vs nondiabetes group 71.81 years), stress (mean: diabetes group 42.68 vs nondiabetes group 41.47; t3082=?2.858; P=.004), and heart rate (mean: diabetes group 75.05 beats/min vs nondiabetes group 73.14 beats/min; t3082=?7.948; P<.001). The extreme gradient boosting model (XGBM) demonstrated the best performance, with an accuracy of 84.88\%, precision of 77.92\%, recall of 66.91\%, F1 score of 72.00, and AUC of 0.7957. The SHAP analysis of the top-performing XGBM revealed key predictors for diabetes: hypertension, age, percent body fat, heart rate, hyperlipidemia, basal metabolic rate, stress, and oxygen saturation. Hypertension strongly increased diabetes risk, while advanced age and elevated stress levels also showed significant associations. Hyperlipidemia and higher heart rates further heightened diabetes probability. These results highlight the importance and directional impact of specific features in predicting diabetes, providing valuable insights for risk stratification and targeted interventions. Conclusions: This study focused on modifiable risk factors, providing crucial data for establishing a system for the automated collection of health information and lifelog data from older adults using digital devices at service facilities. ", doi="10.2196/57874", url="https://formative.jmir.org/2025/1/e57874" } @Article{info:doi/10.2196/69742, author="Bousquet, Cedric and Beltramin, Div{\`a}", title="Advantages and Inconveniences of a Multi-Agent Large Language Model System to Mitigate Cognitive Biases in Diagnostic Challenges", journal="J Med Internet Res", year="2025", month="Jan", day="20", volume="27", pages="e69742", keywords="large language model", keywords="multi-agent system", keywords="diagnostic errors", keywords="cognition", keywords="clinical decision-making", keywords="cognitive bias", keywords="generative artificial intelligence", doi="10.2196/69742", url="https://www.jmir.org/2025/1/e69742" } @Article{info:doi/10.2196/54121, author="Zhang, Haofuzi and Zou, Peng and Luo, Peng and Jiang, Xiaofan", title="Machine Learning for the Early Prediction of Delayed Cerebral Ischemia in Patients With Subarachnoid Hemorrhage: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2025", month="Jan", day="20", volume="27", pages="e54121", keywords="machine learning", keywords="subarachnoid hemorrhage", keywords="delayed cerebral ischemia", keywords="systematic review", abstract="Background: Delayed cerebral ischemia (DCI) is a primary contributor to death after subarachnoid hemorrhage (SAH), with significant incidence. Therefore, early determination of the risk of DCI is an urgent need. Machine learning (ML) has received much attention in clinical practice. Recently, some studies have attempted to apply ML models for early noninvasive prediction of DCI. However, systematic evidence for its predictive accuracy is still lacking. Objective: The aim of this study was to synthesize the prediction accuracy of ML models for DCI to provide evidence for the development or updating of intelligent detection tools. Methods: PubMed, Cochrane, Embase, and Web of Science databases were systematically searched up to May 18, 2023. The risk of bias in the included studies was assessed using PROBAST (Prediction Model Risk of Bias Assessment Tool). During the analysis, we discussed the performance of different models in the training and validation sets. Results: We finally included 48 studies containing 16,294 patients with SAH and 71 ML models with logistic regression as the main model type. In the training set, the pooled concordance index (C index), sensitivity, and specificity of all the models were 0.786 (95\% CI 0.737-0.835), 0.77 (95\% CI 0.69-0.84), and 0.83 (95\% CI 0.75-0.89), respectively, while those of the logistic regression models were 0.770 (95\% CI 0.724-0.817), 0.75 (95\% CI 0.67-0.82), and 0.71 (95\% CI 0.63-0.78), respectively. In the validation set, the pooled C index, sensitivity, and specificity of all the models were 0.767 (95\% CI 0.741-0.793), 0.66 (95\% CI 0.53-0.77), and 0.78 (95\% CI 0.71-0.84), respectively, while those of the logistic regression models were 0.757 (95\% CI 0.715-0.800), 0.59 (95\% CI 0.57-0.80), and 0.80 (95\% CI 0.71-0.87), respectively. Conclusions: ML models appear to have relatively desirable power for early noninvasive prediction of DCI after SAH. However, enhancing the prediction sensitivity of these models is challenging. Therefore, efficient, noninvasive, or minimally invasive low-cost predictors should be further explored in future studies to improve the prediction accuracy of ML models. Trial Registration: PROSPERO (CRD42023438399); https://tinyurl.com/yfuuudde ", doi="10.2196/54121", url="https://www.jmir.org/2025/1/e54121" } @Article{info:doi/10.2196/60326, author="Kirkham, M. Aidan and Fergusson, A. Dean and Presseau, Justin and McIsaac, I. Daniel and Shorr, Risa and Roberts, J. Derek", title="Strategies to Improve Health Care Provider Prescription of and Patient Adherence to Guideline-Recommended Cardiovascular Medications for Atherosclerotic Occlusive Disease: Protocol for Two Systematic Reviews and Meta-Analyses of Randomized Controlled Trials", journal="JMIR Res Protoc", year="2025", month="Jan", day="16", volume="14", pages="e60326", keywords="coronary artery disease", keywords="cerebrovascular disease", keywords="peripheral artery disease", keywords="polyvascular disease", keywords="underprescription", keywords="nonadherence", keywords="implementation strategy", keywords="adherence-supporting strategy", keywords="statins", keywords="antiplatelets", keywords="antihypertensives", keywords="guideline-recommended medications", keywords="implementation", keywords="atherosclerosis", keywords="patient adherence", keywords="RCT", keywords="randomized controlled trials", keywords="PRISMA", abstract="Background: In patients with atherosclerotic occlusive diseases, systematic reviews and meta-analyses of randomized controlled trials (RCTs) report that antiplatelets, statins, and antihypertensives reduce the risk of major adverse cardiac events, need for revascularization procedures, mortality, and health care resource use. However, evidence suggests that these patients are not prescribed these medications adequately or do not adhere to them once prescribed. Objective: We aim to systematically review and meta-analyze RCTs examining the effectiveness of implementation or adherence-supporting strategies for improving health care provider prescription of, or patient adherence to, guideline-recommended cardiovascular medications in patients with atherosclerotic occlusive disease. Methods: We designed and reported the protocol according to the PRISMA-P (Preferred Reporting Items for Systematic Review and Meta-Analysis-Protocols) statement. We will search MEDLINE, Embase, The Cochrane Central Register of Controlled Trials, PsycINFO, and CINAHL from their inception. RCTs examining implementation or adherence-supporting strategies for improving prescription of, or adherence to, guideline-recommended cardiovascular medications in adults with cerebrovascular disease, coronary artery disease, peripheral artery disease, or polyvascular disease (>1 of these diseases) will be included. Two investigators will independently review identified titles/abstracts and full-text studies, extract data, assess the risk of bias (using the Cochrane tool), and classify implementation or adherence-supporting strategies using the refined Cochrane Effective Practice and Organization of Care (EPOC) taxonomy (for strategies aimed at improving prescription) and Behavior Change Wheel (BCW; for adherence-supporting strategies). We will narratively synthesize data describing which implementation or adherence-supporting strategies have been evaluated across RCTs, and their reported effectiveness at improving prescription of, or adherence to, guideline-recommended cardiovascular medications (primary outcomes) and patient-important outcomes and health care resource use (secondary outcomes) within refined EPOC taxonomy levels and BCW interventions and policies. Where limited clinical heterogeneity exists between RCTs, estimates describing the effectiveness of implementation or adherence-supporting strategies within different refined EPOC taxonomy levels and BCW interventions and policies will be pooled using random-effects models. Stratified meta-analyses and meta-regressions will assess if strategy effectiveness varies by recruited patient populations, prescriber types, clinical practice settings, and study design characteristics. GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) will be used to communicate evidence certainty. Results: The search was completed on June 6, 2023. Database searches and the PubMed ``related articles'' feature identified 4319 unique citations for title/abstract screening. We are currently screening titles/abstracts. Conclusions: These studies will identify which implementation and adherence-supporting strategies are being used (and in which combinations) across RCTs for improving the prescription of, or adherence to, guideline-recommended cardiovascular medications in adults with atherosclerotic occlusive diseases. They will also determine the effectiveness of currently trialed implementation and adherence-supporting strategies, and whether effectiveness varies by patient, prescriber, or clinical practice setting traits. Trial Registration: PROSPERO CRD42023461317; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=461317; PROSPERO CRD42023461299; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=461299 ", doi="10.2196/60326", url="https://www.researchprotocols.org/2025/1/e60326" } @Article{info:doi/10.2196/63875, author="Dorosan, Michael and Chen, Ya-Lin and Zhuang, Qingyuan and Lam, Sean Shao Wei", title="In Silico Evaluation of Algorithm-Based Clinical Decision Support Systems: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2025", month="Jan", day="16", volume="14", pages="e63875", keywords="clinical decision support algorithms", keywords="in silico evaluation", keywords="clinical workflow simulation", keywords="health care modeling", keywords="digital twin", keywords="quadruple aims", keywords="clinical decision", keywords="decision-making", keywords="decision support", keywords="workflow", keywords="support system", keywords="protocol", keywords="scoping review", keywords="algorithm-based", keywords="screening", keywords="thematic analysis", keywords="descriptive analysis", keywords="clinical decision-making", abstract="Background: Integrating algorithm-based clinical decision support (CDS) systems poses significant challenges in evaluating their actual clinical value. Such CDS systems are traditionally assessed via controlled but resource-intensive clinical trials. Objective: This paper presents a review protocol for preimplementation in silico evaluation methods to enable broadened impact analysis under simulated environments before clinical trials. Methods: We propose a scoping review protocol that follows an enhanced Arksey and O'Malley framework and PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines to investigate the scope and research gaps in the in silico evaluation of algorithm-based CDS models---specifically CDS decision-making end points and objectives, evaluation metrics used, and simulation paradigms used to assess potential impacts. The databases searched are PubMed, Embase, CINAHL, PsycINFO, Cochrane, IEEEXplore, Web of Science, and arXiv. A 2-stage screening process identified pertinent articles. The information extracted from articles was iteratively refined. The review will use thematic, trend, and descriptive analyses to meet scoping aims. Results: We conducted an automated search of the databases above in May 2023, with most title and abstract screenings completed by November 2023 and full-text screening extended from December 2023 to May 2024. Concurrent charting and full-text analysis were carried out, with the final analysis and manuscript preparation set for completion in July 2024. Publication of the review results is targeted from July 2024 to February 2025. As of April 2024, a total of 21 articles have been selected following a 2-stage screening process; these will proceed to data extraction and analysis. Conclusions: We refined our data extraction strategy through a collaborative, multidisciplinary approach, planning to analyze results using thematic analyses to identify approaches to in silico evaluation. Anticipated findings aim to contribute to developing a unified in silico evaluation framework adaptable to various clinical workflows, detailing clinical decision-making characteristics, impact measures, and reusability of methods. The study's findings will be published and presented in forums combining artificial intelligence and machine learning, clinical decision-making, and health technology impact analysis. Ultimately, we aim to bridge the development-deployment gap through in silico evaluation-based potential impact assessments. International Registered Report Identifier (IRRID): DERR1-10.2196/63875 ", doi="10.2196/63875", url="https://www.researchprotocols.org/2025/1/e63875" } @Article{info:doi/10.2196/52385, author="Mumtaz, Shahzad and McMinn, Megan and Cole, Christian and Gao, Chuang and Hall, Christopher and Guignard-Duff, Magalie and Huang, Huayi and McAllister, A. David and Morales, R. Daniel and Jefferson, Emily and Guthrie, Bruce", title="A Digital Tool for Clinical Evidence--Driven Guideline Development by Studying Properties of Trial Eligible and Ineligible Populations: Development and Usability Study", journal="J Med Internet Res", year="2025", month="Jan", day="16", volume="27", pages="e52385", keywords="multimorbidity", keywords="clinical practice guideline", keywords="gout", keywords="Trusted Research Environment", keywords="National Institute for Health and Care Excellence", keywords="Scottish Intercollegiate Guidelines Network", keywords="clinical practice", keywords="development", keywords="efficacy", keywords="validity", keywords="epidemiological data", keywords="epidemiology", keywords="epidemiological", keywords="digital tool", keywords="tool", keywords="age", keywords="gender", keywords="ethnicity", keywords="mortality", keywords="feedback", keywords="availability", abstract="Background: Clinical guideline development preferentially relies on evidence from randomized controlled trials (RCTs). RCTs are gold-standard methods to evaluate the efficacy of treatments with the highest internal validity but limited external validity, in the sense that their findings may not always be applicable to or generalizable to clinical populations or population characteristics. The external validity of RCTs for the clinical population is constrained by the lack of tailored epidemiological data analysis designed for this purpose due to data governance, consistency of disease or condition definitions, and reduplicated effort in analysis code. Objective: This study aims to develop a digital tool that characterizes the overall population and differences between clinical trial eligible and ineligible populations from the clinical populations of a disease or condition regarding demography (eg, age, gender, ethnicity), comorbidity, coprescription, hospitalization, and mortality. Currently, the process is complex, onerous, and time-consuming, whereas a real-time tool may be used to rapidly inform a guideline developer's judgment about the applicability of evidence. Methods: The National Institute for Health and Care Excellence---particularly the gout guideline development group---and the Scottish Intercollegiate Guidelines Network guideline developers were consulted to gather their requirements and evidential data needs when developing guidelines. An R Shiny (R Foundation for Statistical Computing) tool was designed and developed using electronic primary health care data linked with hospitalization and mortality data built upon an optimized data architecture. Disclosure control mechanisms were built into the tool to ensure data confidentiality. The tool was deployed within a Trusted Research Environment, allowing only trusted preapproved researchers to conduct analysis. Results: The tool supports 128 chronic health conditions as index conditions and 161 conditions as comorbidities (33 in addition to the 128 index conditions). It enables 2 types of analyses via the graphic interface: overall population and stratified by user-defined eligibility criteria. The analyses produce an overview of statistical tables (eg, age, gender) of the index condition population and, within the overview groupings, produce details on, for example, electronic frailty index, comorbidities, and coprescriptions. The disclosure control mechanism is integral to the tool, limiting tabular counts to meet local governance needs. An exemplary result for gout as an index condition is presented to demonstrate the tool's functionality. Guideline developers from the National Institute for Health and Care Excellence and the Scottish Intercollegiate Guidelines Network provided positive feedback on the tool. Conclusions: The tool is a proof-of-concept, and the user feedback has demonstrated that this is a step toward computer-interpretable guideline development. Using the digital tool can potentially improve evidence-driven guideline development through the availability of real-world data in real time. ", doi="10.2196/52385", url="https://www.jmir.org/2025/1/e52385" } @Article{info:doi/10.2196/64374, author="You, Yuzi and Liang, Wei and Zhao, Yajie", title="Development and Validation of a Predictive Model Based on Serum Silent Information Regulator 6 Levels in Chinese Older Adult Patients: Cross-Sectional Descriptive Study", journal="JMIR Aging", year="2025", month="Jan", day="15", volume="8", pages="e64374", keywords="aging", keywords="coronary artery disease", keywords="nomogram", keywords="SIRT6", keywords="TyG index", keywords="silent information regulator 6", keywords="triglyceride glucose index", abstract="Background: Serum levels of silent information regulator 6 (SIRT6), a key biomarker of aging, were identified as a predictor of coronary artery disease (CAD), but whether SIRT6 can distinguish severity of coronary artery lesions in older adult patients is unknown. Objectives: This study developed a nomogram to demonstrate the functionality of SIRT6 in assessing severity of coronary artery atherosclerosis. Methods: Patients aged 60 years and older with angina pectoris were screened for this single-center clinical study between October 1, 2022, and March 31, 2023. Serum specimens of eligible patients were collected for SIRT6 detection by enzyme-linked immunosorbent assay. Clinical data and putative predictors, including 29 physiological characteristics, biochemical parameters, carotid artery ultrasonographic results, and complete coronary angiography findings, were evaluated, with CAD diagnosis as the primary outcome. The nomogram was derived from the Extreme Gradient Boosting (XGBoost) model, with logistic regression for variable selection. Model performance was assessed by examining discrimination, calibration, and clinical use separately. A 10-fold cross-validation technique was used to compare all models. The models' performance was further evaluated on the internal validation set to ensure that the obtained results were not due to overoptimization. Results: Eligible patients (n=222) were divided into 2 cohorts: the development cohort (n=178) and the validation cohort (n=44). Serum SIRT6 levels were identified as both an independent risk factor and a predictor for CAD in older adults. The area under the receiver operating characteristic curve (AUROC) was 0.725 (95\% CI 0.653?0.797). The optimal cutoff value of SIRT6 for predicting CAD was 546.384 pg/mL. Predictors included in this nomogram were serum SIRT6 levels, triglyceride glucose (TyG) index, and apolipoprotein B. The model achieved an AUROC of 0.956 (95\% CI 0.928?0.983) in the development cohort. Similarly, in the internal validation cohort, the AUROC was 0.913 (95\% CI 0.828?0.999). All models demonstrated satisfactory calibration, with predicted outcomes closely aligning with actual results. Conclusions: SIRT6 shows promise in predicting CAD, with enhanced predictive abilities when combined with the TyG index. In clinical settings, monitoring fluctuations in SIRT6 and TyG may offer valuable insights for early CAD detection. The nomogram for CAD outcome prediction in older adult patients with angina pectoris may aid in clinical trial design and personalized clinical decision-making, particularly in institutions where SIRT6 is being explored as a biomarker for aging or cardiovascular health. ", doi="10.2196/64374", url="https://aging.jmir.org/2025/1/e64374" } @Article{info:doi/10.2196/56463, author="Lakshman, Pavithra and Gopal, T. Priyanka and Khurdi, Sheen", title="Effectiveness of Remote Patient Monitoring Equipped With an Early Warning System in Tertiary Care Hospital Wards: Retrospective Cohort Study", journal="J Med Internet Res", year="2025", month="Jan", day="15", volume="27", pages="e56463", keywords="continuous vitals monitoring", keywords="remote patient monitoring", keywords="early warning system", keywords="hospital wards", keywords="retrospective", keywords="cohort study", keywords="early deterioration monitoring", keywords="patient care", keywords="decision making", keywords="clinical information", abstract="Background: Monitoring vital signs in hospitalized patients is crucial for evaluating their clinical condition. While early warning scores like the modified early warning score (MEWS) are typically calculated 3 to 4 times daily through spot checks, they might not promptly identify early deterioration. Leveraging technologies that provide continuous monitoring of vital signs, combined with an early warning system, has the potential to identify clinical deterioration sooner. This approach empowers health care providers to intervene promptly and effectively. Objective: This study aimed to assess the impact of a Remote Patient Monitoring System (RPMS) with an automated early warning system (R-EWS) on patient safety in noncritical care at a tertiary hospital. R-EWS performance was compared with a simulated Modified Early Warning System (S-MEWS) and a simulated threshold-based alert system (S-Threshold). Methods: Patient outcomes, including intensive care unit (ICU) transfers due to deterioration and discharges for nondeteriorating cases, were analyzed in Ramaiah Memorial Hospital's general wards with RPMS. Sensitivity, specificity, chi-square test for alert frequency distribution equality, and the average time from the first alert to ICU transfer in the last 24 hours was determined. Alert and patient distribution by tiers and vitals in R-EWS groups were examined. Results: Analyzing 905 patients, including 38 with deteriorations, R-EWS, S-Threshold, and S-MEWS generated more alerts for deteriorating cases. R-EWS showed high sensitivity (97.37\%) and low specificity (23.41\%), S-Threshold had perfect sensitivity (100\%) but low specificity (0.46\%), and S-MEWS demonstrated moderate sensitivity (47.37\%) and high specificity (81.31\%). The average time from initial alert to clinical deterioration was at least 18 hours for RPMS and S-Threshold in deteriorating participants. R-EWS had increased alert frequency and a higher proportion of critical alerts for deteriorating cases. Conclusions: This study underscores R-EWS role in early deterioration detection, emphasizing timely interventions for improved patient outcomes. Continuous monitoring enhances patient safety and optimizes care quality. ", doi="10.2196/56463", url="https://www.jmir.org/2025/1/e56463" } @Article{info:doi/10.2196/55046, author="Ding, Zhendong and Zhang, Linan and Zhang, Yihan and Yang, Jing and Luo, Yuheng and Ge, Mian and Yao, Weifeng and Hei, Ziqing and Chen, Chaojin", title="A Supervised Explainable Machine Learning Model for Perioperative Neurocognitive Disorder in Liver-Transplantation Patients and External Validation on the Medical Information Mart for Intensive Care IV Database: Retrospective Study", journal="J Med Internet Res", year="2025", month="Jan", day="15", volume="27", pages="e55046", keywords="machine learning", keywords="risk factors", keywords="liver transplantation", keywords="perioperative neurocognitive disorders", keywords="MIMIC-? database", keywords="external validation", abstract="Background: Patients undergoing liver transplantation (LT) are at risk of perioperative neurocognitive dysfunction (PND), which significantly affects the patients' prognosis. Objective: This study used machine learning (ML) algorithms with an aim to extract critical predictors and develop an ML model to predict PND among LT recipients. Methods: In this retrospective study, data from 958 patients who underwent LT between January 2015 and January 2020 were extracted from the Third Affiliated Hospital of Sun Yat-sen University. Six ML algorithms were used to predict post-LT PND, and model performance was evaluated using area under the receiver operating curve (AUC), accuracy, sensitivity, specificity, and F1-scores. The best-performing model was additionally validated using a temporal external dataset including 309 LT cases from February 2020 to August 2022, and an independent external dataset extracted from the Medical Information Mart for Intensive Care ? (MIMIC-?) database including 325 patients. Results: In the development cohort, 201 out of 751 (33.5\%) patients were diagnosed with PND. The logistic regression model achieved the highest AUC (0.799) in the internal validation set, with comparable AUC in the temporal external (0.826) and MIMIC-? validation sets (0.72). The top 3 features contributing to post-LT PND diagnosis were the preoperative overt hepatic encephalopathy, platelet level, and postoperative sequential organ failure assessment score, as revealed by the Shapley additive explanations method. Conclusions: A real-time logistic regression model-based online predictor of post-LT PND was developed, providing a highly interoperable tool for use across medical institutions to support early risk stratification and decision making for the LT recipients. ", doi="10.2196/55046", url="https://www.jmir.org/2025/1/e55046", url="http://www.ncbi.nlm.nih.gov/pubmed/39813086" } @Article{info:doi/10.2196/60520, author="Kim, Hyung Do and Jeong, Won Joo and Kang, Dayoung and Ahn, Taekyung and Hong, Yeonjung and Im, Younggon and Kim, Jaewon and Kim, Jung Min and Jang, Dae-Hyun", title="Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study", journal="J Med Internet Res", year="2025", month="Jan", day="14", volume="27", pages="e60520", keywords="speech sound disorder", keywords="speech recognition software", keywords="speech articulation tests", keywords="speech-language pathology", keywords="child", abstract="Background: Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability. Objective: This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in South Korea, evaluating the ASR model's performance. Methods: A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was used. The model was further trained on 93.6 minutes of children's voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments---the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)---were used, with ASR transcriptions compared to SLP transcriptions. Results: This study included 30 children aged 3-7 years who were suspected of having SSDs. The phoneme error rates for the APAC and U-TAP were 8.42\% (457/5430) and 8.91\% (402/4514), respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58\% (327/3090) and 11.86\% (331/2790) for the APAC and U-TAP, respectively. On average, there were 2.60 (SD 1.54) and 3.07 (SD 1.39) discrepancies per child for correctly produced phonemes, and 7.87 (SD 3.66) and 7.57 (SD 4.85) discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct was excellent, with an intraclass correlation coefficient of 0.984 (95\% CI 0.953-0.994) and 0.978 (95\% CI 0.941-0.990) for the APAC and UTAP, respectively. The z scores between SLPs and ASR showed more pronounced differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP. Conclusions: The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model's refinement and ensure broader clinical applicability. ", doi="10.2196/60520", url="https://www.jmir.org/2025/1/e60520", url="http://www.ncbi.nlm.nih.gov/pubmed/39576242" } @Article{info:doi/10.2196/63004, author="De Silva, Upeka and Madanian, Samaneh and Olsen, Sharon and Templeton, Michael John and Poellabauer, Christian and Schneider, L. Sandra and Narayanan, Ajit and Rubaiat, Rahmina", title="Clinical Decision Support Using Speech Signal Analysis: Systematic Scoping Review of Neurological Disorders", journal="J Med Internet Res", year="2025", month="Jan", day="13", volume="27", pages="e63004", keywords="digital health", keywords="health informatics", keywords="digital biomarker", keywords="speech analytics", keywords="artificial intelligence", keywords="machine learning", abstract="Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech. Deficits in any of these systems can cause changes in speech signal patterns. Increasing efforts are being made to develop speech-based clinical decision support systems. Objective: This systematic scoping review investigated the technological revolution and recent digital clinical speech signal analysis trends to understand the key concepts and research processes from clinical and technical perspectives. Methods: A systematic scoping review was undertaken in 6 databases guided by a set of research questions. Articles that focused on speech signal analysis for clinical decision-making were identified, and the included studies were analyzed quantitatively. A narrower scope of studies investigating neurological diseases were analyzed using qualitative content analysis. Results: A total of 389 articles met the initial eligibility criteria, of which 72 (18.5\%) that focused on neurological diseases were included in the qualitative analysis. In the included studies, Parkinson disease, Alzheimer disease, and cognitive disorders were the most frequently investigated conditions. The literature explored the potential of speech feature analysis in diagnosis, differentiating between, assessing the severity and monitoring the treatment of neurological conditions. The common speech tasks used were sustained phonations, diadochokinetic tasks, reading tasks, activity-based tasks, picture descriptions, and prompted speech tasks. From these tasks, conventional speech features (such as fundamental frequency, jitter, and shimmer), advanced digital signal processing--based speech features (such as wavelet transformation--based features), and spectrograms in the form of audio images were analyzed. Traditional machine learning and deep learning approaches were used to build predictive models, whereas statistical analysis assessed variable relationships and reliability of speech features. Model evaluations primarily focused on analytical validations. A significant research gap was identified: the need for a structured research process to guide studies toward potential technological intervention in clinical settings. To address this, a research framework was proposed that adapts a design science research methodology to guide research studies systematically. Conclusions: The findings highlight how data science techniques can enhance speech signal analysis to support clinical decision-making. By combining knowledge from clinical practice, speech science, and data science within a structured research framework, future research may achieve greater clinical relevance. ", doi="10.2196/63004", url="https://www.jmir.org/2025/1/e63004", url="http://www.ncbi.nlm.nih.gov/pubmed/39804693" } @Article{info:doi/10.2196/58073, author="Saito, Chihiro and Nakatani, Eiji and Sasaki, Hatoko and E Katsuki, Naoko and Tago, Masaki and Harada, Kiyoshi", title="Predictive Factors and the Predictive Scoring System for Falls in Acute Care Inpatients: Retrospective Cohort Study", journal="JMIR Hum Factors", year="2025", month="Jan", day="13", volume="12", pages="e58073", keywords="falls", keywords="inpatient falls", keywords="acute care hospital", keywords="predictive factor", keywords="risk factors", abstract="Background: Falls in hospitalized patients are a serious problem, resulting in physical injury, secondary complications, impaired activities of daily living, prolonged hospital stays, and increased medical costs. Establishing a fall prediction scoring system to identify patients most likely to fall can help prevent falls among hospitalized patients. Objectives: This study aimed to identify predictive factors of falls in acute care hospital patients, develop a scoring system, and evaluate its validity. Methods: This single-center, retrospective cohort study involved patients aged 20 years or older admitted to Shizuoka General Hospital between April 2019 and September 2020. Demographic data, candidate predictors at admission, and fall occurrence reports were collected from medical records. The outcome was the time from admission to a fall requiring medical resources. Two-thirds of cases were randomly selected as the training set for analysis, and univariable and multivariable Cox regression analyses were used to identify factors affecting fall risk. We scored the fall risk based on the estimated hazard ratios (HRs) and constructed a fall prediction scoring system. The remaining one-third of cases was used as the test set to evaluate the predictive performance of the new scoring system. Results: A total of 13,725 individuals were included. During the study period, 2.4\% (326/13,725) of patients experienced a fall. In the training dataset (n=9150), Cox regression analysis identified sex (male: HR 1.60, 95\% CI 1.21?2.13), age (65 to <80 years: HR 2.26, 95\% CI 1.48?3.44; ?80 years: HR 2.50, 95\% CI 1.60?3.92 vs 20-<65 years), BMI (18.5 to <25 kg/m{\texttwosuperior}: HR 1.36, 95\% CI 0.94?1.97; <18.5 kg/m{\texttwosuperior}: HR 1.57, 95\% CI 1.01?2.44 vs ?25 kg/m{\texttwosuperior}), independence degree of daily living for older adults with disabilities (bedriddenness rank A: HR 1.81, 95\% CI 1.26?2.60; rank B: HR 2.03, 95\% CI 1.31?3.14; rank C: HR 1.23, 95\% CI 0.83?1.83 vs rank J), department (internal medicine: HR 1.23, 95\% CI 0.92?1.64; emergency department: HR 1.81, 95\% CI 1.26?2.60 vs department of surgery), and history of falls within 1 year (yes: HR 1.66, 95\% CI 1.21?2.27) as predictors of falls. Using these factors, we developed a fall prediction scoring system categorizing patients into 3 risk groups: low risk (0-4 points), intermediate risk (5-9 points), and high risk (10-15 points). The c-index indicating predictive performance in the test set (n=4575) was 0.733 (95\% CI 0.684?0.782). Conclusions: We developed a new fall prediction scoring system for patients admitted to acute care hospitals by identifying predictors of falls in Japan. This system may be useful for preventive interventions in patient populations with a high likelihood of falling in acute care settings. ", doi="10.2196/58073", url="https://humanfactors.jmir.org/2025/1/e58073" } @Article{info:doi/10.2196/55015, author="Riahi, Vahid and Diouf, Ibrahima and Khanna, Sankalp and Boyle, Justin and Hassanzadeh, Hamed", title="Digital Twins for Clinical and Operational Decision-Making: Scoping Review", journal="J Med Internet Res", year="2025", month="Jan", day="8", volume="27", pages="e55015", keywords="digital twin", keywords="health care", keywords="clinical decision-making", keywords="CDM", keywords="operational decision-making", keywords="ODM", keywords="scoping review", abstract="Background: The health care industry must align with new digital technologies to respond to existing and new challenges. Digital twins (DTs) are an emerging technology for digital transformation and applied intelligence that is rapidly attracting attention. DTs are virtual representations of products, systems, or processes that interact bidirectionally in real time with their actual counterparts. Although DTs have diverse applications from personalized care to treatment optimization, misconceptions persist regarding their definition and the extent of their implementation within health systems. Objective: This study aimed to review DT applications in health care, particularly for clinical decision-making (CDM) and operational decision-making (ODM). It provides a definition and framework for DTs by exploring their unique elements and characteristics. Then, it assesses the current advances and extent of DT applications to support CDM and ODM using the defined DT characteristics. Methods: We conducted a scoping review following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) protocol. We searched multiple databases, including PubMed, MEDLINE, and Scopus, for original research articles describing DT technologies applied to CDM and ODM in health systems. Papers proposing only ideas or frameworks or describing DT capabilities without experimental data were excluded. We collated several available types of information, for example, DT characteristics, the environment that DTs were tested within, and the main underlying method, and used descriptive statistics to analyze the synthesized data. Results: Out of 5537 relevant papers, 1.55\% (86/5537) met the predefined inclusion criteria, all published after 2017. The majority focused on CDM (75/86, 87\%). Mathematical modeling (24/86, 28\%) and simulation techniques (17/86, 20\%) were the most frequently used methods. Using International Classification of Diseases, 10th Revision coding, we identified 3 key areas of DT applications as follows: factors influencing diseases of the circulatory system (14/86, 16\%); health status and contact with health services (12/86, 14\%); and endocrine, nutritional, and metabolic diseases (10/86, 12\%). Only 16 (19\%) of 86 studies tested the developed system in a real environment, while the remainder were evaluated in simulated settings. Assessing the studies against defined DT characteristics reveals that the developed systems have yet to materialize the full capabilities of DTs. Conclusions: This study provides a comprehensive review of DT applications in health care, focusing on CDM and ODM. A key contribution is the development of a framework that defines important elements and characteristics of DTs in the context of related literature. The DT applications studied in this paper reveal encouraging results that allow us to envision that, in the near future, they will play an important role not only in the diagnosis and prevention of diseases but also in other areas, such as efficient clinical trial design, as well as personalized and optimized treatments. ", doi="10.2196/55015", url="https://www.jmir.org/2025/1/e55015" } @Article{info:doi/10.2196/60827, author="Chau, A. Courtney and Feng, Hao and Cobos, Gabriela and Park, Joyce", title="The Comparative Sufficiency of ChatGPT, Google Bard, and Bing AI in Answering Diagnosis, Treatment, and Prognosis Questions About Common Dermatological Diagnoses", journal="JMIR Dermatol", year="2025", month="Jan", day="7", volume="8", pages="e60827", keywords="artificial intelligence", keywords="AI", keywords="ChatGPT", keywords="atopic dermatitis", keywords="acne vulgaris", keywords="cyst", keywords="actinic keratosis", keywords="rosacea", keywords="diagnosis", keywords="treatment", keywords="prognosis", keywords="dermatological", keywords="patient", keywords="chatbot", keywords="dermatologist", doi="10.2196/60827", url="https://derma.jmir.org/2025/1/e60827" } @Article{info:doi/10.2196/62768, author="Yu, Zhongguang and Hu, Ning and Zhao, Qiuyi and Hu, Xiang and Jia, Cunbo and Zhang, Chunyu and Liu, Bing and Li, Yanping", title="The Willingness of Doctors to Adopt Artificial Intelligence--Driven Clinical Decision Support Systems at Different Hospitals in China: Fuzzy Set Qualitative Comparative Analysis of Survey Data", journal="J Med Internet Res", year="2025", month="Jan", day="7", volume="27", pages="e62768", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="willingness", keywords="technology adoption", keywords="fuzzy set qualitative comparative analysis", keywords="fsQCA", keywords="pathways", abstract="Background: Artificial intelligence--driven clinical decision support systems (AI-CDSSs) are pivotal tools for doctors to improve diagnostic and treatment processes, as well as improve the efficiency and quality of health care services. However, not all doctors trust artificial intelligence (AI) technology, and many remain skeptical and unwilling to adopt these systems. Objective: This study aimed to explore in depth the factors influencing doctors' willingness to adopt AI-CDSSs and assess the causal relationships among these factors to gain a better understanding for promoting the clinical application and widespread implementation of these systems. Methods: Based on the unified?theory?of acceptance and use of technology (UTAUT) and the technology-organization-environment (TOE) framework, we have proposed and designed a framework for doctors' willingness to adopt AI-CDSSs. We conducted a nationwide questionnaire survey in China and performed fuzzy set qualitative comparative analysis to explore the willingness of doctors to adopt AI-CDSSs in different types of medical institutions and assess the factors influencing their willingness. Results: The survey was administered to doctors working in tertiary hospitals and primary/secondary hospitals across China. We received 450 valid responses out of 578 questionnaires distributed, indicating a robust response rate of 77.9\%. Our analysis of the influencing factors and adoption pathways revealed that doctors in tertiary hospitals exhibited 6 distinct pathways for AI-CDSS adoption, which were centered on technology-driven pathways, individual-driven pathways, and technology-individual dual-driven pathways. Doctors in primary/secondary hospitals demonstrated 3 adoption pathways, which were centered on technology-individual and organization-individual dual-driven pathways. There were commonalities in the factors influencing adoption across different medical institutions, such as the positive perception of AI technology's utility and individual readiness to try new technologies. There were also variations in the influence of facilitating conditions among doctors at different medical institutions, especially primary/secondary hospitals. Conclusions: From the perspective of the 6 pathways for doctors at tertiary hospitals and the 3 pathways for doctors at primary/secondary hospitals, performance expectancy and personal innovativeness were 2 indispensable and core conditions in the pathways to achieving favorable willingness to adopt AI-CDSSs. ", doi="10.2196/62768", url="https://www.jmir.org/2025/1/e62768" } @Article{info:doi/10.2196/59069, author="Zhang, Kuo and Meng, Xiangbin and Yan, Xiangyu and Ji, Jiaming and Liu, Jingqian and Xu, Hua and Zhang, Heng and Liu, Da and Wang, Jingjia and Wang, Xuliang and Gao, Jun and Wang, Yuan-geng-shuo and Shao, Chunli and Wang, Wenyao and Li, Jiarong and Zheng, Ming-Qi and Yang, Yaodong and Tang, Yi-Da", title="Revolutionizing Health Care: The Transformative Impact of Large Language Models in Medicine", journal="J Med Internet Res", year="2025", month="Jan", day="7", volume="27", pages="e59069", keywords="large language models", keywords="LLMs", keywords="digital health", keywords="medical diagnosis", keywords="treatment", keywords="multimodal data integration", keywords="technological fairness", keywords="artificial intelligence", keywords="AI", keywords="natural language processing", keywords="NLP", doi="10.2196/59069", url="https://www.jmir.org/2025/1/e59069" } @Article{info:doi/10.2196/63020, author="Zhuang, Yan and Zhang, Junyan and Li, Xiuxing and Liu, Chao and Yu, Yue and Dong, Wei and He, Kunlun", title="Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text", journal="JMIR Med Inform", year="2025", month="Jan", day="6", volume="13", pages="e63020", keywords="BERT", keywords="bidirectional encoder representations from transformers", keywords="pretrained language models", keywords="prompt learning", keywords="ICD", keywords="International Classification of Diseases", keywords="cardiovascular disease", keywords="few-shot learning", keywords="multicenter medical data", abstract="Background: Machine learning models can reduce the burden on doctors by converting medical records into International Classification of Diseases (ICD) codes in real time, thereby enhancing the efficiency of diagnosis and treatment. However, it faces challenges such as small datasets, diverse writing styles, unstructured records, and the need for semimanual preprocessing. Existing approaches, such as naive Bayes, Word2Vec, and convolutional neural networks, have limitations in handling missing values and understanding the context of medical texts, leading to a high error rate. We developed a fully automated pipeline based on the Key--bidirectional encoder representations from transformers (BERT) approach and large-scale medical records for continued pretraining, which effectively converts long free text into standard ICD codes. By adjusting parameter settings, such as mixed templates and soft verbalizers, the model can adapt flexibly to different requirements, enabling task-specific prompt learning. Objective: This study aims to propose a prompt learning real-time framework based on pretrained language models that can automatically label long free-text data with ICD-10 codes for cardiovascular diseases without the need for semiautomatic preprocessing. Methods: We integrated 4 components into our framework: a medical pretrained BERT, a keyword filtration BERT in a functional order, a fine-tuning phase, and task-specific prompt learning utilizing mixed templates and soft verbalizers. This framework was validated on a multicenter medical dataset for the automated ICD coding of 13 common cardiovascular diseases (584,969 records). Its performance was compared against robustly optimized BERT pretraining approach, extreme language network, and various BERT-based fine-tuning pipelines. Additionally, we evaluated the framework's performance under different prompt learning and fine-tuning settings. Furthermore, few-shot learning experiments were conducted to assess the feasibility and efficacy of our framework in scenarios involving small- to mid-sized datasets. Results: Compared with traditional pretraining and fine-tuning pipelines, our approach achieved a higher micro--F1-score of 0.838 and a macro--area under the receiver operating characteristic curve (macro-AUC) of 0.958, which is 10\% higher than other methods. Among different prompt learning setups, the combination of mixed templates and soft verbalizers yielded the best performance. Few-shot experiments showed that performance stabilized and the AUC peaked at 500 shots. Conclusions: These findings underscore the effectiveness and superior performance of prompt learning and fine-tuning for subtasks within pretrained language models in medical practice. Our real-time ICD coding pipeline efficiently converts detailed medical free text into standardized labels, offering promising applications in clinical decision-making. It can assist doctors unfamiliar with the ICD coding system in organizing medical record information, thereby accelerating the medical process and enhancing the efficiency of diagnosis and treatment. ", doi="10.2196/63020", url="https://medinform.jmir.org/2025/1/e63020" } @Article{info:doi/10.2196/57644, author="Zuo, Huiyi and Huang, Baoyu and He, Jian and Fang, Liying and Huang, Minli", title="Machine Learning Approaches in High Myopia: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2025", month="Jan", day="3", volume="27", pages="e57644", keywords="high myopia", keywords="pathological myopia", keywords="high myopia-associated glaucoma", keywords="machine learning", keywords="deep learning", abstract="Background: In recent years, with the rapid development of machine learning (ML), it has gained widespread attention from researchers in clinical practice. ML models appear to demonstrate promising accuracy in the diagnosis of complex diseases, as well as in predicting disease progression and prognosis. Some studies have applied it to ophthalmology, primarily for the diagnosis of pathologic myopia and high myopia-associated glaucoma, as well as for predicting the progression of high myopia. ML-based detection still requires evidence-based validation to prove its accuracy and feasibility. Objective: This study aims to discern the performance of ML methods in detecting high myopia and pathologic myopia in clinical practice, thereby providing evidence-based support for the future development and refinement of intelligent diagnostic or predictive tools. Methods: PubMed, Cochrane, Embase, and Web of Science were thoroughly retrieved up to September 3, 2023. The prediction model risk of bias assessment tool was leveraged to appraise the risk of bias in the eligible studies. The meta-analysis was implemented using a bivariate mixed-effects model. In the validation set, subgroup analyses were conducted based on the ML target events (diagnosis and prediction of high myopia and diagnosis of pathological myopia and high myopia-associated glaucoma) and modeling methods. Results: This study ultimately included 45 studies, of which 32 were used for quantitative meta-analysis. The meta-analysis results unveiled that for the diagnosis of pathologic myopia, the summary receiver operating characteristic (SROC), sensitivity, and specificity of ML were 0.97 (95\% CI 0.95-0.98), 0.91 (95\% CI 0.89-0.92), and 0.95 (95\% CI 0.94-0.97), respectively. Specifically, deep learning (DL) showed an SROC of 0.97 (95\% CI 0.95-0.98), sensitivity of 0.92 (95\% CI 0.90-0.93), and specificity of 0.96 (95\% CI 0.95-0.97), while conventional ML (non-DL) showed an SROC of 0.86 (95\% CI 0.75-0.92), sensitivity of 0.77 (95\% CI 0.69-0.84), and specificity of 0.85 (95\% CI 0.75-0.92). For the diagnosis and prediction of high myopia, the SROC, sensitivity, and specificity of ML were 0.98 (95\% CI 0.96-0.99), 0.94 (95\% CI 0.90-0.96), and 0.94 (95\% CI 0.88-0.97), respectively. For the diagnosis of high myopia-associated glaucoma, the SROC, sensitivity, and specificity of ML were 0.96 (95\% CI 0.94-0.97), 0.92 (95\% CI 0.85-0.96), and 0.88 (95\% CI 0.67-0.96), respectively. Conclusions: ML demonstrated highly promising accuracy in diagnosing high myopia and pathologic myopia. Moreover, based on the limited evidence available, we also found that ML appeared to have favorable accuracy in predicting the risk of developing high myopia in the future. DL can be used as a potential method for intelligent image processing and intelligent recognition, and intelligent examination tools can be developed in subsequent research to provide help for areas where medical resources are scarce. Trial Registration: PROSPERO CRD42023470820; https://tinyurl.com/2xexp738 ", doi="10.2196/57644", url="https://www.jmir.org/2025/1/e57644" } @Article{info:doi/10.2196/52786, author="Luo, Xiao-Qin and Zhang, Ning-Ya and Deng, Ying-Hao and Wang, Hong-Shen and Kang, Yi-Xin and Duan, Shao-Bin", title="Major Adverse Kidney Events in Hospitalized Older Patients With Acute Kidney Injury: Machine Learning--Based Model Development and Validation Study", journal="J Med Internet Res", year="2025", month="Jan", day="3", volume="27", pages="e52786", keywords="major adverse kidney events within 30 days", keywords="older", keywords="acute kidney injury", keywords="machine learning", keywords="prediction model", abstract="Background: Acute kidney injury (AKI) is a common complication in hospitalized older patients, associated with increased morbidity, mortality, and health care costs. Major adverse kidney events within 30 days (MAKE30), a composite of death, new renal replacement therapy, or persistent renal dysfunction, has been recommended as a patient-centered endpoint for clinical trials involving AKI. Objective: This study aimed to develop and validate a machine learning--based model to predict MAKE30 in hospitalized older patients with AKI. Methods: A total of 4266 older patients (aged ? 65 years) with AKI admitted to the Second Xiangya Hospital of Central South University from January 1, 2015, to December 31, 2020, were included and randomly divided into a training set and an internal test set in a ratio of 7:3. An additional cohort of 11,864 eligible patients from the Medical Information Mart for Intensive Care ? database served as an external test set. The Boruta algorithm was used to select the most important predictor variables from 53 candidate variables. The eXtreme Gradient Boosting algorithm was applied to establish a prediction model for MAKE30. Model discrimination was evaluated by the area under the receiver operating characteristic curve (AUROC). The SHapley Additive exPlanations method was used to interpret model predictions. Results: The overall incidence of MAKE30 in the 2 study cohorts was 28.3\% (95\% CI 26.9\%-29.7\%) and 26.7\% (95\% CI 25.9\%-27.5\%), respectively. The prediction model for MAKE30 exhibited adequate predictive performance, with an AUROC of 0.868 (95\% CI 0.852-0.881) in the training set and 0.823 (95\% CI 0.798-0.846) in the internal test set. Its simplified version achieved an AUROC of 0.744 (95\% CI 0.735-0.754) in the external test set. The SHapley Additive exPlanations method showed that the use of vasopressors, mechanical ventilation, blood urea nitrogen level, red blood cell distribution width-coefficient of variation, and serum albumin level were closely associated with MAKE30. Conclusions: An interpretable eXtreme Gradient Boosting model was developed and validated to predict MAKE30, which provides opportunities for risk stratification, clinical decision-making, and the conduct of clinical trials involving AKI. Trial Registration: Chinese Clinical Trial Registry ChiCTR2200061610; https://tinyurl.com/3smf9nuw ", doi="10.2196/52786", url="https://www.jmir.org/2025/1/e52786" } @Article{info:doi/10.2196/58812, author="Jiang, Xiangkui and Wang, Bingquan", title="Enhancing Clinical Decision Making by Predicting Readmission Risk in Patients With Heart Failure Using Machine Learning: Predictive Model Development Study", journal="JMIR Med Inform", year="2024", month="Dec", day="31", volume="12", pages="e58812", keywords="prediction model", keywords="heart failure", keywords="hospital readmission", keywords="machine learning", keywords="cardiology", keywords="admissions", keywords="hospitalization", abstract="Background: Patients with heart failure frequently face the possibility of rehospitalization following an initial hospital stay, placing a significant burden on both patients and health care systems. Accurate predictive tools are crucial for guiding clinical decision-making and optimizing patient care. However, the effectiveness of existing models tailored specifically to the Chinese population is still limited. Objective: This study aimed to formulate a predictive model for assessing the likelihood of readmission among patients diagnosed with heart failure. Methods: In this study, we analyzed data from 1948 patients with heart failure in a hospital in Sichuan Province between 2016 and 2019. By applying 3 variable selection strategies, 29 relevant variables were identified. Subsequently, we constructed 6 predictive models using different algorithms: logistic regression, support vector machine, gradient boosting machine, Extreme Gradient Boosting, multilayer perception, and graph convolutional networks. Results: The graph convolutional network model showed the highest prediction accuracy with an area under the receiver operating characteristic curve of 0.831, accuracy of 75\%, sensitivity of 52.12\%, and specificity of 90.25\%. Conclusions: The model crafted in this study proves its effectiveness in forecasting the likelihood of readmission among patients with heart failure, thus serving as a crucial reference for clinical decision-making. ", doi="10.2196/58812", url="https://medinform.jmir.org/2024/1/e58812" } @Article{info:doi/10.2196/55958, author="Prakash, Prita Madhu and Thiagalingam, Aravinda", title="The Role of Clinician-Developed Applications in Promoting Adherence to Evidence-Based Guidelines: Pilot Study", journal="JMIR Cardio", year="2024", month="Dec", day="31", volume="8", pages="e55958", keywords="computerized clinical decision support systems", keywords="acute coronary syndrome", keywords="clinical guidelines", keywords="chest pain pathway", keywords="decision support", keywords="coronary", keywords="heart", keywords="cardiac", keywords="cardiology", keywords="chest", keywords="pain", keywords="web-based", keywords="app", keywords="applications", keywords="computerized", keywords="guideline", keywords="emergency", keywords="usability", abstract="Background: Computerized clinical decision support systems (CDSS) are increasingly being used in clinical practice to improve health care delivery. Mobile apps are a type of CDSS that are currently being increasingly used, particularly in lifestyle interventions and disease prevention. However, the use of such apps in acute patient care, diagnosis, and management has not been studied to a great extent. The Pathway for Acute Coronary Syndrome Assessment (PACSA) is a set of guidelines developed to standardize the management of suspected acute coronary syndrome across emergency departments in New South Wales, Australia. These guidelines, which risk stratify patients and provide an appropriate management plan, are currently available as PDF documents or physical paper-based PACSA documents. The routine use of these documents and their acceptability among clinicians is uncertain. Presenting the PACSA guidelines on a mobile app in a sequential format may be a more acceptable alternative to the current paper-based PACSA documents. Objective: This study aimed to assess the utility and acceptability of a clinician-developed app modeling the PACSA guidelines as an alternative to the existing paper-based PACSA documents in assessing chest pain presentations to the emergency department. Methods: An app modeling the PACSA guidelines was created using the Research Electronic Data Capture (REDCap) platform by a cardiologist, with a total development time of <3 hours. The app utilizes a sequential design, requiring participants to input patient data in a step-wise fashion to reach the final patient risk stratification. Emergency department doctors were asked to use the app and apply it to two hypothetical patient scenarios. Participants then completed a survey to assess if the PACSA app offered any advantages over the current paper-based PACSA documents Results: Participants (n=31) ranged from junior doctors to senior physicians. Current clinician adherence to the paper-based PACSA documents was low with 55\% (N=17) never using it in their daily practice. Totally, 42\% of participants found the PACSA app easier to use compared to the paper-based PACSA documents and 58\% reported that the PACSA app was also faster to use. The perceived usefulness of the PACSA app was similar to the perceived usefulness of the paper-based PACSA documents. Conclusions: The PACSA app offers a more efficient and user-friendly alternative to the current paper-based PACSA documents and may promote clinician adherence to evidence-based guidelines. Additional studies with a larger number of participants are required to assess the transferability of the PACSA app to everyday practice. Furthermore, apps are relatively easy to develop using existing online platforms, with the scope for clinicians to develop such apps for other evidence-based guidelines and across different specialties. ", doi="10.2196/55958", url="https://cardio.jmir.org/2024/1/e55958" } @Article{info:doi/10.2196/62764, author="Ford, L. Katherine and Laur, Celia and Dhaliwal, Rupinder and Nasser, Roseann and Gramlich, Leah and Allard, P. Johane and Keller, Heather and ", title="Spread and Scale of the Integrated Nutrition Pathway for Acute Care Across Canada: Protocol for the Advancing Malnutrition Care Program", journal="JMIR Res Protoc", year="2024", month="Dec", day="31", volume="13", pages="e62764", keywords="malnutrition", keywords="nutrition screening, nutrition assessment", keywords="hospital", keywords="malnutrition care", keywords="nutrition", keywords="acute care", keywords="clinicians", keywords="mixed-methods design", keywords="decision making", keywords="mentor-champion model", keywords="virtual training", keywords="peer support", keywords="virtual community of practice", abstract="Background: A high proportion of patients admitted to hospital are at nutritional risk or have malnutrition. However, this risk is often not identified at admission, which may result in longer hospital stays and increased likelihood of death. The Integrated Nutrition Pathway for Acute Care (INPAC) was developed to provide clinicians with a standardized approach to prevent, detect, and treat malnutrition in hospital. Objective: The purpose of this study was to determine if the Advancing Malnutrition Care (AMC) program can be used to spread and scale-up improvements to nutrition care in Canadian hospitals. Methods: A prospective, longitudinal, mixed methods design is proposed to evaluate the spread and scale of INPAC best practices across Canadian hospitals using a mentor-champion model. Purposive and snowball sampling are used to recruit mentors and hospital champions to participate in the AMC program. Mentors are persons with experience improving nutrition care in a clinical setting and champions are health care providers with a commitment to implementing best care practices. Mentors and champions are trained digitally on their roles and activities. Mentors meet with champions in their area monthly to support them with making practice change. Champions created a site implementation team to target practice change in a specific area related to malnutrition care and use AMC program-specific tools and resources to implement improvements and collect site information through quarterly audits of patient charts to track implementation of nutrition care best practices. An online community of practice is held every 3-4 months to provide further implementation resources and foster connection between mentors and champions at a national level. A prospective evaluation will be conducted to assess the impact of the program and explore how it can be sustainably spread and scaled across Canada. Semistructured interviews will be used to gain a deeper understanding of mentor and champion experiences in the program. The capabilities, opportunities, and motivations of behavior model will be used to evaluate behavior change and the Kirkpatrick 4-level framework will facilitate assessment of barriers to change. Aggregated chart audits will assess the impact of implemented care practices. Descriptive analyses will be used to describe baseline mentor and champion and hospital characteristics and mentor and champion experiences; Friedman test will describe these changes over time. Directed content analysis will guide interpretation of interview data. Results: Data collection began in September 2022 and is anticipated to end in June 2025, at which time data analysis will begin. Conclusions: Evaluation of the AMC program will strengthen decision-making, future programming, and will inform program changes that reflect implementation of best practices in nutrition care while supporting regional mentors and hospital champions. This work will address the sustainability of AMC and the critical challenges related to hospital-based malnutrition, ultimately improving nutrition care for patients across Canada. International Registered Report Identifier (IRRID): DERR1-10.2196/62764 ", doi="10.2196/62764", url="https://www.researchprotocols.org/2024/1/e62764" } @Article{info:doi/10.2196/56382, author="Wyatt, Sage and Lunde Markussen, Dagfinn and Haizoune, Mounir and Vestb{\o}, Strand Anders and Sima, Tilahun Yeneabeba and Sandboe, Ilene Maria and Landschulze, Marcus and Bartsch, Hauke and Sauer, Martin Christopher", title="Leveraging Machine Learning to Identify Subgroups of Misclassified Patients in the Emergency Department: Multicenter Proof-of-Concept Study", journal="J Med Internet Res", year="2024", month="Dec", day="31", volume="26", pages="e56382", keywords="emergency department", keywords="triage", keywords="machine learning", keywords="real world evidence", keywords="random forest", keywords="classification", keywords="subgroup", keywords="misclassification", keywords="patient", keywords="multi-center", keywords="proof-of-concept", keywords="hospital", keywords="clinical feature", keywords="Norway", keywords="retrospective", keywords="cohort study", keywords="electronic health system", keywords="electronic health record", abstract="Background: Hospitals use triage systems to prioritize the needs of patients within available resources. Misclassification of a patient can lead to either adverse outcomes in a patient who did not receive appropriate care in the case of undertriage or a waste of hospital resources in the case of overtriage. Recent advances in machine learning algorithms allow for the quantification of variables important to under- and overtriage. Objective: This study aimed to identify clinical features most strongly associated with triage misclassification using a machine learning classification model to capture nonlinear relationships. Methods: Multicenter retrospective cohort data from 2 big regional hospitals in Norway were extracted. The South African Triage System is used at Bergen University Hospital, and the Rapid Emergency Triage and Treatment System is used at Trondheim University Hospital. Variables included triage score, age, sex, arrival time, subject area affiliation, reason for emergency department contact, discharge location, level of care, and time of death were retrieved. Random forest classification models were used to identify features with the strongest association with overtriage and undertriage in clinical practice in Bergen and Trondheim. We reported variable importance as SHAP (SHapley Additive exPlanations)-values. Results: We collected data on 205,488 patient records from Bergen University Hospital and 304,997 patient records from Trondheim University Hospital. Overall, overtriage was very uncommon at both hospitals (all <0.1\%), with undertriage differing between both locations, with 0.8\% at Bergen and 0.2\% at Trondheim University Hospital. Demographics were similar for both hospitals. However, the percentage given a high-priority triage score (red or orange) was higher in Bergen (24\%) compared with 9\% in Trondheim. The clinical referral department was found to be the variable with the strongest association with undertriage (mean SHAP +0.62 and +0.37 for Bergen and Trondheim, respectively). Conclusions: We identified subgroups of patients consistently undertriaged using 2 common triage systems. While the importance of clinical patient characteristics to triage misclassification varies by triage system and location, we found consistent evidence between the two locations that the clinical referral department is the most important variable associated with triage misclassification. Replication of this approach at other centers could help to further improve triage scoring systems and improve patient care worldwide. ", doi="10.2196/56382", url="https://www.jmir.org/2024/1/e56382", url="http://www.ncbi.nlm.nih.gov/pubmed/39451101" } @Article{info:doi/10.2196/58686, author="Qin, Chenlong and Peng, Li and Liu, Yun and Zhang, Xiaoliang and Miao, Shumei and Wei, Zhiyuan and Feng, Wei and Zhang, Hongjian and Wan, Cheng and Yu, Yun and Lu, Shan and Huang, Ruochen and Zhang, Xin", title="Development and Validation of a Nomogram-Based Model to Predict Primary Hypertension Within the Next Year in Children and Adolescents: Retrospective Cohort Study", journal="J Med Internet Res", year="2024", month="Dec", day="30", volume="26", pages="e58686", keywords="independent risk factors", keywords="prediction model", keywords="primary hypertension", keywords="clinical applicability", keywords="development", keywords="validation", keywords="pediatrics", keywords="electronic health records", abstract="Background: Primary hypertension (PH) poses significant risks to children and adolescents. Few prediction models for the risk of PH in children and adolescents currently exist, posing a challenge for doctors in making informed clinical decisions. Objective: This study aimed to investigate the incidence and risk factors of PH in Chinese children and adolescents. It also aimed to establish and validate a nomogram-based model for predicting the next year's PH risk. Methods: A training cohort (n=3938, between January 1, 2008, and December 31, 2020) and a validation cohort (n=1269, between January 1, 2021, and July 1, 2023) were established for model training and validation. An independent cohort of 576 individuals was established for external validation of the model. The result of the least absolute shrinkage and selection operator regression technique was used to select the optimal predictive features, and multivariate logistic regression to construct the nomogram. The performance of the nomogram underwent assessment and validation through the area under the receiver operating characteristic curve, concordance index, calibration curves, decision curve analysis, clinical impact curves, and sensitivity analysis. Results: The PH risk factors that we have ultimately identified include gender (odds ratio [OR] 3.34, 95\% CI 2.88 to 3.86; P<.001), age (OR 1.11, 95\% CI 1.08 to 1.14; P<.001), family history of hypertension (OR 42.74, 95\% CI 23.07 to 79.19; P<.001), fasting blood glucose (OR 6.07, 95\% CI 4.74 to 7.78; P<.001), low-density lipoprotein cholesterol (OR 2.03, 95\% CI 1.60 to 2.57; P<.001), and uric acid (OR 1.01, 95\% CI 1.01 to 1.01; P<.001), while factor breastfeeding (OR 0.04, 95\% CI 0.03 to 0.05; P<.001) has been identified as a protective factor. Subsequently, a nomogram has been constructed incorporating these factors. Areas under the receiver operating characteristic curves of the nomogram were 0.892 in the training cohort, 0.808 in the validation cohort, and 0.790 in the external validation cohort. Concordance indexes of the nomogram were 0.892 in the training cohort, 0.808 in the validation cohort, and 0.790 in the external validation cohort. The nomogram has been proven to have good clinical benefits and stability in calibration curves, decision curve analysis, clinical impact curves, and sensitivity analysis. Finally, we observed noteworthy differences in uric acid levels and family history of hypertension among various subgroups, demonstrating a high correlation with PH. Moreover, the web-based calculator of the nomogram was built online. Conclusions: We have developed and validated a stable and reliable nomogram that can accurately predict PH risk within the next year among children and adolescents in primary care and offer effective and cost-efficient support for clinical decisions for the risk prediction of PH. ", doi="10.2196/58686", url="https://www.jmir.org/2024/1/e58686" } @Article{info:doi/10.2196/65286, author="Li, Xue and Wang, Youqing and Li, Huizhang and Wang, Le and Zhu, Juan and Yang, Chen and Du, Lingbin", title="Development of a Prediction Model and Risk Score for Self-Assessment and High-Risk Population Identification in Liver Cancer Screening: Prospective Cohort Study", journal="JMIR Public Health Surveill", year="2024", month="Dec", day="30", volume="10", pages="e65286", keywords="liver cancer", keywords="cancer screening", keywords="cancer surveillance", keywords="prediction model", keywords="early detection", keywords="risk score", keywords="self-assessment", abstract="Background: Liver cancer continues to pose a significant burden in China. To enhance the efficiency of screening, it is crucial to implement population stratification for liver cancer surveillance. Objective: This study aimed to develop a simple prediction model and risk score for liver cancer screening in the general population, with the goal of improving early detection and survival. Methods: This population-based cohort study focused on residents aged 40 to 74 years. Participants were enrolled between 2014 and 2019 and were prospectively followed until June 30, 2021. Data were collected through interviews at enrollment. A Cox proportional hazards regression was used to identify predictors and construct the prediction model. A risk score system was developed based on the weighted factors included in the prediction model. Results: A total of 153,082 study participants (67,586 males and 85,496 females) with a mean age of 55.86 years were included. During 781,125 person-years of follow-up (length of follow-up: median 6.07, IQR 3.07?7.09 years), 290 individuals were diagnosed with liver cancer. Key factors identified for the prediction model and risk score system included age (hazard ratio [HR] 1.06, 95\% CI 1.04?1.08), sex (male: HR 3.41, 95\% CI 2.44?4.78), education level (medium: HR 0.84, 95\% CI 0.61?1.15; high: HR 0.37, 95\% CI 0.17?0.78), cirrhosis (HR 11.93, 95\% CI 7.46?19.09), diabetes (HR 1.59, 95\% CI 1.08?2.34), and hepatitis B surface antigen (HBsAg) status (positive: HR 3.84, 95\% CI 2.38?6.19; unknown: HR 1.04, 95\% CI 0.73?1.49). The model exhibited excellent discrimination in both the development and validation sets, with areas under the curve (AUC) of 0.802, 0.812, and 0.791 for predicting liver cancer at the 1-, 3-, and 5-year periods in the development set and 0.751, 0.763, and 0.712 in the validation set, respectively. Sensitivity analyses applied to the subgroups of participants without cirrhosis and with a negative or unknown HBsAg status yielded similar performances, with AUCs ranging from 0.707 to 0.831. Calibration plots indicated an excellent agreement between the observed and predicted probabilities of developing liver cancer over the 1-, 3-, and 5-year periods. Compared to the low-risk group, participants in the high-risk and moderate-risk groups had 11.88-fold (95\% CI 8.67?16.27) and 3.51-fold (95\% CI 2.58?4.76) higher risks of liver cancer, respectively. Decision curve analysis demonstrated that the risk score provided a higher net benefit compared to the current strategy. To aid in risk stratification for individual participants, a user-friendly web-based scoring system was developed. Conclusions: A straightforward liver cancer prediction model was created by incorporating easily accessible variables. This model enables the identification of asymptomatic individuals who should be prioritized for liver cancer screening. ", doi="10.2196/65286", url="https://publichealth.jmir.org/2024/1/e65286" } @Article{info:doi/10.2196/57824, author="Dimitsaki, Stella and Natsiavas, Pantelis and Jaulent, Marie-Christine", title="Applying AI to Structured Real-World Data for Pharmacovigilance Purposes: Scoping Review", journal="J Med Internet Res", year="2024", month="Dec", day="30", volume="26", pages="e57824", keywords="pharmacovigilance", keywords="drug safety", keywords="artificial intelligence", keywords="machine learning", keywords="real-world data", keywords="scoping review", abstract="Background: Artificial intelligence (AI) applied to real-world data (RWD; eg, electronic health care records) has been identified as a potentially promising technical paradigm for the pharmacovigilance field. There are several instances of AI approaches applied to RWD; however, most studies focus on unstructured RWD (conducting natural language processing on various data sources, eg, clinical notes, social media, and blogs). Hence, it is essential to investigate how AI is currently applied to structured RWD in pharmacovigilance and how new approaches could enrich the existing methodology. Objective: This scoping review depicts the emerging use of AI on structured RWD for pharmacovigilance purposes to identify relevant trends and potential research gaps. Methods: The scoping review methodology is based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology. We queried the MEDLINE database through the PubMed search engine. Relevant scientific manuscripts published from January 2010 to January 2024 were retrieved. The included studies were ``mapped'' against a set of evaluation criteria, including applied AI approaches, code availability, description of the data preprocessing pipeline, clinical validation of AI models, and implementation of trustworthy AI criteria following the guidelines of the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI initiative. Results: The scoping review ultimately yielded 36 studies. There has been a significant increase in relevant studies after 2019. Most of the articles focused on adverse drug reaction detection procedures (23/36, 64\%) for specific adverse effects. Furthermore, a substantial number of studies (34/36, 94\%) used nonsymbolic AI approaches, emphasizing classification tasks. Random forest was the most popular machine learning approach identified in this review (17/36, 47\%). The most common RWD sources used were electronic health care records (28/36, 78\%). Typically, these data were not available in a widely acknowledged data model to facilitate interoperability, and they came from proprietary databases, limiting their availability for reproducing results. On the basis of the evaluation criteria classification, 10\% (4/36) of the studies published their code in public registries, 16\% (6/36) tested their AI models in clinical environments, and 36\% (13/36) provided information about the data preprocessing pipeline. In addition, in terms of trustworthy AI, 89\% (32/36) of the studies followed at least half of the trustworthy AI initiative guidelines. Finally, selection and confounding biases were the most common biases in the included studies. Conclusions: AI, along with structured RWD, constitutes a promising line of work for drug safety and pharmacovigilance. However, in terms of AI, some approaches have not been examined extensively in this field (such as explainable AI and causal AI). Moreover, it would be helpful to have a data preprocessing protocol for RWD to support pharmacovigilance processes. Finally, because of personal data sensitivity, evaluation procedures have to be investigated further. ", doi="10.2196/57824", url="https://www.jmir.org/2024/1/e57824" } @Article{info:doi/10.2196/57204, author="Hoffman, Jane and Hattingh, Laetitia and Shinners, Lucy and Angus, L. Rebecca and Richards, Brent and Hughes, Ian and Wenke, Rachel", title="Allied Health Professionals' Perceptions of Artificial Intelligence in the Clinical Setting: Cross-Sectional Survey", journal="JMIR Form Res", year="2024", month="Dec", day="30", volume="8", pages="e57204", keywords="allied health", keywords="artificial intelligence", keywords="hospital", keywords="digital health", keywords="impact", keywords="AI", keywords="mHealth", keywords="cross sectional", keywords="survey", keywords="health professional", keywords="medical professional", keywords="perception", keywords="clinical setting", keywords="opportunity", keywords="challenge", keywords="healthcare", keywords="delivery", keywords="Australia", keywords="clinician", keywords="confirmatory factor analysis", keywords="linear regression", abstract="Background: Artificial intelligence (AI) has the potential to address growing logistical and economic pressures on the health care system by reducing risk, increasing productivity, and improving patient safety; however, implementing digital health technologies can be disruptive. Workforce perception is a powerful indicator of technology use and acceptance, however, there is little research available on the perceptions of allied health professionals (AHPs) toward AI in health care. Objective: This study aimed to explore AHP perceptions of AI and the opportunities and challenges for its use in health care delivery. Methods: A cross-sectional survey was conducted at a health service in, Queensland, Australia, using the Shinners Artificial Intelligence Perception tool. Results: A total of 231 (22.1\%) participants from 11 AHPs responded to the survey. Participants were mostly younger than 40 years (157/231, 67.9\%), female (189/231, 81.8\%), working in a clinical role (196/231, 84.8\%) with a median of 10 years' experience in their profession. Most participants had not used AI (185/231, 80.1\%), had little to no knowledge about AI (201/231, 87\%), and reported workforce knowledge and skill as the greatest challenges to incorporating AI in health care (178/231, 77.1\%). Age (P=.01), profession (P=.009), and AI knowledge (P=.02) were strong predictors of the perceived professional impact of AI. AHPs generally felt unprepared for the implementation of AI in health care, with concerns about a lack of workforce knowledge on AI and losing valued tasks to AI. Prior use of AI (P=.02) and years of experience as a health care professional (P=.02) were significant predictors of perceived preparedness for AI. Most participants had not received education on AI (190/231, 82.3\%) and desired training (170/231, 73.6\%) and believed AI would improve health care. Ideas and opportunities suggested for the use of AI within the allied health setting were predominantly nonclinical, administrative, and to support patient assessment tasks, with a view to improving efficiencies and increasing clinical time for direct patient care. Conclusions: Education and experience with AI are needed in health care to support its implementation across allied health, the second largest workforce in health. Industry and academic partnerships with clinicians should not be limited to AHPs with high AI literacy as clinicians across all knowledge levels can identify many opportunities for AI in health care. ", doi="10.2196/57204", url="https://formative.jmir.org/2024/1/e57204" } @Article{info:doi/10.2196/52914, author="Wang, Wei and Chen, Xiang and Xu, Licong and Huang, Kai and Zhao, Shuang and Wang, Yong", title="Artificial Intelligence--Aided Diagnosis System for the Detection and Classification of Private-Part Skin Diseases: Decision Analytical Modeling Study", journal="J Med Internet Res", year="2024", month="Dec", day="27", volume="26", pages="e52914", keywords="artificial intelligence-aided diagnosis", keywords="private parts", keywords="skin disease", keywords="knowledge graph", keywords="dermatology", keywords="classification", keywords="artificial intelligence", keywords="AI", keywords="diagnosis", abstract="Background: Private-part skin diseases (PPSDs) can cause a patient's stigma, which may hinder the early diagnosis of these diseases. Artificial intelligence (AI) is an effective tool to improve the early diagnosis of PPSDs, especially in preventing the deterioration of skin tumors in private parts such as Paget disease. However, to our knowledge, there is currently no research on using AI to identify PPSDs due to the complex backgrounds of the lesion areas and the challenges in data collection. Objective: This study aimed to develop and evaluate an AI-aided diagnosis system for the detection and classification of PPSDs: aiding patients in self-screening and supporting dermatologists' diagnostic enhancement. Methods: In this decision analytical modeling study, a 2-stage AI-aided diagnosis system was developed to classify PPSDs. In the first stage, a multitask detection network was trained to automatically detect and classify skin lesions (type, color, and shape). In the second stage, we proposed a knowledge graph based on dermatology expertise and constructed a decision network to classify seven PPSDs (condyloma acuminatum, Paget disease, eczema, pearly penile papules, genital herpes, syphilis, and Bowen disease). A reader study with 13 dermatologists of different experience levels was conducted. Dermatologists were asked to classify the testing cohort under reading room conditions, first without and then with system support. This AI-aided diagnostic study used the data of 635 patients from two institutes between July 2019 and April 2022. The data of Institute 1 contained 2701 skin lesion samples from 520 patients, which were used for the training of the multitask detection network in the first stage. In addition, the data of Institute 2 consisted of 115 clinical images and the corresponding medical records, which were used for the test of the whole 2-stage AI-aided diagnosis system. Results: On the test data of Institute 2, the proposed system achieved the average precision, recall, and F1-score of 0.81, 0.86, and 0.83, respectively, better than existing advanced algorithms. For the reader performance test, our system improved the average F1-score of the junior, intermediate, and senior dermatologists by 16\%, 7\%, and 4\%, respectively. Conclusions: In this study, we constructed the first skin-lesion--based dataset and developed the first AI-aided diagnosis system for PPSDs. This system provides the final diagnosis result by simulating the diagnostic process of dermatologists. Compared with existing advanced algorithms, this system is more accurate in identifying PPSDs. Overall, our system can not only help patients achieve self-screening and alleviate their stigma but also assist dermatologists in diagnosing PPSDs. ", doi="10.2196/52914", url="https://www.jmir.org/2024/1/e52914", url="http://www.ncbi.nlm.nih.gov/pubmed/39729353" } @Article{info:doi/10.2196/55916, author="Grossbard, Eitan and Marziano, Yehonatan and Sharabi, Adam and Abutbul, Eliyahu and Berman, Aya and Kassif-Lerner, Reut and Barkai, Galia and Hakim, Hila and Segal, Gad", title="Consensus Between Radiologists, Specialists in Internal Medicine, and AI Software on Chest X-Rays in a Hospital-at-Home Service: Prospective Observational Study", journal="JMIR Form Res", year="2024", month="Dec", day="24", volume="8", pages="e55916", keywords="chest x-ray", keywords="hospital-at-home", keywords="telemedicine", keywords="artificial intelligence", keywords="kappa", keywords="x-ray", keywords="home hospitalization", keywords="clinical data", keywords="chest", keywords="implementation", keywords="comparative analysis", keywords="radiologist", keywords="AI", abstract="Background: Home hospitalization is a care modality growing in popularity worldwide. Telemedicine-driven hospital-at-home (HAH) services could replace traditional hospital departments for selected patients. Chest x-rays typically serve as a key diagnostic tool in such cases. Objective: The implementation, analysis, and clinical assimilation of chest x-rays into an HAH service has not been described yet. Our objective is to introduce this essential information to the realm of HAH services for the first time worldwide. Methods: The study involved a prospective follow-up, description, and analysis of the HAH patient population who underwent chest x-rays at home. A comparative analysis was performed to evaluate the level of agreement among three interpretation modalities: a radiologist, a specialist in internal medicine, and a designated artificial intelligence (AI) algorithm. Results: Between February 2021 and May 2023, 300 chest radiographs were performed at the homes of 260 patients, with the median age being 78 (IQR 65?87) years. The most frequent underlying morbidity was cardiovascular disease (n=185, 71.2\%). Of the x-rays, 286 (95.3\%) were interpreted by a specialist in internal medicine, 29 (9.7\%) by a specialized radiologist, and 95 (31.7\%) by the AI software. The overall raw agreement level among these three modalities exceeded 90\%. The consensus level evaluated using the Cohen $\kappa$ coefficient showed substantial agreement ($\kappa$=0.65) and moderate agreement ($\kappa$=0.49) between the specialist in internal medicine and the radiologist, and between the specialist in internal medicine and the AI software, respectively. Conclusions: Chest x-rays play a crucial role in the HAH setting. Rapid and reliable interpretation of these x-rays is essential for determining whether a patient requires transfer back to in-hospital surveillance. Our comparative results showed that interpretation by an experienced specialist in internal medicine demonstrates a significant level of consensus with that of the radiologists. However, AI algorithm-based interpretation needs to be further developed and revalidated prior to clinical applications. ", doi="10.2196/55916", url="https://formative.jmir.org/2024/1/e55916" } @Article{info:doi/10.2196/65957, author="Beiler, Donielle and Chopra, Aanya and Gregor, M. Christina and Tusing, D. Lorraine and Pradhan, M. Apoorva and Romagnoli, M. Katrina and Kraus, K. Chadd and Piper, J. Brian and Wright, A. Eric and Troiani, Vanessa", title="Medical Marijuana Documentation Practices in Patient Electronic Health Records: Retrospective Observational Study Using Smart Data Elements and a Review of Medical Records", journal="JMIR Form Res", year="2024", month="Dec", day="23", volume="8", pages="e65957", keywords="cannabis", keywords="learning health system", keywords="Epic", keywords="prescription drug monitoring program", keywords="medical marijuana", keywords="electronic health records", keywords="physician", keywords="cannabis use", keywords="drug use", keywords="data sharing", keywords="patient care", keywords="legalization", keywords="dosage", keywords="chart review protocol", keywords="human data extraction", keywords="data collection", abstract="Background: Medical marijuana (MMJ) is available in Pennsylvania, and participation in the state-regulated program requires patient registration and receiving certification by an approved physician. Currently, no integration of MMJ certification data with health records exists in Pennsylvania that would allow clinicians to rapidly identify patients using MMJ, as exists with other scheduled drugs. This absence of a formal data sharing structure necessitates tools aiding in consistent documentation practices to enable comprehensive patient care. Customized smart data elements (SDEs) were made available to clinicians at an integrated health system, Geisinger, following MMJ legalization in Pennsylvania. Objective: The purpose of this project was to examine and contextualize the use of MMJ SDEs in the Geisinger population. We accomplished this goal by developing a systematic protocol for review of medical records and creating a tool that resulted in consistent human data extraction. Methods: We developed a protocol for reviewing medical records for extracting MMJ-related information. The protocol was developed between August and December of 2022 and focused on a patient group that received one of several MMJ SDEs between January 25, 2019, and May 26, 2022. Characteristics were first identified on a pilot sample (n=5), which were then iteratively reviewed to optimize for consistency. Following the pilot, 2 reviewers were assigned 200 randomly selected patients' medical records, with a third reviewer examining a subsample (n=30) to determine reliability. We then summarized the clinician- and patient-level features from 156 medical records with a table-format SDE that best captured MMJ information. Results: We found the review protocol for medical records was feasible for those with minimal medical background to complete, with high interrater reliability ($\kappa$=0.966; P<.001; odds ratio 0.97, 95\% CI 0.954-0.978). MMJ certification was largely documented by nurses and medical assistants (n=138, 88.5\%) and typically within primary care settings (n=107, 68.6\%). The SDE has 6 preset field prompts with heterogeneous documentation completion rates, including certifying conditions (n=146, 93.6\%), product (n=145, 92.9\%), authorized dispensary (n=137, 87.8\%), active ingredient (n=130, 83.3\%), certifying provider (n=96, 61.5\%), and dosage (n=48, 30.8\%). We found preset fields were overall well-recorded (mean 76.6\%, SD 23.7\% across all fields). Primary diagnostic codes recorded at documentation encounters varied, with the most frequent being routine examinations and testing (n=34, 21.8\%), musculoskeletal or nervous conditions, and signs and symptoms not classified elsewhere (n=21, 13.5\%). Conclusions: This method of reviewing medical records yields high-quality data extraction that can serve as a model for other health record inquiries. Our evaluation showed relatively high completeness of SDE fields, primarily by clinical staff responsible for rooming patients, with an overview of conditions under which MMJ is documented. Improving the adoption and fidelity of SDE data collection may present a valuable data source for future research on patient MMJ use, treatment efficacy, and outcomes. ", doi="10.2196/65957", url="https://formative.jmir.org/2024/1/e65957" } @Article{info:doi/10.2196/63866, author="Sprint, Gina and Schmitter-Edgecombe, Maureen and Cook, Diane", title="Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation", journal="JMIR Form Res", year="2024", month="Dec", day="23", volume="8", pages="e63866", keywords="human digital twin", keywords="cognitive health", keywords="cognitive diagnosis", keywords="large language models", keywords="artificial intelligence", keywords="machine learning", keywords="digital behavior marker", keywords="interview marker", keywords="health information", keywords="chatbot", keywords="digital twin", keywords="smartwatch", abstract="Background: Human digital twins have the potential to change the practice of personalizing cognitive health diagnosis because these systems can integrate multiple sources of health information and influence into a unified model. Cognitive health is multifaceted, yet researchers and clinical professionals struggle to align diverse sources of information into a single model. Objective: This study aims to introduce a method called HDTwin, for unifying heterogeneous data using large language models. HDTwin is designed to predict cognitive diagnoses and offer explanations for its inferences. Methods: HDTwin integrates cognitive health data from multiple sources, including demographic, behavioral, ecological momentary assessment, n-back test, speech, and baseline experimenter testing session markers. Data are converted into text prompts for a large language model. The system then combines these inputs with relevant external knowledge from scientific literature to construct a predictive model. The model's performance is validated using data from 3 studies involving 124 participants, comparing its diagnostic accuracy with baseline machine learning classifiers. Results: HDTwin achieves a peak accuracy of 0.81 based on the automated selection of markers, significantly outperforming baseline classifiers. On average, HDTwin yielded accuracy=0.77, precision=0.88, recall=0.63, and Matthews correlation coefficient=0.57. In comparison, the baseline classifiers yielded average accuracy=0.65, precision=0.86, recall=0.35, and Matthews correlation coefficient=0.36. The experiments also reveal that HDTwin yields superior predictive accuracy when information sources are fused compared to single sources. HDTwin's chatbot interface provides interactive dialogues, aiding in diagnosis interpretation and allowing further exploration of patient data. Conclusions: HDTwin integrates diverse cognitive health data, enhancing the accuracy and explainability of cognitive diagnoses. This approach outperforms traditional models and provides an interface for navigating patient information. The approach shows promise for improving early detection and intervention strategies in cognitive health. ", doi="10.2196/63866", url="https://formative.jmir.org/2024/1/e63866" } @Article{info:doi/10.2196/60684, author="Stephan, Daniel and Bertsch, Annika and Burwinkel, Matthias and Vinayahalingam, Shankeeth and Al-Nawas, Bilal and K{\"a}mmerer, W. Peer and Thiem, GE Daniel", title="AI in Dental Radiology---Improving the Efficiency of Reporting With ChatGPT: Comparative Study", journal="J Med Internet Res", year="2024", month="Dec", day="23", volume="26", pages="e60684", keywords="artificial intelligence", keywords="ChatGPT", keywords="radiology report", keywords="dental radiology", keywords="dental orthopantomogram", keywords="panoramic radiograph", keywords="dental", keywords="radiology", keywords="chatbot", keywords="medical documentation", keywords="medical application", keywords="imaging", keywords="disease detection", keywords="clinical decision support", keywords="natural language processing", keywords="medical licensing", keywords="dentistry", keywords="patient care", abstract="Background: Structured and standardized documentation is critical for accurately recording diagnostic findings, treatment plans, and patient progress in health care. Manual documentation can be labor-intensive and error-prone, especially under time constraints, prompting interest in the potential of artificial intelligence (AI) to automate and optimize these processes, particularly in medical documentation. Objective: This study aimed to assess the effectiveness of ChatGPT (OpenAI) in generating radiology reports from dental panoramic radiographs, comparing the performance of AI-generated reports with those manually created by dental students. Methods: A total of 100 dental students were tasked with analyzing panoramic radiographs and generating radiology reports manually or assisted by ChatGPT using a standardized prompt derived from a diagnostic checklist. Results: Reports generated by ChatGPT showed a high degree of textual similarity to reference reports; however, they often lacked critical diagnostic information typically included in reports authored by students. Despite this, the AI-generated reports were consistent in being error-free and matched the readability of student-generated reports. Conclusions: The findings from this study suggest that ChatGPT has considerable potential for generating radiology reports, although it currently faces challenges in accuracy and reliability. This underscores the need for further refinement in the AI's prompt design and the development of robust validation mechanisms to enhance its use in clinical settings. ", doi="10.2196/60684", url="https://www.jmir.org/2024/1/e60684" } @Article{info:doi/10.2196/60944, author="Tang, Xiaoli and Yang, Xiaochen and Yuan, Jiajun and Yang, Jie and Jin, Qian and Zhang, Hanting and Zhao, Liebin and Guo, Weiwei", title="Call for Decision Support for Electrocardiographic Alarm Administration Among Neonatal Intensive Care Unit Staff: Multicenter, Cross-Sectional Survey", journal="J Med Internet Res", year="2024", month="Dec", day="20", volume="26", pages="e60944", keywords="ECG alarm", keywords="electrocardiographic", keywords="perception", keywords="practice", keywords="decision-making", keywords="neonatal intensive care unit", keywords="health care providers", keywords="cross-sectional survey", keywords="nationwide", abstract="Background: Previous studies have shown that electrocardiographic (ECG) alarms have high sensitivity and low specificity, have underreported adverse events, and may cause neonatal intensive care unit (NICU) staff fatigue or alarm ignoring. Moreover, prolonged noise stimuli in hospitalized neonates can disrupt neonatal development. Objective: The aim of the study is to conduct a nationwide, multicenter, large-sample cross-sectional survey to identify current practices and investigate the decision-making requirements of health care providers regarding ECG alarms. Methods: We conducted a nationwide, cross-sectional survey of NICU staff working in grade III level A hospitals in 27 Chinese provinces to investigate current clinical practices, perceptions, decision-making processes, and decision-support requirements for clinical ECG alarms. A comparative analysis was conducted on the results using the chi-square, Kruskal-Wallis, or Mann-Whitney U tests. Results: In total, 1019 respondents participated in this study. NICU staff reported experiencing a significant number of nuisance alarms and negative perceptions as well as practices regarding ECG alarms. Compared to nurses, physicians had more negative perceptions. Individuals with higher education levels and job titles had more negative perceptions of alarm systems than those with lower education levels and job titles. The mean difficulty score for decision-making about ECG alarms was 2.96 (SD 0.27) of 5. A total of 62.32\% (n=635) respondents reported difficulty in resetting or modifying alarm parameters. Intelligent module--assisted decision support systems were perceived as the most popular form of decision support. Conclusions: This study highlights the negative perceptions and strong decision-making requirements of NICU staff related to ECG alarm handling. Health care policy makers must draw attention to the decision-making requirements and provide adequate decision support in different forms. ", doi="10.2196/60944", url="https://www.jmir.org/2024/1/e60944" } @Article{info:doi/10.2196/42774, author="Ngaruiya, Christine and Samad, Zainab and Tajuddin, Salma and Nasim, Zarmeen and Leff, Rebecca and Farhad, Awais and Pires, Kyle and Khan, Alamgir Muhammad and Hartz, Lauren and Safdar, Basmah", title="Identification of Gender Differences in Acute Myocardial Infarction Presentation and Management at Aga Khan University Hospital-Pakistan: Natural Language Processing Application in a Dataset of Patients With Cardiovascular Disease", journal="JMIR Form Res", year="2024", month="Dec", day="20", volume="8", pages="e42774", keywords="natural language processing", keywords="gender-based differences", keywords="acute coronary syndrome", keywords="global health", keywords="Pakistan", keywords="gender", keywords="data", keywords="dataset", keywords="clinical", keywords="research", keywords="management", keywords="patient", keywords="medication", keywords="women", keywords="tool", abstract="Background: Ischemic heart disease is a leading cause of death globally with a disproportionate burden in low- and middle-income countries (LMICs). Natural language processing (NLP) allows for data enrichment in large datasets to facilitate key clinical research. We used NLP to assess gender differences in symptoms and management of patients hospitalized with acute myocardial infarction (AMI) at Aga Khan University Hospital-Pakistan. Objective: The primary objective of this study was to use NLP to assess gender differences in the symptoms and management of patients hospitalized with AMI at a tertiary care hospital in Pakistan. Methods: We developed an NLP-based methodology to extract AMI symptoms and medications from 5358 discharge summaries spanning the years 1988 to 2018. This dataset included patients admitted and discharged between January 1, 1988, and December 31, 2018, who were older than 18 years with a primary discharge diagnosis of AMI (using ICD-9 [International Classification of Diseases, Ninth Revision], diagnostic codes). The methodology used a fuzzy keyword-matching algorithm to extract AMI symptoms from the discharge summaries automatically. It first preprocesses the free text within the discharge summaries to extract passages indicating the presenting symptoms. Then, it applies fuzzy matching techniques to identify relevant keywords or phrases indicative of AMI symptoms, incorporating negation handling to minimize false positives. After manually reviewing the quality of extracted symptoms in a subset of discharge summaries through preliminary experiments, a similarity threshold of 80\% was determined. Results: Among 1769 women and 3589 men with AMI, women had higher odds of presenting with shortness of breath (odds ratio [OR] 1.46, 95\% CI 1.26-1.70) and lower odds of presenting with chest pain (OR 0.65, 95\% CI 0.55-0.75), even after adjustment for diabetes and age. Presentation with abdominal pain, nausea, or vomiting was much less frequent but consistently more common in women (P<.001). ``Ghabrahat,'' a culturally distinct term for a feeling of impending doom was used by 5.09\% of women and 3.69\% of men as presenting symptom for AMI (P=.06). First-line medication prescription (statin and $\beta$-blockers) was lower in women: women had nearly 30\% lower odds (OR 0.71, 95\% CI 0.57-0.90) of being prescribed statins, and they had 40\% lower odds (OR 0.67, 95\% CI 0.57-0.78) of being prescribed $\beta$-blockers. Conclusions: Gender-based differences in clinical presentation and medication management were demonstrated in patients with AMI at a tertiary care hospital in Pakistan. The use of NLP for the identification of culturally nuanced clinical characteristics and management is feasible in LMICs and could be used as a tool to understand gender disparities and address key clinical priorities in LMICs. ", doi="10.2196/42774", url="https://formative.jmir.org/2024/1/e42774", url="http://www.ncbi.nlm.nih.gov/pubmed/39705071" } @Article{info:doi/10.2196/51615, author="Kuo, Nai-Yu and Tsai, Hsin-Jung and Tsai, Shih-Jen and Yang, C. Albert", title="Efficient Screening in Obstructive Sleep Apnea Using Sequential Machine Learning Models, Questionnaires, and Pulse Oximetry Signals: Mixed Methods Study", journal="J Med Internet Res", year="2024", month="Dec", day="19", volume="26", pages="e51615", keywords="sleep apnea", keywords="machine learning", keywords="questionnaire", keywords="oxygen saturation", keywords="polysomnography", keywords="screening", keywords="sleep disorder", keywords="insomnia", keywords="utilization", keywords="dataset", keywords="training", keywords="diagnostic", abstract="Background: Obstructive sleep apnea (OSA) is a prevalent sleep disorder characterized by frequent pauses or shallow breathing during sleep. Polysomnography, the gold standard for OSA assessment, is time consuming and labor intensive, thus limiting diagnostic efficiency. Objective: This study aims to develop 2 sequential machine learning models to efficiently screen and differentiate OSA. Methods: We used 2 datasets comprising 8444 cases from the Sleep Heart Health Study (SHHS) and 1229 cases from Taipei Veterans General Hospital (TVGH). The Questionnaire Model (Model-Questionnaire) was designed to distinguish OSA from primary insomnia using demographic information and Pittsburgh Sleep Quality Index questionnaires, while the Saturation Model (Model-Saturation) categorized OSA severity based on multiple blood oxygen saturation parameters. The performance of the sequential machine learning models in screening and assessing the severity of OSA was evaluated using an independent test set derived from TVGH. Results: The Model-Questionnaire achieved an F1-score of 0.86, incorporating demographic data and the Pittsburgh Sleep Quality Index. Model-Saturation training by the SHHS dataset displayed an F1-score of 0.82 when using the power spectrum of blood oxygen saturation signals and reached the highest F1-score of 0.85 when considering all saturation-related parameters. Model-saturation training by the TVGH dataset displayed an F1-score of 0.82. The independent test set showed stable results for Model-Questionnaire and Model-Saturation training by the TVGH dataset, but with a slightly decreased F1-score (0.78) in Model-Saturation training by the SHHS dataset. Despite reduced model accuracy across different datasets, precision remained at 0.89 for screening moderate to severe OSA. Conclusions: Although a composite model using multiple saturation parameters exhibits higher accuracy, optimizing this model by identifying key factors is essential. Both models demonstrated adequate at-home screening capabilities for sleep disorders, particularly for patients unsuitable for in-laboratory sleep studies. ", doi="10.2196/51615", url="https://www.jmir.org/2024/1/e51615", url="http://www.ncbi.nlm.nih.gov/pubmed/39699950" } @Article{info:doi/10.2196/51255, author="Ma, Mengqing and Chen, Caimei and Chen, Dawei and Zhang, Hao and Du, Xia and Sun, Qing and Fan, Li and Kong, Huiping and Chen, Xueting and Cao, Changchun and Wan, Xin", title="A Machine Learning--Based Prediction Model for Acute Kidney Injury in Patients With Community-Acquired Pneumonia: Multicenter Validation Study", journal="J Med Internet Res", year="2024", month="Dec", day="19", volume="26", pages="e51255", keywords="acute kidney injury", keywords="community-acquired", keywords="pneumonia", keywords="machine learning", keywords="prediction model", abstract="Background: Acute kidney injury (AKI) is common in patients with community-acquired pneumonia (CAP) and is associated with increased morbidity and mortality. Objective: This study aimed to establish and validate predictive models for AKI in hospitalized patients with CAP based on machine learning algorithms. Methods: We trained and externally validated 5 machine learning algorithms, including logistic regression, support vector machine, random forest, extreme gradient boosting, and deep forest (DF). Feature selection was conducted using the sliding window forward feature selection technique. Shapley additive explanations and local interpretable model-agnostic explanation techniques were applied to the optimal model for visual interpretation. Results: A total of 6371 patients with CAP met the inclusion criteria. The development of CAP-associated AKI (CAP-AKI) was recognized in 1006 (15.8\%) patients. The 11 selected indicators were sex, temperature, breathing rate, diastolic blood pressure, C-reactive protein, albumin, white blood cell, hemoglobin, platelet, blood urea nitrogen, and neutrophil count. The DF model achieved the best area under the receiver operating characteristic curve (AUC) and accuracy in the internal (AUC=0.89, accuracy=0.90) and external validation sets (AUC=0.87, accuracy=0.83). Furthermore, the DF model had the best calibration among all models. In addition, a web-based prediction platform was developed to predict CAP-AKI. Conclusions: The model described in this study is the first multicenter-validated AKI prediction model that accurately predicts CAP-AKI during hospitalization. The web-based prediction platform embedded with the DF model serves as a user-friendly tool for early identification of high-risk patients. ", doi="10.2196/51255", url="https://www.jmir.org/2024/1/e51255", url="http://www.ncbi.nlm.nih.gov/pubmed/39699941" } @Article{info:doi/10.2196/60535, author="Jeanmougin, Pauline and Larramendy, St{\'e}phanie and Fournier, Jean-Pascal and Gaultier, Aur{\'e}lie and Rat, C{\'e}dric", title="Effect of a Feedback Visit and a Clinical Decision Support System Based on Antibiotic Prescription Audit in Primary Care: Multiarm Cluster-Randomized Controlled Trial", journal="J Med Internet Res", year="2024", month="Dec", day="18", volume="26", pages="e60535", keywords="antibacterial agents", keywords="feedback", keywords="clinical decision support system", keywords="prescriptions", keywords="primary health care", keywords="clinical decision", keywords="antibiotic prescription", keywords="antimicrobial", keywords="antibiotic stewardship", keywords="interventions", keywords="health insurance", keywords="systematic antibiotic prescriptions", abstract="Background: While numerous antimicrobial stewardship programs aim to decrease inappropriate antibiotic prescriptions, evidence of their positive impact is needed to optimize future interventions. Objective: This study aimed to evaluate 2 multifaceted antibiotic stewardship interventions for inappropriate systemic antibiotic prescription in primary care. Methods: An open-label, cluster-randomized controlled trial of 2501 general practitioners (GPs) working in western France was conducted from July 2019 to January 2021. Two interventions were studied: the standard intervention, consisting of a visit by a health insurance representative who gave prescription feedback and provided a leaflet for treating cystitis and tonsillitis; and a clinical decision support system (CDSS)--based intervention, consisting of a visit with prescription feedback and a CDSS demonstration on antibiotic prescribing. The control group received no intervention. Data on systemic antibiotic dispensing was obtained from the National Health Insurance System (Syst{\`e}me National d'Information Inter-R{\'e}gimes de l'Assurance Maladie) database. The overall antibiotic volume dispensed per GP at 12 months was compared between arms using a 2-level hierarchical analysis of covariance adjusted for annual antibiotic prescription volume at baseline. Results: Overall, 2501 GPs were randomized (n=1099, 43.9\% women). At 12 months, the mean volume of systemic antibiotics per GP decreased by 219.2 (SD 61.4; 95\% CI ?339.5 to ?98.8; P<.001) defined daily doses in the CDSS-based visit group compared with the control group. The decrease in the mean volume of systemic antibiotics dispensed per GP was not significantly different between the standard visit group and the control group (?109.7, SD 62.4; 95\% CI ?232.0 to 12.5 defined daily doses; P=.08). Conclusions: A visit by a health insurance representative combining feedback and a CDSS demonstration resulted in a 4.4\% (-219.2/4930) reduction in the total volume of systemic antibiotic prescriptions in 12 months. Trial Registration: ClinicalTrials.gov NCT04028830; https://clinicaltrials.gov/study/NCT04028830 ", doi="10.2196/60535", url="https://www.jmir.org/2024/1/e60535", url="http://www.ncbi.nlm.nih.gov/pubmed/39693139" } @Article{info:doi/10.2196/60879, author="Chong, K. Min and Hickie, B. Ian and Ottavio, Antonia and Rogers, David and Dimitropoulos, Gina and LaMonica, M. Haley and Borgnolo, J. Luke and McKenna, Sarah and Scott, M. Elizabeth and Iorfino, Frank", title="A Digital Approach for Addressing Suicidal Ideation and Behaviors in Youth Mental Health Services: Observational Study", journal="J Med Internet Res", year="2024", month="Dec", day="18", volume="26", pages="e60879", keywords="mental health service", keywords="youth mental health", keywords="suicide management", keywords="clinical decision support", keywords="primary care", keywords="personalization", keywords="suicide", keywords="suicidal", keywords="youth", keywords="mental health", keywords="mental health care", keywords="suicide prevention", keywords="digital technology", keywords="online assessment", keywords="clinician", keywords="digital health", keywords="health informatics", keywords="clinical information", abstract="Background: Long wait times for mental health treatments may cause delays in early detection and management of suicidal ideation and behaviors, which are crucial for effective mental health care and suicide prevention. The use of digital technology is a potential solution for prompt identification of youth with high suicidality. Objective: The primary aim of this study was to evaluate the use of a digital suicidality notification system designed to detect and respond to suicidal needs in youth mental health services. Second, the study aimed to characterize young people at different levels of suicidal ideation and behaviors. Methods: Young people aged between 16 and 25 years completed multidimensional assessments using a digital platform, collecting demographic, clinical, social, functional, and suicidality information. When the suicidality score exceeded a predetermined threshold, established based on clinical expertise and service policies, a rule-based algorithm configured within the platform immediately generated an alert for treating clinicians. Subsequent clinical actions and response times were analyzed. Results: A total of 2021 individuals participated, of whom 266 (11\%) triggered one or more high suicidal ideation and behaviors notification. Of the 292 notifications generated, 76\% (222/292) were resolved, with a median response time of 1.9 (range 0-50.8) days. Clinical actions initiated to address suicidality included creating safety plans (60\%, 134/222), conducting safety checks (18\%, 39/222), psychological therapy (8\%, 17/222), transfer to another service (3\%, 8/222), and scheduling of new appointments (2\%, 4/222). Young people with high levels of suicidality were more likely to present with more severe and comorbid symptoms, including low engagement in work or education, heterogenous psychopathology, substance misuse, and recurrent illness. Conclusions: The digital suicidality notification system facilitated prompt clinical actions by alerting clinicians to high levels of suicidal ideation and behaviors detected among youth. Further, the multidimensional assessment revealed complex and comorbid symptoms exhibited in youth with high suicidality. By expediting and personalizing care for those displaying elevated suicidality, the digital notification system can play a pivotal role in preventing rapid symptom progression and its detrimental impacts on young people's mental health. ", doi="10.2196/60879", url="https://www.jmir.org/2024/1/e60879", url="http://www.ncbi.nlm.nih.gov/pubmed/39693140" } @Article{info:doi/10.2196/60231, author="Silvey, Scott and Liu, Jinze", title="Sample Size Requirements for Popular Classification Algorithms in Tabular Clinical Data: Empirical Study", journal="J Med Internet Res", year="2024", month="Dec", day="17", volume="26", pages="e60231", keywords="medical informatics", keywords="machine learning", keywords="sample size", keywords="research design", keywords="decision trees", keywords="classification algorithm", keywords="clinical research", keywords="learning-curve analysis", keywords="analysis", keywords="analyses", keywords="guidelines", keywords="ML", keywords="decision making", keywords="algorithm", keywords="curve analysis", keywords="dataset", abstract="Background: The performance of a classification algorithm eventually reaches a point of diminishing returns, where the additional sample added does not improve the results. Thus, there is a need to determine an optimal sample size that maximizes performance while accounting for computational burden or budgetary concerns. Objective: This study aimed to determine optimal sample sizes and the relationships between sample size and dataset-level characteristics over a variety of binary classification algorithms. Methods: A total of 16 large open-source datasets were collected, each containing a binary clinical outcome. Furthermore, 4 machine learning algorithms were assessed: XGBoost (XGB), random forest (RF), logistic regression (LR), and neural networks (NNs). For each dataset, the cross-validated area under the curve (AUC) was calculated at increasing sample sizes, and learning curves were fit. Sample sizes needed to reach the observed full--dataset AUC minus 2 points (0.02) were calculated from the fitted learning curves and compared across the datasets and algorithms. Dataset--level characteristics, minority class proportion, full--dataset AUC, number of features, type of features, and degree of nonlinearity were examined. Negative binomial regression models were used to quantify relationships between these characteristics and expected sample sizes within each algorithm. A total of 4 multivariable models were constructed, which selected the best-fitting combination of dataset--level characteristics. Results: Among the 16 datasets (full-dataset sample sizes ranging from 70,000-1,000,000), median sample sizes were 9960 (XGB), 3404 (RF), 696 (LR), and 12,298 (NN) to reach AUC stability. For all 4 algorithms, more balanced classes (multiplier: 0.93-0.96 for a 1\% increase in minority class proportion) were associated with decreased sample size. Other characteristics varied in importance across algorithms---in general, more features, weaker features, and more complex relationships between the predictors and the response increased expected sample sizes. In multivariable analysis, the top selected predictors were minority class proportion among all 4 algorithms assessed, full--dataset AUC (XGB, RF, and NN), and dataset nonlinearity (XGB, RF, and NN). For LR, the top predictors were minority class proportion, percentage of strong linear features, and number of features. Final multivariable sample size models had high goodness-of-fit, with dataset--level predictors explaining a majority (66.5\%-84.5\%) of the total deviance in the data among all 4 models. Conclusions: The sample sizes needed to reach AUC stability among 4 popular classification algorithms vary by dataset and method and are associated with dataset--level characteristics that can be influenced or estimated before the start of a research study. ", doi="10.2196/60231", url="https://www.jmir.org/2024/1/e60231" } @Article{info:doi/10.2196/57899, author="Uihlein, Adriane and Beissel, Lisa and Ajlani, Hanane Anna and Orzechowski, Marcin and Leinert, Christoph and Kocar, Derya Thomas and Pankratz, Carlos and Schuetze, Konrad and Gebhard, Florian and Steger, Florian and Fotteler, Liselotte Marina and Denkinger, Michael", title="Expectations and Requirements of Surgical Staff for an AI-Supported Clinical Decision Support System for Older Patients: Qualitative Study", journal="JMIR Aging", year="2024", month="Dec", day="17", volume="7", pages="e57899", keywords="traumatology", keywords="orthogeriatrics", keywords="older adult", keywords="elderly", keywords="older people", keywords="aging", keywords="interviews", keywords="mHealth", keywords="mobile health", keywords="mobile application", keywords="digital health", keywords="digital technology", keywords="digital intervention", keywords="CDSS", keywords="clinical decision support system", keywords="artificial intelligence", keywords="AI", keywords="algorithm", keywords="predictive model", keywords="predictive analytics", keywords="predictive system", keywords="practical model", keywords="decision support", keywords="decision support tool", abstract="Background: Geriatric comanagement has been shown to improve outcomes of older surgical inpatients. Furthermore, the choice of discharge location, that is, continuity of care, can have a fundamental impact on convalescence. These challenges and demands have led to the SURGE-Ahead project that aims to develop a clinical decision support system (CDSS) for geriatric comanagement in surgical clinics including a decision support for the best continuity of care option, supported by artificial intelligence (AI) algorithms. Objective: This qualitative study aims to explore the current challenges and demands in surgical geriatric patient care. Based on these challenges, the study explores the attitude of interviewees toward the introduction of an AI-supported CDSS (AI-CDSS) in geriatric patient care in surgery, focusing on technical and general wishes about an AI-CDSS, as well as ethical considerations. Methods: In this study, 15 personal interviews with physicians, nurses, physiotherapists, and social workers, employed in surgical departments at a university hospital in Southern Germany, were conducted in April 2022. Interviews were conducted in person, transcribed, and coded by 2 researchers (AU, LB) using content and thematic analysis. During the analysis, quotes were sorted into the main categories of geriatric patient care, use of an AI-CDSS, and ethical considerations by 2 authors (AU, LB). The main themes of the interviews were subsequently described in a narrative synthesis, citing key quotes. Results: In total, 399 quotes were extracted and categorized from the interviews. Most quotes could be assigned to the primary code challenges in geriatric patient care (111 quotes), with the most frequent subcode being medical challenges (45 quotes). More quotes were assigned to the primary code chances of an AI-CDSS (37 quotes), with its most frequent subcode being holistic patient overview (16 quotes), then to the primary code limits of an AI-CDSS (26 quotes). Regarding the primary code technical wishes (37 quotes), most quotes could be assigned to the subcode intuitive usability (15 quotes), followed by mobile availability and easy access (11 quotes). Regarding the main category ethical aspects of an AI-CDSS, most quotes could be assigned to the subcode critical position toward trust in an AI-CDSS (9 quotes), followed by the subcodes respecting the patient's will and individual situation (8 quotes) and responsibility remaining in the hands of humans (7 quotes). Conclusions: Support regarding medical geriatric challenges and responsible handling of AI-based recommendations, as well as necessity for a holistic approach focused on usability, were the most important topics of health care professionals in surgery regarding development of an AI-CDSS for geriatric care. These findings, together with the wish to preserve the patient-caregiver relationship, will help set the focus for the ongoing development of AI-supported CDSS. ", doi="10.2196/57899", url="https://aging.jmir.org/2024/1/e57899" } @Article{info:doi/10.2196/51409, author="Cabanillas Silva, Patricia and Sun, Hong and Rezk, Mohamed and Roccaro-Waldmeyer, M. Diana and Fliegenschmidt, Janis and Hulde, Nikolai and von Dossow, Vera and Meesseman, Laurent and Depraetere, Kristof and Stieg, Joerg and Szymanowsky, Ralph and Dahlweid, Fried-Michael", title="Longitudinal Model Shifts of Machine Learning--Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals", journal="J Med Internet Res", year="2024", month="Dec", day="13", volume="26", pages="e51409", keywords="model shift", keywords="model monitoring", keywords="prediction models", keywords="acute kidney injury", keywords="AKI", keywords="sepsis", keywords="delirium", keywords="decision curve analysis", keywords="DCA", abstract="Background: In recent years, machine learning (ML)--based models have been widely used in clinical domains to predict clinical risk events. However, in production, the performances of such models heavily rely on changes in the system and data. The dynamic nature of the system environment, characterized by continuous changes, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring model shifts and evaluating their impact on prediction models are of utmost importance. Objective: This study aimed to assess the impact of a model shift on ML-based prediction models by evaluating 3 different use cases---delirium, sepsis, and acute kidney injury (AKI)---from 2 hospitals (M and H) with different patient populations and investigate potential model deterioration during the COVID-19 pandemic period. Methods: We trained prediction models using retrospective data from earlier years and examined the presence of a model shift using data from more recent years. We used the area under the receiver operating characteristic curve (AUROC) to evaluate model performance and analyzed the calibration curves over time. We also assessed the influence on clinical decisions by evaluating the alert rate, the rates of over- and underdiagnosis, and the decision curve. Results: The 2 data sets used in this study contained 189,775 and 180,976 medical cases for hospitals M and H, respectively. Statistical analyses (Z test) revealed no significant difference (P>.05) between the AUROCs from the different years for all use cases and hospitals. For example, in hospital M, AKI did not show a significant difference between 2020 (AUROC=0.898) and 2021 (AUROC=0.907, Z=--1.171, P=.242). Similar results were observed in both hospitals and for all use cases (sepsis and delirium) when comparing all the different years. However, when evaluating the calibration curves at the 2 hospitals, model shifts were observed for the delirium and sepsis use cases but not for AKI. Additionally, to investigate the clinical utility of our models, we performed decision curve analysis (DCA) and compared the results across the different years. A pairwise nonparametric statistical comparison showed no differences in the net benefit at the probability thresholds of interest (P>.05). The comprehensive evaluations performed in this study ensured robust model performance of all the investigated models across the years. Moreover, neither performance deteriorations nor alert surges were observed during the COVID-19 pandemic period. Conclusions: Clinical risk prediction models were affected by the dynamic and continuous evolution of clinical practices and workflows. The performance of the models evaluated in this study appeared stable when assessed using AUROCs, showing no significant variations over the years. Additional model shift investigations suggested that a calibration shift was present for certain use cases (delirium and sepsis). However, these changes did not have any impact on the clinical utility of the models based on DCA. Consequently, it is crucial to closely monitor data changes and detect possible model shifts, along with their potential influence on clinical decision-making. ", doi="10.2196/51409", url="https://www.jmir.org/2024/1/e51409" } @Article{info:doi/10.2196/63289, author="Carbunaru, Samuel and Neshatvar, Yassamin and Do, Hyungrok and Murray, Katie and Ranganath, Rajesh and Nayan, Madhur", title="Survival After Radical Cystectomy for Bladder Cancer: Development of a Fair Machine Learning Model", journal="JMIR Med Inform", year="2024", month="Dec", day="13", volume="12", pages="e63289", keywords="machine learning", keywords="bladder cancer", keywords="survival", keywords="prediction", keywords="model", keywords="bias", keywords="fairness", keywords="radical cystectomy", keywords="mortality rate", keywords="algorithmic fairness", keywords="health equity", keywords="healthcare disparities", abstract="Background: Prediction models based on machine learning (ML) methods are being increasingly developed and adopted in health care. However, these models may be prone to bias and considered unfair if they demonstrate variable performance in population subgroups. An unfair model is of particular concern in bladder cancer, where disparities have been identified in sex and racial subgroups. Objective: This study aims (1) to develop a ML model to predict survival after radical cystectomy for bladder cancer and evaluate for potential model bias in sex and racial subgroups; and (2) to compare algorithm unfairness mitigation techniques to improve model fairness. Methods: We trained and compared various ML classification algorithms to predict 5-year survival after radical cystectomy using the National Cancer Database. The primary model performance metric was the F1-score. The primary metric for model fairness was the equalized odds ratio (eOR). We compared 3 algorithm unfairness mitigation techniques to improve eOR. Results: We identified 16,481 patients; 23.1\% (n=3800) were female, and 91.5\% (n=15,080) were ``White,'' 5\% (n=832) were ``Black,'' 2.3\% (n=373) were ``Hispanic,'' and 1.2\% (n=196) were ``Asian.'' The 5-year mortality rate was 75\% (n=12,290). The best naive model was extreme gradient boosting (XGBoost), which had an F1-score of 0.860 and eOR of 0.619. All unfairness mitigation techniques increased the eOR, with correlation remover showing the highest increase and resulting in a final eOR of 0.750. This mitigated model had F1-scores of 0.86, 0.904, and 0.824 in the full, Black male, and Asian female test sets, respectively. Conclusions: The ML model predicting survival after radical cystectomy exhibited bias across sex and racial subgroups. By using algorithm unfairness mitigation techniques, we improved algorithmic fairness as measured by the eOR. Our study highlights the role of not only evaluating for model bias but also actively mitigating such disparities to ensure equitable health care delivery. We also deployed the first web-based fair ML model for predicting survival after radical cystectomy. ", doi="10.2196/63289", url="https://medinform.jmir.org/2024/1/e63289" } @Article{info:doi/10.2196/57495, author="Kwok, Wing-Ping Nicholas and Pevnick, Joshua and Feldman, Keith", title="Elevated Ambient Temperature Associated With Reduced Infectious Disease Test Positivity Rates: Retrospective Observational Analysis of Statewide COVID-19 Testing and Weather Across California Counties", journal="JMIR Public Health Surveill", year="2024", month="Dec", day="12", volume="10", pages="e57495", keywords="body temperature", keywords="BT", keywords="fever", keywords="febrile", keywords="feverish", keywords="ambient temperature", keywords="environmental factor", keywords="environmental context", keywords="environmental", keywords="environment", keywords="COVID-19", keywords="SARS-CoV-2", keywords="coronavirus", keywords="respiratory", keywords="infectious", keywords="pulmonary", keywords="COVID-19 pandemic", keywords="pandemic", keywords="diagnostics", keywords="diagnostic test", keywords="diagnostic testing", keywords="public health surveillance", abstract="Background: From medication usage to the time of day, a number of external factors are known to alter human body temperature (BT), even in the absence of underlying pathology. In select cases, clinical guidance already suggests the consideration of clinical and demographic factors when interpreting BT, such as a decreased threshold for fever as age increases. Recent work has indicated factors impacting BT extend to environmental conditions including ambient temperature. However, the effect sizes of these relationships are often small, and it remains unclear if such relationships result in a meaningful impact on real-world health care practices. Objective: Temperature remains a common element in public health screening efforts. Leveraging the unique testing and reporting infrastructure developed around the COVID-19 pandemic, this paper uses a unique resource of daily-level statewide testing data to assess the relationship between ambient temperatures and positivity rates. As fever was a primary symptom that triggered diagnostic testing for COVID-19, this work hypothesizes that environmentally mediated BT increases would not reflect pathology, leading to decreased COVID-19 test positivity rates as temperature rises. Methods: Statewide COVID-19 polymerase chain reaction testing data curated by the California Department of Public Health were used to obtain the daily number of total tests and positivity rates for all counties across the state. These data were combined with ambient temperature data provided by the National Centers for Environmental Information for a period of 133 days between widespread testing availability and vaccine approval. A mixed-effects beta-regression model was used to estimate daily COVID-19 test positivity rate as a function of ambient temperature, population, and estimates of COVID prevalence, with nested random effects for a day of the week within unique counties across the state. Results: Considering over 19 million tests performed over 4 months and across 45 distinct counties, adjusted model results highlighted a significant negative association between daily ambient temperature and testing positivity rate (P<.001). Results of the model are strengthened as, using the same testing data, this relationship was not present in a sensitivity analysis using random daily temperatures drawn from the range of observed values (P=.52). Conclusions: These results support the underlying hypothesis and demonstrate the relationship between environmental factors and BT can impact an essential public health activity. As health care continues to operate using thresholds of BT as anchor points (ie, ?100.4 as fever) it is increasingly important to develop approaches to integrate the array of factors known to influence BT measurement. Moreover, as weather data are not often readily available in the same systems as patient data, these findings present a compelling case for future research into when and how environmental context can best be used to improve the interpretation of patient data. ", doi="10.2196/57495", url="https://publichealth.jmir.org/2024/1/e57495" } @Article{info:doi/10.2196/55460, author="Veyron, Jacques-Henri and Deparis, Fran{\c{c}}ois and Al Zayat, Noel Marie and Belmin, Jo{\"e}l and Havreng-Th{\'e}ry, Charlotte", title="Postimplementation Evaluation in Assisted Living Facilities of an eHealth Medical Device Developed to Predict and Avoid Unplanned Hospitalizations: Pragmatic Trial", journal="J Med Internet Res", year="2024", month="Dec", day="10", volume="26", pages="e55460", keywords="digital technology", keywords="unplanned hospitalization", keywords="machine learning", keywords="predictive tool", keywords="assisted living facility", keywords="eHealth", keywords="pragmatic trial", keywords="artificial intelligence", keywords="AI", keywords="gerontology", keywords="older people", keywords="aging", keywords="quality of life", keywords="uncontrolled multicenter trial", keywords="France", keywords="smartphone", keywords="app", keywords="telehealth", keywords="telemonitoring", keywords="remote monitoring of patients", keywords="electronic patient-reported outcome measure", keywords="ePROM", abstract="Background: The proportion of very old adults in the population is increasing, representing a significant challenge. Due to their vulnerability, there is a higher frequency of unplanned hospitalizations in this population, leading to adverse events. Digital tools based on artificial intelligence (AI) can help to identify early signs of vulnerability and unfavorable health events and can contribute to earlier and optimized management. Objective: This study aims to report the implementation in assisted living facilities of an innovative monitoring system (Presage Care) for predicting the short-term risk of emergency hospitalization. We describe its use and assess its performance. Methods: An uncontrolled multicenter intervention study was conducted between March and August 2022 in 7 assisted living facilities in France that house very old and vulnerable adults. The monitoring system was set up to provide alerts in cases of a high risk of emergency hospitalization. Nurse assistants (NAs) at the assisted living facilities used a smartphone app to complete a questionnaire on the functional status of the patients, comprising electronic patient-reported outcome measures (ePROMs); these were analyzed in real time by a previously designed machine learning algorithm. This remote monitoring of patients using ePROMs allowed notification of a coordinating nurse or a coordinating NA who subsequently informed the patient's nurses or physician. The primary outcomes were the acceptability and feasibility of the monitoring system in the context and confirmation of the effectiveness and efficiency of AI in risk prevention and detection in practical, real-life scenarios. The secondary outcome was the hospitalization rate after alert-triggered interventions. Results: In this study, 118 of 194 (61\%) eligible patients were included who had at least 1 follow-up visit. A total of 38 emergency hospitalizations were documented. The system generated 92 alerts for 47 of the 118 (40\%) patients. Of these 92 alerts, 46 (50\%) led to 46 health care interventions for 14 of the 118 (12\%) patients and resulted in 4 hospitalizations. The other 46 of the 92 (50\%) alerts did not trigger a health care intervention and resulted in 25 hospitalizations (P<.001). Almost all hospitalizations were associated with a lack of alert-triggered interventions (P<.001). System performance to predict hospitalization had a high specificity (96\%) and negative predictive value (99.4\%). Conclusions: The Presage Care system has been implemented with success in assisted living facilities. It was well accepted by coordinating nurses and performed well in predicting emergency hospitalizations. However, its use by NAs was less than expected. Overall, the system performed well in terms of performance and clinical impact in this setting. Nevertheless, further work is needed to improve the moderate use rate by NAs. Trial Registration: ClinicalTrials.gov NCT05221697; https://clinicaltrials.gov/study/NCT05221697 ", doi="10.2196/55460", url="https://www.jmir.org/2024/1/e55460" } @Article{info:doi/10.2196/60176, author="Benjamin, Ellen and Giuliano, K. Karen", title="Work Systems Analysis of Emergency Nurse Patient Flow Management Using the Systems Engineering Initiative for Patient Safety Model: Applying Findings From a Grounded Theory Study", journal="JMIR Hum Factors", year="2024", month="Dec", day="10", volume="11", pages="e60176", keywords="patient flow", keywords="throughput", keywords="emergency department", keywords="nursing", keywords="emergency nursing", keywords="organizing work", keywords="cognitive work", keywords="human factors", keywords="ergonomics", keywords="SEIPS model", abstract="Background: Emergency nurses actively manage the flow of patients through emergency departments. Patient flow management is complex, cognitively demanding work that shapes the timeliness, efficiency, and safety of patient care. Research exploring nursing patient flow management is limited. A comprehensive analysis of emergency nursing work systems is needed to improve patient flow work processes. Objective: The aim of this paper is to describe the work system factors that impact emergency nurse patient flow management using the System Engineering Initiative for Patient Safety model. Methods: This study used grounded theory methodologies. Data were collected through multiple rounds of focus groups and interviews with 27 emergency nurse participants and 64 hours of participant observation across 4 emergency departments between August 2022 and February 2023. Data were analyzed using coding, constant comparative analysis, and memo-writing. Emergent themes were organized according to the first component of the System Engineering Initiative for Patient Safety model, the work system. Results: Patient flow management is impacted by diverse factors, including personal nursing characteristics; tools and technology; external factors; and the emergency department's physical and socio-organizational environment. Participants raised concerns about the available technology's functionality, usability, and accessibility; departmental capacity and layout; resource levels across the health care system; and interdepartmental teamwork. Other noteworthy findings include obscurity and variability across departments' staff roles titles, functions, and norms; the degree of provider involvement in patient flow management decisions; and management's enforcement of timing metrics. Conclusions: There are significant barriers to the work of emergency patient flow management. More research is needed to measure the impact of these human factors on patient flow outcomes. Collaboration between health care administrators, human factors engineers, and nurses is needed to improve emergency nurse work systems. ", doi="10.2196/60176", url="https://humanfactors.jmir.org/2024/1/e60176" } @Article{info:doi/10.2196/55827, author="Sugiura, Ayaka and Saegusa, Satoshi and Jin, Yingzi and Yoshimoto, Riki and Smith, D. Nicholas and Dohi, Koji and Higuchi, Tadashi and Kozu, Tomotake", title="Evaluation of RMES, an Automated Software Tool Utilizing AI, for Literature Screening with Reference to Published Systematic Reviews as Case-Studies: Development and Usability Study", journal="JMIR Form Res", year="2024", month="Dec", day="9", volume="8", pages="e55827", keywords="artificial intelligence", keywords="automated literature screening", keywords="natural language processing", keywords="randomized controlled trials", keywords="Rapid Medical Evidence Synthesis", keywords="RMES", keywords="systematic reviews", keywords="text mining", abstract="Background: Systematic reviews and meta-analyses are important to evidence-based medicine, but the information retrieval and literature screening procedures are burdensome tasks. Rapid Medical Evidence Synthesis (RMES; Deloitte Tohmatsu Risk Advisory LLC) is a software designed to support information retrieval, literature screening, and data extraction for evidence-based medicine. Objective: This study aimed to evaluate the accuracy of RMES for literature screening with reference to published systematic reviews. Methods: We used RMES to automatically screen the titles and abstracts of PubMed-indexed articles included in 12 systematic reviews across 6 medical fields, by applying 4 filters: (1) study type; (2) study type + disease; (3) study type + intervention; and (4) study type + disease + intervention. We determined the numbers of articles correctly included by each filter relative to those included by the authors of each systematic review. Only PubMed-indexed articles were assessed. Results: Across the 12 reviews, the number of articles analyzed by RMES ranged from 46 to 5612. The number of PubMed-cited articles included in the reviews ranged from 4 to 47. The median (range) percentage of articles correctly labeled by RMES using filters 1-4 were: 80.9\% (57.1\%-100\%), 65.2\% (34.1\%-81.8\%), 70.5\% (0\%-100\%), and 58.6\% (0\%-81.8\%), respectively. Conclusions: This study demonstrated good performance and accuracy of RMES for the initial screening of the titles and abstracts of articles for use in systematic reviews. RMES has the potential to reduce the workload involved in the initial screening of published studies. ", doi="10.2196/55827", url="https://formative.jmir.org/2024/1/e55827" } @Article{info:doi/10.2196/67409, author="Sorich, Joseph Michael and Mangoni, Aleksander Arduino and Bacchi, Stephen and Menz, Douglas Bradley and Hopkins, Mark Ashley", title="The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance", journal="J Med Internet Res", year="2024", month="Dec", day="6", volume="26", pages="e67409", keywords="generative artificial intelligence", keywords="large language models", keywords="triage", keywords="diagnosis", keywords="accuracy", keywords="physician", keywords="ChatGPT", keywords="diagnostic", keywords="primary care", keywords="physicians", keywords="prediction", keywords="medical care", keywords="internet", keywords="LLMs", keywords="AI", doi="10.2196/67409", url="https://www.jmir.org/2024/1/e67409" } @Article{info:doi/10.2196/59045, author="Chen, Hongbo and Alfred, Myrtede and Brown, D. Andrew and Atinga, Angela and Cohen, Eldan", title="Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-Ray Pathology Detection: Algorithm Development and Validation Study", journal="JMIR Form Res", year="2024", month="Dec", day="5", volume="8", pages="e59045", keywords="explainable artificial intelligence", keywords="deep learning", keywords="chest x-ray", keywords="thoracic pathology", keywords="fairness", keywords="interpretability", abstract="Background: While deep learning classifiers have shown remarkable results in detecting chest X-ray (CXR) pathologies, their adoption in clinical settings is often hampered by the lack of transparency. To bridge this gap, this study introduces the neural prototype tree (NPT), an interpretable image classifier that combines the diagnostic capability of deep learning models and the interpretability of the decision tree for CXR pathology detection. Objective: This study aimed to investigate the utility of the NPT classifier in 3 dimensions, including performance, interpretability, and fairness, and subsequently examined the complex interaction between these dimensions. We highlight both local and global explanations of the NPT classifier and discuss its potential utility in clinical settings. Methods: This study used CXRs from the publicly available Chest X-ray 14, CheXpert, and MIMIC-CXR datasets. We trained 6 separate classifiers for each CXR pathology in all datasets, 1 baseline residual neural network (ResNet)--152, and 5 NPT classifiers with varying levels of interpretability. Performance, interpretability, and fairness were measured using the area under the receiver operating characteristic curve (ROC AUC), interpretation complexity (IC), and mean true positive rate (TPR) disparity, respectively. Linear regression analyses were performed to investigate the relationship between IC and ROC AUC, as well as between IC and mean TPR disparity. Results: The performance of the NPT classifier improved as the IC level increased, surpassing that of ResNet-152 at IC level 15 for the Chest X-ray 14 dataset and IC level 31 for the CheXpert and MIMIC-CXR datasets. The NPT classifier at IC level 1 exhibited the highest degree of unfairness, as indicated by the mean TPR disparity. The magnitude of unfairness, as measured by the mean TPR disparity, was more pronounced in groups differentiated by age (chest X-ray 14 0.112, SD 0.015; CheXpert 0.097, SD 0.010; MIMIC 0.093, SD 0.017) compared to sex (chest X-ray 14 0.054 SD 0.012; CheXpert 0.062, SD 0.008; MIMIC 0.066, SD 0.013). A significant positive relationship between interpretability (ie, IC level) and performance (ie, ROC AUC) was observed across all CXR pathologies (P<.001). Furthermore, linear regression analysis revealed a significant negative relationship between interpretability and fairness (ie, mean TPR disparity) across age and sex subgroups (P<.001). Conclusions: By illuminating the intricate relationship between performance, interpretability, and fairness of the NPT classifier, this research offers insightful perspectives that could guide future developments in effective, interpretable, and equitable deep learning classifiers for CXR pathology detection. ", doi="10.2196/59045", url="https://formative.jmir.org/2024/1/e59045" } @Article{info:doi/10.2196/63195, author="Gariepy, Genevieve and Zahan, Rifat and Osgood, D. Nathaniel and Yeoh, Benjamin and Graham, Eva and Orpana, Heather", title="Dynamic Simulation Models of Suicide and Suicide-Related Behaviors: Systematic Review", journal="JMIR Public Health Surveill", year="2024", month="Dec", day="2", volume="10", pages="e63195", keywords="suicide", keywords="agent-based modeling", keywords="complex system", keywords="complexity science", keywords="discrete-event simulation", keywords="dynamic modeling", keywords="microsimulation", keywords="system dynamics", keywords="systems science", keywords="qualitative study", keywords="dynamic simulation", keywords="database", keywords="depression", keywords="mental state", keywords="systematic review", keywords="stress", abstract="Background: Suicide remains a public health priority worldwide with over 700,000 deaths annually, ranking as a leading cause of death among young adults. Traditional research methodologies have often fallen short in capturing the multifaceted nature of suicide, focusing on isolated risk factors rather than the complex interplay of individual, social, and environmental influences. Recognizing these limitations, there is a growing recognition of the value of dynamic simulation modeling to inform suicide prevention planning. Objective: This systematic review aims to provide a comprehensive overview of existing dynamic models of population-level suicide and suicide-related behaviors, and to summarize their methodologies, applications, and outcomes. Methods: Eight databases were searched, including MEDLINE, Embase, PsycINFO, Scopus, Compendex, ACM Digital Library, IEEE Xplore, and medRxiv, from inception to July 2023. We developed a search strategy in consultation with a research librarian. Two reviewers independently conducted the title and abstract and full-text screenings including studies using dynamic modeling methods (eg, System Dynamics and agent-based modeling) for suicide or suicide-related behaviors at the population level, and excluding studies on microbiology, bioinformatics, pharmacology, nondynamic modeling methods, and nonprimary modeling reports (eg, editorials and reviews). Reviewers extracted the data using a standardized form and assessed the quality of reporting using the STRESS (Strengthening the Reporting of Empirical Simulation Studies) guidelines. A narrative synthesis was conducted for the included studies. Results: The search identified 1574 studies, with 22 studies meeting the inclusion criteria, including 15 System Dynamics models, 6 agent-based models, and 1 microsimulation model. The studies primarily targeted populations in Australia and the United States, with some focusing on hypothetical scenarios. The models addressed various interventions ranging from specific clinical and health service interventions, such as mental health service capacity increases, to broader social determinants, including employment programs and reduction in access to means of suicide. The studies demonstrated the utility of dynamic models in identifying the synergistic effects of combined interventions and understanding the temporal dynamics of intervention impacts. Conclusions: Dynamic modeling of suicide and suicide-related behaviors, though still an emerging area, is expanding rapidly, adapting to a range of questions, settings, and contexts. While the quality of reporting was overall adequate, some studies lacked detailed reporting on model transparency and reproducibility. This review highlights the potential of dynamic modeling as a tool to support decision-making and to further our understanding of the complex dynamics of suicide and its related behaviors. Trial Registration: PROSPERO CRD42022346617; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=346617 ", doi="10.2196/63195", url="https://publichealth.jmir.org/2024/1/e63195" } @Article{info:doi/10.2196/60258, author="Chen, You and Lehmann, U. Christoph and Malin, Bradley", title="Digital Information Ecosystems in Modern Care Coordination and Patient Care Pathways and the Challenges and Opportunities for AI Solutions", journal="J Med Internet Res", year="2024", month="Dec", day="2", volume="26", pages="e60258", keywords="patient care pathway", keywords="care journey", keywords="care coordination", keywords="digital information ecosystem", keywords="digital technologies", keywords="artificial intelligence", keywords="information interoperability", keywords="information silos", keywords="workload", keywords="information retrieval", keywords="care transitions", keywords="patient-reported outcome measures", keywords="clinical workflow", keywords="usability", keywords="user experience workflow", keywords="health care information systems", keywords="networks of health care professionals", keywords="patient information flow", doi="10.2196/60258", url="https://www.jmir.org/2024/1/e60258" } @Article{info:doi/10.2196/47856, author="Baxter-King, Ryan and Naeim, Arash and Huang, Q. Tina and Sepucha, Karen and Stanton, Annette and Rudkin, Aaron and Ryu, Rita and Sabacan, Leah and Vavreck, Lynn and Esserman, Laura and Stover Fiscalini, Allison and Wenger, S. Neil", title="Relationship Between Perceived COVID-19 Risk and Change in Perceived Breast Cancer Risk: Prospective Observational Study", journal="JMIR Cancer", year="2024", month="Dec", day="2", volume="10", pages="e47856", keywords="breast cancer", keywords="COVID-19 risk perception", keywords="cancer screening", keywords="anxiety", keywords="cancer", keywords="COVID-19", keywords="prevention", keywords="medical care", keywords="screening", keywords="survey", abstract="Background: Whether COVID-19 is associated with a change in risk perception about other health conditions is unknown. Because COVID-19 occurred during a breast cancer study, we evaluated the effect of COVID-19 risk perception on women's breast cancer risk perception. Objective: This study aims to evaluate the relationship between perceived risk of COVID-19 and change in perceived breast cancer risk. We hypothesized that women who perceived greater COVID-19 risk would evidence increased perceived breast cancer risk and this risk would relate to increased anxiety and missed cancer screening. Methods: Women aged 40-74 years with no breast cancer history were enrolled in a US breast cancer prevention trial in outpatient settings. They had provided breast cancer risk perception and general anxiety before COVID-19. We performed a prospective observational study of the relationship between the perceived risk of COVID-19 and the change in perceived breast cancer risk compared to before the pandemic. Each woman was surveyed up to 4 times about COVID-19 and breast cancer risk perception, general anxiety, and missed medical care early in COVID-19 (May to December 2020). Results: Among 13,002 women who completed a survey, compared to before COVID-19, anxiety was higher during COVID-19 (mean T score 53.5 vs 49.7 before COVID-19; difference 3.8, 95\% CI 3.6-4.0; P<.001) and directly related to perceived COVID-19 risk. In survey wave 1, anxiety increased by 2.3 T score points for women with very low perceived COVID-19 risk and 5.2 points for those with moderately or very high perceived COVID-19 risk. Despite no overall difference in breast cancer risk perception (mean 32.5\% vs 32.5\% before COVID-19; difference 0.24, 95\% CI --0.47 to 0.52; P=.93), there was a direct relationship between change in perceived breast cancer risk with COVID-19 risk perception, ranging in survey wave 4 from a 2.4\% decrease in breast cancer risk perception for those with very low COVID-19 risk perception to a 3.4\% increase for women with moderately to very high COVID-19 risk perception. This was not explained by the change in anxiety or missed cancer screening. After adjustment for age, race, education, and survey wave, compared to women with very low perceived COVID-19 risk, perceived breast cancer risk increased by 1.54\% (95\% CI 0.75\%-2.33\%; P<.001), 4.28\% (95\% CI 3.30\%-5.25\%; P<.001), and 3.67\% (95\% CI 1.94\%-5.40\%; P<.001) for women with moderately low, neither high nor low, and moderately or very high perceived COVID-19 risk, respectively. Conclusions: Low perceived COVID-19 risk was associated with reduced perceived breast cancer risk, and higher levels of perceived COVID-19 risk were associated with increased perceived breast cancer risk. This natural experiment suggests that a threat such as COVID-19 may have implications beyond the pandemic. Preventive health behaviors related to perceived risk may need attention as COVID-19 becomes endemic. ", doi="10.2196/47856", url="https://cancer.jmir.org/2024/1/e47856" } @Article{info:doi/10.2196/55185, author="Van De Sijpe, Greet and Gijsen, Matthias and Van der Linden, Lorenz and Strouven, Stephanie and Simons, Eline and Martens, Emily and Persan, Nele and Grootaert, Veerle and Foulon, Veerle and Casteels, Minne and Verelst, Sandra and Vanbrabant, Peter and De Winter, Sabrina and Spriet, Isabel", title="A Prediction Model to Identify Clinically Relevant Medication Discrepancies at the Emergency Department (MED-REC Predictor): Development and Validation Study", journal="J Med Internet Res", year="2024", month="Nov", day="27", volume="26", pages="e55185", keywords="medication reconciliation", keywords="medication discrepancy", keywords="emergency department", keywords="prediction model", keywords="risk stratification", keywords="MED-REC predictor", keywords="MED-REC", keywords="predictor", keywords="patient", keywords="medication", keywords="hospital", keywords="software-implemented prediction model", keywords="software", keywords="geographic validation", keywords="geographic", abstract="Background: Many patients do not receive a comprehensive medication reconciliation, mostly owing to limited resources. We hence need an approach to identify those patients at the emergency department (ED) who are at increased risk for clinically relevant discrepancies. Objective: The aim of our study was to develop and externally validate a prediction model to identify patients at risk for at least 1 clinically relevant medication discrepancy upon ED presentation. Methods: A prospective, multicenter, observational study was conducted at the University Hospitals Leuven and General Hospital Sint-Jan Brugge-Oostende AV, Belgium. Medication histories were obtained from patients admitted to the ED between November 2017 and May 2022, and clinically relevant medication discrepancies were identified. Three distinct datasets were created for model development, temporal external validation, and geographic external validation. Multivariable logistic regression with backward stepwise selection was used to select the final model. The presence of at least 1 clinically relevant discrepancy was the dependent variable. The model was evaluated by measuring calibration, discrimination, classification, and net benefit. Results: We included 824, 350, and 119 patients in the development, temporal validation, and geographic validation dataset, respectively. The final model contained 8 predictors, for example, age, residence before admission, number of drugs, and number of drugs of certain drug classes based on Anatomical Therapeutic Chemical coding. Temporal validation showed excellent calibration with a slope of 1.09 and an intercept of 0.18. Discrimination was moderate with a c-index (concordance index) of 0.67 (95\% CI 0.61-0.73). In the geographic validation dataset, the calibration slope and intercept were 1.35 and 0.83, respectively, and the c-index was 0.68 (95\% CI 0.58-0.78). The model showed net benefit over a range of clinically reasonable threshold probabilities and outperformed other selection criteria. Conclusions: Our software-implemented prediction model shows moderate performance, outperforming random or typical selection criteria for medication reconciliation. Depending on available resources, the probability threshold can be customized to increase either the specificity or the sensitivity of the model. ", doi="10.2196/55185", url="https://www.jmir.org/2024/1/e55185" } @Article{info:doi/10.2196/58036, author="Grechuta, Klaudia and Shokouh, Pedram and Alhussein, Ahmad and M{\"u}ller-Wieland, Dirk and Meyerhoff, Juliane and Gilbert, Jeremy and Purushotham, Sneha and Rolland, Catherine", title="Benefits of Clinical Decision Support Systems for the Management of Noncommunicable Chronic Diseases: Targeted Literature Review", journal="Interact J Med Res", year="2024", month="Nov", day="27", volume="13", pages="e58036", keywords="clinical decision support system", keywords="digital health", keywords="chronic disease management", keywords="electronic health records", keywords="noncommunicable diseases", keywords="targeted literature review", keywords="mobile phone", abstract="Background: Clinical decision support systems (CDSSs) are designed to assist in health care delivery by supporting medical practice with clinical knowledge, patient information, and other relevant types of health information. CDSSs are integral parts of health care technologies assisting in disease management, including diagnosis, treatment, and monitoring. While electronic medical records (EMRs) serve as data repositories, CDSSs are used to assist clinicians in providing personalized, context-specific recommendations derived by comparing individual patient data to evidence-based guidelines. Objective: This targeted literature review (TLR) aimed to identify characteristics and features of both stand-alone and EMR-integrated CDSSs that influence their outcomes and benefits based on published scientific literature. Methods: A TLR was conducted using the Embase, MEDLINE, and Cochrane databases to identify data on CDSSs published in a 10-year frame (2012-2022). Studies on computerized, guideline-based CDSSs used by health care practitioners with a focus on chronic disease areas and reporting outcomes for CDSS utilization were eligible for inclusion. Results: A total of 49 publications were included in the TLR. Studies predominantly reported on EMR-integrated CDSSs (ie, connected to an EMR database; n=32, 65\%). The implementation of CDSSs varied globally, with substantial utilization in the United States and within the domain of cardio-renal-metabolic diseases. CDSSs were found to positively impact ``quality assurance'' (n=35, 69\%) and provide ``clinical benefits'' (n=20, 41\%), compared to usual care. Among CDSS features, treatment guidance and flagging were consistently reported as the most frequent elements for enhancing health care, followed by risk level estimation, diagnosis, education, and data export. The effectiveness of a CDSS was evaluated most frequently in primary care settings (n=34, 69\%) across cardio-renal-metabolic disease areas (n=32, 65\%), especially in diabetes (n=13, 26\%). Studies reported CDSSs to be commonly used by a mixed group (n=27, 55\%) of users including physicians, specialists, nurses or nurse practitioners, and allied health care professionals. Conclusions: Overall, both EMR-integrated and stand-alone CDSSs showed positive results, suggesting their benefits to health care providers and potential for successful adoption. Flagging and treatment recommendation features were commonly used in CDSSs to improve patient care; other features such as risk level estimation, diagnosis, education, and data export were tailored to specific requirements and collectively contributed to the effectiveness of health care delivery. While this TLR demonstrated that both stand-alone and EMR-integrated CDSSs were successful in achieving clinical outcomes, the heterogeneity of included studies reflects the evolving nature of this research area, underscoring the need for further longitudinal studies to elucidate aspects that may impact their adoption in real-world scenarios. ", doi="10.2196/58036", url="https://www.i-jmr.org/2024/1/e58036" } @Article{info:doi/10.2196/54597, author="Deady, Matthew and Duncan, Raymond and Sonesen, Matthew and Estiandan, Renier and Stimpert, Kelly and Cho, Sylvia and Beers, Jeffrey and Goodness, Brian and Jones, Daniel Lance and Forshee, Richard and Anderson, A. Steven and Ezzeldin, Hussein", title="A Computable Phenotype Algorithm for Postvaccination Myocarditis/Pericarditis Detection Using Real-World Data: Validation Study", journal="J Med Internet Res", year="2024", month="Nov", day="25", volume="26", pages="e54597", keywords="adverse event", keywords="vaccine safety", keywords="interoperability", keywords="computable phenotype", keywords="postmarket surveillance system", keywords="fast healthcare interoperability resources", keywords="FHIR", keywords="real-world data", keywords="validation study", keywords="Food and Drug Administration", keywords="electronic health records", keywords="COVID-19 vaccine", abstract="Background: Adverse events (AEs) associated with vaccination have traditionally been evaluated by epidemiological studies. More recently, they have gained attention due to the emergency use authorization of several COVID-19 vaccines. As part of its responsibility to conduct postmarket surveillance, the US Food and Drug Administration continues to monitor several AEs of interest to ensure the safety of vaccines, including those for COVID-19. Objective: This study is part of the Biologics Effectiveness and Safety Initiative, which aims to improve the US Food and Drug Administration's postmarket surveillance capabilities while minimizing the burden of collecting clinical data on suspected postvaccination AEs. The objective of this study was to enhance active surveillance efforts through a pilot platform that can receive automatically reported AE cases through a health care data exchange. Methods: We detected cases by sharing and applying computable phenotype algorithms to real-world data in health care providers' electronic health records databases. Using the fast healthcare interoperability resources standard for secure data transmission, we implemented a computable phenotype algorithm on a new health care system. The study focused on the algorithm's positive predictive value, validated through clinical records, assessing both the time required for implementation and the accuracy of AE detection. Results: The algorithm required 200-250 hours to implement and optimize. Of the 6,574,420 clinical encounters across 694,151 patients, 30 cases were identified as potential myocarditis/pericarditis. Of these, 26 cases were retrievable, and 24 underwent clinical validation. In total, 14 cases were confirmed as definite or probable myocarditis/pericarditis, yielding a positive predictive value of 58.3\% (95\% CI 37.3\%-76.9\%). These findings underscore the algorithm's capability for real-time detection of AEs, though they also highlight variability in performance across different health care systems. Conclusions: The study advocates for the ongoing refinement and application of distributed computable phenotype algorithms to enhance AE detection capabilities. These tools are crucial for comprehensive postmarket surveillance and improved vaccine safety monitoring. The outcomes suggest the need for further optimization to achieve more consistent results across diverse health care settings. ", doi="10.2196/54597", url="https://www.jmir.org/2024/1/e54597" } @Article{info:doi/10.2196/54357, author="Cavero-Redondo, Iv{\'a}n and Martinez-Rodrigo, Arturo and Saz-Lara, Alicia and Moreno-Herraiz, Nerea and Casado-Vicente, Veronica and Gomez-Sanchez, Leticia and Garcia-Ortiz, Luis and Gomez-Marcos, A. Manuel and ", title="Antihypertensive Drug Recommendations for Reducing Arterial Stiffness in Patients With Hypertension: Machine Learning--Based Multicohort (RIGIPREV) Study", journal="J Med Internet Res", year="2024", month="Nov", day="25", volume="26", pages="e54357", keywords="antihypertensive", keywords="drugs", keywords="models", keywords="patients", keywords="pulse wave velocity", keywords="recommendations", keywords="hypertension", keywords="machine learning", keywords="drug recommendations", keywords="arterial stiffness", keywords="RIGIPREV", abstract="Background: High systolic blood pressure is one of the leading global risk factors for mortality, contributing significantly to cardiovascular diseases. Despite advances in treatment, a large proportion of patients with hypertension do not achieve optimal blood pressure control. Arterial stiffness (AS), measured by pulse wave velocity (PWV), is an independent predictor of cardiovascular events and overall mortality. Various antihypertensive drugs exhibit differential effects on PWV, but the extent to which these effects vary depending on individual patient characteristics is not well understood. Given the complexity of selecting the most appropriate antihypertensive medication for reducing PWV, machine learning (ML) techniques offer an opportunity to improve personalized treatment recommendations. Objective: This study aims to develop an ML model that provides personalized recommendations for antihypertensive medications aimed at reducing PWV. The model considers individual patient characteristics, such as demographic factors, clinical data, and cardiovascular measurements, to identify the most suitable antihypertensive agent for improving AS. Methods: This study, known as the RIGIPREV study, used data from the EVA, LOD-DIABETES, and EVIDENT studies involving individuals with hypertension with baseline and follow-up measurements. Antihypertensive drugs were grouped into classes such as angiotensin-converting enzyme inhibitors (ACEIs), angiotensin receptor blockers (ARBs), $\beta$-blockers, diuretics, and combinations of diuretics with ACEIs or ARBs. The primary outcomes were carotid-femoral and brachial-ankle PWV, while the secondary outcomes included various cardiovascular, anthropometric, and biochemical parameters. A multioutput regressor using 6 random forest models was used to predict the impact of each antihypertensive class on PWV reduction. Model performance was evaluated using the coefficient of determination (R2) and mean squared error. Results: The random forest models exhibited strong predictive capabilities, with internal validation yielding R2 values between 0.61 and 0.74, while external validation showed a range of 0.26 to 0.46. The mean squared values ranged from 0.08 to 0.22 for internal validation and from 0.29 to 0.45 for external validation. Variable importance analysis revealed that glycated hemoglobin and weight were the most critical predictors for ACEIs, while carotid-femoral PWV and total cholesterol were key variables for ARBs. The decision tree model achieved an accuracy of 84.02\% in identifying the most suitable antihypertensive drug based on individual patient characteristics. Furthermore, the system's recommendations for ARBs matched 55.3\% of patients' original prescriptions. Conclusions: This study demonstrates the utility of ML techniques in providing personalized treatment recommendations for antihypertensive therapy. By accounting for individual patient characteristics, the model improves the selection of drugs that control blood pressure and reduce AS. These findings could significantly aid clinicians in optimizing hypertension management and reducing cardiovascular risk. However, further studies with larger and more diverse populations are necessary to validate these results and extend the model's applicability. ", doi="10.2196/54357", url="https://www.jmir.org/2024/1/e54357", url="http://www.ncbi.nlm.nih.gov/pubmed/39585738" } @Article{info:doi/10.2196/57486, author="Mai, Haiyan and Lu, Yaxin and Fu, Yu and Luo, Tongsen and Li, Xiaoyue and Zhang, Yihan and Liu, Zifeng and Zhang, Yuenong and Zhou, Shaoli and Chen, Chaojin", title="Identification of a Susceptible and High-Risk Population for Postoperative Systemic Inflammatory Response Syndrome in Older Adults: Machine Learning--Based Predictive Model", journal="J Med Internet Res", year="2024", month="Nov", day="22", volume="26", pages="e57486", keywords="older adult patients", keywords="postoperative SIRS", keywords="sepsis", keywords="machine learning", keywords="prediction model", abstract="Background: Systemic inflammatory response syndrome (SIRS) is a serious postoperative complication among older adult surgical patients that frequently develops into sepsis or even death. Notably, the incidences of SIRS and sepsis steadily increase with age. It is important to identify the risk of postoperative SIRS for older adult patients at a sufficiently early stage, which would allow preemptive individualized enhanced therapy to be conducted to improve the prognosis of older adult patients. In recent years, machine learning (ML) models have been deployed by researchers for many tasks, including disease prediction and risk stratification, exhibiting good application potential. Objective: We aimed to develop and validate an individualized predictive model to identify susceptible and high-risk populations for SIRS in older adult patients to instruct appropriate early interventions. Methods: Data for surgical patients aged ?65 years from September 2015 to September 2020 in 3 independent medical centers were retrieved and analyzed. The eligible patient cohort in the Third Affiliated Hospital of Sun Yat-sen University was randomly separated into an 80\% training set (2882 patients) and a 20\% internal validation set (720 patients). We developed 4 ML models to predict postoperative SIRS. The area under the receiver operating curve (AUC), F1 score, Brier score, and calibration curve were used to evaluate the model performance. The model with the best performance was further validated in the other 2 independent data sets involving 844 and 307 cases, respectively. Results: The incidences of SIRS in the 3 medical centers were 24.3\% (876/3602), 29.6\% (250/844), and 6.5\% (20/307), respectively. We identified 15 variables that were significantly associated with postoperative SIRS and used in 4 ML models to predict postoperative SIRS. A balanced cutoff between sensitivity and specificity was chosen to ensure as high a true positive as possible. The random forest classifier (RF) model showed the best overall performance to predict postoperative SIRS, with an AUC of 0.751 (95\% CI 0.709-0.793), sensitivity of 0.682, specificity of 0.681, and F1 score of 0.508 in the internal validation set and higher AUCs in the external validation-1 set (0.759, 95\% CI 0.723-0.795) and external validation-2 set (0.804, 95\% CI 0.746-0.863). Conclusions: We developed and validated a generalizable RF model to predict postoperative SIRS in older adult patients, enabling clinicians to screen susceptible and high-risk patients and implement early individualized interventions. An online risk calculator to make the RF model accessible to anesthesiologists and peers around the world was developed. ", doi="10.2196/57486", url="https://www.jmir.org/2024/1/e57486" } @Article{info:doi/10.2196/55734, author="Meng, Jian and Niu, Xiaoyu and Luo, Can and Chen, Yueyue and Li, Qiao and Wei, Dongmei", title="Development and Validation of a Machine Learning--Based Early Warning Model for Lichenoid Vulvar Disease: Prediction Model Development Study", journal="J Med Internet Res", year="2024", month="Nov", day="22", volume="26", pages="e55734", keywords="female", keywords="lichenoid vulvar disease", keywords="risk factors", keywords="evidence-based medicine", keywords="early warning model", abstract="Background: Given the complexity and diversity of lichenoid vulvar disease (LVD) risk factors, it is crucial to actively explore these factors and construct personalized warning models using relevant clinical variables to assess disease risk in patients. Yet, to date, there has been insufficient research, both nationwide and internationally, on risk factors and warning models for LVD. In light of these gaps, this study represents the first systematic exploration of the risk factors associated with LVD. Objective: The risk factors of LVD in women were explored and a medically evidence-based warning model was constructed to provide an early alert tool for the high-risk target population. The model can be applied in the clinic to identify high-risk patients and evaluate its accuracy and practicality in predicting LVD in women. Simultaneously, it can also enhance the diagnostic and treatment proficiency of medical personnel in primary community health service centers, which is of great significance in reducing overall health care spending and disease burden. Methods: A total of 2990 patients who attended West China Second Hospital of Sichuan University from January 2013 to December 2017 were selected as the study candidates and were divided into 1218 cases in the normal vulvovagina group (group 0) and 1772 cases in the lichenoid vulvar disease group (group 1) according to the results of the case examination. We investigated and collected routine examination data from patients for intergroup comparisons, included factors with significant differences in multifactorial analysis, and constructed logistic regression, random forests, gradient boosting machine (GBM), adaboost, eXtreme Gradient Boosting, and Categorical Boosting analysis models. The predictive efficacy of these six models was evaluated using receiver operating characteristic curve and area under the curve. Results: Univariate analysis revealed that vaginitis, urinary incontinence, humidity of the long-term residential environment, spicy dietary habits, regular intake of coffee or caffeinated beverages, daily sleep duration, diabetes mellitus, smoking history, presence of autoimmune diseases, menopausal status, and hypertension were all significant risk factors affecting female LVD. Furthermore, the area under the receiver operating characteristic curve, accuracy, sensitivity, and F1-score of the GBM warning model were notably higher than the other 5 predictive analysis models. The GBM analysis model indicated that menopausal status had the strongest impact on female LVD, showing a positive correlation, followed by the presence of autoimmune diseases, which also displayed a positive dependency. Conclusions: In accordance with evidence-based medicine, the construction of a predictive warning model for female LVD can be used to identify high-risk populations at an early stage, aiding in the formulation of effective preventive measures, which is of paramount importance for reducing the incidence of LVD in women. ", doi="10.2196/55734", url="https://www.jmir.org/2024/1/e55734" } @Article{info:doi/10.2196/59260, author="Lee, Haeun and Kim, Seok and Moon, Hui-Woun and Lee, Ho-Young and Kim, Kwangsoo and Jung, Young Se and Yoo, Sooyoung", title="Hospital Length of Stay Prediction for Planned Admissions Using Observational Medical Outcomes Partnership Common Data Model: Retrospective Study", journal="J Med Internet Res", year="2024", month="Nov", day="22", volume="26", pages="e59260", keywords="length of stay", keywords="machine learning", keywords="Observational Medical Outcomes Partnership Common Data Model", keywords="allocation of resources", keywords="reproducibility of results", keywords="hospital", keywords="admission", keywords="retrospective study", keywords="prediction model", keywords="electronic health record", keywords="EHR", keywords="South Korea", keywords="logistic regression", keywords="algorithm", keywords="Shapley Additive Explanation", keywords="health care", keywords="clinical informatics", abstract="Background: Accurate hospital length of stay (LoS) prediction enables efficient resource management. Conventional LoS prediction models with limited covariates and nonstandardized data have limited reproducibility when applied to the general population. Objective: In this study, we developed and validated a machine learning (ML)--based LoS prediction model for planned admissions using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). Methods: Retrospective patient-level prediction models used electronic health record (EHR) data converted to the OMOP CDM (version 5.3) from Seoul National University Bundang Hospital (SNUBH) in South Korea. The study included 137,437 hospital admission episodes between January 2016 and December 2020. Covariates from the patient, condition occurrence, medication, observation, measurement, procedure, and visit occurrence tables were included in the analysis. To perform feature selection, we applied Lasso regularization in the logistic regression. The primary outcome was an LoS of 7 days or longer, while the secondary outcome was an LoS of 3 days or longer. The prediction models were developed using 6 ML algorithms, with the training and test set split in a 7:3 ratio. The performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley Additive Explanations (SHAP) analysis measured feature importance, while calibration plots assessed the reliability of the prediction models. External validation of the developed models occurred at an independent institution, the Seoul National University Hospital. Results: The final sample included 129,938 patient entry events in the planned admissions. The Extreme Gradient Boosting (XGB) model achieved the best performance in binary classification for predicting an LoS of 7 days or longer, with an AUROC of 0.891 (95\% CI 0.887-0.894) and an AUPRC of 0.819 (95\% CI 0.813-0.826) on the internal test set. The Light Gradient Boosting (LGB) model performed the best in the multiclassification for predicting an LoS of 3 days or more, with an AUROC of 0.901 (95\% CI 0.898-0.904) and an AUPRC of 0.770 (95\% CI 0.762-0.779). The most important features contributing to the models were the operation performed, frequency of previous outpatient visits, patient admission department, age, and day of admission. The RF model showed robust performance in the external validation set, achieving an AUROC of 0.804 (95\% CI 0.802-0.807). Conclusions: The use of the OMOP CDM in predicting hospital LoS for planned admissions demonstrates promising predictive capabilities for stays of varying durations. It underscores the advantage of standardized data in achieving reproducible results. This approach should serve as a model for enhancing operational efficiency and patient care coordination across health care settings. ", doi="10.2196/59260", url="https://www.jmir.org/2024/1/e59260" } @Article{info:doi/10.2196/52514, author="Drogt, Jojanneke and Milota, Megan and Veldhuis, Wouter and Vos, Shoko and Jongsma, Karin", title="The Promise of AI for Image-Driven Medicine: Qualitative Interview Study of Radiologists' and Pathologists' Perspectives", journal="JMIR Hum Factors", year="2024", month="Nov", day="21", volume="11", pages="e52514", keywords="digital medicine", keywords="computer vision", keywords="medical AI", keywords="image-driven specialisms", keywords="qualitative interview study", keywords="digital health ethics", keywords="artificial intelligence", keywords="AI", keywords="imaging", keywords="imaging informatics", keywords="radiology", keywords="pathology", abstract="Background: Image-driven specialisms such as radiology and pathology are at the forefront of medical artificial intelligence (AI) innovation. Many believe that AI will lead to significant shifts in professional roles, so it is vital to investigate how professionals view the pending changes that AI innovation will initiate and incorporate their views in ongoing AI developments. Objective: Our study aimed to gain insights into the perspectives and wishes of radiologists and pathologists regarding the promise of AI. Methods: We have conducted the first qualitative interview study investigating the perspectives of both radiologists and pathologists regarding the integration of AI in their fields. The study design is in accordance with the consolidated criteria for reporting qualitative research (COREQ). Results: In total, 21 participants were interviewed for this study (7 pathologists, 10 radiologists, and 4 computer scientists). The interviews revealed a diverse range of perspectives on the impact of AI. Respondents discussed various task-specific benefits of AI; yet, both pathologists and radiologists agreed that AI had yet to live up to its hype. Overall, our study shows that AI could facilitate welcome changes in the workflows of image-driven professionals and eventually lead to better quality of care. At the same time, these professionals also admitted that many hopes and expectations for AI were unlikely to become a reality in the next decade. Conclusions: This study points to the importance of maintaining a ``healthy skepticism'' on the promise of AI in imaging specialisms and argues for more structural and inclusive discussions about whether AI is the right technology to solve current problems encountered in daily clinical practice. ", doi="10.2196/52514", url="https://humanfactors.jmir.org/2024/1/e52514" } @Article{info:doi/10.2196/63031, author="Maa{\ss}, Laura and Badino, Manuel and Iyamu, Ihoghosa and Holl, Felix", title="Assessing the Digital Advancement of Public Health Systems Using Indicators Published in Gray Literature: Narrative Review", journal="JMIR Public Health Surveill", year="2024", month="Nov", day="20", volume="10", pages="e63031", keywords="digital public health", keywords="health system", keywords="indicator", keywords="interdisciplinary", keywords="information and communications technology", keywords="maturity assessment", keywords="readiness assessment", keywords="narrative review", keywords="gray literature", keywords="digital health", keywords="mobile phone", abstract="Background: Revealing the full potential of digital public health (DiPH) systems requires a wide-ranging tool to assess their maturity and readiness for emerging technologies. Although a variety of indices exist to assess digital health systems, questions arise about the inclusion of indicators of information and communications technology maturity and readiness, digital (health) literacy, and interest in DiPH tools by the society and workforce, as well as the maturity of the legal framework and the readiness of digitalized health systems. Existing tools frequently target one of these domains while overlooking the others. In addition, no review has yet holistically investigated the available national DiPH system maturity and readiness indicators using a multidisciplinary lens. Objective: We used a narrative review to map the landscape of DiPH system maturity and readiness indicators published in the gray literature. Methods: As original indicators were not published in scientific databases, we applied predefined search strings to the DuckDuckGo and Google search engines for 11 countries from all continents that had reached level 4 of 5 in the latest Global Digital Health Monitor evaluation. In addition, we searched the literature published by 19 international organizations for maturity and readiness indicators concerning DiPH. Results: Of the 1484 identified references, 137 were included, and they yielded 15,806 indicators. We deemed 286 indicators from 90 references relevant for DiPH system maturity and readiness assessments. The majority of these indicators (133/286, 46.5\%) had legal relevance (targeting big data and artificial intelligence regulation, cybersecurity, national DiPH strategies, or health data governance), and the smallest number of indicators (37/286, 12.9\%) were related to social domains (focusing on internet use and access, digital literacy and digital health literacy, or the use of DiPH tools, smartphones, and computers). Another 14.3\% (41/286) of indicators analyzed the information and communications technology infrastructure (such as workforce, electricity, internet, and smartphone availability or interoperability standards). The remaining 26.2\% (75/286) of indicators described the degree to which DiPH was applied (including health data architecture, storage, and access; the implementation of DiPH interventions; or the existence of interventions promoting health literacy and digital inclusion). Conclusions: Our work is the first to conduct a multidisciplinary analysis of the gray literature on DiPH maturity and readiness assessments. Although new methods for systematically researching gray literature are needed, our study holds the potential to develop more comprehensive tools for DiPH system assessments. We contributed toward a more holistic understanding of DiPH. Further examination is required to analyze the suitability and applicability of all identified indicators in diverse health care settings. By developing a standardized method to assess DiPH system maturity and readiness, we aim to foster informed decision-making among health care planners and practitioners to improve resource distribution and continue to drive innovation in health care delivery. ", doi="10.2196/63031", url="https://publichealth.jmir.org/2024/1/e63031", url="http://www.ncbi.nlm.nih.gov/pubmed/39566910" } @Article{info:doi/10.2196/58088, author="Patel, Mohammed Ahmed and Baxter, Weston and Porat, Talya", title="Toward Guidelines for Designing Holistic Integrated Information Visualizations for Time-Critical Contexts: Systematic Review", journal="J Med Internet Res", year="2024", month="Nov", day="20", volume="26", pages="e58088", keywords="visualization", keywords="design", keywords="holistic", keywords="integrated", keywords="time-critical", keywords="guidelines", keywords="pre-attentive processing", keywords="gestalt theory", keywords="situation awareness", keywords="decision-making", keywords="mobile phone", abstract="Background: With the extensive volume of information from various and diverse data sources, it is essential to present information in a way that allows for quick understanding and interpretation. This is particularly crucial in health care, where timely insights into a patient's condition can be lifesaving. Holistic visualizations that integrate multiple data variables into a single visual representation can enhance rapid situational awareness and support informed decision-making. However, despite the existence of numerous guidelines for different types of visualizations, this study reveals that there are currently no specific guidelines or principles for designing holistic integrated information visualizations that enable quick processing and comprehensive understanding of multidimensional data in time-critical contexts. Addressing this gap is essential for enhancing decision-making in time-critical scenarios across various domains, particularly in health care. Objective: This study aims to establish a theoretical foundation supporting the argument that holistic integrated visualizations are a distinct type of visualization for time-critical contexts and identify applicable design principles and guidelines that can be used to design for such cases. Methods: We systematically searched the literature for peer-reviewed research on visualization strategies, guidelines, and taxonomies. The literature selection followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The search was conducted across 6 databases: ACM Digital Library, Google Scholar, IEEE Xplore, PubMed, Scopus, and Web of Science. The search was conducted up to August 2024 using the terms (``visualisations'' OR ``visualizations'') AND (``guidelines'' OR ``taxonomy'' OR ``taxonomies''), with studies restricted to the English language. Results: Of 936 papers, 46 (4.9\%) were included in the final review. In total, 48\% (22/46) related to providing a holistic understanding and overview of multidimensional data; 28\% (13/46) focused on integrated presentation, that is, integrating or combining multidimensional data into a single visual representation; and 35\% (16/46) pertained to time and designing for rapid information processing. In total, 65\% (30/46) of the papers presented general information visualization or visual communication guidelines and principles. No specific guidelines or principles were found that addressed all the characteristics of holistic, integrated visualizations in time-critical contexts. A summary of the key guidelines and principles from the 46 papers was extracted, collated, and categorized into 60 guidelines that could aid in designing holistic integrated visualizations. These were grouped according to different characteristics identified in the systematic review (eg, gestalt principles, reduction, organization, abstraction, and task complexity) and further condensed into 5 main proposed guidelines. Conclusions: Holistic integrated information visualizations in time-critical domains are a unique use case requiring a unique set of design guidelines. Our proposed 5 main guidelines, derived from existing design theories and guidelines, can serve as a starting point to enable both holistic and rapid processing of information, facilitating better-informed decisions in time-critical contexts. ", doi="10.2196/58088", url="https://www.jmir.org/2024/1/e58088" } @Article{info:doi/10.2196/58329, author="Seo, Junhyuk and Choi, Dasol and Kim, Taerim and Cha, Chul Won and Kim, Minha and Yoo, Haanju and Oh, Namkee and Yi, YongJin and Lee, Hwa Kye and Choi, Edward", title="Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study", journal="J Med Internet Res", year="2024", month="Nov", day="20", volume="26", pages="e58329", keywords="large language models", keywords="health care documentation", keywords="clinical evaluation", keywords="emergency department", keywords="artificial intelligence", keywords="medical record accuracy", abstract="Background: The advancement of large language models (LLMs) offers significant opportunities for health care, particularly in the generation of medical documentation. However, challenges related to ensuring the accuracy and reliability of LLM outputs, coupled with the absence of established quality standards, have raised concerns about their clinical application. Objective: This study aimed to develop and validate an evaluation framework for assessing the accuracy and clinical applicability of LLM-generated emergency department (ED) records, aiming to enhance artificial intelligence integration in health care documentation. Methods: We organized the Healthcare Prompt-a-thon, a competitive event designed to explore the capabilities of LLMs in generating accurate medical records. The event involved 52 participants who generated 33 initial ED records using HyperCLOVA X, a Korean-specialized LLM. We applied a dual evaluation approach. First, clinical evaluation: 4 medical professionals evaluated the records using a 5-point Likert scale across 5 criteria---appropriateness, accuracy, structure/format, conciseness, and clinical validity. Second, quantitative evaluation: We developed a framework to categorize and count errors in the LLM outputs, identifying 7 key error types. Statistical methods, including Pearson correlation and intraclass correlation coefficients (ICC), were used to assess consistency and agreement among evaluators. Results: The clinical evaluation demonstrated strong interrater reliability, with ICC values ranging from 0.653 to 0.887 (P<.001), and a test-retest reliability Pearson correlation coefficient of 0.776 (P<.001). Quantitative analysis revealed that invalid generation errors were the most common, constituting 35.38\% of total errors, while structural malformation errors had the most significant negative impact on the clinical evaluation score (Pearson r=--0.654; P<.001). A strong negative correlation was found between the number of quantitative errors and clinical evaluation scores (Pearson r=--0.633; P<.001), indicating that higher error rates corresponded to lower clinical acceptability. Conclusions: Our research provides robust support for the reliability and clinical acceptability of the proposed evaluation framework. It underscores the framework's potential to mitigate clinical burdens and foster the responsible integration of artificial intelligence technologies in health care, suggesting a promising direction for future research and practical applications in the field. ", doi="10.2196/58329", url="https://www.jmir.org/2024/1/e58329" } @Article{info:doi/10.2196/64844, author="Hirosawa, Takanobu and Harada, Yukinori and Tokumasu, Kazuki and Shiraishi, Tatsuya and Suzuki, Tomoharu and Shimizu, Taro", title="Comparative Analysis of Diagnostic Performance: Differential Diagnosis Lists by LLaMA3 Versus LLaMA2 for Case Reports", journal="JMIR Form Res", year="2024", month="Nov", day="19", volume="8", pages="e64844", keywords="artificial intelligence", keywords="clinical decision support system", keywords="generative artificial intelligence", keywords="large language models", keywords="natural language processing", keywords="NLP", keywords="AI", keywords="clinical decision making", keywords="decision support", keywords="decision making", keywords="LLM: diagnostic", keywords="case report", keywords="diagnosis", keywords="generative AI", keywords="LLaMA", abstract="Background: Generative artificial intelligence (AI), particularly in the form of large language models, has rapidly developed. The LLaMA series are popular and recently updated from LLaMA2 to LLaMA3. However, the impacts of the update on diagnostic performance have not been well documented. Objective: We conducted a comparative evaluation of the diagnostic performance in differential diagnosis lists generated by LLaMA3 and LLaMA2 for case reports. Methods: We analyzed case reports published in the American Journal of Case Reports from 2022 to 2023. After excluding nondiagnostic and pediatric cases, we input the remaining cases into LLaMA3 and LLaMA2 using the same prompt and the same adjustable parameters. Diagnostic performance was defined by whether the differential diagnosis lists included the final diagnosis. Multiple physicians independently evaluated whether the final diagnosis was included in the top 10 differentials generated by LLaMA3 and LLaMA2. Results: In our comparative evaluation of the diagnostic performance between LLaMA3 and LLaMA2, we analyzed differential diagnosis lists for 392 case reports. The final diagnosis was included in the top 10 differentials generated by LLaMA3 in 79.6\% (312/392) of the cases, compared to 49.7\% (195/392) for LLaMA2, indicating a statistically significant improvement (P<.001). Additionally, LLaMA3 showed higher performance in including the final diagnosis in the top 5 differentials, observed in 63\% (247/392) of cases, compared to LLaMA2's 38\% (149/392, P<.001). Furthermore, the top diagnosis was accurately identified by LLaMA3 in 33.9\% (133/392) of cases, significantly higher than the 22.7\% (89/392) achieved by LLaMA2 (P<.001). The analysis across various medical specialties revealed variations in diagnostic performance with LLaMA3 consistently outperforming LLaMA2. Conclusions: The results reveal that the LLaMA3 model significantly outperforms LLaMA2 per diagnostic performance, with a higher percentage of case reports having the final diagnosis listed within the top 10, top 5, and as the top diagnosis. Overall diagnostic performance improved almost 1.5 times from LLaMA2 to LLaMA3. These findings support the rapid development and continuous refinement of generative AI systems to enhance diagnostic processes in medicine. However, these findings should be carefully interpreted for clinical application, as generative AI, including the LLaMA series, has not been approved for medical applications such as AI-enhanced diagnostics. ", doi="10.2196/64844", url="https://formative.jmir.org/2024/1/e64844" } @Article{info:doi/10.2196/63445, author="Ralevski, Alexandra and Taiyab, Nadaa and Nossal, Michael and Mico, Lindsay and Piekos, Samantha and Hadlock, Jennifer", title="Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study", journal="J Med Internet Res", year="2024", month="Nov", day="19", volume="26", pages="e63445", keywords="housing instability", keywords="housing insecurity", keywords="housing", keywords="machine learning", keywords="artificial intelligence", keywords="AI", keywords="large language model", keywords="LLM", keywords="natural language processing", keywords="NLP", keywords="electronic health record", keywords="EHR", keywords="electronic medical record", keywords="EMR", keywords="social determinants of health", keywords="exposome", keywords="pregnancy", keywords="obstetric", keywords="deidentification", abstract="Background: Social determinants of health (SDoH) such as housing insecurity are known to be intricately linked to patients' health status. More efficient methods for abstracting structured data on SDoH can help accelerate the inclusion of exposome variables in biomedical research and support health care systems in identifying patients who could benefit from proactive outreach. Large language models (LLMs) developed from Generative Pre-trained Transformers (GPTs) have shown potential for performing complex abstraction tasks on unstructured clinical notes. Objective: Here, we assess the performance of GPTs on identifying temporal aspects of housing insecurity and compare results between both original and deidentified notes. Methods: We compared the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results were compared with manual abstraction, a named entity recognition model, and regular expressions. Results: Compared with GPT-3.5 and the named entity recognition model, GPT-4 had the highest performance and had a much higher recall (0.924) than human abstractors (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human abstractors (0.971). GPT-4's precision improved slightly (0.936 original, 0.939 deidentified) on deidentified versions of the same notes, while recall dropped (0.781 original, 0.704 deidentified). Conclusions: This work demonstrates that while manual abstraction is likely to yield slightly more accurate results overall, LLMs can provide a scalable, cost-effective solution with the advantage of greater recall. This could support semiautomated abstraction, but given the potential risk for harm, human review would be essential before using results for any patient engagement or care decisions. Furthermore, recall was lower when notes were deidentified prior to LLM abstraction. ", doi="10.2196/63445", url="https://www.jmir.org/2024/1/e63445" } @Article{info:doi/10.2196/59439, author="Ke, Yuhe and Yang, Rui and Lie, An Sui and Lim, Yi Taylor Xin and Ning, Yilin and Li, Irene and Abdullah, Rizal Hairil and Ting, Wei Daniel Shu and Liu, Nan", title="Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study", journal="J Med Internet Res", year="2024", month="Nov", day="19", volume="26", pages="e59439", keywords="clinical decision-making", keywords="cognitive bias", keywords="generative artificial intelligence", keywords="large language model", keywords="multi-agent", abstract="Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study aimed to explore the role of large language models (LLMs) in mitigating these biases through the use of the multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy compared with humans. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 (OpenAI) to facilitate interactions among different simulated agents to replicate clinical team dynamics. Each agent was assigned a distinct role: (1) making the final diagnosis after considering the discussions, (2) acting as a devil's advocate to correct confirmation and anchoring biases, (3) serving as a field expert in the required medical subspecialty, (4) facilitating discussions to mitigate premature closure bias, and (5) recording and summarizing findings. We tested varying combinations of these agents within the framework to determine which configuration yielded the highest rate of correct final diagnoses. Each scenario was repeated 5 times for consistency. The accuracy of the initial diagnoses and the final differential diagnoses were evaluated, and comparisons with human-generated answers were made using the Fisher exact test. Results: A total of 240 responses were evaluated (3 different multi-agent frameworks). The initial diagnosis had an accuracy of 0\% (0/80). However, following multi-agent discussions, the accuracy for the top 2 differential diagnoses increased to 76\% (61/80) for the best-performing multi-agent framework (Framework 4-C). This was significantly higher compared with the accuracy achieved by human evaluators (odds ratio 3.49; P=.002). Conclusions: The multi-agent framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. In addition, the LLM-driven, multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios. ", doi="10.2196/59439", url="https://www.jmir.org/2024/1/e59439" } @Article{info:doi/10.2196/66453, author="Tang, Ran and Qi, Shi-qin", title="The Vast Potential of ChatGPT in Pediatric Surgery", journal="J Med Internet Res", year="2024", month="Nov", day="18", volume="26", pages="e66453", keywords="ChatGPT", keywords="pediatric", keywords="surgery", keywords="artificial intelligence", keywords="AI", keywords="diagnosis", keywords="surgeon", doi="10.2196/66453", url="https://www.jmir.org/2024/1/e66453" } @Article{info:doi/10.2196/57641, author="Zhu, Jinpu and Yang, Fushuang and Wang, Yang and Wang, Zhongtian and Xiao, Yao and Wang, Lie and Sun, Liping", title="Accuracy of Machine Learning in Discriminating Kawasaki Disease and Other Febrile Illnesses: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2024", month="Nov", day="18", volume="26", pages="e57641", keywords="machine learning", keywords="artificial intelligence", keywords="Kawasaki disease", keywords="febrile illness", keywords="coronary artery lesions", keywords="systematic review", keywords="meta-analysis", abstract="Background: Kawasaki disease (KD) is an acute pediatric vasculitis that can lead to coronary artery aneurysms and severe cardiovascular complications, often presenting with obvious fever in the early stages. In current clinical practice, distinguishing KD from other febrile illnesses remains a significant challenge. In recent years, some researchers have explored the potential of machine learning (ML) methods for the differential diagnosis of KD versus other febrile illnesses, as well as for predicting coronary artery lesions (CALs) in people with KD. However, there is still a lack of systematic evidence to validate their effectiveness. Therefore, we have conducted the first systematic review and meta-analysis to evaluate the accuracy of ML in differentiating KD from other febrile illnesses and in predicting CALs in people with KD, so as to provide evidence-based support for the application of ML in the diagnosis and treatment of KD. Objective: This study aimed to summarize the accuracy of ML in differentiating KD from other febrile illnesses and predicting CALs in people with KD. Methods: PubMed, Cochrane Library, Embase, and Web of Science were systematically searched until September 26, 2023. The risk of bias in the included original studies was appraised using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Stata (version 15.0; StataCorp) was used for the statistical analysis. Results: A total of 29 studies were incorporated. Of them, 20 used ML to differentiate KD from other febrile illnesses. These studies involved a total of 103,882 participants, including 12,541 people with KD. In the validation set, the pooled concordance index, sensitivity, and specificity were 0.898 (95\% CI 0.874-0.922), 0.91 (95\% CI 0.83-0.95), and 0.86 (95\% CI 0.80-0.90), respectively. Meanwhile, 9 studies used ML for early prediction of the risk of CALs in children with KD. These studies involved a total of 6503 people with KD, of whom 986 had CALs. The pooled concordance index in the validation set was 0.787 (95\% CI 0.738-0.835). Conclusions: The diagnostic and predictive factors used in the studies we included were primarily derived from common clinical data. The ML models constructed based on these clinical data demonstrated promising effectiveness in differentiating KD from other febrile illnesses and in predicting coronary artery lesions. Therefore, in future research, we can explore the use of ML methods to identify more efficient predictors and develop tools that can be applied on a broader scale for the differentiation of KD and the prediction of CALs. ", doi="10.2196/57641", url="https://www.jmir.org/2024/1/e57641" } @Article{info:doi/10.2196/49724, author="Cho, Na Ha and Jun, Joon Tae and Kim, Young-Hak and Kang, Heejun and Ahn, Imjin and Gwon, Hansle and Kim, Yunha and Seo, Jiahn and Choi, Heejung and Kim, Minkyoung and Han, Jiye and Kee, Gaeun and Park, Seohyun and Ko, Soyoung", title="Task-Specific Transformer-Based Language Models in Health Care: Scoping Review", journal="JMIR Med Inform", year="2024", month="Nov", day="18", volume="12", pages="e49724", keywords="transformer-based language models", keywords="medicine", keywords="health care", keywords="medical language model", abstract="Background: Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows. Objective: This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition. Methods: We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks. Results: Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences. Conclusions: This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics. ", doi="10.2196/49724", url="https://medinform.jmir.org/2024/1/e49724" } @Article{info:doi/10.2196/55865, author="Bogale, Binyam and Vesinurm, M{\"a}rt and Lillrank, Paul and Celius, Gulowsen Elisabeth and Halvorsrud, Ragnhild", title="Visual Modeling Languages in Patient Pathways: Scoping Review", journal="Interact J Med Res", year="2024", month="Nov", day="15", volume="13", pages="e55865", keywords="patient pathways", keywords="visual modeling languages", keywords="business process model and notation", keywords="BPMN", keywords="unified modeling language", keywords="UML", keywords="domain-specific modeling languages", keywords="scoping review", abstract="Background: Patient pathways (PPs) are presented as a panacea solution to enhance health system functions. It is a complex concept that needs to be described and communicated well. Modeling plays a crucial role in promoting communication, fostering a shared understanding, and streamlining processes. Only a few existing systematic reviews have focused on modeling methods and standardized modeling languages. There remains a gap in consolidated knowledge regarding the use of diverse visual modeling languages. Objective: This scoping review aimed to compile visual modeling languages used to represent PPs, including the justifications and the context in which a modeling language was adopted, adapted, combined, or developed. Methods: After initial experimentation with the keywords used to describe the concepts of PPs and visual modeling languages, we developed a search strategy that was further refined and customized to the major databases identified as topically relevant. In addition, we consulted gray literature and conducted hand searches of the referenced articles. Two reviewers independently screened the articles in 2 stages using preset inclusion criteria, and a third reviewer voted on the discordance. Data charting was done using an iteratively developed form in the Covidence software. Descriptive and thematic summaries were presented following rounds of discussion to produce the final report. Results: Of 1838 articles retrieved after deduplication, 22 satisfied our inclusion criteria. Clinical pathway is the most used phrase to represent the PP concept, and most papers discussed the concept without providing their operational definition. We categorized the visual modeling languages into five categories: (1) general purpose--modeling language (GPML) adopted without major extension or modification, (2) GPML used with formal extension recommendations, (3) combination of 2 or more modeling languages, (4) a developed domain-specific modeling language (DSML), and (5) ontological modeling languages. The justifications for adopting, adapting, combining, and developing visual modeling languages varied accordingly and ranged from versatility, expressiveness, tool support, and extensibility of a language to domain needs, integration, and simplification. Conclusions: Various visual modeling languages were used in PP modeling, each with varying levels of abstraction and granularity. The categorization we made could aid in a better understanding of the complex combination of PP and modeling languages. Standardized GPMLs were used with or without any modifications. The rationale to propose any modification to GPMLs evolved as more evidence was presented following requirement analyses to support domain constructs. DSMLs are infrequently used due to their resource-intensive development, often initiated at a project level. The justifications provided and the context where DSMLs were created are paramount. Future studies should assess the merits and demerits of using a visual modeling language to facilitate PP communications among stakeholders and use evaluation frameworks to identify, modify, or develop them, depending on the scope and goal of the modeling need. ", doi="10.2196/55865", url="https://www.i-jmr.org/2024/1/e55865" } @Article{info:doi/10.2196/60226, author="Ming, Shuai and Yao, Xi and Guo, Xiaohong and Guo, Qingge and Xie, Kunpeng and Chen, Dandan and Lei, Bo", title="Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study", journal="J Med Internet Res", year="2024", month="Nov", day="14", volume="26", pages="e60226", keywords="artificial intelligence", keywords="chatbot", keywords="ChatGPT", keywords="ophthalmic registration", keywords="clinical diagnosis", keywords="AI", keywords="cross-sectional study", keywords="eye disease", keywords="eye disorder", keywords="ophthalmology", keywords="health care", keywords="outpatient registration", keywords="clinical", keywords="decision-making", keywords="generative AI", keywords="vision impairment", abstract="Background: Artificial intelligence (AI) chatbots such as ChatGPT are expected to impact vision health care significantly. Their potential to optimize the consultation process and diagnostic capabilities across range of ophthalmic subspecialties have yet to be fully explored. Objective: This study aims to investigate the performance of AI chatbots in recommending ophthalmic outpatient registration and diagnosing eye diseases within clinical case profiles. Methods: This cross-sectional study used clinical cases from Chinese Standardized Resident Training--Ophthalmology (2nd Edition). For each case, 2 profiles were created: patient with history (Hx) and patient with history and examination (Hx+Ex). These profiles served as independent queries for GPT-3.5 and GPT-4.0 (accessed from March 5 to 18, 2024). Similarly, 3 ophthalmic residents were posed the same profiles in a questionnaire format. The accuracy of recommending ophthalmic subspecialty registration was primarily evaluated using Hx profiles. The accuracy of the top-ranked diagnosis and the accuracy of the diagnosis within the top 3 suggestions (do-not-miss diagnosis) were assessed using Hx+Ex profiles. The gold standard for judgment was the published, official diagnosis. Characteristics of incorrect diagnoses by ChatGPT were also analyzed. Results: A total of 208 clinical profiles from 12 ophthalmic subspecialties were analyzed (104 Hx and 104 Hx+Ex profiles). For Hx profiles, GPT-3.5, GPT-4.0, and residents showed comparable accuracy in registration suggestions (66/104, 63.5\%; 81/104, 77.9\%; and 72/104, 69.2\%, respectively; P=.07), with ocular trauma, retinal diseases, and strabismus and amblyopia achieving the top 3 accuracies. For Hx+Ex profiles, both GPT-4.0 and residents demonstrated higher diagnostic accuracy than GPT-3.5 (62/104, 59.6\% and 63/104, 60.6\% vs 41/104, 39.4\%; P=.003 and P=.001, respectively). Accuracy for do-not-miss diagnoses also improved (79/104, 76\% and 68/104, 65.4\% vs 51/104, 49\%; P<.001 and P=.02, respectively). The highest diagnostic accuracies were observed in glaucoma; lens diseases; and eyelid, lacrimal, and orbital diseases. GPT-4.0 recorded fewer incorrect top-3 diagnoses (25/42, 60\% vs 53/63, 84\%; P=.005) and more partially correct diagnoses (21/42, 50\% vs 7/63 11\%; P<.001) than GPT-3.5, while GPT-3.5 had more completely incorrect (27/63, 43\% vs 7/42, 17\%; P=.005) and less precise diagnoses (22/63, 35\% vs 5/42, 12\%; P=.009). Conclusions: GPT-3.5 and GPT-4.0 showed intermediate performance in recommending ophthalmic subspecialties for registration. While GPT-3.5 underperformed, GPT-4.0 approached and numerically surpassed residents in differential diagnosis. AI chatbots show promise in facilitating ophthalmic patient registration. However, their integration into diagnostic decision-making requires more validation. ", doi="10.2196/60226", url="https://www.jmir.org/2024/1/e60226" } @Article{info:doi/10.2196/65994, author="Hong, Minseok and Kang, Ri-Ra and Yang, Hun Jeong and Rhee, Jin Sang and Lee, Hyunju and Kim, Yong-gyom and Lee, KangYoon and Kim, HongGi and Lee, Sang Yu and Youn, Tak and Kim, Hyun Se and Ahn, Min Yong", title="Comprehensive Symptom Prediction in Inpatients With Acute Psychiatric Disorders Using Wearable-Based Deep Learning Models: Development and Validation Study", journal="J Med Internet Res", year="2024", month="Nov", day="13", volume="26", pages="e65994", keywords="digital phenotype", keywords="mental health monitoring", keywords="smart hospital", keywords="clinical decision support system", keywords="multitask learning", keywords="wearable sensor", keywords="local validation", keywords="mental health facility", keywords="deep learning", abstract="Background: Assessing the complex and multifaceted symptoms of patients with acute psychiatric disorders proves to be significantly challenging for clinicians. Moreover, the staff in acute psychiatric wards face high work intensity and risk of burnout, yet research on the introduction of digital technologies in this field remains limited. The combination of continuous and objective wearable sensor data acquired from patients with deep learning techniques holds the potential to overcome the limitations of traditional psychiatric assessments and support clinical decision-making. Objective: This study aimed to develop and validate wearable-based deep learning models to comprehensively predict patient symptoms across various acute psychiatric wards in South Korea. Methods: Participants diagnosed with schizophrenia and mood disorders were recruited from 4 wards across 3 hospitals and prospectively observed using wrist-worn wearable devices during their admission period. Trained raters conducted periodic clinical assessments using the Brief Psychiatric Rating Scale, Hamilton Anxiety Rating Scale, Montgomery-Asberg Depression Rating Scale, and Young Mania Rating Scale. Wearable devices collected patients' heart rate, accelerometer, and location data. Deep learning models were developed to predict psychiatric symptoms using 2 distinct approaches: single symptoms individually (Single) and multiple symptoms simultaneously via multitask learning (Multi). These models further addressed 2 problems: within-subject relative changes (Deterioration) and between-subject absolute severity (Score). Four configurations were consequently developed for each scale: Single-Deterioration, Single-Score, Multi-Deterioration, and Multi-Score. Data of participants recruited before May 1, 2024, underwent cross-validation, and the resulting fine-tuned models were then externally validated using data from the remaining participants. Results: Of the 244 enrolled participants, 191 (78.3\%; 3954 person-days) were included in the final analysis after applying the exclusion criteria. The demographic and clinical characteristics of participants, as well as the distribution of sensor data, showed considerable variations across wards and hospitals. Data of 139 participants were used for cross-validation, while data of 52 participants were used for external validation. The Single-Deterioration and Multi-Deterioration models achieved similar overall accuracy values of 0.75 in cross-validation and 0.73 in external validation. The Single-Score and Multi-Score models attained overall R{\texttwosuperior} values of 0.78 and 0.83 in cross-validation and 0.66 and 0.74 in external validation, respectively, with the Multi-Score model demonstrating superior performance. Conclusions: Deep learning models based on wearable sensor data effectively classified symptom deterioration and predicted symptom severity in participants in acute psychiatric wards. Despite lower computational costs, Multi models demonstrated equivalent or superior performance to Single models, suggesting that multitask learning is a promising approach for comprehensive symptom prediction. However, significant variations were observed across wards, which present a key challenge for developing clinical decision support systems in acute psychiatric wards. Future studies may benefit from recurring local validation or federated learning to address generalizability issues. ", doi="10.2196/65994", url="https://www.jmir.org/2024/1/e65994" } @Article{info:doi/10.2196/50497, author="ten Klooster, Iris and Kip, Hanneke and Beyer, L. Sina and van Gemert-Pijnen, C. Lisette J. E. W. and Kelders, M. Saskia", title="Clarifying the Concepts of Personalization and Tailoring of eHealth Technologies: Multimethod Qualitative Study", journal="J Med Internet Res", year="2024", month="Nov", day="13", volume="26", pages="e50497", keywords="eHealth", keywords="personalization", keywords="tailoring", keywords="segmentation", keywords="adaptation", keywords="interviews", keywords="definition", abstract="Background: Although personalization and tailoring have been identified as alternatives to a ``one-size-fits-all'' approach for eHealth technologies, there is no common understanding of these two concepts and how they should be applied. Objective: This study aims to describe (1) how tailoring and personalization are defined in the literature and by eHealth experts, and what the differences and similarities are; (2) what type of variables can be used to segment eHealth users into more homogeneous groups or at the individual level; (3) what elements of eHealth technologies are adapted to these segments; and (4) how the segments are matched with eHealth adaptations. Methods: We used a multimethod qualitative study design. To gain insights into the definitions of personalization and tailoring, definitions were collected from the literature and through interviews with eHealth experts. In addition, the interviews included questions about how users can be segmented and how eHealth can be adapted accordingly, and responses to 3 vignettes of examples of eHealth technologies, varying in personalization and tailoring strategies to elicit responses about views from stakeholders on how the two components were applied and matched in different contexts. Results: A total of 28 unique definitions of tailoring and 16 unique definitions of personalization were collected from the literature and interviews. The definitions of tailoring and personalization varied in their components, namely adaptation, individuals, user groups, preferences, symptoms, characteristics, context, behavior, content, identification, feedback, channel, design, computerization, and outcomes. During the interviews, participants mentioned 9 types of variables that can be used to segment eHealth users, namely demographics, preferences, health variables, psychological variables, behavioral variables, individual determinants, environmental information, intervention interaction, and technology variables. In total, 5 elements were mentioned that can be adapted to those segments, namely channeling, content, graphical, functionalities, and behavior change strategy. Participants mentioned substantiation methods and variable levels as two components for matching the segmentations with adaptations. Conclusions: Tailoring and personalization are multidimensional concepts, and variability and technology affordances seem to determine whether and how personalization and tailoring should be applied to eHealth technologies. On the basis of our findings, tailoring and personalization can be differentiated by the way that segmentations and adaptations are matched. Tailoring matches segmentations and adaptations based on general group characteristics using if-then algorithms, whereas personalization involves the direct insertion of user information (such as name) or adaptations based on individual-level inferences. We argue that future research should focus on how inferences can be made at the individual level to further develop the field of personalized eHealth. ", doi="10.2196/50497", url="https://www.jmir.org/2024/1/e50497" } @Article{info:doi/10.2196/55667, author="McBride, Caroline and Hunter, Barbara and Lumsden, Natalie and Somasundaram, Kaleswari and McMorrow, Rita and Boyle, Douglas and Emery, Jon and Nelson, Craig and Manski-Nankervis, Jo-Anne", title="Clinical Acceptability of a Quality Improvement Program for Reducing Cardiovascular Disease Risk in People With Chronic Kidney Disease in Australian General Practice: Qualitative Study", journal="JMIR Hum Factors", year="2024", month="Nov", day="13", volume="11", pages="e55667", keywords="clinical decision support", keywords="general practice", keywords="GP", keywords="primary care", keywords="family medicine", keywords="general medicine", keywords="family physician", keywords="implementation science", keywords="chronic kidney disease", keywords="CKD", keywords="nephrology", keywords="nephrologist", keywords="chronic disease", keywords="cardiovascular risk", keywords="cardiology", keywords="quality improvement", keywords="EHR", keywords="electronic health record", keywords="clinical software", abstract="Background: Future Health Today (FHT) is a technology program that integrates with general practice clinical software to provide point of care (PoC) clinical decision support and a quality improvement dashboard. This qualitative study looks at the use of FHT in the context of cardiovascular disease risk in chronic kidney disease (CKD). Objective: This study aims to explore factors influencing clinical implementation of the FHT module focusing on cardiovascular risk in CKD, from the perspectives of participating general practitioner staff. Methods: Practices in Victoria were recruited to participate in a pragmatic cluster randomized controlled trial using FHT, of which 19 practices were randomly assigned to use FHT's cardiovascular risk in CKD program. A total of 13 semistructured interviews were undertaken with a nominated general practitioner (n=7) or practice nurse (n=6) from 10 participating practices. Interview questions focused on the clinical usefulness of the tool and its place in clinical workflows. Qualitative data were coded by 2 researchers and analyzed using framework analysis and Clinical Performance Feedback Intervention Theory. Results: All 13 interviewees had used the FHT PoC tool, and feedback was largely positive. Overall, clinicians described engaging with the tool as a ``prompt'' or ``reminder'' system. Themes reflected that the tool's goals and clinical content were aligned with clinician's existing priorities and knowledge, and the tool's design facilitated easy integration into existing workflows. The main barrier to implementation identified by 2 clinicians was notification fatigue. A total of 7 interviewees had used the FHT dashboard tool. The main barriers to use were its limited integration into clinical workflows, such that some participants did not know of its existence; clinicians' competing clinical priorities; and limited time to learn and use the tool. Conclusions: This study identified many facilitators for the successful use of the FHT PoC program, in the context of cardiovascular risk in CKD, and barriers to the use of the dashboard program. This work will be used to inform the wider implementation of FHT, as well as the development of future modules of FHT for other risk or disease states. Trial Registration: Australian New Zealand Clinical Trial Registry ACTRN12620000993998; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=380119\&is ", doi="10.2196/55667", url="https://humanfactors.jmir.org/2024/1/e55667" } @Article{info:doi/10.2196/59634, author="Parsons, Rex and Blythe, Robin and Cramb, Susanna and Abdel-Hafez, Ahmad and McPhail, Steven", title="An Electronic Medical Record--Based Prognostic Model for Inpatient Falls: Development and Internal-External Cross-Validation", journal="J Med Internet Res", year="2024", month="Nov", day="13", volume="26", pages="e59634", keywords="clinical prediction model", keywords="falls", keywords="patient safety", keywords="prognostic", keywords="electronic medical record", keywords="EMR", keywords="intervention", keywords="hospital", keywords="risk assessment", keywords="clinical decision", keywords="support system", keywords="in-hospital fall", keywords="survival model", keywords="inpatient falls", abstract="Background: Effective fall prevention interventions in hospitals require appropriate allocation of resources early in admission. To address this, fall risk prediction tools and models have been developed with the aim to provide fall prevention strategies to patients at high risk. However, fall risk assessment tools have typically been inaccurate for prediction, ineffective in prevention, and time-consuming to complete. Accurate, dynamic, individualized estimates of fall risk for admitted patients using routinely recorded data may assist in prioritizing fall prevention efforts. Objective: The objective of this study was to develop and validate an accurate and dynamic prognostic model for inpatient falls among a cohort of patients using routinely recorded electronic medical record data. Methods: We used routinely recorded data from 5 Australian hospitals to develop and internally-externally validate a prediction model for inpatient falls using a Cox proportional hazards model with time-varying covariates. The study cohort included patients admitted during 2018-2021 to any ward, with no age restriction. Predictors used in the model included admission-related administrative data, length of stay, and number of previous falls during the admission (updated every 12 hours up to 14 days after admission). Model calibration was assessed using Poisson regression and discrimination using the area under the time-dependent receiver operating characteristic curve. Results: There were 1,107,556 inpatient admissions, 6004 falls, and 5341 unique fallers. The area under the time-dependent receiver operating characteristic curve was 0.899 (95\% CI 0.88-0.91) at 24 hours after admission and declined throughout admission (eg, 0.765, 95\% CI 0.75-0.78 on the seventh day after admission). Site-dependent overestimation and underestimation of risk was observed on the calibration plots. Conclusions: Using a large dataset from multiple hospitals and robust methods to model development and validation, we developed a prognostic model for inpatient falls. It had high discrimination, suggesting the model has the potential for operationalization in clinical decision support for prioritizing inpatients for fall prevention. Performance was site dependent, and model recalibration may lead to improved performance. ", doi="10.2196/59634", url="https://www.jmir.org/2024/1/e59634" } @Article{info:doi/10.2196/58663, author="Carter, Michela and Linton, C. Samuel and Zeineddin, Suhail and Pitt, Benjamin J. and De Boer, Christopher and Figueroa, Angie and Gosain, Ankush and Lanning, David and Lesher, Aaron and Islam, Saleem and Sathya, Chethan and Holl, L. Jane and Ghomrawi, MK Hassan and Abdullah, Fizan", title="Impact of Consumer Wearables Data on Pediatric Surgery Clinicians' Management: Multi-Institutional Scenario-Based Usability Study", journal="JMIR Perioper Med", year="2024", month="Nov", day="12", volume="7", pages="e58663", keywords="postoperative care", keywords="telehealth", keywords="consultation", keywords="remote", keywords="appendectomy", keywords="pediatric hospital", keywords="children", keywords="wearable device", keywords="minimally invasive surgery", keywords="pediatric surgery", keywords="remote simulation study", abstract="Background: At present, parents lack objective methods to evaluate their child's postoperative recovery following discharge from the hospital. As a result, clinicians are dependent upon a parent's subjective assessment of the child's health status and the child's ability to communicate their symptoms. This subjective nature of home monitoring contributes to unnecessary emergency department (ED) use as well as delays in treatment. However, the integration of data remotely collected using a consumer wearable device has the potential to provide clinicians with objective metrics for postoperative patients to facilitate informed longitudinal, remote assessment. Objective: This multi-institutional study aimed to evaluate the impact of adding actual and simulated objective recovery data that were collected remotely using a consumer wearable device to simulated postoperative telephone encounters on clinicians' management. Methods: In total, 3 simulated telephone scenarios of patients after an appendectomy were presented to clinicians at 5 children's hospitals. Each scenario was then supplemented with wearable data concerning or reassuring against a postoperative complication. Clinicians rated their likelihood of ED referral before and after the addition of wearable data to evaluate if it changed their recommendation. Clinicians reported confidence in their decision-making. Results: In total, 34 clinicians participated. Compared with the scenario alone, the addition of reassuring wearable data resulted in a decreased likelihood of ED referral for all 3 scenarios (P<.01). When presented with concerning wearable data, there was an increased likelihood of ED referral for 1 of 3 scenarios (P=.72, P=.17, and P<.001). At the institutional level, there was no difference between the 5 institutions in how the wearable data changed the likelihood of ED referral for all 3 scenarios. With the addition of wearable data, 76\% (19/25) to 88\% (21/24 and 22/25) of clinicians reported increased confidence in their recommendations. Conclusions: The addition of wearable data to simulated telephone scenarios for postdischarge patients who underwent pediatric surgery impacted clinicians' remote patient management at 5 pediatric institutions and increased clinician confidence. Wearable devices are capable of providing real-time measures of recovery, which can be used as a postoperative monitoring tool to reduce delays in care and avoidable health care use. ", doi="10.2196/58663", url="https://periop.jmir.org/2024/1/e58663" } @Article{info:doi/10.2196/59556, author="Gutman, Barak and Shmilovitch, Amit-Haim and Aran, Dvir and Shelly, Shahar", title="Twenty-Five Years of AI in Neurology: The Journey of Predictive Medicine and Biological Breakthroughs", journal="JMIR Neurotech", year="2024", month="Nov", day="8", volume="3", pages="e59556", keywords="neurology", keywords="artificial intelligence", keywords="telemedicine", keywords="clinical advancements", keywords="mobile phone", doi="10.2196/59556", url="https://neuro.jmir.org/2024/1/e59556" } @Article{info:doi/10.2196/64674, author="Smith, N. Shawna and Lanham, M. Michael S. and Seagull, Jacob F. and Fabbri, Morris and Dorsch, P. Michael and Jennings, Kathleen and Barnes, Geoffrey", title="System-Wide, Electronic Health Record--Based Medication Alerts for Appropriate Prescribing of Direct Oral Anticoagulants: Pilot Randomized Controlled Trial", journal="JMIR Form Res", year="2024", month="Nov", day="8", volume="8", pages="e64674", keywords="direct oral anticoagulants", keywords="electronic health record", keywords="medication safety", keywords="prescribing errors", keywords="pilot randomized controlled trial", keywords="alert system optimization", keywords="clinical decision support", keywords="EHR", keywords="randomized controlled trial", keywords="RCT", keywords="oral anticoagulants", abstract="Background: While direct oral anticoagulants (DOACs) have improved oral anticoagulation management, inappropriate prescribing remains prevalent and leads to adverse drug events. Antithrombotic stewardship programs seek to enhance DOAC prescribing but require scalable and sustainable strategies. Objective: We present a pilot, prescriber-level randomized controlled trial to assess the effectiveness of electronic health record (EHR)--based medication alerts in a large health system. Methods: The pilot assessed prescriber responses to alerts for initial DOAC prescription errors (apixaban and rivaroxaban). A user-centered, multistage design process informed alert development, emphasizing clear indication, appropriate dosing based on renal function, and drug-drug interactions. Alerts appeared whenever a DOAC was being prescribed in a way that did not follow package label instructions. Clinician responses measured acceptability, accuracy, feasibility, and utilization of the alerts. Results: The study ran from August 1, 2022, through April 30, 2023. Only 1 prescriber requested trial exclusion, demonstrating acceptability. The error rate for false alerts due to incomplete data was 6.6\% (16/243). Two scenarios with alert design and/or execution errors occurred but were quickly identified and resolved, underlining the importance of a responsive quality assurance process in EHR-based interventions. Trial feasibility issues related to alert-data capture were identified and resolved. Trial feasibility was also assessed with balanced randomization of prescribers and the inclusion of various alerts across both medications. Assessing utilization, 34.2\% (83/243) of the encounters (with 134 prescribers) led to a prescription change. Conclusions: The pilot implementation study demonstrated the acceptability, accuracy, feasibility, and estimates of the utilization of EHR-based medication alerts for DOAC prescriptions and successfully established just-in-time randomization of prescribing clinicians. This pilot study sets the stage for large-scale, randomized implementation evaluations of EHR-based alerts to improve medication safety. Trial Registration: ClinicalTrials.gov NCT05351749; https://clinicaltrials.gov/study/NCT05351749 ", doi="10.2196/64674", url="https://formative.jmir.org/2024/1/e64674" } @Article{info:doi/10.2196/54022, author="Naicker, Sundresan and Tariq, Amina and Donovan, Raelene and Magon, Honor and White, Nicole and Simmons, Joshua and McPhail, M. Steven", title="Patterns and Perceptions of Standard Order Set Use Among Physicians Working Within a Multihospital System: Mixed Methods Study", journal="JMIR Form Res", year="2024", month="Nov", day="8", volume="8", pages="e54022", keywords="medical informatics", keywords="adoption and implementation", keywords="behavior", keywords="health systems", keywords="testing", keywords="electronic medical records", keywords="behavioral model", keywords="quantitative data", keywords="semistructured interview", keywords="clinical practice", keywords="user preference", keywords="user", keywords="user experience", abstract="Background: Electronic standard order sets automate the ordering of specific treatment, testing, and investigative protocols by physicians. These tools may help reduce unwarranted clinical variation and improve health care efficiency. Despite their routine implementation within electronic medical records (EMRs), little is understood about how they are used and what factors influence their adoption in practice. Objective: This study aims to (1) describe the patterns of use of standard order sets implemented in a widely used EMR (PowerPlans and Cerner Millennium) within a multihospital digital health care system; (2) explore the experiences and perceptions of implementers and users regarding the factors contributing to the use of these standard order sets; and (3) map these findings to the Capability, Opportunity, and Motivation Behavior (COM-B) model of behavior change to assist those planning to develop, improve, implement, and iterate the use of standard order sets in hospital settings. Methods: Quantitative data on standard order set usage were captured from 5 hospitals over 5-month intervals for 3 years (2019, 2020, and 2021). Qualitative data, comprising unstructured and semistructured interviews (n=15), were collected and analyzed using a reflexive thematic approach. Interview themes were then mapped to a theory-informed model of behavior change (COM-B) to identify determinants of standard order set usage in routine clinical practice. The COM-B model is an evidence-based, multicomponent framework that posits that human actions result from multiple contextual influences, which can be categorized across 3 dimensions: capability, opportunity, and motivation, all of which intersect. Results: The total count of standard order set usage across the health system during the 2019 observation period was 267,253, increasing to 293,950 in 2020 and 335,066 in 2021. There was a notable shift toward using specialty order sets that received upgrades during the study period. Four emergent themes related to order set use were derived from clinician interviews: (1) Knowledge and Skills; (2) Perceptions; (3) Technical Dependencies; and (4) Unintended Consequences, all of which were mapped to the COM-B model. Findings indicate a user preference for customized order sets that respond to local context and user experience. Conclusions: The study findings suggest that ongoing investment in the development and functionality of specialty order sets has the potential to enhance usage as these sets continue to be customized in response to local context and user experience. Sustained and continuous uptake of appropriate Computerized Provider Order Entry use may require implementation strategies that address the capability, opportunity, and motivational influencers of behavior. ", doi="10.2196/54022", url="https://formative.jmir.org/2024/1/e54022" } @Article{info:doi/10.2196/62641, author="Patel, Atushi and Maruthananth, Kevin and Matharu, Neha and Pinto, D. Andrew and Hosseini, Banafshe", title="Early Warning Systems for Acute Respiratory Infections: Scoping Review of Global Evidence", journal="JMIR Public Health Surveill", year="2024", month="Nov", day="7", volume="10", pages="e62641", keywords="early warning systems", keywords="acute respiratory infections", keywords="early detection systems", abstract="Background: Early warning systems (EWSs) are tools that integrate clinical observations to identify patterns indicating increased risks of clinical deterioration, thus facilitating timely and appropriate interventions. EWSs can mitigate the impact of global infectious diseases by enhancing information exchange, monitoring, and early detection. Objective: We aimed to evaluate the effectiveness of EWSs in acute respiratory infections (ARIs) through a scoping review of EWSs developed, described, and implemented for detecting novel, exotic, and re-emerging ARIs. Methods: We searched Ovid MEDLINE ALL, Embase, Cochrane Library (Wiley), and CINAHL (Ebsco). The search was conducted on October 03, 2023. Studies that implemented EWSs for the detection of acute respiratory illnesses were included. Covidence was used for citation management, and a modified Critical Appraisal Skills Programme (CASP) checklist was used for quality assessment. Results: From 5838 initial articles, 29 met the inclusion criteria for this review. Twelve studies evaluated the use of EWSs within community settings, ranging from rural community reporting networks to urban online participatory surveillance platforms. Five studies focused on EWSs that used data from hospitalization and emergency department visits. These systems leveraged clinical and admission data to effectively detect and manage local outbreaks of respiratory infections. Two studies focused on the effectiveness of existing surveillance systems, assessing their adaptability and responsiveness to emerging threats and how they could be improved based on past performance. Four studies highlighted the integration of machine learning models to improve the predictive accuracy of EWSs. Three studies explored the applications of national EWSs in different health care settings and emphasized their potential in predicting clinical deterioration and facilitating early intervention. Lastly, 3 studies addressed the use of surveillance systems in aged-care facilities, highlighting the unique challenges and needs of monitoring and responding to health threats in environments housing vulnerable populations. The CASP tool revealed that most studies were relevant, reliable, and of high value (score 6: 11/29, 38\%; score 5: 9/29, 31\%). The common limitations included result generalizability, selection bias, and small sample size for model validation. Conclusions: This scoping review confirms the critical role of EWSs in enhancing public health responses to respiratory infections. Although the effectiveness of these systems is evident, challenges related to generalizability and varying methodologies suggest a need for continued innovation and standardization in EWS development. ", doi="10.2196/62641", url="https://publichealth.jmir.org/2024/1/e62641" } @Article{info:doi/10.2196/58039, author="Jian, Ming-Jr and Lin, Tai-Han and Chung, Hsing-Yi and Chang, Chih-Kai and Perng, Cherng-Lih and Chang, Feng-Yee and Shang, Hung-Sheng", title="Pioneering Klebsiella Pneumoniae Antibiotic Resistance Prediction With Artificial Intelligence-Clinical Decision Support System--Enhanced Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry: Retrospective Study", journal="J Med Internet Res", year="2024", month="Nov", day="7", volume="26", pages="e58039", keywords="Klebsiella pneumoniae", keywords="multidrug resistance", keywords="AI-CDSS", keywords="quinolone", keywords="ciprofloxacin", keywords="levofloxacin", abstract="Background: The rising prevalence and swift spread of multidrug-resistant gram-negative bacteria (MDR-GNB), especially Klebsiella pneumoniae (KP), present a critical global health threat highlighted by the World Health Organization, with mortality rates soaring approximately 50\% with inappropriate antimicrobial treatment. Objective: This study aims to advance a novel strategy to develop an artificial intelligence-clinical decision support system (AI-CDSS) that combines machine learning (ML) with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), aiming to significantly improve the accuracy and speed of diagnosing antibiotic resistance, directly addressing the grave health risks posed by the widespread dissemination of pan drug-resistant gram-negative bacteria across numerous countries. Methods: A comprehensive dataset comprising 165,299 bacterial specimens and 11,996 KP isolates was meticulously analyzed using MALDI-TOF MS technology. Advanced ML algorithms were harnessed to sculpt predictive models that ascertain resistance to quintessential antibiotics, particularly levofloxacin and ciprofloxacin, by using the amassed spectral data. Results: Our ML models revealed remarkable proficiency in forecasting antibiotic resistance, with the random forest classifier emerging as particularly effective in predicting resistance to both levofloxacin and ciprofloxacin, achieving the highest area under the curve of 0.95. Performance metrics across different models, including accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1-score, were detailed, underlining the potential of these algorithms in aiding the development of precision treatment strategies. Conclusions: This investigation highlights the synergy between MALDI-TOF MS and ML as a beacon of hope against the escalating threat of antibiotic resistance. The advent of AI-CDSS heralds a new era in clinical diagnostics, promising a future in which rapid and accurate resistance prediction becomes a cornerstone in combating infectious diseases. Through this innovative approach, we answered the challenge posed by KP and other multidrug-resistant pathogens, marking a significant milestone in our journey toward global health security. ", doi="10.2196/58039", url="https://www.jmir.org/2024/1/e58039" } @Article{info:doi/10.2196/58413, author="Chung, young Wou and Yoon, Jinsik and Yoon, Dukyong and Kim, Songsoo and Kim, Yujeong and Park, Eun Ji and Kang, Ae Young", title="Development and Validation of Deep Learning--Based Infectivity Prediction in Pulmonary Tuberculosis Through Chest Radiography: Retrospective Study", journal="J Med Internet Res", year="2024", month="Nov", day="7", volume="26", pages="e58413", keywords="pulmonary tuberculosis", keywords="chest radiography", keywords="artificial intelligence", keywords="tuberculosis", keywords="TB", keywords="smear", keywords="smear test", keywords="culture test", keywords="diagnosis", keywords="treatment", keywords="deep learning", keywords="CXR", keywords="PTB", keywords="management", keywords="cost effective", keywords="asymptomatic infection", keywords="diagnostic tools", keywords="infectivity", keywords="AI tool", keywords="cohort", abstract="Background: Pulmonary tuberculosis (PTB) poses a global health challenge owing to the time-intensive nature of traditional diagnostic tests such as smear and culture tests, which can require hours to weeks to yield results. Objective: This study aimed to use artificial intelligence (AI)--based chest radiography (CXR) to evaluate the infectivity of patients with PTB more quickly and accurately compared with traditional methods such as smear and culture tests. Methods: We used DenseNet121 and visualization techniques such as gradient-weighted class activation mapping and local interpretable model-agnostic explanations to demonstrate the decision-making process of the model. We analyzed 36,142 CXR images of 4492 patients with PTB obtained from Severance Hospital, focusing specifically on the lung region through segmentation and cropping with TransUNet. We used data from 2004 to 2020 to train the model, data from 2021 for testing, and data from 2022 to 2023 for internal validation. In addition, we used 1978 CXR images of 299 patients with PTB obtained from Yongin Severance Hospital for external validation. Results: In the internal validation, the model achieved an accuracy of 73.27\%, an area under the receiver operating characteristic curve of 0.79, and an area under the precision-recall curve of 0.77. In the external validation, it exhibited an accuracy of 70.29\%, an area under the receiver operating characteristic curve of 0.77, and an area under the precision-recall curve of 0.8. In addition, gradient-weighted class activation mapping and local interpretable model-agnostic explanations provided insights into the decision-making process of the AI model. Conclusions: This proposed AI tool offers a rapid and accurate alternative for evaluating PTB infectivity through CXR, with significant implications for enhancing screening efficiency by evaluating infectivity before sputum test results in clinical settings, compared with traditional smear and culture tests. ", doi="10.2196/58413", url="https://www.jmir.org/2024/1/e58413" } @Article{info:doi/10.2196/58276, author="Sommers, W. Stuart and Tolle, J. Heather and Trinkley, E. Katy and Johnston, G. Christine and Dietsche, L. Caitlin and Eldred, V. Stephanie and Wick, T. Abraham and Hoppe, A. Jason", title="Clinical Decision Support to Increase Emergency Department Naloxone Coprescribing: Implementation Report", journal="JMIR Med Inform", year="2024", month="Nov", day="6", volume="12", pages="e58276", keywords="clinical decision support systems", keywords="order sets", keywords="drug monitoring", keywords="opioid analgesic", keywords="opioid use", keywords="opioid prescribing", keywords="drug overdose", keywords="opioid overdose", keywords="naloxone", keywords="naloxone coprescribing", keywords="harm reduction", keywords="harm minimization", abstract="Background: Coprescribing naloxone with opioid analgesics is a Centers for Disease Control and Prevention (CDC) best practice to mitigate the risk of fatal opioid overdose, yet coprescription by emergency medicine clinicians is rare, occurring less than 5\% of the time it is indicated. Clinical decision support (CDS) has been associated with increased naloxone prescribing; however, key CDS design characteristics and pragmatic outcome measures necessary to understand replicability and effectiveness have not been reported. Objective: This study aimed to rigorously evaluate and quantify the impact of CDS designed to improve emergency department (ED) naloxone coprescribing. We hypothesized CDS would increase naloxone coprescribing and the number of naloxone prescriptions filled by patients discharged from EDs in a large health care system. Methods: Following user-centered design principles, we designed and implemented a fully automated, interruptive, electronic health record--based CDS to nudge clinicians to coprescribe naloxone with high-risk opioid prescriptions. ``High-risk'' opioid prescriptions were defined as any opioid analgesic prescription ?90 total morphine milligram equivalents per day or for patients with a prior diagnosis of opioid use disorder or opioid overdose. The Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework was used to evaluate pragmatic CDS outcomes of reach, effectiveness, adoption, implementation, and maintenance. Effectiveness was the primary outcome of interest and was assessed by (1) constructing a Bayesian structural time-series model of the number of ED visits with naloxone coprescriptions before and after CDS implementation and (2) calculating the percentage of naloxone prescriptions associated with CDS that were filled at an outpatient pharmacy. Mann-Kendall tests were used to evaluate longitudinal trends in CDS adoption. All outcomes were analyzed in R (version 4.2.2; R Core Team). Implementation (Results): Between November 2019 and July 2023, there were 1,994,994 ED visits. CDS reached clinicians in 0.83\% (16,566/1,994,994) of all visits and 15.99\% (16,566/103,606) of ED visits where an opioid was prescribed at discharge. Clinicians adopted CDS, coprescribing naloxone in 34.36\% (6613/19,246) of alerts. CDS was effective, increasing naloxone coprescribing from baseline by 18.1 (95\% CI 17.9?18.3) coprescriptions per week or 2,327\% (95\% CI 3390?3490). Patients filled 43.80\% (1989/4541) of naloxone coprescriptions. The CDS was implemented simultaneously at every ED and no adaptations were made to CDS postimplementation. CDS was maintained beyond the study period and maintained its effect, with adoption increasing over time ($\tau$=0.454; P<.001). Conclusions: Our findings advance the evidence that electronic health record--based CDS increases the number of naloxone coprescriptions and improves the distribution of naloxone. Our time series analysis controls for secular trends and strongly suggests that minimally interruptive CDS significantly improves process outcomes. ", doi="10.2196/58276", url="https://medinform.jmir.org/2024/1/e58276" } @Article{info:doi/10.2196/58068, author="Gorban, Carla and McKenna, Sarah and Chong, K. Min and Capon, William and Battisti, Robert and Crowley, Alison and Whitwell, Bradley and Ottavio, Antonia and Scott, M. Elizabeth and Hickie, B. Ian and Iorfino, Frank", title="Building Mutually Beneficial Collaborations Between Digital Navigators, Mental Health Professionals, and Clients: Naturalistic Observational Case Study", journal="JMIR Ment Health", year="2024", month="Nov", day="6", volume="11", pages="e58068", keywords="digital navigator", keywords="digital coach", keywords="clinical technology specialist", keywords="mental health services", keywords="shared decision-making", keywords="lived experience", keywords="implementation", keywords="poor engagement", keywords="decision-making", keywords="mental health", keywords="digital mental health", keywords="digital mental health technology", doi="10.2196/58068", url="https://mental.jmir.org/2024/1/e58068" } @Article{info:doi/10.2196/64406, author="Chow, L. James C. and Li, Kay", title="Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models", journal="JMIR Bioinform Biotech", year="2024", month="Nov", day="6", volume="5", pages="e64406", keywords="artificial intelligence", keywords="humanistic AI", keywords="ethical AI", keywords="human-centered AI", keywords="machine learning", keywords="large language models", keywords="natural language processing", keywords="oncology chatbot", keywords="transformer-based model", keywords="ChatGPT", keywords="health care", doi="10.2196/64406", url="https://bioinform.jmir.org/2024/1/e64406", url="http://www.ncbi.nlm.nih.gov/pubmed/39321336" } @Article{info:doi/10.2196/58776, author="Kim, Heon Ho and Jeong, Chan Won and Pi, Kyungran and Lee, Soeun Angela and Kim, Soo Min and Kim, Jin Hye and Kim, Hong Jae", title="A Deep Learning Model to Predict Breast Implant Texture Types Using Ultrasonography Images: Feasibility Development Study", journal="JMIR Form Res", year="2024", month="Nov", day="5", volume="8", pages="e58776", keywords="breast implants", keywords="mammoplasty", keywords="ultrasonography: AI-assisted diagnosis", keywords="cshell surface topography", keywords="artificial intelligence", keywords="deep learning", keywords="machine learning", abstract="Background: Breast implants, including textured variants, have been widely used in aesthetic and reconstructive mammoplasty. However, the textured type, which is one of the shell texture types of breast implants, has been identified as a possible etiologic factor for lymphoma, specifically breast implant--associated anaplastic large cell lymphoma (BIA-ALCL). Identifying the shell texture type of the implant is critical to diagnosing BIA-ALCL. However, distinguishing the shell texture type can be difficult due to the loss of human memory and medical history. An alternative approach is to use ultrasonography, but this method also has limitations in quantitative assessment. Objective: This study aims to determine the feasibility of using a deep learning model to classify the shell texture type of breast implants and make robust predictions from ultrasonography images from heterogeneous sources. Methods: A total of 19,502 breast implant images were retrospectively collected from heterogeneous sources, including images captured from both Canon and GE devices, images of ruptured implants, and images without implants, as well as publicly available images. The Canon images were trained using ResNet-50. The model's performance on the Canon dataset was evaluated using stratified 5-fold cross-validation. Additionally, external validation was conducted using the GE and publicly available datasets. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (PRAUC) were calculated based on the contribution of the pixels with Gradient-weighted Class Activation Mapping (Grad-CAM). To identify the significant pixels for classification, we masked the pixels that contributed less than 10\%, up to a maximum of 100\%. To assess the model's robustness to uncertainty, Shannon entropy was calculated for 4 image groups: Canon, GE, ruptured implants, and without implants. Results: The deep learning model achieved an average AUROC of 0.98 and a PRAUC of 0.88 in the Canon dataset. The model achieved an AUROC of 0.985 and a PRAUC of 0.748 for images captured with GE devices. Additionally, the model predicted an AUROC of 0.909 and a PRAUC of 0.958 for the publicly available dataset. This model maintained the PRAUC values for quantitative validation when masking up to 90\% of the least-contributing pixels and the remnant pixels in breast shell layers. Furthermore, the prediction uncertainty increased in the following order: Canon (0.066), GE (0072), ruptured implants (0.371), and no implants (0.777). Conclusions: We have demonstrated the feasibility of using deep learning to predict the shell texture type of breast implants. This approach quantifies the shell texture types of breast implants, supporting the first step in the diagnosis of BIA-ALCL. ", doi="10.2196/58776", url="https://formative.jmir.org/2024/1/e58776" } @Article{info:doi/10.2196/52794, author="Hwang, Ha Seung and Lee, Hayeon and Lee, Hyuk Jun and Lee, Myeongcheol and Koyanagi, Ai and Smith, Lee and Rhee, Youl Sang and Yon, Keon Dong and Lee, Jinseok", title="Machine Learning--Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan", journal="J Med Internet Res", year="2024", month="Nov", day="5", volume="26", pages="e52794", keywords="machine learning", keywords="hypertension", keywords="cardiovascular disease", keywords="artificial intelligence", keywords="cause of death", keywords="cardiovascular risk", keywords="predictive analytics", abstract="Background: Worldwide, cardiovascular diseases are the primary cause of death, with hypertension as a key contributor. In 2019, cardiovascular diseases led to 17.9 million deaths, predicted to reach 23 million by 2030. Objective: This study presents a new method to predict hypertension using demographic data, using 6 machine learning models for enhanced reliability and applicability. The goal is to harness artificial intelligence for early and accurate hypertension diagnosis across diverse populations. Methods: Data from 2 national cohort studies, National Health Insurance Service-National Sample Cohort (South Korea, n=244,814), conducted between 2002 and 2013 were used to train and test machine learning models designed to anticipate incident hypertension within 5 years of a health checkup involving those aged ?20 years, and Japanese Medical Data Center cohort (Japan, n=1,296,649) were used for extra validation. An ensemble from 6 diverse machine learning models was used to identify the 5 most salient features contributing to hypertension by presenting a feature importance analysis to confirm the contribution of each future. Results: The Adaptive Boosting and logistic regression ensemble showed superior balanced accuracy (0.812, sensitivity 0.806, specificity 0.818, and area under the receiver operating characteristic curve 0.901). The 5 key hypertension indicators were age, diastolic blood pressure, BMI, systolic blood pressure, and fasting blood glucose. The Japanese Medical Data Center cohort dataset (extra validation set) corroborated these findings (balanced accuracy 0.741 and area under the receiver operating characteristic curve 0.824). The ensemble model was integrated into a public web portal for predicting hypertension onset based on health checkup data. Conclusions: Comparative evaluation of our machine learning models against classical statistical models across 2 distinct studies emphasized the former's enhanced stability, generalizability, and reproducibility in predicting hypertension onset. ", doi="10.2196/52794", url="https://www.jmir.org/2024/1/e52794" } @Article{info:doi/10.2196/55614, author="Sullivan, Sean Patrick and Mera-Giler, M. Robertino and Bush, Staci and Shvachko, Valentina and Sarkodie, Eleanor and O'Farrell, Daniel and Dubose, Stephanie and Magnuson, David", title="Claims-Based Algorithm to Identify Pre-Exposure Prophylaxis Indications for Tenofovir Disoproxil Fumarate and Emtricitabine Prescriptions (2012-2014): Validation Study", journal="JMIR Form Res", year="2024", month="Nov", day="4", volume="8", pages="e55614", keywords="pre-exposure prophylaxis", keywords="PrEP", keywords="classification", keywords="electronic medical record", keywords="EMR", keywords="algorithm", keywords="electronic health record", keywords="EHR", keywords="drug", keywords="pharmacology", keywords="pharmacotherapy", keywords="pharmaceutical", keywords="medication", keywords="monotherapy", keywords="HIV", keywords="prevention", abstract="Background: To monitor the use of tenofovir disoproxil fumarate and emtricitabine (TDF/FTC) and related medicines for pre-exposure prophylaxis (PrEP) as HIV prevention using commercial pharmacy data, it is necessary to determine whether TDF/FTC prescriptions are used for PrEP or for some other clinical indication. Objective: This study aimed to validate an algorithm to distinguish the use of TDF/FTC for HIV prevention or infectious disease treatment. Methods: An algorithm was developed to identify whether TDF/FTC prescriptions were for PrEP or for other indications from large-scale administrative databases. The algorithm identifies TDF/FTC prescriptions and then excludes patients with International Classification of Diseases (ICD)--9 diagnostic codes, medications, or procedures that suggest indications other than for PrEP (eg, documentation of HIV infection, chronic hepatitis B, or use of TDF/FTC for postexposure prophylaxis). For evaluation, we collected data by clinician assessment of medical records for patients with TDF/FTC prescriptions and compared the assessed indication identified by the clinician review with the assessed indication identified by the algorithm. The algorithm was then applied and evaluated in a large, urban, community-based sexual health clinic. Results: The PrEP algorithm demonstrated high sensitivity and moderate specificity (99.6\% and 49.6\%) in the electronic medical record database and high sensitivity and specificity (99\% and 87\%) in data from the urban community health clinic. Conclusions: The PrEP algorithm classified the indication for PrEP in most patients treated with TDF/FTC with sufficient accuracy to be useful for surveillance purposes. The methods described can serve as a basis for developing a robust and evolving case definition for antiretroviral prescriptions for HIV prevention purposes. ", doi="10.2196/55614", url="https://formative.jmir.org/2024/1/e55614", url="http://www.ncbi.nlm.nih.gov/pubmed/39141024" } @Article{info:doi/10.2196/55148, author="Brehmer, Alexander and Sauer, Martin Christopher and Salazar Rodr{\'i}guez, Jayson and Herrmann, Kelsey and Kim, Moon and Keyl, Julius and Bahnsen, Hendrik Fin and Frank, Benedikt and K{\"o}hrmann, Martin and Rassaf, Tienush and Mahabadi, Amir-Abbas and Hadaschik, Boris and Darr, Christopher and Herrmann, Ken and Tan, Susanne and Buer, Jan and Brenner, Thorsten and Reinhardt, Christian Hans and Nensa, Felix and Gertz, Michael and Egger, Jan and Kleesiek, Jens", title="Establishing Medical Intelligence---Leveraging Fast Healthcare Interoperability Resources to Improve Clinical Management: Retrospective Cohort and Clinical Implementation Study", journal="J Med Internet Res", year="2024", month="Oct", day="31", volume="26", pages="e55148", keywords="clinical informatics", keywords="FHIR", keywords="real-world evidence", keywords="medical intelligence", keywords="interoperability", keywords="data exchange", keywords="clinical management", keywords="clinical decision-making", keywords="electronic health records", keywords="quality of care", keywords="quality improvement", abstract="Background: FHIR (Fast Healthcare Interoperability Resources) has been proposed to enable health data interoperability. So far, its applicability has been demonstrated for selected research projects with limited data. Objective: This study aimed to design and implement a conceptual medical intelligence framework to leverage real-world care data for clinical decision-making. Methods: A Python package for the use of multimodal FHIR data (FHIRPACK [FHIR Python Analysis Conversion Kit]) was developed and pioneered in 5 real-world clinical use cases, that is, myocardial infarction, stroke, diabetes, sepsis, and prostate cancer. Patients were identified based on the ICD-10 (International Classification of Diseases, Tenth Revision) codes, and outcomes were derived from laboratory tests, prescriptions, procedures, and diagnostic reports. Results were provided as browser-based dashboards. Results: For 2022, a total of 1,302,988 patient encounters were analyzed. (1) Myocardial infarction: in 72.7\% (261/359) of cases, medication regimens fulfilled guideline recommendations. (2) Stroke: out of 1277 patients, 165 received thrombolysis and 108 thrombectomy. (3) Diabetes: in 443,866 serum glucose and 16,180 glycated hemoglobin A1c measurements from 35,494 unique patients, the prevalence of dysglycemic findings was 39\% (13,887/35,494). Among those with dysglycemia, diagnosis was coded in 44.2\% (6138/13,887) of the patients. (4) Sepsis: In 1803 patients, Staphylococcus epidermidis was the primarily isolated pathogen (773/2672, 28.9\%) and piperacillin and tazobactam was the primarily prescribed antibiotic (593/1593, 37.2\%). (5) PC: out of 54, three patients who received radical prostatectomy were identified as cases with prostate-specific antigen persistence or biochemical recurrence. Conclusions: Leveraging FHIR data through large-scale analytics can enhance health care quality and improve patient outcomes across 5 clinical specialties. We identified (1) patients with sepsis requiring less broad antibiotic therapy, (2) patients with myocardial infarction who could benefit from statin and antiplatelet therapy, (3) patients who had a stroke with longer than recommended times to intervention, (4) patients with hyperglycemia who could benefit from specialist referral, and (5) patients with PC with early increases in cancer markers. ", doi="10.2196/55148", url="https://www.jmir.org/2024/1/e55148", url="http://www.ncbi.nlm.nih.gov/pubmed/39240144" } @Article{info:doi/10.2196/55766, author="Ayorinde, Abimbola and Mensah, Opoku Daniel and Walsh, Julia and Ghosh, Iman and Ibrahim, Aishah Siti and Hogg, Jeffry and Peek, Niels and Griffiths, Frances", title="Health Care Professionals' Experience of Using AI: Systematic Review With Narrative Synthesis", journal="J Med Internet Res", year="2024", month="Oct", day="30", volume="26", pages="e55766", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="CDSS", keywords="decision-making", keywords="quality assessment", keywords="clinician experience", keywords="health care professionals", keywords="health care delivery", abstract="Background: There has been a substantial increase in the development of artificial intelligence (AI) tools for clinical decision support. Historically, these were mostly knowledge-based systems, but recent advances include non--knowledge-based systems using some form of machine learning. The ability of health care professionals to trust technology and understand how it benefits patients or improves care delivery is known to be important for their adoption of that technology. For non--knowledge-based AI tools for clinical decision support, these issues are poorly understood. Objective: The aim of this study is to qualitatively synthesize evidence on the experiences of health care professionals in routinely using non--knowledge-based AI tools to support their clinical decision-making. Methods: In June 2023, we searched 4 electronic databases, MEDLINE, Embase, CINAHL, and Web of Science, with no language or date limit. We also contacted relevant experts and searched reference lists of the included studies. We included studies of any design that reported the experiences of health care professionals using non--knowledge-based systems for clinical decision support in their work settings. We completed double independent quality assessment for all included studies using the Mixed Methods Appraisal Tool. We used a theoretically informed thematic approach to synthesize the findings. Results: After screening 7552 titles and 182 full-text articles, we included 25 studies conducted in 9 different countries. Most of the included studies were qualitative (n=13), and the remaining were quantitative (n=9) and mixed methods (n=3). Overall, we identified 7 themes: health care professionals' understanding of AI applications, level of trust and confidence in AI tools, judging the value added by AI, data availability and limitations of AI, time and competing priorities, concern about governance, and collaboration to facilitate the implementation and use of AI. The most frequently occurring are the first 3 themes. For example, many studies reported that health care professionals were concerned about not understanding the AI outputs or the rationale behind them. There were issues with confidence in the accuracy of the AI applications and their recommendations. Some health care professionals believed that AI provided added value and improved decision-making, and some reported that it only served as a confirmation of their clinical judgment, while others did not find it useful at all. Conclusions: Our review identified several important issues documented in various studies on health care professionals' use of AI tools in real-world health care settings. Opinions of health care professionals regarding the added value of AI tools for supporting clinical decision-making varied widely, and many professionals had concerns about their understanding of and trust in this technology. The findings of this review emphasize the need for concerted efforts to optimize the integration of AI tools in real-world health care settings. Trial Registration: PROSPERO CRD42022336359; https://tinyurl.com/2yunvkmb ", doi="10.2196/55766", url="https://www.jmir.org/2024/1/e55766" } @Article{info:doi/10.2196/51711, author="Yang, Lingrui and Pang, Jiali and Zuo, Song and Xu, Jian and Jin, Wei and Zuo, Feng and Xue, Kui and Xiao, Zhongzhou and Peng, Xinwei and Xu, Jie and Zhang, Xiaofan and Chen, Ruiyao and Luo, Shuqing and Zhang, Shaoting and Sun, Xin", title="Evolution of the ``Internet Plus Health Care'' Mode Enabled by Artificial Intelligence: Development and Application of an Outpatient Triage System", journal="J Med Internet Res", year="2024", month="Oct", day="30", volume="26", pages="e51711", keywords="artificial intelligence", keywords="triage system", keywords="all department recommendation", keywords="subspecialty department recommendation", keywords="``internet plus healthcare''", keywords="``internet plus health care''", abstract="Background: Although new technologies have increased the efficiency and convenience of medical care, patients still struggle to identify specialized outpatient departments in Chinese tertiary hospitals due to a lack of medical knowledge. Objective: The objective of our study was to develop a precise and subdividable outpatient triage system to improve the experiences and convenience of patient care. Methods: We collected 395,790 electronic medical records (EMRs) and 500 medical dialogue groups. The EMRs were divided into 3 data sets to design and train the triage model (n=387,876, 98\%) and test (n=3957, 1\%) and validate (n=3957, 1\%) it. The triage system was altered based on the current BERT (Bidirectional Encoder Representations from Transformers) framework and evaluated by recommendation accuracies in Xinhua Hospital using the cancellation rates in 2021 and 2022, from October 29 to December 5. Finally, a prospective observational study containing 306 samples was conducted to compare the system's performance with that of triage nurses, which was evaluated by calculating precision, accuracy, recall of the top 3 recommended departments (recall@3), and time consumption. Results: With 3957 (1\%) records each, the testing and validation data sets achieved an accuracy of 0.8945 and 0.8941, respectively. Implemented in Xinhua Hospital, our triage system could accurately recommend 79 subspecialty departments and reduce the number of registration cancellations from 16,037 (3.83\%) of the total 418,714 to 15,338 (3.53\%) of the total 434200 (P<.05). In comparison to the triage system, the performance of the triage nurses was more accurate (0.9803 vs 0.9153) and precise (0.9213 vs 0.9049) since the system could identify subspecialty departments, whereas triage nurses or even general physicians can only recommend main departments. In addition, our triage system significantly outperformed triage nurses in recall@3 (0.6230 vs 0.5266; P<.001) and time consumption (10.11 vs 14.33 seconds; P<.001). Conclusions: The triage system demonstrates high accuracy in outpatient triage of all departments and excels in subspecialty department recommendations, which could decrease the cancellation rate and time consumption. It also improves the efficiency and convenience of clinical care to fulfill better the usage of medical resources, expand hospital effectiveness, and improve patient satisfaction in Chinese tertiary hospitals. ", doi="10.2196/51711", url="https://www.jmir.org/2024/1/e51711" } @Article{info:doi/10.2196/53636, author="Bardhan, Jayetri and Roberts, Kirk and Wang, Zhe Daisy", title="Question Answering for Electronic Health Records: Scoping Review of Datasets and Models", journal="J Med Internet Res", year="2024", month="Oct", day="30", volume="26", pages="e53636", keywords="medical question answering", keywords="electronic health record", keywords="EHR", keywords="electronic medical records", keywords="EMR", keywords="relational database", keywords="knowledge graph", abstract="Background: Question answering (QA) systems for patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Substantial amounts of patient data are stored in electronic health records (EHRs), making EHR QA an important research area. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that use medical websites or scientific papers to retrieve answers, making it critical to research EHR QA. Objective: This study aims to provide a methodological review of existing works on QA for EHRs. The objectives of this study were to identify the existing EHR QA datasets and analyze them, study the state-of-the-art methodologies used in this task, compare the different evaluation metrics used by these state-of-the-art models, and finally elicit the various challenges and the ongoing issues in EHR QA. Methods: We searched for articles from January 1, 2005, to September 30, 2023, in 4 digital sources, including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed, to collect relevant publications on EHR QA. Our systematic screening process followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A total of 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained 47 papers for further study. The selected studies were then classified into 2 non--mutually exclusive categories depending on their scope: ``EHR QA datasets'' and ``EHR QA models.'' Results: A systematic screening process obtained 47 papers on EHR QA for final review. Out of the 47 papers, 53\% (n=25) were about EHR QA datasets, and 79\% (n=37) papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. In addition, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. We have classified the EHR QA datasets based on their modality, and we have inferred that Medical Information Mart for Intensive Care (MIMIC-III) and the National Natural Language Processing Clinical Challenges datasets (ie, n2c2 datasets) are the most popular EHR databases and corpuses used in EHR QA. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models. Conclusions: EHR QA research faces multiple challenges, such as the limited availability of clinical annotations, concept normalization in EHR QA, and challenges faced in generating realistic EHR QA datasets. There are still many gaps in research that motivate further work. This study will assist future researchers in focusing on areas of EHR QA that have possible future research directions. ", doi="10.2196/53636", url="https://www.jmir.org/2024/1/e53636" } @Article{info:doi/10.2196/54839, author="Wernli, Boris and Verloo, Henk and von Gunten, Armin and Pereira, Filipa", title="Using Existing Clinical Data to Measure Older Adult Inpatients' Frailty at Admission and Discharge: Hospital Patient Register Study", journal="JMIR Aging", year="2024", month="Oct", day="28", volume="7", pages="e54839", keywords="frailty", keywords="frailty assessment", keywords="electronic patient records", keywords="functional independence measure", keywords="routinely collected data", keywords="hospital register", keywords="patient records", keywords="medical records", keywords="clinical data", keywords="older adults", keywords="cluster analysis", keywords="hierarchical clustering", abstract="Background: Frailty is a widespread geriatric syndrome among older adults, including hospitalized older inpatients. Some countries use electronic frailty measurement tools to identify frailty at the primary care level, but this method has rarely been investigated during hospitalization in acute care hospitals. An electronic frailty measurement instrument based on population-based hospital electronic health records could effectively detect frailty, frailty-related problems, and complications as well be a clinical alert. Identifying frailty among older adults using existing patient health data would greatly aid the management and support of frailty identification and could provide a valuable public health instrument without additional costs. Objective: We aim to explore a data-driven frailty measurement instrument for older adult inpatients using data routinely collected at hospital admission and discharge. Methods: A retrospective electronic patient register study included inpatients aged ?65 years admitted to and discharged from a public hospital between 2015 and 2017. A dataset of 53,690 hospitalizations was used to customize this data-driven frailty measurement instrument inspired by the Edmonton Frailty Scale developed by Rolfson et al. A 2-step hierarchical cluster procedure was applied to compute e-Frail-CH (Switzerland) scores at hospital admission and discharge. Prevalence, central tendency, comparative, and validation statistics were computed. Results: Mean patient age at admission was 78.4 (SD 7.9) years, with more women admitted (28,018/53,690, 52.18\%) than men (25,672/53,690, 47.81\%). Our 2-step hierarchical clustering approach computed 46,743 inputs of hospital admissions and 47,361 for discharges. Clustering solutions scored from 0.5 to 0.8 on a scale from 0 to 1. Patients considered frail comprised 42.02\% (n=19,643) of admissions and 48.23\% (n=22,845) of discharges. Within e-Frail-CH's 0-12 range, a score ?6 indicated frailty. We found a statistically significant mean e-Frail-CH score change between hospital admission (5.3, SD 2.6) and discharge (5.75, SD 2.7; P<.001). Sensitivity and specificity cut point values were 0.82 and 0.88, respectively. The area under the receiver operating characteristic curve was 0.85. Comparing the e-Frail-CH instrument to the existing Functional Independence Measure (FIM) instrument, FIM scores indicating severe dependence equated to e-Frail-CH scores of ?9, with a sensitivity and specificity of 0.97 and 0.88, respectively. The area under the receiver operating characteristic curve was 0.92. There was a strong negative association between e-Frail-CH scores at hospital discharge and FIM scores (rs=--0.844; P<.001). Conclusions: An electronic frailty measurement instrument was constructed and validated using patient data routinely collected during hospitalization, especially at admission and discharge. The mean e-Frail-CH score was higher at discharge than at admission. The routine calculation of e-Frail-CH scores during hospitalization could provide very useful clinical alerts on the health trajectories of older adults and help select interventions for preventing or mitigating frailty. ", doi="10.2196/54839", url="https://aging.jmir.org/2024/1/e54839" } @Article{info:doi/10.2196/59906, author="Dritsakis, Giorgos and Gallos, Ioannis and Psomiadi, Maria-Elisavet and Amditis, Angelos and Dionysiou, Dimitra", title="Data Analytics to Support Policy Making for Noncommunicable Diseases: Scoping Review", journal="Online J Public Health Inform", year="2024", month="Oct", day="25", volume="16", pages="e59906", keywords="policy making", keywords="public health", keywords="noncommunicable diseases", keywords="data analytics", keywords="digital tools", keywords="descriptive", keywords="predictive", keywords="decision support", keywords="implementation", abstract="Background: There is an emerging need for evidence-based approaches harnessing large amounts of health care data and novel technologies (such as artificial intelligence) to optimize public health policy making. Objective: The aim of this review was to explore the data analytics tools designed specifically for policy making in noncommunicable diseases (NCDs) and their implementation. Methods: A scoping review was conducted after searching the PubMed and IEEE databases for articles published in the last 10 years. Results: Nine articles that presented 7 data analytics tools designed to inform policy making for NCDs were reviewed. The tools incorporated descriptive and predictive analytics. Some tools were designed to include recommendations for decision support, but no pilot studies applying prescriptive analytics have been published. The tools were piloted with various conditions, with cancer being the least studied condition. Implementation of the tools included use cases, pilots, or evaluation workshops that involved policy makers. However, our findings demonstrate very limited real-world use of analytics by policy makers, which is in line with previous studies. Conclusions: Despite the availability of tools designed for different purposes and conditions, data analytics is not widely used to support policy making for NCDs. However, the review demonstrates the value and potential use of data analytics to support policy making. Based on the findings, we make suggestions for researchers developing digital tools to support public health policy making. The findings will also serve as input for the European Union--funded research project ONCODIR developing a policy analytics dashboard for the prevention of colorectal cancer as part of an integrated platform. ", doi="10.2196/59906", url="https://ojphi.jmir.org/2024/1/e59906", url="http://www.ncbi.nlm.nih.gov/pubmed/39454197" } @Article{info:doi/10.2196/63456, author="Ashimwe, Aimerence and Davoody, Nadia", title="Exploring Health Care Professionals' Perspectives on the Use of a Medication and Care Support System and Recommendations for Designing a Similar Tool for Family Caregivers: Interview Study Among Health Care Professionals", journal="JMIR Med Inform", year="2024", month="Oct", day="23", volume="12", pages="e63456", keywords="eHealth", keywords="telemedicine", keywords="mobile health", keywords="mHealth", keywords="medication management", keywords="home care", keywords="family caregivers", keywords="mobile phone", abstract="Background: With the aging population on the rise, the demand for effective health care solutions to address adverse drug events is becoming increasingly urgent. Telemedicine has emerged as a promising solution for strengthening health care delivery in home care settings and mitigating drug errors. Due to the indispensable role of family caregivers in daily patient care, integrating digital health tools has the potential to streamline medication management processes and enhance the overall quality of patient care. Objective: This study aims to explore health care professionals' perspectives on the use of a medication and care support system (MCSS) and collect recommendations for designing a similar tool for family caregivers. Methods: Fifteen interviews with health care professionals in a home care center were conducted. Thematic analysis was used, and 5 key themes highlighting the importance of using the MCSS tool to improve medication management in home care were identified. Results: All participants emphasized the necessity of direct communication between health care professionals and family caregivers and stated that family caregivers need comprehensive information about medication administration, patient conditions, and symptoms. Furthermore, the health care professionals recommended features and functions customized for family caregivers. Conclusions: This study underscored the importance of clear communication between health care professionals and family caregivers and the provision of comprehensive instructions to promote safe medication practices. By equipping family caregivers with essential information via a tool similar to the MCSS, a proactive approach to preventing errors and improving outcomes is advocated. ", doi="10.2196/63456", url="https://medinform.jmir.org/2024/1/e63456" } @Article{info:doi/10.2196/57940, author="Ortiz-Barrios, Miguel and Cleland, Ian and Donnelly, Mark and Gul, Muhammet and Yucesan, Melih and Jim{\'e}nez-Delgado, Isabel Genett and Nugent, Chris and Madrid-Sierra, Stephany", title="Integrated Approach Using Intuitionistic Fuzzy Multicriteria Decision-Making to Support Classifier Selection for Technology Adoption in Patients with Parkinson Disease: Algorithm Development and Validation", journal="JMIR Rehabil Assist Technol", year="2024", month="Oct", day="22", volume="11", pages="e57940", keywords="Parkinson disease", keywords="technology adoption", keywords="intuitionistic fuzzy analytic hierarchy process", keywords="intuitionistic fuzzy decision-making trial and evaluation laboratory", keywords="combined compromise solution", abstract="Background: Parkinson disease (PD) is reported to be among the most prevalent neurodegenerative diseases globally, presenting ongoing challenges and increasing burden on health care systems. In an effort to support patients with PD, their carers, and the wider health care sector to manage this incurable condition, the focus has begun to shift away from traditional treatments. One of the most contemporary treatments includes prescribing assistive technologies (ATs), which are viewed as a way to promote independent living and deliver remote care. However, the uptake of these ATs is varied, with some users not ready or willing to accept all forms of AT and others only willing to adopt low-technology solutions. Consequently, to manage both the demands on resources and the efficiency with which ATs are deployed, new approaches are needed to automatically assess or predict a user's likelihood to accept and adopt a particular AT before it is prescribed. Classification algorithms can be used to automatically consider the range of factors impacting AT adoption likelihood, thereby potentially supporting more effective AT allocation. From a computational perspective, different classification algorithms and selection criteria offer various opportunities and challenges to address this need. Objective: This paper presents a novel hybrid multicriteria decision-making approach to support classifier selection in technology adoption processes involving patients with PD. Methods: First, the intuitionistic fuzzy analytic hierarchy process (IF-AHP) was implemented to calculate the relative priorities of criteria and subcriteria considering experts' knowledge and uncertainty. Second, the intuitionistic fuzzy decision-making trial and evaluation laboratory (IF-DEMATEL) was applied to evaluate the cause-effect relationships among criteria/subcriteria. Finally, the combined compromise solution (CoCoSo) was used to rank the candidate classifiers based on their capability to model the technology adoption. Results: We conducted a study involving a mobile smartphone solution to validate the proposed methodology. Structure (F5) was identified as the factor with the highest relative priority (overall weight=0.214), while adaptability (F4) (D-R=1.234) was found to be the most influencing aspect when selecting classifiers for technology adoption in patients with PD. In this case, the most appropriate algorithm for supporting technology adoption in patients with PD was the A3 - J48 decision tree (M3=2.5592). The results obtained by comparing the CoCoSo method in the proposed approach with 2 alternative methods (simple additive weighting and technique for order of preference by similarity to ideal solution) support the accuracy and applicability of the proposed methodology. It was observed that the final scores of the algorithms in each method were highly correlated (Pearson correlation coefficient >0.8). Conclusions: The IF-AHP-IF-DEMATEL-CoCoSo approach helped to identify classification algorithms that do not just discriminate between good and bad adopters of assistive technologies within the Parkinson population but also consider technology-specific features like design, quality, and compatibility that make these classifiers easily implementable by clinicians in the health care system. ", doi="10.2196/57940", url="https://rehab.jmir.org/2024/1/e57940" } @Article{info:doi/10.2196/50023, author="Renne, Lorenzo Salvatore and Cammelli, Manuela and Santori, Ilaria and Tassan-Mangina, Marta and Sam{\`a}, Laura and Ruspi, Laura and Sicoli, Federico and Colombo, Piergiuseppe and Terracciano, Maria Luigi and Quagliuolo, Vittorio and Cananzi, Maria Ferdinando Carlo", title="True Mitotic Count Prediction in Gastrointestinal Stromal Tumors: Bayesian Network Model and PROMETheus (Preoperative Mitosis Estimator Tool) Application Development", journal="J Med Internet Res", year="2024", month="Oct", day="22", volume="26", pages="e50023", keywords="GIST mitosis", keywords="risk classification", keywords="mHealth", keywords="mobile health", keywords="neoadjuvant therapy", keywords="patient stratification", keywords="Gastrointestinal Stroma", keywords="preoperative risk", abstract="Background: Gastrointestinal stromal tumors (GISTs) present a complex clinical landscape, where precise preoperative risk assessment plays a pivotal role in guiding therapeutic decisions. Conventional methods for evaluating mitotic count, such as biopsy-based assessments, encounter challenges stemming from tumor heterogeneity and sampling biases, thereby underscoring the urgent need for innovative approaches to enhance prognostic accuracy. Objective: The primary objective of this study was to develop a robust and reliable computational tool, PROMETheus (Preoperative Mitosis Estimator Tool), aimed at refining patient stratification through the precise estimation of mitotic count in GISTs. Methods: Using advanced Bayesian network methodologies, we constructed a directed acyclic graph (DAG) integrating pertinent clinicopathological variables essential for accurate mitotic count prediction on the surgical specimen. Key parameters identified and incorporated into the model encompassed tumor size, location, mitotic count from biopsy specimens, surface area evaluated during biopsy, and tumor response to therapy, when applicable. Rigorous testing procedures, including prior predictive simulations, validation utilizing synthetic data sets were employed. Finally, the model was trained on a comprehensive cohort of real-world GIST cases (n=80), drawn from the repository of the Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Humanitas Research Hospital, with a total of 160 cases analyzed. Results: Our computational model exhibited excellent diagnostic performance on synthetic data. Different model architecture were selected based on lower deviance and robust out-of-sample predictive capabilities. Posterior predictive checks (retrodiction) further corroborated the model's accuracy. Subsequently, PROMETheus was developed. This is an intuitive tool that dynamically computes predicted mitotic count and risk assessment on surgical specimens based on tumor-specific attributes, including size, location, surface area, and biopsy-derived mitotic count, using posterior probabilities derived from the model. Conclusions: The deployment of PROMETheus represents a potential advancement in preoperative risk stratification for GISTs, offering clinicians a precise and reliable means to anticipate mitotic counts on surgical specimens and a solid base to stratify patients for clinical studies. By facilitating tailored therapeutic strategies, this innovative tool is poised to revolutionize clinical decision-making paradigms, ultimately translating into improved patient outcomes and enhanced prognostic precision in the management of GISTs. ", doi="10.2196/50023", url="https://www.jmir.org/2024/1/e50023" } @Article{info:doi/10.2196/60402, author="Fernando, Manasha and Abell, Bridget and McPhail, M. Steven and Tyack, Zephanie and Tariq, Amina and Naicker, Sundresan", title="Applying the Non-Adoption, Abandonment, Scale-up, Spread, and Sustainability Framework Across Implementation Stages to Identify Key Strategies to Facilitate Clinical Decision Support System Integration Within a Large Metropolitan Health Service: Interview and Focus Group Study", journal="JMIR Med Inform", year="2024", month="Oct", day="17", volume="12", pages="e60402", keywords="medical informatics", keywords="adoption and implementation", keywords="behavior", keywords="health systems", abstract="Background: Computerized clinical decision support systems (CDSSs) enhance patient care through real-time, evidence-based guidance for health care professionals. Despite this, the effective implementation of these systems for health services presents multifaceted challenges, leading to inappropriate use and abandonment over the course of time. Using the Non-Adoption, Abandonment, Scale-Up, Spread, and Sustainability (NASSS) framework, this qualitative study examined CDSS adoption in a metropolitan health service, identifying determinants across implementation stages to optimize CDSS integration into health care practice. Objective: This study aims to identify the theory-informed (NASSS) determinants, which included multiple CDSS interventions across a 2-year period, both at the health-service level and at the individual hospital setting, that either facilitate or hinder the application of CDSSs within a metropolitan health service. In addition, this study aimed to map these determinants onto specific stages of the implementation process, thereby developing a system-level understanding of CDSS application across implementation stages. Methods: Participants involved in various stages of the implementation process were recruited (N=30). Participants took part in interviews and focus groups. We used a hybrid inductive-deductive qualitative content analysis and a framework mapping approach to categorize findings into barriers, enablers, or neutral determinants aligned to NASSS framework domains. These determinants were also mapped to implementation stages using the Active Implementation Framework stages approach. Results: Participants comprised clinical adopters (14/30, 47\%), organizational champions (5/30, 16\%), and those with roles in organizational clinical informatics (5/30, 16\%). Most determinants were mapped to the organization level, technology, and adopter subdomains. However, the study findings also demonstrated a relative lack of long-term implementation planning. Consequently, determinants were not uniformly distributed across the stages of implementation, with 61.1\% (77/126) identified in the exploration stage, 30.9\% (39/126) in the full implementation stage, and 4.7\% (6/126) in the installation stages. Stakeholders engaged in more preimplementation and full-scale implementation activities, with fewer cycles of monitoring and iteration activities identified. Conclusions: These findings addressed a substantial knowledge gap in the literature using systems thinking principles to identify the interdependent dynamics of CDSS implementation. A lack of sustained implementation strategies (ie, training and longer-term, adopter-level championing) weakened the sociotechnical network between developers and adopters, leading to communication barriers. More rigorous implementation planning, encompassing all 4 implementation stages, may, in a way, help in addressing the barriers identified and enhancing enablers. ", doi="10.2196/60402", url="https://medinform.jmir.org/2024/1/e60402" } @Article{info:doi/10.2196/56192, author="Li, Chia-Yen and Huang, Mei-Hui and Lin, Yu-Shiue and Chu, Chi-Ming and Pan, Hsueh-Hsing", title="Effects of Implementing a Barcode Information Management System on Operating Room Staff: Comparative Study", journal="J Med Internet Res", year="2024", month="Oct", day="17", volume="26", pages="e56192", keywords="barcode information management system", keywords="barcode", keywords="information system", keywords="operation management information system", keywords="Agile development model", abstract="Background: Barcode information management systems (BIMS) have been implemented in operating rooms to improve the quality of medical care and administrative efficiency. Previous research has demonstrated that the Agile development model is extensively used in the development and management of information systems. However, the effect of information systems on staff acceptance has not been examined within the context of clinical medical information management systems. Objective: This study aimed to explore the effects and acceptance of implementing a BIMS in comparison to the original information system (OIS) among operating and supply room staff. Methods: This study was a comparative cohort design. A total of 80 staff members from the operating and supply rooms of a Northern Taiwan medical center were recruited. Data collection, conducted from January 2020 to August 2020 using a mobile-based structured questionnaire, included participant characteristics and the Information Management System Scale. SPSS (version 20.0, IBM Corp) for Windows (Microsoft Corporation) was used for data analysis. Descriptive statistics included mean, SD, frequency, and percentage. Differences between groups were analyzed using the Mann-Whitney U test and Kruskal-Wallis test, with a P value <.05 considered statistically significant. Results: The results indicated that the BIMS generally achieved higher scores in key elements of system success, system quality, information quality, perceived system use, perceived ease of use, perceived usefulness, and overall quality score; none of these differences were statistically significant (P>.05), with the system quality subscale being closest to significance (P=.06). Nurses showed significantly better perceived system use than technicians (1.58, SD 4.78 vs --1.19, SD 6.24; P=.02). Significant differences in perceived usefulness were found based on educational level (P=.04) and experience with OIS (P=.03), with junior college-educated nurses and those with over 6 years of OIS experience reporting the highest perceived usefulness. Conclusions: The study demonstrates that using the Agile development model for BIMS is advantageous for clinical environments. The high acceptance among operating room staff underscores its practicality and broader adoption potential. It advocates for continued exploration of technology-driven solutions to enhance health care delivery and optimize clinical workflows. ", doi="10.2196/56192", url="https://www.jmir.org/2024/1/e56192" } @Article{info:doi/10.2196/44494, author="Liu, Siqi and Xu, Qianyi and Xu, Zhuoyang and Liu, Zhuo and Sun, Xingzhi and Xie, Guotong and Feng, Mengling and See, Choong Kay", title="Reinforcement Learning to Optimize Ventilator Settings for Patients on Invasive Mechanical Ventilation: Retrospective Study", journal="J Med Internet Res", year="2024", month="Oct", day="16", volume="26", pages="e44494", keywords="mechanical ventilation", keywords="reinforcement learning", keywords="artificial intelligence", keywords="validation study", keywords="critical care", keywords="treatment", keywords="intensive care unit", keywords="critically ill", keywords="patient", keywords="monitoring", keywords="database", keywords="mortality rate", keywords="decision support", keywords="support tool", keywords="survival", keywords="prognosis", keywords="respiratory support", abstract="Background: One of the significant changes in intensive care medicine over the past 2 decades is the acknowledgment that improper mechanical ventilation settings substantially contribute to pulmonary injury in critically ill patients. Artificial intelligence (AI) solutions can optimize mechanical ventilation settings in intensive care units (ICUs) and improve patient outcomes. Specifically, machine learning algorithms can be trained on large datasets of patient information and mechanical ventilation settings. These algorithms can then predict patient responses to different ventilation strategies and suggest personalized ventilation settings for individual patients. Objective: In this study, we aimed to design and evaluate an AI solution that could tailor an optimal ventilator strategy for each critically ill patient who requires mechanical ventilation. Methods: We proposed a reinforcement learning--based AI solution using observational data from multiple ICUs in the United States. The primary outcome was hospital mortality. Secondary outcomes were the proportion of optimal oxygen saturation and the proportion of optimal mean arterial blood pressure. We trained our AI agent to recommend low, medium, and high levels of 3 ventilator settings---positive end-expiratory pressure, fraction of inspired oxygen, and ideal body weight--adjusted tidal volume---according to patients' health conditions. We defined a policy as rules guiding ventilator setting changes given specific clinical scenarios. Off-policy evaluation metrics were applied to evaluate the AI policy. Results: We studied 21,595 and 5105 patients' ICU stays from the e-Intensive Care Unit Collaborative Research (eICU) and Medical Information Mart for Intensive Care IV (MIMIC-IV) databases, respectively. Using the learned AI policy, we estimated the hospital mortality rate (eICU 12.1\%, SD 3.1\%; MIMIC-IV 29.1\%, SD 0.9\%), the proportion of optimal oxygen saturation (eICU 58.7\%, SD 4.7\%; MIMIC-IV 49\%, SD 1\%), and the proportion of optimal mean arterial blood pressure (eICU 31.1\%, SD 4.5\%; MIMIC-IV 41.2\%, SD 1\%). Based on multiple quantitative and qualitative evaluation metrics, our proposed AI solution outperformed observed clinical practice. Conclusions: Our study found that customizing ventilation settings for individual patients led to lower estimated hospital mortality rates compared to actual rates. This highlights the potential effectiveness of using reinforcement learning methodology to develop AI models that analyze complex clinical data for optimizing treatment parameters. Additionally, our findings suggest the integration of this model into a clinical decision support system for refining ventilation settings, supporting the need for prospective validation trials. ", doi="10.2196/44494", url="https://www.jmir.org/2024/1/e44494", url="http://www.ncbi.nlm.nih.gov/pubmed/39219230" } @Article{info:doi/10.2196/60081, author="You, Guan-Ting Jacqueline and Leung, I. Tiffany and Pandita, Deepti and Sakumoto, Matthew", title="Primary Care Informatics: Vitalizing the Bedrock of Health Care", journal="J Med Internet Res", year="2024", month="Oct", day="15", volume="26", pages="e60081", keywords="health care delivery", keywords="primary care", keywords="primary health care", keywords="primary prevention", keywords="quality of health care", keywords="holistic care", keywords="holistic medicine", keywords="people-centric care", keywords="person-centric care", keywords="medical informatics applications", keywords="primary care informatics", keywords="medical informatics", keywords="health informatics", keywords="information science", keywords="data science", doi="10.2196/60081", url="https://www.jmir.org/2024/1/e60081" } @Article{info:doi/10.2196/52722, author="Chen, Tingting and Tang, Xiaofen and Xu, Min and Jiang, Yue and Zheng, Fengyan", title="Application of Information Link Control in Surgical Specimen Near-Miss Events in a South China Hospital: Nonrandomized Controlled Study", journal="JMIR Med Inform", year="2024", month="Oct", day="14", volume="12", pages="e52722", keywords="near misses", keywords="technical barriers", keywords="process barriers", keywords="surgical specimens", keywords="information", abstract="Background: Information control is a promising approach for managing surgical specimens. However, there is limited research evidence on surgical near misses. This is particularly true in the closed loop of information control for each link. Objective: A new model of surgical specimen process management is further constructed, and a safe operating room nursing practice environment is created by intercepting specimen near-miss events through information safety barriers. Methods: In a large hospital in China, 84,289 surgical specimens collected in the conventional information specimen management mode from January to December 2021 were selected as the control group, and 99,998 surgical specimens collected in the information safety barrier control surgical specimen management mode from January to December 2022 were selected as the improvement group. The incidence of near misses, the qualified rate of pathological specimen fixation, and the average time required for specimen fixation were compared under the 2 management modes. The causes of 2 groups of near misses were analyzed and the near misses of information safety barrier control surgical specimens were studied. Results: Under the information-based safety barrier control surgical specimen management model, the incidence of adverse events in surgical specimens was reduced, the reporting of near-miss events in surgical specimens was improved by 100\%, the quality control quality management of surgical specimens was effectively improved, the pass rate of surgical pathology specimen fixation was improved, and the meantime for surgical specimen fixation was shortened, with differences considered statistically significant at P<.05. Conclusions: Our research has developed a new mode of managing the surgical specimen process. This mode can prevent errors in approaching specimens by implementing information security barriers, thereby enhancing the quality of specimen management, ensuring the safety of medical procedures, and improving the quality of hospital services. ", doi="10.2196/52722", url="https://medinform.jmir.org/2024/1/e52722" } @Article{info:doi/10.2196/55903, author="Patil, Rohan and Ashraf, Fatima and Abu Dayeh, Samer and Prakash, K. Siddharth", title="Development and Assessment of a Point-of-Care Application (Genomic Medicine Guidance) for Heritable Thoracic Aortic Disease", journal="JMIRx Med", year="2024", month="Oct", day="8", volume="5", pages="e55903", keywords="genomic medicine", keywords="point of care", keywords="thoracic aortic aneurysm", keywords="aortic dissection", keywords="decision support", abstract="Background: Genetic testing can determine familial and personal risks for heritable thoracic aortic aneurysms and dissections (TAD). The 2022 American College of Cardiology/American Heart Association guidelines for TAD recommend management decisions based on the specific gene mutation. However, many clinicians lack sufficient comfort or insight to integrate genetic information into clinical practice. Objective: We therefore developed the Genomic Medicine Guidance (GMG) application, an interactive point-of-care tool to inform clinicians and patients about TAD diagnosis, treatment, and surveillance. GMG is a REDCap-based application that combines publicly available genetic data and clinical recommendations based on the TAD guidelines into one translational education tool. Methods: TAD genetic information in GMG was sourced from the Montalcino Aortic Consortium, a worldwide collaboration of TAD centers of excellence, and the National Institutes of Health genetic repositories ClinVar and ClinGen. Results: The application streamlines data on the 13 most frequently mutated TAD genes with 2286 unique pathogenic mutations that cause TAD so that users receive comprehensive recommendations for diagnostic testing, imaging, surveillance, medical therapy, and preventative surgical repair, as well as guidance for exercise safety and management during pregnancy. The application output can be displayed in a clinician view or exported as an informative pamphlet in a patient-friendly format. Conclusions: The overall goal of the GMG application is to make genomic medicine more accessible to clinicians and patients while serving as a unifying platform for research. We anticipate that these features will be catalysts for collaborative projects aiming to understand the spectrum of genetic variants contributing to TAD. ", doi="10.2196/55903", url="https://xmed.jmir.org/2024/1/e55903" } @Article{info:doi/10.2196/56353, author="Bozkurt, Selen and Fereydooni, Soraya and Kar, Irem and Diop Chalmers, Catherine and Leslie, L. Sharon and Pathak, Ravi and Walling, Anne and Lindvall, Charlotta and Lorenz, Karl and Quest, Tammie and Giannitrapani, Karleen and Kavalieratos, Dio", title="Investigating Data Diversity and Model Robustness of AI Applications in Palliative Care and Hospice: Protocol for Scoping Review", journal="JMIR Res Protoc", year="2024", month="Oct", day="8", volume="13", pages="e56353", keywords="palliative care", keywords="artificial intelligence", keywords="ethical frameworks", keywords="AI", keywords="data diversity", keywords="model robustness", keywords="decision support", keywords="clinical settings", keywords="end-of-life care", keywords="hospice environments", keywords="hospice", keywords="methodology", keywords="thematic analysis", keywords="dissemination", abstract="Background: Artificial intelligence (AI) has become a pivotal element in health care, leading to significant advancements across various medical domains, including palliative care and hospice services. These services focus on improving the quality of life for patients with life-limiting illnesses, and AI's ability to process complex datasets can enhance decision-making and personalize care in these sensitive settings. However, incorporating AI into palliative and hospice care requires careful examination to ensure it reflects the multifaceted nature of these settings. Objective: This scoping review aims to systematically map the landscape of AI in palliative care and hospice settings, focusing on the data diversity and model robustness. The goal is to understand AI's role, its clinical integration, and the transparency of its development, ultimately providing a foundation for developing AI applications that adhere to established ethical guidelines and principles. Methods: Our scoping review involves six stages: (1) identifying the research question; (2) identifying relevant studies; (3) study selection; (4) charting the data; (5) collating, summarizing, and reporting the results; and (6) consulting with stakeholders. Searches were conducted across databases including MEDLINE through PubMed, Embase.com, IEEE Xplore, ClinicalTrials.gov, and Web of Science Core Collection, covering studies from the inception of each database up to November 1, 2023. We used a comprehensive set of search terms to capture relevant studies, and non-English records were excluded if their abstracts were not in English. Data extraction will follow a systematic approach, and stakeholder consultations will refine the findings. Results: The electronic database searches conducted in November 2023 resulted in 4614 studies. After removing duplicates, 330 studies were selected for full-text review to determine their eligibility based on predefined criteria. The extracted data will be organized into a table to aid in crafting a narrative summary. The review is expected to be completed by May 2025. Conclusions: This scoping review will advance the understanding of AI in palliative care and hospice, focusing on data diversity and model robustness. It will identify gaps and guide future research, contributing to the development of ethically responsible and effective AI applications in these settings. International Registered Report Identifier (IRRID): DERR1-10.2196/56353 ", doi="10.2196/56353", url="https://www.researchprotocols.org/2024/1/e56353" } @Article{info:doi/10.2196/55472, author="Trinkley, E. Katy and Maw, M. Anna and Torres, Huebner Cristina and Huebschmann, G. Amy and Glasgow, E. Russell", title="Applying Implementation Science to Advance Electronic Health Record--Driven Learning Health Systems: Case Studies, Challenges, and Recommendations", journal="J Med Internet Res", year="2024", month="Oct", day="7", volume="26", pages="e55472", keywords="learning health systems", keywords="implementation science", keywords="chronic care", keywords="electronic health record", keywords="evidence-based medicine", keywords="information technology", keywords="research and technology", doi="10.2196/55472", url="https://www.jmir.org/2024/1/e55472", url="http://www.ncbi.nlm.nih.gov/pubmed/39374069" } @Article{info:doi/10.2196/45122, author="Peiffer-Smadja, Nathan and Descousse, Sophie and Courr{\`e}ges, Elsa and Nganbou, Audrey and Jeanmougin, Pauline and Birgand, Gabriel and L{\'e}naud, S{\'e}verin and Beaumont, Anne-Lise and Durand, Claire and Delory, Tristan and Le Bel, Josselin and Bouvet, Elisabeth and Lariven, Sylvie and D'Ortenzio, Eric and Konat{\'e}, Issa and Bouyou-Akotet, Karine Marielle and Ouedraogo, Abdoul-Salam and Kouakou, Affoue Gis{\`e}le and Poda, Armel and Akpovo, Corinne and Lescure, Fran{\c{c}}ois-Xavier and Tanon, Aristophane", title="Implementation of a Clinical Decision Support System for Antimicrobial Prescribing in Sub-Saharan Africa: Multisectoral Qualitative Study", journal="J Med Internet Res", year="2024", month="Oct", day="7", volume="26", pages="e45122", keywords="antimicrobial resistance", keywords="implementation research", keywords="Consolidated Framework for Implementation Research", keywords="CDSS", keywords="mobile health", keywords="mHealth", keywords="eHealth", keywords="mobile phone", abstract="Background: Suboptimal use of antimicrobials is a driver of antimicrobial resistance in West Africa. Clinical decision support systems (CDSSs) can facilitate access to updated and reliable recommendations. Objective: This study aimed to assess contextual factors that could facilitate the implementation of a CDSS for antimicrobial prescribing in West Africa and Central Africa and to identify tailored implementation strategies. Methods: This qualitative study was conducted through 21 semistructured individual interviews via videoconference with health care professionals between September and December 2020. Participants were recruited using purposive sampling in a transnational capacity-building network for hospital preparedness in West Africa. The interview guide included multiple constructs derived from the Consolidated Framework for Implementation Research. Interviews were transcribed, and data were analyzed using thematic analysis. Results: The panel of participants included health practitioners (12/21, 57\%), health actors trained in engineering (2/21, 10\%), project managers (3/21, 14\%), antimicrobial resistance research experts (2/21, 10\%), a clinical microbiologist (1/21, 5\%), and an anthropologist (1/21, 5\%). Contextual factors influencing the implementation of eHealth tools existed at the individual, health care system, and national levels. At the individual level, the main challenge was to design a user-centered CDSS adapted to the prescriber's clinical routine and structural constraints. Most of the participants stated that the CDSS should not only target physicians in academic hospitals who can use their network to disseminate the tool but also general practitioners, primary care nurses, midwives, and other health care workers who are the main prescribers of antimicrobials in rural areas of West Africa. The heterogeneity in antimicrobial prescribing training among prescribers was a significant challenge to the use of a common CDSS. At the country level, weak pharmaceutical regulations, the lack of official guidelines for antimicrobial prescribing, limited access to clinical microbiology laboratories, self-medication, and disparity in health care coverage lead to inappropriate antimicrobial use and could limit the implementation and diffusion of CDSS for antimicrobial prescribing. Participants emphasized the importance of building a solid eHealth ecosystem in their countries by establishing academic partnerships, developing physician networks, and involving diverse stakeholders to address challenges. Additional implementation strategies included conducting a local needs assessment, identifying early adopters, promoting network weaving, using implementation advisers, and creating a learning collaborative. Participants noted that a CDSS for antimicrobial prescribing could be a powerful tool for the development and dissemination of official guidelines for infectious diseases in West Africa. Conclusions: These results suggest that a CDSS for antimicrobial prescribing adapted for nonspecialized prescribers could have a role in improving clinical decisions. They also confirm the relevance of adopting a cross-disciplinary approach with participants from different backgrounds to assess contextual factors, including social, political, and economic determinants. ", doi="10.2196/45122", url="https://www.jmir.org/2024/1/e45122", url="http://www.ncbi.nlm.nih.gov/pubmed/39374065" } @Article{info:doi/10.2196/54891, author="Huang, Yanqun and Chen, Siyuan and Wang, Yongfeng and Ou, Xiaohong and Yan, Huanhuan and Gan, Xin and Wei, Zhixiao", title="Analyzing Comorbidity Patterns in Patients With Thyroid Disease Using Large-Scale Electronic Medical Records: Network-Based Retrospective Observational Study", journal="Interact J Med Res", year="2024", month="Oct", day="3", volume="13", pages="e54891", keywords="thyroid disease", keywords="comorbidity patterns", keywords="prevalence", keywords="network analysis", keywords="electronic medical records", abstract="Background: Thyroid disease (TD) is a prominent endocrine disorder that raises global health concerns; however, its comorbidity patterns remain unclear. Objective: This study aims to apply a network-based method to comprehensively analyze the comorbidity patterns of TD using large-scale real-world health data. Methods: In this retrospective observational study, we extracted the comorbidities of adult patients with TD from both private and public data sets. All comorbidities were identified using ICD-10 (International Classification of Diseases, 10th Revision) codes at the 3-digit level, and those with a prevalence greater than 2\% were analyzed. Patients were categorized into several subgroups based on sex, age, and disease type. A phenotypic comorbidity network (PCN) was constructed, where comorbidities served as nodes and their significant correlations were represented as edges, encompassing all patients with TD and various subgroups. The associations and differences in comorbidities within the PCN of each subgroup were analyzed and compared. The PageRank algorithm was used to identify key comorbidities. Results: The final cohorts included 18,311 and 50,242 patients with TD in the private and public data sets, respectively. Patients with TD demonstrated complex comorbidity patterns, with coexistence relationships differing by sex, age, and type of TD. The number of comorbidities increased with age. The most prevalent TDs were nontoxic goiter, hypothyroidism, hyperthyroidism, and thyroid cancer, while hypertension, diabetes, and lipoprotein metabolism disorders had the highest prevalence and PageRank values among comorbidities. Males and patients with benign TD exhibited a greater number of comorbidities, increased disease diversity, and stronger comorbidity associations compared with females and patients with thyroid cancer. Conclusions: Patients with TD exhibited complex comorbidity patterns, particularly with cardiocerebrovascular diseases and diabetes. The associations among comorbidities varied across different TD subgroups. This study aims to enhance the understanding of comorbidity patterns in patients with TD and improve the integrated management of these individuals. ", doi="10.2196/54891", url="https://www.i-jmr.org/2024/1/e54891", url="http://www.ncbi.nlm.nih.gov/pubmed/39361379" } @Article{info:doi/10.2196/55267, author="Bischof, Yvonne Anja and Kuklinski, David and Salvi, Irene and Walker, Carla and Vogel, Justus and Geissler, Alexander", title="A Collection of Components to Design Clinical Dashboards Incorporating Patient-Reported Outcome Measures: Qualitative Study", journal="J Med Internet Res", year="2024", month="Oct", day="2", volume="26", pages="e55267", keywords="clinical dashboards", keywords="design components", keywords="patient-reported outcome measures (PROMs)", keywords="patient-reported outcomes (PROs)", abstract="Background: A clinical dashboard is a data-driven clinical decision support tool visualizing multiple key performance indicators in a single report while minimizing time and effort for data gathering. Studies have shown that including patient-reported outcome measures (PROMs) in clinical dashboards supports the clinician's understanding of how treatments impact patients' health status, helps identify changes in health-related quality of life at an early stage, and strengthens patient-physician communication. Objective: This study aims to determine design components for clinical dashboards incorporating PROMs to inform software producers and users (ie, physicians). Methods: We conducted interviews with software producers and users to test preselected design components. Furthermore, the interviews allowed us to derive additional components that are not outlined in existing literature. Finally, we used inductive and deductive coding to derive a guide on which design components need to be considered when building a clinical dashboard incorporating PROMs. Results: A total of 25 design components were identified, of which 16 were already surfaced during the literature search. Furthermore, 9 additional components were derived inductively during our interviews. The design components are clustered in a generic dashboard, PROM-related, adjacent information, and requirements for adoption components. Both software producers and users agreed on the primary purpose of a clinical dashboard incorporating PROMs to enhance patient communication in outpatient settings. Dashboard benefits include enhanced data visualization and improved workflow efficiency, while interoperability and data collection were named as adoption challenges. Consistency in dashboard design components is preferred across different episodes of care, with adaptations only for disease-specific PROMs. Conclusions: Clinical dashboards have the potential to facilitate informed treatment decisions if certain design components are followed. This study establishes a comprehensive framework of design components to guide the development of effective clinical dashboards incorporating PROMs in health care practice. ", doi="10.2196/55267", url="https://www.jmir.org/2024/1/e55267" } @Article{info:doi/10.2196/55315, author="Eguia, Hans and S{\'a}nchez-Bocanegra, Luis Carlos and Vinciarelli, Franco and Alvarez-Lopez, Fernando and Saig{\'i}-Rubi{\'o}, Francesc", title="Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review", journal="J Med Internet Res", year="2024", month="Sep", day="30", volume="26", pages="e55315", keywords="artificial intelligence", keywords="AI", keywords="natural language processing", keywords="clinical decision support system", keywords="CDSS", keywords="health recommender system", keywords="clinical information extraction", keywords="electronic health record", keywords="systematic literature review", keywords="patient", keywords="treatment", keywords="diagnosis", keywords="health workers", abstract="Background: Ensuring access to accurate and verified information is essential for effective patient treatment and diagnosis. Although health workers rely on the internet for clinical data, there is a need for a more streamlined approach. Objective: This systematic review aims to assess the current state of artificial intelligence (AI) and natural language processing (NLP) techniques in health care to identify their potential use in electronic health records and automated information searches. Methods: A search was conducted in the PubMed, Embase, ScienceDirect, Scopus, and Web of Science online databases for articles published between January 2000 and April 2023. The only inclusion criteria were (1) original research articles and studies on the application of AI-based medical clinical decision support using NLP techniques and (2) publications in English. A Critical Appraisal Skills Programme tool was used to assess the quality of the studies. Results: The search yielded 707 articles, from which 26 studies were included (24 original articles and 2 systematic reviews). Of the evaluated articles, 21 (81\%) explained the use of NLP as a source of data collection, 18 (69\%) used electronic health records as a data source, and a further 8 (31\%) were based on clinical data. Only 5 (19\%) of the articles showed the use of combined strategies for NLP to obtain clinical data. In total, 16 (62\%) articles presented stand-alone data review algorithms. Other studies (n=9, 35\%) showed that the clinical decision support system alternative was also a way of displaying the information obtained for immediate clinical use. Conclusions: The use of NLP engines can effectively improve clinical decision systems' accuracy, while biphasic tools combining AI algorithms and human criteria may optimize clinical diagnosis and treatment flows. Trial Registration: PROSPERO CRD42022373386; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=373386 ", doi="10.2196/55315", url="https://www.jmir.org/2024/1/e55315" } @Article{info:doi/10.2196/58740, author="Chen, Qi and Qin, Yuchen and Jin, Zhichao and Zhao, Xinxin and He, Jia and Wu, Cheng and Tang, Bihan", title="Enhancing Performance of the National Field Triage Guidelines Using Machine Learning: Development of a Prehospital Triage Model to Predict Severe Trauma", journal="J Med Internet Res", year="2024", month="Sep", day="30", volume="26", pages="e58740", keywords="severe trauma", keywords="field triage", keywords="machine learning", keywords="prediction model", abstract="Background: Prehospital trauma triage is essential to get the right patient to the right hospital. However, the national field triage guidelines proposed by the American College of Surgeons have proven to be relatively insensitive when identifying severe traumas. Objective: This study aimed to build a prehospital triage model to predict severe trauma and enhance the performance of the national field triage guidelines. Methods: This was a multisite prediction study, and the data were extracted from the National Trauma Data Bank between 2017 and 2019. All patients with injury, aged 16 years of age or older, and transported by ambulance from the injury scene to any trauma center were potentially eligible. The data were divided into training, internal, and external validation sets of 672,309; 288,134; and 508,703 patients, respectively. As the national field triage guidelines recommended, age, 7 vital signs, and 8 injury patterns at the prehospital stage were included as candidate variables for model development. Outcomes were severe trauma with an Injured Severity Score ?16 (primary) and critical resource use within 24 hours of emergency department arrival (secondary). The triage model was developed using an extreme gradient boosting model and Shapley additive explanation analysis. The model's accuracy regarding discrimination, calibration, and clinical benefit was assessed. Results: At a fixed specificity of 0.5, the model showed a sensitivity of 0.799 (95\% CI 0.797-0.801), an undertriage rate of 0.080 (95\% CI 0.079-0.081), and an overtriage rate of 0.743 (95\% CI 0.742-0.743) for predicting severe trauma. The model showed a sensitivity of 0.774 (95\% CI 0.772-0.776), an undertriage rate of 0.158 (95\% CI 0.157-0.159), and an overtriage rate of 0.609 (95\% CI 0.608-0.609) when predicting critical resource use, fixed at 0.5 specificity. The triage model's areas under the curve were 0.755 (95\% CI 0.753-0.757) for severe trauma prediction and 0.736 (95\% CI 0.734-0.737) for critical resource use prediction. The triage model's performance was better than those of the Glasgow Coma Score, Prehospital Index, revised trauma score, and the 2011 national field triage guidelines RED criteria. The model's performance was consistent in the 2 validation sets. Conclusions: The prehospital triage model is promising for predicting severe trauma and achieving an undertriage rate of <10\%. Moreover, machine learning enhances the performance of field triage guidelines. ", doi="10.2196/58740", url="https://www.jmir.org/2024/1/e58740" } @Article{info:doi/10.2196/48294, author="Fesshaye, Berhaun and Pandya, Shivani and Kan, Lena and Kalbarczyk, Anna and Alland, Kelsey and Rahman, Mustafizur S. M. and Bulbul, Islam Md Mofijul and Mustaphi, Piyali and Siddique, Bakr Muhammad Abu and Tanim, Alam Md Imtiaz and Chowdhury, Mridul and Rumman, Tajkia and Labrique, B. Alain", title="Quality, Usability, and Trust Challenges to Effective Data Use in the Deployment and Use of the Bangladesh Nutrition Information System Dashboard: Qualitative Study", journal="J Med Internet Res", year="2024", month="Sep", day="30", volume="26", pages="e48294", keywords="digital health", keywords="nutrition", keywords="data for decision-making", keywords="health information systems", keywords="information system", keywords="information systems", keywords="LMIC", keywords="low- and middle-income countries", keywords="nutritional", keywords="dashboard", keywords="experience", keywords="experiences", keywords="interview", keywords="interviews", keywords="service", keywords="services", keywords="delivery", keywords="health care management", abstract="Background: Evidence-based decision-making is essential to improve public health benefits and resources, especially in low- and middle-income countries (LMICs), but the mechanisms of its implementation remain less straightforward. The availability of high-quality, reliable, and sufficient data in LMICs can be challenging due to issues such as a lack of human resource capacity and weak digital infrastructure, among others. Health information systems (HISs) have been critical for aggregating and integrating health-related data from different sources to support evidence-based decision-making. Nutrition information systems (NISs), which are nutrition-focused HISs, collect and report on nutrition-related indicators to improve issues related to malnutrition and food security---and can assist in improving populations' nutritional statuses and the integration of nutrition programming into routine health services. Data visualization tools (DVTs) such as dashboards have been recommended to support evidence-based decision-making, leveraging data from HISs or NISs. The use of such DVTs to support decision-making has largely been unexplored within LMIC contexts. In Bangladesh, the Mukto dashboard was developed to display and visualize nutrition-related performance indicators at the national and subnational levels. However, despite this effort, the current use of nutrition data to guide priorities and decisions remains relatively nascent and underused. Objective: The goal of this study is to better understand how Bangladesh's NIS, including the Mukto dashboard, has been used and areas for improvement to facilitate its use for evidence-based decision-making toward ameliorating nutrition-related service delivery and the health status of communities in Bangladesh. Methods: Primary data collection was conducted through qualitative semistructured interviews with key policy-level stakeholders (n=24). Key informants were identified through purposive sampling and were asked questions about the experiences and challenges with the NIS and related nutrition dashboards. Results: Main themes such as trust, data usability, personal power, and data use for decision-making emerged from the data. Trust in both data collection and quality was lacking among many stakeholders. Poor data usability stemmed from unstandardized indicators, irregular data collection, and differences between rural and urban data. Insufficient personal power and staff training coupled with infrastructural challenges can negatively affect data at the input stage. While stakeholders understood and expressed the importance of evidence-based decision-making, ultimately, they noted that the data were not being used to their maximum potential. Conclusions: Leveraging DVTs can improve the use of data for evidence-based decision-making, but decision makers must trust that the data are believable, credible, timely, and responsive. The results support the significance of a tailored data ecosystem, which has not reached its full potential in Bangladesh. Recommendations to reach this potential include ensuring a clear intended user base and accountable stakeholders are present. Systems should also have the capacity to ensure data credibility and support ongoing personal power requirements. ", doi="10.2196/48294", url="https://www.jmir.org/2024/1/e48294", url="http://www.ncbi.nlm.nih.gov/pubmed/39348172" } @Article{info:doi/10.2196/55648, author="Franc, Micheal Jeffrey and Hertelendy, Julius Attila and Cheng, Lenard and Hata, Ryan and Verde, Manuela", title="Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study", journal="J Med Internet Res", year="2024", month="Sep", day="30", volume="26", pages="e55648", keywords="disaster medicine", keywords="large language models", keywords="triage", keywords="disaster", keywords="emergency", keywords="disasters", keywords="emergencies", keywords="LLM", keywords="LLMs", keywords="GPT", keywords="ChatGPT", keywords="language model", keywords="language models", keywords="NLP", keywords="natural language processing", keywords="artificial intelligence", keywords="repeatability", keywords="reproducibility", keywords="accuracy", keywords="accurate", keywords="reproducible", keywords="repeatable", abstract="Background: The release of ChatGPT (OpenAI) in November 2022 drastically reduced the barrier to using artificial intelligence by allowing a simple web-based text interface to a large language model (LLM). One use case where ChatGPT could be useful is in triaging patients at the site of a disaster using the Simple Triage and Rapid Treatment (START) protocol. However, LLMs experience several common errors including hallucinations (also called confabulations) and prompt dependency. Objective: This study addresses the research problem: ``Can ChatGPT adequately triage simulated disaster patients using the START protocol?'' by measuring three outcomes: repeatability, reproducibility, and accuracy. Methods: Nine prompts were developed by 5 disaster medicine physicians. A Python script queried ChatGPT Version 4 for each prompt combined with 391 validated simulated patient vignettes. Ten repetitions of each combination were performed for a total of 35,190 simulated triages. A reference standard START triage code for each simulated case was assigned by 2 disaster medicine specialists (JMF and MV), with a third specialist (LC) added if the first two did not agree. Results were evaluated using a gage repeatability and reproducibility study (gage R and R). Repeatability was defined as variation due to repeated use of the same prompt. Reproducibility was defined as variation due to the use of different prompts on the same patient vignette. Accuracy was defined as agreement with the reference standard. Results: Although 35,102 (99.7\%) queries returned a valid START score, there was considerable variability. Repeatability (use of the same prompt repeatedly) was 14\% of the overall variation. Reproducibility (use of different prompts) was 4.1\% of the overall variation. The accuracy of ChatGPT for START was 63.9\% with a 32.9\% overtriage rate and a 3.1\% undertriage rate. Accuracy varied by prompt with a maximum of 71.8\% and a minimum of 46.7\%. Conclusions: This study indicates that ChatGPT version 4 is insufficient to triage simulated disaster patients via the START protocol. It demonstrated suboptimal repeatability and reproducibility. The overall accuracy of triage was only 63.9\%. Health care professionals are advised to exercise caution while using commercial LLMs for vital medical determinations, given that these tools may commonly produce inaccurate data, colloquially referred to as hallucinations or confabulations. Artificial intelligence--guided tools should undergo rigorous statistical evaluation---using methods such as gage R and R---before implementation into clinical settings. ", doi="10.2196/55648", url="https://www.jmir.org/2024/1/e55648", url="http://www.ncbi.nlm.nih.gov/pubmed/39348189" } @Article{info:doi/10.2196/58498, author="Luo, Ming and Gu, Yu and Zhou, Feilong and Chen, Shaohong", title="Implementation of the Observational Medical Outcomes Partnership Model in Electronic Medical Record Systems: Evaluation Study Using Factor Analysis and Decision-Making Trial and Evaluation Laboratory-Best-Worst Methods", journal="JMIR Med Inform", year="2024", month="Sep", day="27", volume="12", pages="e58498", keywords="electronic medical record", keywords="Technology Acceptance Model", keywords="external factors", keywords="perception", keywords="attitude", keywords="behavioral inclination", keywords="OMOP", abstract="Background: Electronic medical record (EMR) systems are essential in health care for collecting and storing patient medical data. They provide critical information to doctors and caregivers, facilitating improved decision-making and patient care. Despite their significance, optimizing EMR systems is crucial for enhancing health care quality. Implementing the Observational Medical Outcomes Partnership (OMOP) shared data model represents a promising approach to improve EMR performance and overall health care outcomes. Objective: This study aims to evaluate the effects of implementing the OMOP shared data model in EMR systems and to assess its impact on enhancing health care quality. Methods: In this study, 3 distinct methodologies are used to explore various aspects of health care information systems. First, factor analysis is utilized to investigate the correlations between EMR systems and attitudes toward OMOP. Second, the best-worst method (BWM) is applied to determine the weights of criteria and subcriteria. Lastly, the decision-making trial and evaluation laboratory technique is used to illustrate the interactions and interdependencies among the identified criteria. Results: In this research, we evaluated the AliHealth EMR system by surveying 98 users and practitioners to assess its effectiveness and user satisfaction. The study reveals that among all components, ``EMR resolution'' holds the highest importance with a weight of 0.31007783, highlighting its significant role in the evaluation. Conversely, ``EMR ease of use'' has the lowest weight of 0.1860467, indicating that stakeholders prioritize the resolution aspect over ease of use in their assessment of EMR systems. Conclusions: The findings highlight that stakeholders prioritize certain aspects of EMR systems, with ``EMR resolution'' being the most valued component. ", doi="10.2196/58498", url="https://medinform.jmir.org/2024/1/e58498" } @Article{info:doi/10.2196/57852, author="Kugic, Amila and Martin, Ingrid and Modersohn, Luise and Pallaoro, Peter and Kreuzthaler, Markus and Schulz, Stefan and Boeker, Martin", title="Processing of Short-Form Content in Clinical Narratives: Systematic Scoping Review", journal="J Med Internet Res", year="2024", month="Sep", day="26", volume="26", pages="e57852", keywords="electronic health records", keywords="EHR", keywords="clinical narratives", keywords="natural language processing", keywords="machine learning", keywords="deep learning", keywords="rule-based approach", keywords="short-form expression", keywords="disambiguation", keywords="word embedding", keywords="vector representations", keywords="language modeling", keywords="human-in-the-loop, feature extraction", abstract="Background: Clinical narratives are essential components of electronic health records. The adoption of electronic health records has increased documentation time for hospital staff, leading to the use of abbreviations and acronyms more frequently. This brevity can potentially hinder comprehension for both professionals and patients. Objective: This review aims to provide an overview of the types of short forms found in clinical narratives, as well as the natural language processing (NLP) techniques used for their identification, expansion, and disambiguation. Methods: In the databases Web of Science, Embase, MEDLINE, EBMR (Evidence-Based Medicine Reviews), and ACL Anthology, publications that met the inclusion criteria were searched according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for a systematic scoping review. Original, peer-reviewed publications focusing on short-form processing in human clinical narratives were included, covering the period from January 2018 to February 2023. Short-form types were extracted, and multidimensional research methodologies were assigned to each target objective (identification, expansion, and disambiguation). NLP study recommendations and study characteristics were systematically assigned occurrence rates for evaluation. Results: Out of a total of 6639 records, only 19 articles were included in the final analysis. Rule-based approaches were predominantly used for identifying short forms, while string similarity and vector representations were applied for expansion. Embeddings and deep learning approaches were used for disambiguation. Conclusions: The scope and types of what constitutes a clinical short form were often not explicitly defined by the authors. This lack of definition poses challenges for reproducibility and for determining whether specific methodologies are suitable for different types of short forms. Analysis of a subset of NLP recommendations for assessing quality and reproducibility revealed only partial adherence to these recommendations. Single-character abbreviations were underrepresented in studies on clinical narrative processing, as were investigations in languages other than English. Future research should focus on these 2 areas, and each paper should include descriptions of the types of content analyzed. ", doi="10.2196/57852", url="https://www.jmir.org/2024/1/e57852", url="http://www.ncbi.nlm.nih.gov/pubmed/39325515" } @Article{info:doi/10.2196/49720, author="Goehringer, Jessica and Kosmin, Abigail and Laible, Natalie and Romagnoli, Katrina", title="Assessing the Utility of a Patient-Facing Diagnostic Tool Among Individuals With Hypermobile Ehlers-Danlos Syndrome: Focus Group Study", journal="JMIR Form Res", year="2024", month="Sep", day="26", volume="8", pages="e49720", keywords="diagnostic tool", keywords="hypermobile Ehlers-Danlos syndrome", keywords="patient experiences", keywords="diagnostic odyssey", keywords="affinity mapping", keywords="mobile health app", keywords="mobile phone", abstract="Background: Hypermobile Ehlers-Danlos syndrome (hEDS), characterized by joint hypermobility, skin laxity, and tissue fragility, is thought to be the most common inherited connective tissue disorder, with millions affected worldwide. Diagnosing this condition remains a challenge that can impact quality of life for individuals with hEDS. Many with hEDS describe extended diagnostic odysseys involving exorbitant time and monetary investment. This delay is due to the complexity of diagnosis, symptom overlap with other conditions, and limited access to providers. Many primary care providers are unfamiliar with hEDS, compounded by genetics clinics that do not accept referrals for hEDS evaluation and long waits for genetics clinics that do evaluate for hEDS, leaving patients without sufficient options. Objective: This study explored the user experience, quality, and utility of a prototype of a patient-facing diagnostic tool intended to support clinician diagnosis for individuals with symptoms of hEDS. The questions included within the prototype are aligned with the 2017 international classification of Ehlers-Danlos syndromes. This study explored how this tool may help patients communicate information about hEDS to their physicians, influencing the diagnosis of hEDS and affecting patient experience. Methods: Participants clinically diagnosed with hEDS were recruited from either a medical center or private groups on a social media platform. Interested participants provided verbal consent, completed questionnaires about their diagnosis, and were invited to join an internet-based focus group to share their thoughts and opinions on a diagnostic tool prototype. Participants were invited to complete the Mobile App Rating Scale (MARS) to evaluate their experience viewing the diagnostic tool. The MARS is a framework for evaluating mobile health apps across 4 dimensions: engagement, functionality, esthetics, and information quality. Qualitative data were analyzed using affinity mapping to organize information and inductively create themes that were categorized within the MARS framework dimensions to help identify strengths and weaknesses of the diagnostic tool prototype. Results: In total, 15 individuals participated in the internet-based focus groups; 3 (20\%) completed the MARS. Through affinity diagramming, 2 main categories of responses were identified, including responses related to the user interface and responses related to the application of the tool. Each category included several themes and subthemes that mapped well to the 4 MARS dimensions. The analysis showed that the tool held value and utility among the participants diagnosed with hEDS. The shareable ending summary sheet provided by the tool stood out as a strength for facilitating communication between patient and provider during the diagnostic evaluation. Conclusions: The results provide insights on the perceived utility and value of the tool, including preferred phrasing, layout and design preferences, and tool accessibility. The participants expressed that the tool may improve the hEDS diagnostic odyssey and help educate providers about the diagnostic process. ", doi="10.2196/49720", url="https://formative.jmir.org/2024/1/e49720", url="http://www.ncbi.nlm.nih.gov/pubmed/39325533" } @Article{info:doi/10.2196/57633, author="Rossi, S. Fernanda and Wu, Justina and Timko, Christine and Nevedal, L. Andrea and Wiltsey Stirman, Shannon", title="A Clinical Decision Support Tool for Intimate Partner Violence Screening Among Women Veterans: Development and Qualitative Evaluation of Provider Perspectives", journal="JMIR Form Res", year="2024", month="Sep", day="25", volume="8", pages="e57633", keywords="intimate partner violence", keywords="clinical decision support", keywords="intimate partner violence screening", keywords="women veterans", abstract="Background: Women veterans, compared to civilian women, are especially at risk of experiencing intimate partner violence (IPV), pointing to the critical need for IPV screening and intervention in the Veterans Health Administration (VHA). However, implementing paper-based IPV screening and intervention in the VHA has revealed substantial barriers, including health care providers' inadequate IPV training, competing demands, time constraints, and discomfort addressing IPV and making decisions about the appropriate type or level of intervention. Objective: This study aimed to address IPV screening implementation barriers and hence developed and tested a novel IPV clinical decision support (CDS) tool for physicians in the Women's Health Clinic (WHC), a primary care clinic within the Veterans Affairs Palo Alto Health Care System. This tool provides intelligent, evidence-based, step-by-step guidance on how to conduct IPV screening and intervention. Methods: Informed by existing CDS development frameworks, developing the IPV CDS tool prototype involved six steps: (1) identifying the scope of the tool, (2) identifying IPV screening and intervention content, (3) incorporating IPV-related VHA and clinic resources, (4) identifying the tool's components, (5) designing the tool, and (6) conducting initial tool revisions. We obtained preliminary physician feedback on user experience and clinical utility of the CDS tool via the System Usability Scale (SUS) and semistructured interviews with 6 WHC physicians. SUS scores were examined using descriptive statistics. Interviews were analyzed using rapid qualitative analysis to extract actionable feedback to inform design updates and improvements. Results: This study includes a detailed description of the IPV CDS tool. Findings indicated that the tool was generally well received by physicians, who indicated good tool usability (SUS score: mean 77.5, SD 12.75). They found the tool clinically useful, needed in their practice, and feasible to implement in primary care. They emphasized that it increased their confidence in managing patients reporting IPV but expressed concerns regarding its length, workflow integration, flexibility, and specificity of information. Several physicians, for example, found the tool too time consuming when encountering patients at high risk; they suggested multiple uses of the tool (eg, an educational tool for less-experienced health care providers and a checklist for more-experienced health care providers) and including more detailed information (eg, a list of local shelters). Conclusions: Physician feedback on the IPV CDS tool is encouraging and will be used to improve the tool. This study offers an example of an IPV CDS tool that clinics can adapt to potentially enhance the quality and efficiency of their IPV screening and intervention process. Additional research is needed to determine the tool's clinical utility in improving IPV screening and intervention rates and patient outcomes (eg, increased patient safety, reduced IPV risk, and increased referrals to mental health treatment). ", doi="10.2196/57633", url="https://formative.jmir.org/2024/1/e57633" } @Article{info:doi/10.2196/55546, author="Brunton, Lisa and Cotterill, Sarah and Wilson, Paul", title="Evaluating the National Rollout of a Type 2 Diabetes Self-Management Intervention: Qualitative Interview Study With Local National Health Service Leads Responsible for Implementation", journal="J Med Internet Res", year="2024", month="Sep", day="25", volume="26", pages="e55546", keywords="type 2 diabetes", keywords="structured education", keywords="self-management", keywords="digital interventions", keywords="implementation", keywords="qualitative methods", keywords="evaluation", abstract="Background: Approximately 4.5 million people live with type 2 diabetes mellitus (T2DM) in the United Kingdom. Evidence shows that structured education programs can improve glycemic control and reduce the risk of complications from T2DM, but they have low attendance rates. To widen access to T2DM structured education, National Health Service England commissioned a national rollout of Healthy Living, a digital self-management program. Objective: The objectives were to understand the barriers and enablers to adopting, implementing, and integrating Healthy Living into existing T2DM care pathways across England. Methods: We undertook a cross-sectional, qualitative telephone semistructured interview study to address the objectives. In total, 17 local National Health Service leads responsible for implementing Healthy Living across their locality were recruited. We conducted 16 one-time interviews across 16 case sites (1 of the interviews was conducted with 2 local leads from the same case site). Interview data were analyzed using thematic analysis. Results: Three overarching themes were generated: (1) implementation activities, (2) where Healthy Living fits within existing pathways, and (3) contextual factors affecting implementation. Of the 16 sites, 14 (88\%) were implementing Healthy Living; the barrier to not implementing it in 2 case sites was not wanting Healthy Living to compete with their current education provision for T2DM. We identified 6 categories of implementation activities across sites: communication strategies to raise awareness of Healthy Living, developing bespoke local resources to support general practices with referrals, providing financial reimbursement or incentives to general practices, promoting Healthy Living via public events, monitoring implementation across their footprint, and widening access across high-need groups. However, outside early engagement sites, most implementation activities were ``light touch,'' consisting mainly of one-way communications to raise awareness. Local leads were generally positive about Healthy Living as an additional part of their T2DM structured education programs, but some felt it was more suited to specific patient groups. Barriers to undertaking more prolonged, targeted implementation campaigns included implementation not being mandated, sites not receiving data on uptake across their footprint, and confusion in understanding where Healthy Living fit within existing care pathways. Conclusions: A passive process of disseminating information about Healthy Living to general practices rather than an active process of implementation occurred across most sites sampled. This study identified that there is a need for clearer communications regarding the type of patients that may benefit from the Healthy Living program, including when it should be offered and whether it should be offered instead of or in addition to other education programs. No sites other than early engagement sites received data to monitor uptake across their footprint. Understanding variability in uptake across practices may have enabled sites to plan targeted referral campaigns in practices that were not using the service. ", doi="10.2196/55546", url="https://www.jmir.org/2024/1/e55546" } @Article{info:doi/10.2196/49387, author="MacNeill, Luke A. and MacNeill, Lillian and Luke, Alison and Doucet, Shelley", title="Health Professionals' Views on the Use of Conversational Agents for Health Care: Qualitative Descriptive Study", journal="J Med Internet Res", year="2024", month="Sep", day="25", volume="26", pages="e49387", keywords="conversational agents", keywords="chatbots", keywords="health care", keywords="health professionals", keywords="health personnel", keywords="qualitative", keywords="interview", abstract="Background: In recent years, there has been an increase in the use of conversational agents for health promotion and service delivery. To date, health professionals' views on the use of this technology have received limited attention in the literature. Objective: The purpose of this study was to gain a better understanding of how health professionals view the use of conversational agents for health care. Methods: Physicians, nurses, and regulated mental health professionals were recruited using various web-based methods. Participants were interviewed individually using the Zoom (Zoom Video Communications, Inc) videoconferencing platform. Interview questions focused on the potential benefits and risks of using conversational agents for health care, as well as the best way to integrate conversational agents into the health care system. Interviews were transcribed verbatim and uploaded to NVivo (version 12; QSR International, Inc) for thematic analysis. Results: A total of 24 health professionals participated in the study (19 women, 5 men; mean age 42.75, SD 10.71 years). Participants said that the use of conversational agents for health care could have certain benefits, such as greater access to care for patients or clients and workload support for health professionals. They also discussed potential drawbacks, such as an added burden on health professionals (eg, program familiarization) and the limited capabilities of these programs. Participants said that conversational agents could be used for routine or basic tasks, such as screening and assessment, providing information and education, and supporting individuals between appointments. They also said that health professionals should have some oversight in terms of the development and implementation of these programs. Conclusions: The results of this study provide insight into health professionals' views on the use of conversational agents for health care, particularly in terms of the benefits and drawbacks of these programs and how they should be integrated into the health care system. These collective findings offer useful information and guidance to stakeholders who have an interest in the development and implementation of this technology. ", doi="10.2196/49387", url="https://www.jmir.org/2024/1/e49387", url="http://www.ncbi.nlm.nih.gov/pubmed/39320936" } @Article{info:doi/10.2196/58845, author="Boege, Selina and Milne-Ives, Madison and Meinert, Edward and Carroll, Camille", title="Self-Management Systems for Patients and Clinicians in Parkinson Care: Protocol for an Integrated Scoping Review, Product Search, and Evaluation", journal="JMIR Res Protoc", year="2024", month="Sep", day="24", volume="13", pages="e58845", keywords="Parkinson's disease", keywords="digital health", keywords="self-management", keywords="health care systems", keywords="self-care", keywords="Parkinson", keywords="mobile health", keywords="mHeath", keywords="evaluation", keywords="acceptability", keywords="usability", keywords="decision-making support", keywords="database", keywords="qualitative", keywords="quantitative", keywords="mixed method", keywords="perception", abstract="Background: Parkinson disease (PD) poses emotional and financial challenges to patients, families, caregivers, and health care systems. Self-management systems show promise in empowering people with PD and enabling more control over their treatment. The collaborative nature of PD care requires communication between patients and health care professionals. While past reviews explored self-management systems in PD diagnosis and symptom management with a focus on patient portals, there is limited research addressing the interconnectivity of systems catering to the needs of both patients and clinicians. A system's acceptability and usability for clinicians are pivotal for enabling comprehensive data collection and supporting clinical decision-making, which can enhance patient care and treatment outcomes. Objective: This review study aims to assess PD self-management systems that include a clinician portal and to determine which features enhance acceptability and usability for clinicians. The primary aim is to assess evidence of clinicians' acceptability and usability of self-management systems with a focus on the integration of systems into clinical workflows, data collection points, monitoring, clinical decision-making support, and extended education and training. Methods: The review will entail 3 separate stages: a literature review following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, a product search, and an evaluation of the level of evidence for the identified products. For the first stage, 5 databases will be searched: PubMed, CINAHL, Scopus, ACM digital library, and IEEE Xplore. Studies eligible for inclusion will be qualitative, quantitative, and mixed methods studies examining patients' and clinician's perceptions of the acceptability and usability of digital health interventions, synthesized by a narrative qualitative analysis. A web search in the iOS Apple App Store and Android Google Play Store will identify currently available tools; the level of evidence for these will then be assessed using the Oxford Centre for Evidence-Based Medicine guidelines. Results: Literature search and screening began soon after submission of the protocol, and the review is expected to be completed by end of September 2024. Conclusions: This review will examine currently available self-management systems in PD care, focusing on their acceptability and usability. This is significant because there is limited research addressing the integration of clinicians into these systems. The findings from this study may provide critical knowledge and insight to help inform future research and will contribute to the design of self-management systems that promote collaborative efforts in PD care. International Registered Report Identifier (IRRID): PRR1-10.2196/58845 ", doi="10.2196/58845", url="https://www.researchprotocols.org/2024/1/e58845" } @Article{info:doi/10.2196/57926, author="Prakash, Ravi and Dupre, E. Matthew and {\O}stbye, Truls and Xu, Hanzhang", title="Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study", journal="JMIR Aging", year="2024", month="Sep", day="24", volume="7", pages="e57926", keywords="electronic health record", keywords="EHR", keywords="electric medical record", keywords="EMR", keywords="patient record", keywords="health record", keywords="personal health record", keywords="PHR", keywords="unstructured data", keywords="rule based analysis", keywords="artificial intelligence", keywords="AI", keywords="large language model", keywords="LLM", keywords="natural language processing", keywords="NLP", keywords="deep learning", keywords="Alzheimer's disease and related dementias", keywords="AD", keywords="ADRD", keywords="Alzheimer's disease", keywords="dementia", keywords="geriatric syndromes", abstract="Background: The severity of Alzheimer disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHRs). Although this information is important for clinical monitoring and decision-making, it is often undocumented or ``hidden'' in unstructured text fields and not readily available for clinicians to act upon. Objective: We aimed to assess the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data. Methods: We used EHR data from a large academic health care system that included patients with a primary discharge diagnosis of ADRD based on ICD-9 (International Classification of Diseases, Ninth Revision) and ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes between 2014 and 2019. We first assessed the presence of ADRD severity information and then the severity of ADRD in the EHR. Clinicians' notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment and (2) explicit terms for ADRD severity (eg, ``mild dementia'' and ``advanced Alzheimer disease''). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on previous literature and clinical expertise. Subsequently, we used rule-based matching in Python using standard open-source data analysis libraries to identify the context in which specific words or phrases were mentioned. We estimated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm. Results: We included 9115 eligible patients with over 65,000 notes from the providers. Overall, 22.93\% (2090/9115) of patients were documented with mild ADRD, 20.87\% (1902/9115) were documented with moderate or severe ADRD, and 56.20\% (5123/9115) did not have any documentation of the severity of their ADRD. For the task of determining the presence of any ADRD severity information, our algorithm achieved an accuracy of >95\%, specificity of >95\%, sensitivity of >90\%, and an F1-score of >83\%. For the specific task of identifying the actual severity of ADRD, the algorithm performed well with an accuracy of >91\%, specificity of >80\%, sensitivity of >88\%, and F1-score of >92\%. Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to contain older, more likely female, and Black patients, and having received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural or urban residence. Conclusions: Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it is essential to acknowledge potential biases arising from differences in documentation practices across various health care systems. ", doi="10.2196/57926", url="https://aging.jmir.org/2024/1/e57926", url="http://www.ncbi.nlm.nih.gov/pubmed/39316421" } @Article{info:doi/10.2196/59392, author="Brahma, Arindam and Chatterjee, Samir and Seal, Kala and Fitzpatrick, Ben and Tao, Youyou", title="Development of a Cohort Analytics Tool for Monitoring Progression Patterns in Cardiovascular Diseases: Advanced Stochastic Modeling Approach", journal="JMIR Med Inform", year="2024", month="Sep", day="24", volume="12", pages="e59392", keywords="healthcare analytics", keywords="eHealth", keywords="disease monitoring", keywords="cardiovascular disease", keywords="disease progression model", keywords="myocardial", keywords="stroke", keywords="decision support", keywords="continuous-time Markov chain model", keywords="stochastic model", keywords="stochastic", keywords="Markov", keywords="cardiology", keywords="cardiovascular", keywords="heart", keywords="monitoring", keywords="progression", abstract="Background: The World Health Organization (WHO) reported that cardiovascular diseases (CVDs) are the leading cause of death worldwide. CVDs are chronic, with complex progression patterns involving episodes of comorbidities and multimorbidities. When dealing with chronic diseases, physicians often adopt a ``watchful waiting'' strategy, and actions are postponed until information is available. Population-level transition probabilities and progression patterns can be revealed by applying time-variant stochastic modeling methods to longitudinal patient data from cohort studies. Inputs from CVD practitioners indicate that tools to generate and visualize cohort transition patterns have many impactful clinical applications. The resultant computational model can be embedded in digital decision support tools for clinicians. However, to date, no study has attempted to accomplish this for CVDs. Objective: This study aims to apply advanced stochastic modeling methods to uncover the transition probabilities and progression patterns from longitudinal episodic data of patient cohorts with CVD and thereafter use the computational model to build a digital clinical cohort analytics artifact demonstrating the actionability of such models. Methods: Our data were sourced from 9 epidemiological cohort studies by the National Heart Lung and Blood Institute and comprised chronological records of 1274 patients associated with 4839 CVD episodes across 16 years. We then used the continuous-time Markov chain method to develop our model, which offers a robust approach to time-variant transitions between disease states in chronic diseases. Results: Our study presents time-variant transition probabilities of CVD state changes, revealing patterns of CVD progression against time. We found that the transition from myocardial infarction (MI) to stroke has the fastest transition rate (mean transition time 3, SD 0 days, because only 1 patient had a MI-to-stroke transition in the dataset), and the transition from MI to angina is the slowest (mean transition time 1457, SD 1449 days). Congestive heart failure is the most probable first episode (371/840, 44.2\%), followed by stroke (216/840, 25.7\%). The resultant artifact is actionable as it can act as an eHealth cohort analytics tool, helping physicians gain insights into treatment and intervention strategies. Through expert panel interviews and surveys, we found 9 application use cases of our model. Conclusions: Past research does not provide actionable cohort-level decision support tools based on a comprehensive, 10-state, continuous-time Markov chain model to unveil complex CVD progression patterns from real-world patient data and support clinical decision-making. This paper aims to address this crucial limitation. Our stochastic model--embedded artifact can help clinicians in efficient disease monitoring and intervention decisions, guided by objective data-driven insights from real patient data. Furthermore, the proposed model can unveil progression patterns of any chronic disease of interest by inputting only 3 data elements: a synthetic patient identifier, episode name, and episode time in days from a baseline date. ", doi="10.2196/59392", url="https://medinform.jmir.org/2024/1/e59392", url="http://www.ncbi.nlm.nih.gov/pubmed/39316426" } @Article{info:doi/10.2196/52678, author="Stanhope, Victoria and Yoo, Nari and Matthews, Elizabeth and Baslock, Daniel and Hu, Yuanyuan", title="The Impact of Collaborative Documentation on Person-Centered Care: Textual Analysis of Clinical Notes", journal="JMIR Med Inform", year="2024", month="Sep", day="20", volume="12", pages="e52678", keywords="person-centered care", keywords="collaborative documentation", keywords="natural language processing", keywords="concurrent documentation", keywords="clinical documentations", keywords="visit notes", keywords="community", keywords="health center", keywords="mental health center", keywords="textual analysis", keywords="clinical informatics", keywords="behavioral health", keywords="mental health", keywords="linguistic", keywords="linguistic inquiry", keywords="dictionary-based", keywords="sentence fragment", keywords="psychology", keywords="psychological", keywords="clinical information", keywords="decision-making", keywords="mental health services", keywords="clinical notes", keywords="NLP", abstract="Background: Collaborative documentation (CD) is a behavioral health practice involving shared writing of clinic visit notes by providers and consumers. Despite widespread dissemination of CD, research on its effectiveness or impact on person-centered care (PCC) has been limited. Principles of PCC planning, a recovery-based approach to service planning that operationalizes PCC, can inform the measurement of person-centeredness within clinical documentation. Objective: This study aims to use the clinical informatics approach of natural language processing (NLP) to examine the impact of CD on person-centeredness in clinic visit notes. Using a dictionary-based approach, this study conducts a textual analysis of clinic notes from a community mental health center before and after staff were trained in CD. Methods: This study used visit notes (n=1981) from 10 providers in a community mental health center 6 months before and after training in CD. LIWC-22 was used to assess all notes using the Linguistic Inquiry and Word Count (LIWC) dictionary, which categorizes over 5000 linguistic and psychological words. Twelve LIWC categories were selected and mapped onto PCC planning principles through the consensus of 3 domain experts. The LIWC-22 contextualizer was used to extract sentence fragments from notes corresponding to LIWC categories. Then, fixed-effects modeling was used to identify differences in notes before and after CD training while accounting for nesting within the provider. Results: Sentence fragments identified by the contextualizing process illustrated how visit notes demonstrated PCC. The fixed effects analysis found a significant positive shift toward person-centeredness; this was observed in 6 of the selected LIWC categories post CD. Specifically, there was a notable increase in words associated with achievement ($\beta$=.774, P<.001), power ($\beta$=.831, P<.001), money ($\beta$=.204, P<.001), physical health ($\beta$=.427, P=.03), while leisure words decreased ($\beta$=?.166, P=.002). Conclusions: By using a dictionary-based approach, the study identified how CD might influence the integration of PCC principles within clinical notes. Although the results were mixed, the findings highlight the potential effectiveness of CD in enhancing person-centeredness in clinic notes. By leveraging NLP techniques, this research illuminated the value of narrative clinical notes in assessing the quality of care in behavioral health contexts. These findings underscore the promise of NLP for quality assurance in health care settings and emphasize the need for refining algorithms to more accurately measure PCC. ", doi="10.2196/52678", url="https://medinform.jmir.org/2024/1/e52678" } @Article{info:doi/10.2196/58278, author="Dai, Hong-Jie and Wang, Chen-Kai and Chen, Chien-Chang and Liou, Chong-Sin and Lu, An-Tai and Lai, Chia-Hsin and Shain, Bo-Tsz and Ke, Cheng-Rong and Wang, Chung William Yu and Mir, Hussain Tatheer and Simanjuntak, Mutiara and Kao, Hao-Yun and Tsai, Ming-Ju and Tseng, S. Vincent", title="Evaluating a Natural Language Processing--Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study", journal="J Med Internet Res", year="2024", month="Sep", day="20", volume="26", pages="e58278", keywords="natural language processing", keywords="International Classification of Diseases", keywords="deep learning", keywords="electronic medical record", keywords="Taiwan diagnosis related groups", abstract="Background: International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone to errors. With the transition from the International Classification of Diseases, Ninth Revision, to the International Classification of Diseases, Tenth Revision (ICD-10), the coding process has become more complex, highlighting the need for automated approaches to enhance coding efficiency and accuracy. Inaccurate coding can result in substantial financial losses for hospitals, and a precise assessment of outcomes generated by a natural language processing (NLP)--driven autocoding system thus assumes a critical role in safeguarding the accuracy of the Taiwan diagnosis related groups (Tw-DRGs). Objective: This study aims to evaluate the feasibility of applying an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM), autocoding system that can automatically determine diagnoses and codes based on free-text discharge summaries to facilitate the assessment of Tw-DRGs, specifically principal diagnosis and major diagnostic categories (MDCs). Methods: By using the patient discharge summaries from Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUCHH) from April 2019 to December 2020 as a reference data set we developed artificial intelligence (AI)--assisted ICD-10-CM coding systems based on deep learning models. We constructed a web-based user interface for the AI-assisted coding system and deployed the system to the workflow of the certified coding specialists (CCSs) of KMUCHH. The data used for the assessment of Tw-DRGs were manually curated by a CCS with the principal diagnosis and MDC was determined from discharge summaries collected at KMUCHH from February 2023 to April 2023. Results: Both the reference data set and real hospital data were used to assess performance in determining ICD-10-CM coding, principal diagnosis, and MDC for Tw-DRGs. Among all methods, the GPT-2 (OpenAI)-based model achieved the highest F1-score, 0.667 (F1-score 0.851 for the top 50 codes), on the KMUCHH test set and a slightly lower F1-score, 0.621, in real hospital data. Cohen $\kappa$ evaluation for the agreement of MDC between the models and the CCS revealed that the overall average $\kappa$ value for GPT-2 ($\kappa$=0.714) was approximately 12.2 percentage points higher than that of the hierarchy attention network ($\kappa$=0.592). GPT-2 demonstrated superior agreement with the CCS across 6 categories of MDC, with an average $\kappa$ value of approximately 0.869 (SD 0.033), underscoring the effectiveness of the developed AI-assisted coding system in supporting the work of CCSs. Conclusions: An NLP-driven AI-assisted coding system can assist CCSs in ICD-10-CM coding by offering coding references via a user interface, demonstrating the potential to reduce the manual workload and expedite Tw-DRG assessment. Consistency in performance affirmed the effectiveness of the system in supporting CCSs in ICD-10-CM coding and the judgment of Tw-DRGs. ", doi="10.2196/58278", url="https://www.jmir.org/2024/1/e58278" } @Article{info:doi/10.2196/62890, author="Kim, Kwan Yun and Seo, Won-Doo and Lee, Jung Sun and Koo, Hyung Ja and Kim, Chul Gyung and Song, Seok Hee and Lee, Minji", title="Early Prediction of Cardiac Arrest in the Intensive Care Unit Using Explainable Machine Learning: Retrospective Study", journal="J Med Internet Res", year="2024", month="Sep", day="17", volume="26", pages="e62890", keywords="early cardiac arrest warning system", keywords="electric medical record", keywords="explainable clinical decision support system", keywords="pseudo-real-time evaluation", keywords="ensemble learning", keywords="cost-sensitive learning", abstract="Background: Cardiac arrest (CA) is one of the leading causes of death among patients in the intensive care unit (ICU). Although many CA prediction models with high sensitivity have been developed to anticipate CA, their practical application has been challenging due to a lack of generalization and validation. Additionally, the heterogeneity among patients in different ICU subtypes has not been adequately addressed. Objective: This study aims to propose a clinically interpretable ensemble approach for the timely and accurate prediction of CA within 24 hours, regardless of patient heterogeneity, including variations across different populations and ICU subtypes. Additionally, we conducted patient-independent evaluations to emphasize the model's generalization performance and analyzed interpretable results that can be readily adopted by clinicians in real-time. Methods: Patients were retrospectively analyzed using data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) and the eICU-Collaborative Research Database (eICU-CRD). To address the problem of underperformance, we constructed our framework using feature sets based on vital signs, multiresolution statistical analysis, and the Gini index, with a 12-hour window to capture the unique characteristics of CA. We extracted 3 types of features from each database to compare the performance of CA prediction between high-risk patient groups from MIMIC-IV and patients without CA from eICU-CRD. After feature extraction, we developed a tabular network (TabNet) model using feature screening with cost-sensitive learning. To assess real-time CA prediction performance, we used 10-fold leave-one-patient-out cross-validation and a cross--data set method. We evaluated MIMIC-IV and eICU-CRD across different cohort populations and subtypes of ICU within each database. Finally, external validation using the eICU-CRD and MIMIC-IV databases was conducted to assess the model's generalization ability. The decision mask of the proposed method was used to capture the interpretability of the model. Results: The proposed method outperformed conventional approaches across different cohort populations in both MIMIC-IV and eICU-CRD. Additionally, it achieved higher accuracy than baseline models for various ICU subtypes within both databases. The interpretable prediction results can enhance clinicians' understanding of CA prediction by serving as a statistical comparison between non-CA and CA groups. Next, we tested the eICU-CRD and MIMIC-IV data sets using models trained on MIMIC-IV and eICU-CRD, respectively, to evaluate generalization ability. The results demonstrated superior performance compared with baseline models. Conclusions: Our novel framework for learning unique features provides stable predictive power across different ICU environments. Most of the interpretable global information reveals statistical differences between CA and non-CA groups, demonstrating its utility as an indicator for clinical decisions. Consequently, the proposed CA prediction system is a clinically validated algorithm that enables clinicians to intervene early based on CA prediction information and can be applied to clinical trials in digital health. ", doi="10.2196/62890", url="https://www.jmir.org/2024/1/e62890" } @Article{info:doi/10.2196/54737, author="Lin, Xinnian and Liang, Chen and Liu, Jihong and Lyu, Tianchu and Ghumman, Nadia and Campbell, Berry", title="Artificial Intelligence--Augmented Clinical Decision Support Systems for Pregnancy Care: Systematic Review", journal="J Med Internet Res", year="2024", month="Sep", day="16", volume="26", pages="e54737", keywords="artificial intelligence", keywords="biomedical ontologies", keywords="clinical decision support systems", keywords="implementation science", keywords="obstetrics", keywords="pregnancy", keywords="AI", keywords="systematic review", keywords="CDSS", keywords="functionality", keywords="methodology", keywords="implementation", keywords="database query", keywords="database queries", keywords="bibliography", keywords="record", keywords="records", keywords="eligibility", keywords="literature review", keywords="prenatal", keywords="early pregnancy", keywords="obstetric care", keywords="postpartum care", keywords="pregnancy care", keywords="diagnostic support", keywords="clinical prediction", keywords="knowledge base", keywords="therapeutic", keywords="therapeutics", keywords="recommendation", keywords="recommendations", keywords="diagnosis", keywords="abnormality", keywords="abnormalities", keywords="cost-effective", keywords="surveillance", keywords="ultrasound", keywords="ontology", abstract="Background: Despite the emerging application of clinical decision support systems (CDSS) in pregnancy care and the proliferation of artificial intelligence (AI) over the last decade, it remains understudied regarding the role of AI in CDSS specialized for pregnancy care. Objective: To identify and synthesize AI-augmented CDSS in pregnancy care, CDSS functionality, AI methodologies, and clinical implementation, we reported a systematic review based on empirical studies that examined AI-augmented CDSS in pregnancy care. Methods: We retrieved studies that examined AI-augmented CDSS in pregnancy care using database queries involved with titles, abstracts, keywords, and MeSH (Medical Subject Headings) terms. Bibliographic records from their inception to 2022 were retrieved from PubMed/MEDLINE (n=206), Embase (n=101), and ACM Digital Library (n=377), followed by eligibility screening and literature review. The eligibility criteria include empirical studies that (1) developed or tested AI methods, (2) developed or tested CDSS or CDSS components, and (3) focused on pregnancy care. Data of studies used for review and appraisal include title, abstract, keywords, MeSH terms, full text, and supplements. Publications with ancillary information or overlapping outcomes were synthesized as one single study. Reviewers independently reviewed and assessed the quality of selected studies. Results: We identified 30 distinct studies of 684 studies from their inception to 2022. Topics of clinical applications covered AI-augmented CDSS from prenatal, early pregnancy, obstetric care, and postpartum care. Topics of CDSS functions include diagnostic support, clinical prediction, therapeutics recommendation, and knowledge base. Conclusions: Our review acknowledged recent advances in CDSS studies including early diagnosis of prenatal abnormalities, cost-effective surveillance, prenatal ultrasound support, and ontology development. To recommend future directions, we also noted key gaps from existing studies, including (1) decision support in current childbirth deliveries without using observational data from consequential fetal or maternal outcomes in future pregnancies; (2) scarcity of studies in identifying several high-profile biases from CDSS, including social determinants of health highlighted by the American College of Obstetricians and Gynecologists; and (3) chasm between internally validated CDSS models, external validity, and clinical implementation. ", doi="10.2196/54737", url="https://www.jmir.org/2024/1/e54737" } @Article{info:doi/10.2196/52490, author="Soe, Nyi Nyi and Yu, Zhen and Latt, Mon Phyu and Lee, David and Samra, Singh Ranjit and Ge, Zongyuan and Rahman, Rashidur and Sun, Jiajun and Ong, J. Jason and Fairley, K. Christopher and Zhang, Lei", title="Using AI to Differentiate Mpox From Common Skin Lesions in a Sexual Health Clinic: Algorithm Development and Validation Study", journal="J Med Internet Res", year="2024", month="Sep", day="13", volume="26", pages="e52490", keywords="mpox", keywords="sexually transmitted infections", keywords="artificial intelligence", keywords="deep learning", keywords="skin lesion", abstract="Background: The 2022 global outbreak of mpox has significantly impacted health facilities, and necessitated additional infection prevention and control measures and alterations to clinic processes. Early identification of suspected mpox cases will assist in mitigating these impacts. Objective: We aimed to develop and evaluate an artificial intelligence (AI)--based tool to differentiate mpox lesion images from other skin lesions seen in a sexual health clinic. Methods: We used a data set with 2200 images, that included mpox and non-mpox lesions images, collected from Melbourne Sexual Health Centre and web resources. We adopted deep learning approaches which involved 6 different deep learning architectures to train our AI models. We subsequently evaluated the performance of each model using a hold-out data set and an external validation data set to determine the optimal model for differentiating between mpox and non-mpox lesions. Results: The DenseNet-121 model outperformed other models with an overall area under the receiver operating characteristic curve (AUC) of 0.928, an accuracy of 0.848, a precision of 0.942, a recall of 0.742, and an F1-score of 0.834. Implementation of a region of interest approach significantly improved the performance of all models, with the AUC for the DenseNet-121 model increasing to 0.982. This approach resulted in an increase in the correct classification of mpox images from 79\% (55/70) to 94\% (66/70). The effectiveness of this approach was further validated by a visual analysis with gradient-weighted class activation mapping, demonstrating a reduction in false detection within the background of lesion images. On the external validation data set, ResNet-18 and DenseNet-121 achieved the highest performance. ResNet-18 achieved an AUC of 0.990 and an accuracy of 0.947, and DenseNet-121 achieved an AUC of 0.982 and an accuracy of 0.926. Conclusions: Our study demonstrated it was possible to use an AI-based image recognition algorithm to accurately differentiate between mpox and common skin lesions. Our findings provide a foundation for future investigations aimed at refining the algorithm and establishing the place of such technology in a sexual health clinic. ", doi="10.2196/52490", url="https://www.jmir.org/2024/1/e52490" } @Article{info:doi/10.2196/56729, author="Raff, Daniel and Stewart, Kurtis and Yang, Christie Michelle and Shang, Jessie and Cressman, Sonya and Tam, Roger and Wong, Jessica and Tammem{\"a}gi, C. Martin and Ho, Kendall", title="Improving Triage Accuracy in Prehospital Emergency Telemedicine: Scoping Review of Machine Learning--Enhanced Approaches", journal="Interact J Med Res", year="2024", month="Sep", day="11", volume="13", pages="e56729", keywords="telemedicine", keywords="machine learning", keywords="emergency medicine", keywords="artificial intelligence", keywords="chatbot", keywords="triage", keywords="scoping review", keywords="prehospital", abstract="Background: Prehospital telemedicine triage systems combined with machine learning (ML) methods have the potential to improve triage accuracy and safely redirect low-acuity patients from attending the emergency department. However, research in prehospital settings is limited but needed; emergency department overcrowding and adverse patient outcomes are increasingly common. Objective: In this scoping review, we sought to characterize the existing methods for ML-enhanced telemedicine emergency triage. In order to support future research, we aimed to delineate what data sources, predictors, labels, ML models, and performance metrics were used, and in which telemedicine triage systems these methods were applied. Methods: A scoping review was conducted, querying multiple databases (MEDLINE, PubMed, Scopus, and IEEE Xplore) through February 24, 2023, to identify potential ML-enhanced methods, and for those eligible, relevant study characteristics were extracted, including prehospital triage setting, types of predictors, ground truth labeling method, ML models used, and performance metrics. Inclusion criteria were restricted to the triage of emergency telemedicine services using ML methods on an undifferentiated (disease nonspecific) population. Only primary research studies in English were considered. Furthermore, only those studies using data collected remotely (as opposed to derived from physical assessments) were included. In order to limit bias, we exclusively included articles identified through our predefined search criteria and had 3 researchers (DR, JS, and KS) independently screen the resulting studies. We conducted a narrative synthesis of findings to establish a knowledge base in this domain and identify potential gaps to be addressed in forthcoming ML-enhanced methods. Results: A total of 165 unique records were screened for eligibility and 15 were included in the review. Most studies applied ML methods during emergency medical dispatch (7/15, 47\%) or used chatbot applications (5/15, 33\%). Patient demographics and health status variables were the most common predictors, with a notable absence of social variables. Frequently used ML models included support vector machines and tree-based methods. ML-enhanced models typically outperformed conventional triage algorithms, and we found a wide range of methods used to establish ground truth labels. Conclusions: This scoping review observed heterogeneity in dataset size, predictors, clinical setting (triage process), and reported performance metrics. Standard structured predictors, including age, sex, and comorbidities, across articles suggest the importance of these inputs; however, there was a notable absence of other potentially useful data, including medications, social variables, and health system exposure. Ground truth labeling practices should be reported in a standard fashion as the true model performance hinges on these labels. This review calls for future work to form a standardized framework, thereby supporting consistent reporting and performance comparisons across ML-enhanced prehospital triage systems. ", doi="10.2196/56729", url="https://www.i-jmr.org/2024/1/e56729" } @Article{info:doi/10.2196/56935, author="McLoughlin, E. Daniel and Moreno Echevarria, M. Fabiola and Badawy, M. Sherif", title="Lessons Learned From Shared Decision-Making With Oral Anticoagulants: Viewpoint on Suggestions for the Development of Oral Chemotherapy Decision Aids", journal="JMIR Cancer", year="2024", month="Sep", day="11", volume="10", pages="e56935", keywords="shared decision-making", keywords="SDM", keywords="decision aids", keywords="decision aids design", keywords="oral chemotherapy", keywords="oral anticoagulants", keywords="drug delivery", keywords="chemotherapy", keywords="chemo", keywords="anticoagulants", keywords="drug deliveries", keywords="cancer", keywords="oncology", keywords="oncologist", keywords="metastases", keywords="literature review", keywords="literature reviews", doi="10.2196/56935", url="https://cancer.jmir.org/2024/1/e56935", url="http://www.ncbi.nlm.nih.gov/pubmed/39187430" } @Article{info:doi/10.2196/57949, author="Zheng, Chengyi and Ackerson, Bradley and Qiu, Sijia and Sy, S. Lina and Daily, Vega Leticia I. and Song, Jeannie and Qian, Lei and Luo, Yi and Ku, H. Jennifer and Cheng, Yanjun and Wu, Jun and Tseng, Fu Hung", title="Natural Language Processing Versus Diagnosis Code--Based Methods for Postherpetic Neuralgia Identification: Algorithm Development and Validation", journal="JMIR Med Inform", year="2024", month="Sep", day="10", volume="12", pages="e57949", keywords="postherpetic neuralgia", keywords="herpes zoster", keywords="natural language processing", keywords="electronic health record", keywords="real-world data", keywords="artificial intelligence", keywords="development", keywords="validation", keywords="diagnosis", keywords="EHR", keywords="algorithm", keywords="EHR data", keywords="sensitivity", keywords="specificity", keywords="validation data", keywords="neuralgia", keywords="recombinant zoster vaccine", abstract="Background: Diagnosis codes and prescription data are used in algorithms to identify postherpetic neuralgia (PHN), a debilitating complication of herpes zoster (HZ). Because of the questionable accuracy of codes and prescription data, manual chart review is sometimes used to identify PHN in electronic health records (EHRs), which can be costly and time-consuming. Objective: This study aims to develop and validate a natural language processing (NLP) algorithm for automatically identifying PHN from unstructured EHR data and to compare its performance with that of code-based methods. Methods: This retrospective study used EHR data from Kaiser Permanente Southern California, a large integrated health care system that serves over 4.8 million members. The source population included members aged ?50 years who received an incident HZ diagnosis and accompanying antiviral prescription between 2018 and 2020 and had ?1 encounter within 90?180 days of the incident HZ diagnosis. The study team manually reviewed the EHR and identified PHN cases. For NLP development and validation, 500 and 800 random samples from the source population were selected, respectively. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-score, and Matthews correlation coefficient (MCC) of NLP and the code-based methods were evaluated using chart-reviewed results as the reference standard. Results: The NLP algorithm identified PHN cases with a 90.9\% sensitivity, 98.5\% specificity, 82\% PPV, and 99.3\% NPV. The composite scores of the NLP algorithm were 0.89 (F-score) and 0.85 (MCC). The prevalences of PHN in the validation data were 6.9\% (reference standard), 7.6\% (NLP), and 5.4\%?13.1\% (code-based). The code-based methods achieved a 52.7\%?61.8\% sensitivity, 89.8\%?98.4\% specificity, 27.6\%?72.1\% PPV, and 96.3\%?97.1\% NPV. The F-scores and MCCs ranged between 0.45 and 0.59 and between 0.32 and 0.61, respectively. Conclusions: The automated NLP-based approach identified PHN cases from the EHR with good accuracy. This method could be useful in population-based PHN research. ", doi="10.2196/57949", url="https://medinform.jmir.org/2024/1/e57949" } @Article{info:doi/10.2196/59711, author="Wang, Zhaoxin and Yang, Wenwen and Li, Zhengyu and Rong, Ze and Wang, Xing and Han, Jincong and Ma, Lei", title="A 25-Year Retrospective of the Use of AI for Diagnosing Acute Stroke: Systematic Review", journal="J Med Internet Res", year="2024", month="Sep", day="10", volume="26", pages="e59711", keywords="acute stroke", keywords="artificial intelligence", keywords="AI", keywords="machine learning", keywords="deep learning", keywords="stroke lesion segmentation and classification", keywords="stroke prediction", keywords="stroke prognosis", abstract="Background: Stroke is a leading cause of death and disability worldwide. Rapid and accurate diagnosis is crucial for minimizing brain damage and optimizing treatment plans. Objective: This review aims to summarize the methods of artificial intelligence (AI)--assisted stroke diagnosis over the past 25 years, providing an overview of performance metrics and algorithm development trends. It also delves into existing issues and future prospects, intending to offer a comprehensive reference for clinical practice. Methods: A total of 50 representative articles published between 1999 and 2024 on using AI technology for stroke prevention and diagnosis were systematically selected and analyzed in detail. Results: AI-assisted stroke diagnosis has made significant advances in stroke lesion segmentation and classification, stroke risk prediction, and stroke prognosis. Before 2012, research mainly focused on segmentation using traditional thresholding and heuristic techniques. From 2012 to 2016, the focus shifted to machine learning (ML)--based approaches. After 2016, the emphasis moved to deep learning (DL), which brought significant improvements in accuracy. In stroke lesion segmentation and classification as well as stroke risk prediction, DL has shown superiority over ML. In stroke prognosis, both DL and ML have shown good performance. Conclusions: Over the past 25 years, AI technology has shown promising performance in stroke diagnosis. ", doi="10.2196/59711", url="https://www.jmir.org/2024/1/e59711" } @Article{info:doi/10.2196/54985, author="Liu, Jiayu and Liang, Xiuting and Fang, Dandong and Zheng, Jiqi and Yin, Chengliang and Xie, Hui and Li, Yanteng and Sun, Xiaochun and Tong, Yue and Che, Hebin and Hu, Ping and Yang, Fan and Wang, Bingxian and Chen, Yuanyuan and Cheng, Gang and Zhang, Jianning", title="The Diagnostic Ability of GPT-3.5 and GPT-4.0 in Surgery: Comparative Analysis", journal="J Med Internet Res", year="2024", month="Sep", day="10", volume="26", pages="e54985", keywords="ChatGPT", keywords="accuracy rates", keywords="artificial intelligence", keywords="diagnosis", keywords="surgeon", abstract="Background: ChatGPT (OpenAI) has shown great potential in clinical diagnosis and could become an excellent auxiliary tool in clinical practice. This study investigates and evaluates ChatGPT in diagnostic capabilities by comparing the performance of GPT-3.5 and GPT-4.0 across model iterations. Objective: This study aims to evaluate the precise diagnostic ability of GPT-3.5 and GPT-4.0 for colon cancer and its potential as an auxiliary diagnostic tool for surgeons and compare the diagnostic accuracy rates between GTP-3.5 and GPT-4.0. We precisely assess the accuracy of primary and secondary diagnoses and analyze the causes of misdiagnoses in GPT-3.5 and GPT-4.0 according to 7 categories: patient histories, symptoms, physical signs, laboratory examinations, imaging examinations, pathological examinations, and intraoperative findings. Methods: We retrieved 316 case reports for intestinal cancer from the Chinese Medical Association Publishing House database, of which 286 cases were deemed valid after data cleansing. The cases were translated from Mandarin to English and then input into GPT-3.5 and GPT-4.0 using a simple, direct prompt to elicit primary and secondary diagnoses. We conducted a comparative study to evaluate the diagnostic accuracy of GPT-4.0 and GPT-3.5. Three senior surgeons from the General Surgery Department, specializing in Colorectal Surgery, assessed the diagnostic information at the Chinese PLA (People's Liberation Army) General Hospital. The accuracy of primary and secondary diagnoses was scored based on predefined criteria. Additionally, we analyzed and compared the causes of misdiagnoses in both models according to 7 categories: patient histories, symptoms, physical signs, laboratory examinations, imaging examinations, pathological examinations, and intraoperative findings. Results: Out of 286 cases, GPT-4.0 and GPT-3.5 both demonstrated high diagnostic accuracy for primary diagnoses, but the accuracy rates of GPT-4.0 were significantly higher than GPT-3.5 (mean 0.972, SD 0.137 vs mean 0.855, SD 0.335; t285=5.753; P<.001). For secondary diagnoses, the accuracy rates of GPT-4.0 were also significantly higher than GPT-3.5 (mean 0.908, SD 0.159 vs mean 0.617, SD 0.349; t285=--7.727; P<.001). GPT-3.5 showed limitations in processing patient history, symptom presentation, laboratory tests, and imaging data. While GPT-4.0 improved upon GPT-3.5, it still has limitations in identifying symptoms and laboratory test data. For both primary and secondary diagnoses, there was no significant difference in accuracy related to age, gender, or system group between GPT-4.0 and GPT-3.5. Conclusions: This study demonstrates that ChatGPT, particularly GPT-4.0, possesses significant diagnostic potential, with GPT-4.0 exhibiting higher accuracy than GPT-3.5. However, GPT-4.0 still has limitations, particularly in recognizing patient symptoms and laboratory data, indicating a need for more research in real-world clinical settings to enhance its diagnostic capabilities. ", doi="10.2196/54985", url="https://www.jmir.org/2024/1/e54985", url="http://www.ncbi.nlm.nih.gov/pubmed/39255016" } @Article{info:doi/10.2196/57195, author="van der Meijden, Lise Siri and van Boekel, M. Anna and van Goor, Harry and Nelissen, GHH Rob and Schoones, W. Jan and Steyerberg, W. Ewout and Geerts, F. Bart and de Boer, GJ Mark and Arbous, Sesmu M.", title="Automated Identification of Postoperative Infections to Allow Prediction and Surveillance Based on Electronic Health Record Data: Scoping Review", journal="JMIR Med Inform", year="2024", month="Sep", day="10", volume="12", pages="e57195", keywords="postoperative infections", keywords="surveillance", keywords="prediction", keywords="surgery", keywords="artificial intelligence", keywords="chart review", keywords="electronic health record", keywords="scoping review", keywords="postoperative", keywords="surgical", keywords="infection", keywords="infections", keywords="predictions", keywords="predict", keywords="predictive", keywords="bacterial", keywords="machine learning", keywords="record", keywords="records", keywords="EHR", keywords="EHRs", keywords="synthesis", keywords="review methods", keywords="review methodology", keywords="search", keywords="searches", keywords="searching", keywords="scoping", abstract="Background: Postoperative infections remain a crucial challenge in health care, resulting in high morbidity, mortality, and costs. Accurate identification and labeling of patients with postoperative bacterial infections is crucial for developing prediction models, validating biomarkers, and implementing surveillance systems in clinical practice. Objective: This scoping review aimed to explore methods for identifying patients with postoperative infections using electronic health record (EHR) data to go beyond the reference standard of manual chart review. Methods: We performed a systematic search strategy across PubMed, Embase, Web of Science (Core Collection), the Cochrane Library, and Emcare (Ovid), targeting studies addressing the prediction and fully automated surveillance (ie, without manual check) of diverse bacterial infections in the postoperative setting. For prediction modeling studies, we assessed the labeling methods used, categorizing them as either manual or automated. We evaluated the different types of EHR data needed for the surveillance and labeling of postoperative infections, as well as the performance of fully automated surveillance systems compared with manual chart review. Results: We identified 75 different methods and definitions used to identify patients with postoperative infections in studies published between 2003 and 2023. Manual labeling was the predominant method in prediction modeling research, 65\% (49/75) of the identified methods use structured data, and 45\% (34/75) use free text and clinical notes as one of their data sources. Fully automated surveillance systems should be used with caution because the reported positive predictive values are between 0.31 and 0.76. Conclusions: There is currently no evidence to support fully automated labeling and identification of patients with infections based solely on structured EHR data. Future research should focus on defining uniform definitions, as well as prioritizing the development of more scalable, automated methods for infection detection using structured EHR data. ", doi="10.2196/57195", url="https://medinform.jmir.org/2024/1/e57195", url="http://www.ncbi.nlm.nih.gov/pubmed/39255011" } @Article{info:doi/10.2196/44662, author="Slovis, Heritier Benjamin and Huang, Soonyip and McArthur, Melanie and Martino, Cara and Beers, Tasia and Labella, Meghan and Riggio, M. Jeffrey and Pribitkin, deAzevedo Edmund", title="Design and Implementation of an Opioid Scorecard for Hospital System--Wide Peer Comparison of Opioid Prescribing Habits: Observational Study", journal="JMIR Hum Factors", year="2024", month="Sep", day="9", volume="11", pages="e44662", keywords="opioids", keywords="peer comparison", keywords="quality", keywords="scorecard", keywords="prescribing", keywords="design", keywords="implementation", keywords="opioid", keywords="morbidity", keywords="mortality", keywords="opioid usage", keywords="opioid dependence", keywords="drug habits", abstract="Background: Reductions in opioid prescribing by health care providers can lead to a decreased risk of opioid dependence in patients. Peer comparison has been demonstrated to impact providers' prescribing habits, though its effect on opioid prescribing has predominantly been studied in the emergency department setting. Objective: The purpose of this study is to describe the development of an enterprise-wide opioid scorecard, the architecture of its implementation, and plans for future research on its effects. Methods: Using data generated by the author's enterprise vendor--based electronic health record, the enterprise analytics software, and expertise from a dedicated group of informaticists, physicians, and analysts, the authors developed an opioid scorecard that was released on a quarterly basis via email to all opioid prescribers at our institution. These scorecards compare providers' opioid prescribing habits on the basis of established metrics to those of their peers within their specialty throughout the enterprise. Results: At the time of this study's completion, 2034 providers have received at least 1 scorecard over a 5-quarter period ending in September 2021. Poisson regression demonstrated a 1.6\% quarterly reduction in opioid prescribing, and chi-square analysis demonstrated pre-post reductions in the proportion of prescriptions longer than 5 days' duration and a morphine equivalent daily dose of >50. Conclusions: To our knowledge, this is the first peer comparison effort with high-quality evidence-based metrics of this scale published in the literature. By sharing this process for designing the metrics and the process of distribution, the authors hope to influence other health systems to attempt to curb the opioid pandemic through peer comparison. Future research examining the effects of this intervention could demonstrate significant reductions in opioid prescribing, thus potentially reducing the progression of individual patients to opioid use disorder and the associated increased risk of morbidity and mortality. ", doi="10.2196/44662", url="https://humanfactors.jmir.org/2024/1/e44662" } @Article{info:doi/10.2196/56121, author="Si, Yafei and Yang, Yuyi and Wang, Xi and Zu, Jiaqi and Chen, Xi and Fan, Xiaojing and An, Ruopeng and Gong, Sen", title="Quality and Accountability of ChatGPT in Health Care in Low- and Middle-Income Countries: Simulated Patient Study", journal="J Med Internet Res", year="2024", month="Sep", day="9", volume="26", pages="e56121", keywords="ChatGPT", keywords="generative AI", keywords="simulated patient", keywords="health care", keywords="quality and safety", keywords="low- and middle-income countries", keywords="quality", keywords="LMIC", keywords="patient study", keywords="effectiveness", keywords="reliability", keywords="medication prescription", keywords="prescription", keywords="noncommunicable diseases", keywords="AI integration", keywords="AI", keywords="artificial intelligence", doi="10.2196/56121", url="https://www.jmir.org/2024/1/e56121", url="http://www.ncbi.nlm.nih.gov/pubmed/39250188" } @Article{info:doi/10.2196/55506, author="Mir, Hassan and Cullen, J. Katelyn and Mosleh, Karen and Setrak, Rafi and Jolly, Sanjit and Tsang, Michael and Rutledge, Gregory and Ibrahim, Quazi and Welsford, Michelle and Mercuri, Mathew and Schwalm, JD and Natarajan, K. Madhu", title="Smartphone App for Prehospital ECG Transmission in ST-Elevation Myocardial Infarction Activation: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2024", month="Sep", day="6", volume="13", pages="e55506", keywords="ST-elevation myocardial infarction", keywords="m-health", keywords="cardiac systems of care", keywords="knowledge mobilization", keywords="digital health", keywords="smartphone technology", keywords="technology", keywords="STEMI", keywords="Canada", keywords="implementation", keywords="mobile phone", abstract="Background: Timely diagnosis and treatment for ST-elevation myocardial infarction (STEMI) requires a coordinated response from multiple providers. Rapid intervention is key to reducing mortality and morbidity. Activation of the cardiac catheterization laboratory may occur through verbal communication and may also involve the secure sharing of electrocardiographic images between frontline health care providers and interventional cardiologists. To improve this response, we developed a quick, easy-to-use, privacy-compliant smartphone app, that is SMART AMI-ACS (Strategic Management of Acute Reperfusion and Therapies in Acute Myocardial Infarction Acute Coronary Syndromes), for real-time verbal communication and sharing of electrocardiographic images among health care providers in Ontario, Canada. The app further provides information about diagnosis, management, and risk calculators for patients presenting with acute coronary syndrome. Objective: This study aims to integrate the app into workflow processes to improve communication for STEMI activation, resulting in decreased treatment times, improved patient outcomes, and reduced unnecessary catheterization laboratory activation and transfer. Methods: Implementation of the app will be guided by the Reach, Effectiveness, Acceptability, Implementation, and Maintenance (RE-AIM) framework to measure impact. The study will use quantitative registry data already being collected through the SMART AMI project (STEMI registry), the use of the SMART AMI app, and quantitative and qualitative survey data from physicians. Survey questions will be based on the Consolidated Framework for Implementation Research. Descriptive quantitative analysis and thematic qualitative analysis of survey results will be conducted. Continuous variables will be described using either mean and SD or median and IQR values at pre- and postintervention periods by the study sites. Categorical variables, such as false activation, will be described as frequencies (percentages). For each outcome, an interrupted time series regression model will be fitted to evaluate the impact of the app. Results: The primary outcomes of this study include the usability, acceptability, and functionality of the app for physicians. This will be measured using electronic surveys to identify barriers and facilitators to app use. Other key outcomes will measure the implementation of the app by reviewing the timing-of-care intervals, false ``avoidable'' catheterization laboratory activation rates, and uptake and use of the app by physicians. Prospective evaluation will be conducted between April 1, 2022, and March 31, 2023. However, for the timing- and accuracy-of-care outcomes, registry data will be compared from January 1, 2019, to March 31, 2023. Data analysis is expected to be completed in Fall 2024, with the completion of a paper for publication anticipated by the end of 2024. Conclusions: Smartphone technology is well integrated into clinical practice and widely accessible. The proposed solution being tested is secure and leverages the accessibility of smartphones. Emergency medicine physicians can use this app to quickly, securely, and accurately transmit information ensuring faster and more appropriate decision-making for STEMI activation. Trial Registration: ClinicalTrials.gov NCT05290389; https://clinicaltrials.gov/study/NCT05290389 International Registered Report Identifier (IRRID): DERR1-10.2196/55506 ", doi="10.2196/55506", url="https://www.researchprotocols.org/2024/1/e55506", url="http://www.ncbi.nlm.nih.gov/pubmed/39240681" } @Article{info:doi/10.2196/56022, author="Lin, Tai-Han and Chung, Hsing-Yi and Jian, Ming-Jr and Chang, Chih-Kai and Perng, Cherng-Lih and Liao, Guo-Shiou and Yu, Jyh-Cherng and Dai, Ming-Shen and Yu, Cheng-Ping and Shang, Hung-Sheng", title="An Advanced Machine Learning Model for a Web-Based Artificial Intelligence--Based Clinical Decision Support System Application: Model Development and Validation Study", journal="J Med Internet Res", year="2024", month="Sep", day="4", volume="26", pages="e56022", keywords="breast cancer recurrence", keywords="artificial intelligence--based clinical decision support system", keywords="machine learning", keywords="personalized treatment planning", keywords="ChatGPT", keywords="predictive model accuracy", abstract="Background: Breast cancer is a leading global health concern, necessitating advancements in recurrence prediction and management. The development of an artificial intelligence (AI)--based clinical decision support system (AI-CDSS) using ChatGPT addresses this need with the aim of enhancing both prediction accuracy and user accessibility. Objective: This study aims to develop and validate an advanced machine learning model for a web-based AI-CDSS application, leveraging the question-and-answer guidance capabilities of ChatGPT to enhance data preprocessing and model development, thereby improving the prediction of breast cancer recurrence. Methods: This study focused on developing an advanced machine learning model by leveraging data from the Tri-Service General Hospital breast cancer registry of 3577 patients (2004-2016). As a tertiary medical center, it accepts referrals from four branches---3 branches in the northern region and 1 branch on an offshore island in our country---that manage chronic diseases but refer complex surgical cases, including breast cancer, to the main center, enriching our study population's diversity. Model training used patient data from 2004 to 2012, with subsequent validation using data from 2013 to 2016, ensuring comprehensive assessment and robustness of our predictive models. ChatGPT is integral to preprocessing and model development, aiding in hormone receptor categorization, age binning, and one-hot encoding. Techniques such as the synthetic minority oversampling technique address the imbalance of data sets. Various algorithms, including light gradient-boosting machine, gradient boosting, and extreme gradient boosting, were used, and their performance was evaluated using metrics such as the area under the curve, accuracy, sensitivity, and F1-score. Results: The light gradient-boosting machine model demonstrated superior performance, with an area under the curve of 0.80, followed closely by the gradient boosting and extreme gradient boosting models. The web interface of the AI-CDSS tool was effectively tested in clinical decision-making scenarios, proving its use in personalized treatment planning and patient involvement. Conclusions: The AI-CDSS tool, enhanced by ChatGPT, marks a significant advancement in breast cancer recurrence prediction, offering a more individualized and accessible approach for clinicians and patients. Although promising, further validation in diverse clinical settings is recommended to confirm its efficacy and expand its use. ", doi="10.2196/56022", url="https://www.jmir.org/2024/1/e56022" } @Article{info:doi/10.2196/54621, author="Yang, Meicheng and Zhuang, Jinqiang and Hu, Wenhan and Li, Jianqing and Wang, Yu and Zhang, Zhongheng and Liu, Chengyu and Chen, Hui", title="Enhancing Patient Selection in Sepsis Clinical Trials Design Through an AI Enrichment Strategy: Algorithm Development and Validation", journal="J Med Internet Res", year="2024", month="Sep", day="4", volume="26", pages="e54621", keywords="sepsis", keywords="enrichment strategy", keywords="disease progression trajectories", keywords="artificial intelligence", keywords="predictive modeling", keywords="conformal prediction", abstract="Background: Sepsis is a heterogeneous syndrome, and enrollment of more homogeneous patients is essential to improve the efficiency of clinical trials. Artificial intelligence (AI) has facilitated the identification of homogeneous subgroups, but how to estimate the uncertainty of the model outputs when applying AI to clinical decision-making remains unknown. Objective: We aimed to design an AI-based model for purposeful patient enrollment, ensuring that a patient with sepsis recruited into a trial would still be persistently ill by the time the proposed therapy could impact patient outcome. We also expected that the model could provide interpretable factors and estimate the uncertainty of the model outputs at a customized confidence level. Methods: In this retrospective study, 9135 patients with sepsis requiring vasopressor treatment within 24 hours after sepsis onset were enrolled from Beth Israel Deaconess Medical Center. This cohort was used for model development, and 10-fold cross-validation with 50 repeats was used for internal validation. In total, 3743 patients with sepsis from the eICU Collaborative Research Database were used as the external validation cohort. All included patients with sepsis were stratified based on disease progression trajectories: rapid death, recovery, and persistent ill. A total of 148 variables were selected for predicting the 3 trajectories. Four machine learning algorithms with 3 different setups were used. We estimated the uncertainty of the model outputs using conformal prediction (CP). The Shapley Additive Explanations method was used to explain the model. Results: The multiclass gradient boosting machine was identified as the best-performing model with good discrimination and calibration performance in both validation cohorts. The mean area under the receiver operating characteristic curve with SD was 0.906 (0.018) for rapid death, 0.843 (0.008) for recovery, and 0.807 (0.010) for persistent ill in the internal validation cohort. In the external validation cohort, the mean area under the receiver operating characteristic curve (SD) was 0.878 (0.003) for rapid death, 0.764 (0.008) for recovery, and 0.696 (0.007) for persistent ill. The maximum norepinephrine equivalence, total urine output, Acute Physiology Score III, mean systolic blood pressure, and the coefficient of variation of oxygen saturation contributed the most. Compared to the model without CP, using the model with CP at a mixed confidence approach reduced overall prediction errors by 27.6\% (n=62) and 30.7\% (n=412) in the internal and external validation cohorts, respectively, as well as enabled the identification of more potentially persistent ill patients. Conclusions: The implementation of our model has the potential to reduce heterogeneity and enroll more homogeneous patients in sepsis clinical trials. The use of CP for estimating the uncertainty of the model outputs allows for a more comprehensive understanding of the model's reliability and assists in making informed decisions based on the predicted outcomes. ", doi="10.2196/54621", url="https://www.jmir.org/2024/1/e54621" } @Article{info:doi/10.2196/59952, author="Hawkins, T. Alexander and Fa, Andrea and Younan, A. Samuel and Ivatury, Joga Srinivas and Bonnet, Kemberlee and Schlundt, David and Gordon, J. Elisa and Cavanaugh, L. Kerri", title="Decision Aid for Colectomy in Recurrent Diverticulitis: Development and Usability Study", journal="JMIR Form Res", year="2024", month="Sep", day="3", volume="8", pages="e59952", keywords="design sprint", keywords="diverticulitis", keywords="decision aid", keywords="shared decision-making", keywords="colectomy", keywords="decision-making", keywords="diverticular diseases", keywords="gastrointestinal diagnosis", keywords="American", keywords="America", keywords="tools", keywords="tool", keywords="effectiveness", keywords="surgeon", keywords="patients", keywords="patient", keywords="communication", keywords="synopsis", abstract="Background: Diverticular disease is a common gastrointestinal diagnosis with over 2.7 million clinic visits yearly. National guidelines from the American Society of Colon and Rectal Surgeons state that ``the decision to recommend elective sigmoid colectomy after recovery from uncomplicated acute diverticulitis should be individualized.'' However, tools to individualize this decision are lacking. Objective: This study aimed to develop an online educational decision aid (DA) to facilitate effective surgeon and patient communication about treatment options for recurrent left-sided diverticulitis. Methods: We used a modified design sprint methodology to create a prototype DA. We engaged a multidisciplinary team and adapted elements from the Ottawa Personal Decision Guide. We then iteratively refined the prototype by conducting a mixed methods assessment of content and usability testing, involving cognitive interviews with patients and surgeons. The findings informed the refinement of the DA. Further testing included an in-clinic feasibility review. Results: Over a 4-day in-person rapid design sprint, including patients, surgeons, and health communication experts, we developed a prototype of a diverticulitis DA, comprising an interactive website and handout with 3 discrete sections. The first section contains education about diverticulitis and treatment options. The second section clarifies the potential risks and benefits of both clinical treatment options (medical management vs colectomy). The third section invites patients to participate in a value clarification exercise. After navigating the DA, the patient prints a synopsis that they bring to their clinic appointment, which serves as a guide for shared decision-making. Conclusions: Design sprint methodology, emphasizing stakeholder co-design and complemented by extensive user testing, is an effective and efficient strategy to create a DA for patients living with recurrent diverticulitis facing critical treatment decisions. ", doi="10.2196/59952", url="https://formative.jmir.org/2024/1/e59952", url="http://www.ncbi.nlm.nih.gov/pubmed/39226090" } @Article{info:doi/10.2196/62866, author="Zhou, Huan and Fang, Cheng and Pan, Yifeng", title="Development of a System for Predicting Hospitalization Time for Patients With Traumatic Brain Injury Based on Machine Learning Algorithms: User-Centered Design Case Study", journal="JMIR Hum Factors", year="2024", month="Aug", day="30", volume="11", pages="e62866", keywords="machine learning", keywords="traumatic brain injury", keywords="support vector regression machine", keywords="predictive model", keywords="hospitalization", abstract="Background: Currently, the treatment and care of patients with traumatic brain injury (TBI) are intractable health problems worldwide and greatly increase the medical burden in society. However, machine learning--based algorithms and the use of a large amount of data accumulated in the clinic in the past can predict the hospitalization time of patients with brain injury in advance, so as to design a reasonable arrangement of resources and effectively reduce the medical burden of society. Especially in China, where medical resources are so tight, this method has important application value. Objective: We aimed to develop a system based on a machine learning model for predicting the length of hospitalization of patients with TBI, which is available to patients, nurses, and physicians. Methods: We collected information on 1128 patients who received treatment at the Neurosurgery Center of the Second Affiliated Hospital of Anhui Medical University from May 2017 to May 2022, and we trained and tested the machine learning model using 5 cross-validations to avoid overfitting; 28 types of independent variables were used as input variables in the machine learning model, and the length of hospitalization was used as the output variables. Once the models were trained, we obtained the error and goodness of fit (R2) of each machine learning model from the 5 rounds of cross-validation and compared them to select the best predictive model to be encapsulated in the developed system. In addition, we externally tested the models using clinical data related to patients treated at the First Affiliated Hospital of Anhui Medical University from June 2021 to February 2022. Results: Six machine learning models were built, including support vector regression machine, convolutional neural network, back propagation neural network, random forest, logistic regression, and multilayer perceptron. Among them, the support vector regression has the smallest error of 10.22\% on the test set, the highest goodness of fit of 90.4\%, and all performances are the best among the 6 models. In addition, we used external datasets to verify the experimental results of these 6 models in order to avoid experimental chance, and the support vector regression machine eventually performed the best in the external datasets. Therefore, we chose to encapsulate the support vector regression machine into our system for predicting the length of stay of patients with traumatic brain trauma. Finally, we made the developed system available to patients, nurses, and physicians, and the satisfaction questionnaire showed that patients, nurses, and physicians agreed that the system was effective in providing clinical decisions to help patients, nurses, and physicians. Conclusions: This study shows that the support vector regression machine model developed using machine learning methods can accurately predict the length of hospitalization of patients with TBI, and the developed prediction system has strong clinical use. ", doi="10.2196/62866", url="https://humanfactors.jmir.org/2024/1/e62866" } @Article{info:doi/10.2196/54944, author="Fang, Cheng and Ji, Xiao and Pan, Yifeng and Xie, Guanchao and Zhang, Hongsheng and Li, Sai and Wan, Jinghai", title="Combining Clinical-Radiomics Features With Machine Learning Methods for Building Models to Predict Postoperative Recurrence in Patients With Chronic Subdural Hematoma: Retrospective Cohort Study", journal="J Med Internet Res", year="2024", month="Aug", day="28", volume="26", pages="e54944", keywords="chronic subdural hematoma", keywords="convolutional neural network", keywords="machine learning", keywords="neurosurgery", keywords="radiomics", keywords="support vector machine", abstract="Background: Chronic subdural hematoma (CSDH) represents a prevalent medical condition, posing substantial challenges in postoperative management due to risks of recurrence. Such recurrences not only cause physical suffering to the patient but also add to the financial burden on the family and the health care system. Currently, prognosis determination largely depends on clinician expertise, revealing a dearth of precise prediction models in clinical settings. Objective: This study aims to use machine learning (ML) techniques for the construction of predictive models to assess the likelihood of CSDH recurrence after surgery, which leads to greater benefits for patients and the health care system. Methods: Data from 133 patients were amassed and partitioned into a training set (n=93) and a test set (n=40). Radiomics features were extracted from preoperative cranial computed tomography scans using 3D Slicer software. These features, in conjunction with clinical data and composite clinical-radiomics features, served as input variables for model development. Four distinct ML algorithms were used to build predictive models, and their performance was rigorously evaluated via accuracy, area under the curve (AUC), and recall metrics. The optimal model was identified, followed by recursive feature elimination for feature selection, leading to enhanced predictive efficacy. External validation was conducted using data sets from additional health care facilities. Results: Following rigorous experimental analysis, the support vector machine model, predicated on clinical-radiomics features, emerged as the most efficacious for predicting postoperative recurrence in patients with CSDH. Subsequent to feature selection, key variables exerting significant impact on the model were incorporated as the input set, thereby augmenting its predictive accuracy. The model demonstrated robust performance, with metrics including accuracy of 92.72\%, AUC of 91.34\%, and recall of 93.16\%. External validation further substantiated its effectiveness, yielding an accuracy of 90.32\%, AUC of 91.32\%, and recall of 88.37\%, affirming its clinical applicability. Conclusions: This study substantiates the feasibility and clinical relevance of an ML-based predictive model, using clinical-radiomics features, for relatively accurate prognostication of postoperative recurrence in patients with CSDH. If the model is integrated into clinical practice, it will be of great significance in enhancing the quality and efficiency of clinical decision-making processes, which can improve the accuracy of diagnosis and treatment, reduce unnecessary tests and surgeries, and reduce the waste of medical resources. ", doi="10.2196/54944", url="https://www.jmir.org/2024/1/e54944" } @Article{info:doi/10.2196/55476, author="Lee, Haedeun and Oh, Bumjo and Kim, Seung-Chan", title="Recognition of Forward Head Posture Through 3D Human Pose Estimation With a Graph Convolutional Network: Development and Feasibility Study", journal="JMIR Form Res", year="2024", month="Aug", day="26", volume="8", pages="e55476", keywords="posture correction", keywords="injury prediction", keywords="human pose estimation", keywords="forward head posture", keywords="machine learning", keywords="graph convolutional networks", keywords="posture", keywords="graph neural network", keywords="graph", keywords="pose", keywords="postural", keywords="deep learning", keywords="neural network", keywords="neural networks", keywords="upper", keywords="algorithms", abstract="Background: Prolonged improper posture can lead to forward head posture (FHP), causing headaches, impaired respiratory function, and fatigue. This is especially relevant in sedentary scenarios, where individuals often maintain static postures for extended periods---a significant part of daily life for many. The development of a system capable of detecting FHP is crucial, as it would not only alert users to correct their posture but also serve the broader goal of contributing to public health by preventing the progression of chronic injuries associated with this condition. However, despite significant advancements in estimating human poses from standard 2D images, most computational pose models do not include measurements of the craniovertebral angle, which involves the C7 vertebra, crucial for diagnosing FHP. Objective: Accurate diagnosis of FHP typically requires dedicated devices, such as clinical postural assessments or specialized imaging equipment, but their use is impractical for continuous, real-time monitoring in everyday settings. Therefore, developing an accessible, efficient method for regular posture assessment that can be easily integrated into daily activities, providing real-time feedback, and promoting corrective action, is necessary. Methods: The system sequentially estimates 2D and 3D human anatomical key points from a provided 2D image, using the Detectron2D and VideoPose3D algorithms, respectively. It then uses a graph convolutional network (GCN), explicitly crafted to analyze the spatial configuration and alignment of the upper body's anatomical key points in 3D space. This GCN aims to implicitly learn the intricate relationship between the estimated 3D key points and the correct posture, specifically to identify FHP. Results: The test accuracy was 78.27\% when inputs included all joints corresponding to the upper body key points. The GCN model demonstrated slightly superior balanced performance across classes with an F1-score (macro) of 77.54\%, compared to the baseline feedforward neural network (FFNN) model's 75.88\%. Specifically, the GCN model showed a more balanced precision and recall between the classes, suggesting its potential for better generalization in FHP detection across diverse postures. Meanwhile, the baseline FFNN model demonstrates a higher precision for FHP cases but at the cost of lower recall, indicating that while it is more accurate in confirming FHP when detected, it misses a significant number of actual FHP instances. This assertion is further substantiated by the examination of the latent feature space using t-distributed stochastic neighbor embedding, where the GCN model presented an isotropic distribution, unlike the FFNN model, which showed an anisotropic distribution. Conclusions: Based on 2D image input using 3D human pose estimation joint inputs, it was found that it is possible to learn FHP-related features using the proposed GCN-based network to develop a posture correction system. We conclude the paper by addressing the limitations of our current system and proposing potential avenues for future work in this area. ", doi="10.2196/55476", url="https://formative.jmir.org/2024/1/e55476" } @Article{info:doi/10.2196/56042, author="Kar, Debasish and Taylor, S. Kathryn and Joy, Mark and Venkatesan, Sudhir and Meeraus, Wilhelmine and Taylor, Sylvia and Anand, N. Sneha and Ferreira, Filipa and Jamie, Gavin and Fan, Xuejuan and de Lusignan, Simon", title="Creating a Modified Version of the Cambridge Multimorbidity Score to Predict Mortality in People Older Than 16 Years: Model Development and Validation", journal="J Med Internet Res", year="2024", month="Aug", day="26", volume="26", pages="e56042", keywords="pandemics", keywords="COVID-19", keywords="multimorbidity", keywords="prevalence", keywords="predictive model", keywords="discrimination", keywords="calibration", keywords="systematized nomenclature of medicine", keywords="computerized medical records", keywords="systems", abstract="Background: No single multimorbidity measure is validated for use in NHS (National Health Service) England's General Practice Extraction Service Data for Pandemic Planning and Research (GDPPR), the nationwide primary care data set created for COVID-19 pandemic research. The Cambridge Multimorbidity Score (CMMS) is a validated tool for predicting mortality risk, with 37 conditions defined by Read Codes. The GDPPR uses the more internationally used Systematized Nomenclature of Medicine clinical terms (SNOMED CT). We previously developed a modified version of the CMMS using SNOMED CT, but the number of terms for the GDPPR data set is limited making it impossible to use this version. Objective: We aimed to develop and validate a modified version of CMMS using the clinical terms available for the GDPPR. Methods: We used pseudonymized data from the Oxford-Royal College of General Practitioners Research and Surveillance Centre (RSC), which has an extensive SNOMED CT list. From the 37 conditions in the original CMMS model, we selected conditions either with (1) high prevalence ratio (?85\%), calculated as the prevalence in the RSC data set but using the GDPPR set of SNOMED CT codes, divided by the prevalence included in the RSC SNOMED CT codes or (2) conditions with lower prevalence ratios but with high predictive value. The resulting set of conditions was included in Cox proportional hazard models to determine the 1-year mortality risk in a development data set (n=500,000) and construct a new CMMS model, following the methods for the original CMMS study, with variable reduction and parsimony, achieved by backward elimination and the Akaike information stopping criterion. Model validation involved obtaining 1-year mortality estimates for a synchronous data set (n=250,000) and 1-year and 5-year mortality estimates for an asynchronous data set (n=250,000). We compared the performance with that of the original CMMS and the modified CMMS that we previously developed using RSC data. Results: The initial model contained 22 conditions and our final model included 17 conditions. The conditions overlapped with those of the modified CMMS using the more extensive SNOMED CT list. For 1-year mortality, discrimination was high in both the derivation and validation data sets (Harrell C=0.92) and 5-year mortality was slightly lower (Harrell C=0.90). Calibration was reasonable following an adjustment for overfitting. The performance was similar to that of both the original and previous modified CMMS models. Conclusions: The new modified version of the CMMS can be used on the GDPPR, a nationwide primary care data set of 54 million people, to enable adjustment for multimorbidity in predicting mortality in people in real-world vaccine effectiveness, pandemic planning, and other research studies. It requires 17 variables to produce a comparable performance with our previous modification of CMMS to enable it to be used in routine data using SNOMED CT. ", doi="10.2196/56042", url="https://www.jmir.org/2024/1/e56042" } @Article{info:doi/10.2196/46936, author="Straw, Isabel and Rees, Geraint and Nachev, Parashkev", title="Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study", journal="J Med Internet Res", year="2024", month="Aug", day="26", volume="26", pages="e46936", keywords="artificial intelligence", keywords="machine learning", keywords="cardiology", keywords="health care", keywords="health equity", keywords="medicine", keywords="cardiac", keywords="quantitative evaluation", keywords="inequality", keywords="cardiac disease", keywords="performance", keywords="sex", keywords="management", keywords="heart failure", abstract="Background: The presence of bias in artificial intelligence has garnered increased attention, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In health care, the inequitable performance of algorithms across demographic groups may widen health inequalities. Objective: Here, we identify and characterize bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure. Methods: Stage 1 involved a literature search of PubMed and Web of Science for key terms relating to cardiac machine learning (ML) algorithms. Papers that built ML models to predict cardiac disease were evaluated for their focus on demographic bias in model performance, and open-source data sets were retained for our investigation. Two open-source data sets were identified: (1) the University of California Irvine Heart Failure data set and (2) the University of California Irvine Coronary Artery Disease data set. We reproduced existing algorithms that have been reported for these data sets, tested them for sex biases in algorithm performance, and assessed a range of remediation techniques for their efficacy in reducing inequities. Particular attention was paid to the false negative rate (FNR), due to the clinical significance of underdiagnosis and missed opportunities for treatment. Results: In stage 1, our literature search returned 127 papers, with 60 meeting the criteria for a full review and only 3 papers highlighting sex differences in algorithm performance. In the papers that reported sex, there was a consistent underrepresentation of female patients in the data sets. No papers investigated racial or ethnic differences. In stage 2, we reproduced algorithms reported in the literature, achieving mean accuracies of 84.24\% (SD 3.51\%) for data set 1 and 85.72\% (SD 1.75\%) for data set 2 (random forest models). For data set 1, the FNR was significantly higher for female patients in 13 out of 16 experiments, meeting the threshold of statistical significance (--17.81\% to --3.37\%; P<.05). A smaller disparity in the false positive rate was significant for male patients in 13 out of 16 experiments (--0.48\% to +9.77\%; P<.05). We observed an overprediction of disease for male patients (higher false positive rate) and an underprediction of disease for female patients (higher FNR). Sex differences in feature importance suggest that feature selection needs to be demographically tailored. Conclusions: Our research exposes a significant gap in cardiac ML research, highlighting that the underperformance of algorithms for female patients has been overlooked in the published literature. Our study quantifies sex disparities in algorithmic performance and explores several sources of bias. We found an underrepresentation of female patients in the data sets used to train algorithms, identified sex biases in model error rates, and demonstrated that a series of remediation techniques were unable to address the inequities present. ", doi="10.2196/46936", url="https://www.jmir.org/2024/1/e46936" } @Article{info:doi/10.2196/54616, author="Zou, Xuan and He, Weijie and Huang, Yu and Ouyang, Yi and Zhang, Zhen and Wu, Yu and Wu, Yongsheng and Feng, Lili and Wu, Sheng and Yang, Mengqi and Chen, Xuyan and Zheng, Yefeng and Jiang, Rui and Chen, Ting", title="AI-Driven Diagnostic Assistance in Medical Inquiry: Reinforcement Learning Algorithm Development and Validation", journal="J Med Internet Res", year="2024", month="Aug", day="23", volume="26", pages="e54616", keywords="inquiry and diagnosis", keywords="electronic health record", keywords="reinforcement learning", keywords="natural language processing", keywords="artificial intelligence", abstract="Background: For medical diagnosis, clinicians typically begin with a patient's chief concerns, followed by questions about symptoms and medical history, physical examinations, and requests for necessary auxiliary examinations to gather comprehensive medical information. This complex medical investigation process has yet to be modeled by existing artificial intelligence (AI) methodologies. Objective: The aim of this study was to develop an AI-driven medical inquiry assistant for clinical diagnosis that provides inquiry recommendations by simulating clinicians' medical investigating logic via reinforcement learning. Methods: We compiled multicenter, deidentified outpatient electronic health records from 76 hospitals in Shenzhen, China, spanning the period from July to November 2021. These records consisted of both unstructured textual information and structured laboratory test results. We first performed feature extraction and standardization using natural language processing techniques and then used a reinforcement learning actor-critic framework to explore the rational and effective inquiry logic. To align the inquiry process with actual clinical practice, we segmented the inquiry into 4 stages: inquiring about symptoms and medical history, conducting physical examinations, requesting auxiliary examinations, and terminating the inquiry with a diagnosis. External validation was conducted to validate the inquiry logic of the AI model. Results: This study focused on 2 retrospective inquiry-and-diagnosis tasks in the emergency and pediatrics departments. The emergency departments provided records of 339,020 consultations including mainly children (median age 5.2, IQR 2.6-26.1 years) with various types of upper respiratory tract infections (250,638/339,020, 73.93\%). The pediatrics department provided records of 561,659 consultations, mainly of children (median age 3.8, IQR 2.0-5.7 years) with various types of upper respiratory tract infections (498,408/561,659, 88.73\%). When conducting its own inquiries in both scenarios, the AI model demonstrated high diagnostic performance, with areas under the receiver operating characteristic curve of 0.955 (95\% CI 0.953-0.956) and 0.943 (95\% CI 0.941-0.944), respectively. When the AI model was used in a simulated collaboration with physicians, it notably reduced the average number of physicians' inquiries to 46\% (6.037/13.26; 95\% CI 6.009-6.064) and 43\% (6.245/14.364; 95\% CI 6.225-6.269) while achieving areas under the receiver operating characteristic curve of 0.972 (95\% CI 0.970-0.973) and 0.968 (95\% CI 0.967-0.969) in the scenarios. External validation revealed a normalized Kendall $\tau$ distance of 0.323 (95\% CI 0.301-0.346), indicating the inquiry consistency of the AI model with physicians. Conclusions: This retrospective analysis of predominantly respiratory pediatric presentations in emergency and pediatrics departments demonstrated that an AI-driven diagnostic assistant had high diagnostic performance both in stand-alone use and in simulated collaboration with clinicians. Its investigation process was found to be consistent with the clinicians' medical investigation logic. These findings highlight the diagnostic assistant's promise in assisting the decision-making processes of health care professionals. ", doi="10.2196/54616", url="https://www.jmir.org/2024/1/e54616" } @Article{info:doi/10.2196/53662, author="Ruchonnet-M{\'e}trailler, Isabelle and Siebert, N. Johan and Hartley, Mary-Anne and Lacroix, Laurence", title="Automated Interpretation of Lung Sounds by Deep Learning in Children With Asthma: Scoping Review and Strengths, Weaknesses, Opportunities, and Threats Analysis", journal="J Med Internet Res", year="2024", month="Aug", day="23", volume="26", pages="e53662", keywords="asthma", keywords="wheezing disorders", keywords="artificial intelligence", keywords="deep learning", keywords="machine learning", keywords="respiratory sounds", keywords="auscultation", keywords="stethoscope", keywords="pediatric", keywords="mobile phone", abstract="Background: The interpretation of lung sounds plays a crucial role in the appropriate diagnosis and management of pediatric asthma. Applying artificial intelligence (AI) to this task has the potential to better standardize assessment and may even improve its predictive potential. Objective: This study aims to objectively review the literature on AI-assisted lung auscultation for pediatric asthma and provide a balanced assessment of its strengths, weaknesses, opportunities, and threats. Methods: A scoping review on AI-assisted lung sound analysis in children with asthma was conducted across 4 major scientific databases (PubMed, MEDLINE Ovid, Embase, and Web of Science), supplemented by a gray literature search on Google Scholar, to identify relevant studies published from January 1, 2000, until May 23, 2023. The search strategy incorporated a combination of keywords related to AI, pulmonary auscultation, children, and asthma. The quality of eligible studies was assessed using the ChAMAI (Checklist for the Assessment of Medical Artificial Intelligence). Results: The search identified 7 relevant studies out of 82 (9\%) to be included through an academic literature search, while 11 of 250 (4.4\%) studies from the gray literature search were considered but not included in the subsequent review and quality assessment. All had poor to medium ChAMAI scores, mostly due to the absence of external validation. Identified strengths were improved predictive accuracy of AI to allow for prompt and early diagnosis, personalized management strategies, and remote monitoring capabilities. Weaknesses were the heterogeneity between studies and the lack of standardization in data collection and interpretation. Opportunities were the potential of coordinated surveillance, growing data sets, and new ways of collaboratively learning from distributed data. Threats were both generic for the field of medical AI (loss of interpretability) but also specific to the use case, as clinicians might lose the skill of auscultation. Conclusions: To achieve the opportunities of automated lung auscultation, there is a need to address weaknesses and threats with large-scale coordinated data collection in globally representative populations and leveraging new approaches to collaborative learning. ", doi="10.2196/53662", url="https://www.jmir.org/2024/1/e53662", url="http://www.ncbi.nlm.nih.gov/pubmed/39178033" } @Article{info:doi/10.2196/56035, author="Ni, Jiali and Huang, Yong and Xiang, Qiangqiang and Zheng, Qi and Xu, Xiang and Qin, Zhiwen and Sheng, Guoping and Li, Lanjuan", title="Establishment and Evaluation of a Noninvasive Metabolism-Related Fatty Liver Screening and Dynamic Monitoring Model: Cross-Sectional Study", journal="Interact J Med Res", year="2024", month="Aug", day="22", volume="13", pages="e56035", keywords="metabolic-associated fatty liver disease", keywords="nonalcoholic fatty liver disease", keywords="nonalcoholic steatohepatitis", keywords="body fat mass", keywords="waist-height ratio", keywords="basal metabolic rate", keywords="liver", abstract="Background: Metabolically associated fatty liver disease (MAFLD) insidiously affects people's health, and many models have been proposed for the evaluation of liver fibrosis. However, there is still a lack of noninvasive and sensitive models to screen MAFLD in high-risk populations. Objective: The purpose of this study was to explore a new method for early screening of the public and establish a home-based tool for regular self-assessment and monitoring of MAFLD. Methods: In this cross-sectional study, there were 1758 eligible participants in the training set and 200 eligible participants in the testing set. Routine blood, blood biochemistry, and FibroScan tests were performed, and body composition was analyzed using a body composition instrument. Additionally, we recorded multiple factors including disease-related risk factors, the Forns index score, the hepatic steatosis index (HSI), the triglyceride glucose index, total body water (TBW), body fat mass (BFM), visceral fat area, waist-height ratio (WHtR), and basal metabolic rate. Binary logistic regression analysis was performed to explore the potential anthropometric indicators that have a predictive ability to screen for MAFLD. A new model, named the MAFLD Screening Index (MFSI), was established using binary logistic regression analysis, and BFM, WHtR, and TBW were included. A simple rating table, named the MAFLD Rating Table (MRT), was also established using these indicators. Results: The performance of the HSI (area under the curve [AUC]=0.873, specificity=76.8\%, sensitivity=81.4\%), WHtR (AUC=0.866, specificity=79.8\%, sensitivity=80.8\%), and BFM (AUC=0.842, specificity=76.9\%, sensitivity=76.2\%) in discriminating between the MAFLD group and non-fatty liver group was evaluated (P<.001). The AUC of the combined model including WHtR, HSI, and BFM values was 0.900 (specificity=81.8\%, sensitivity=85.6\%; P<.001). The MFSI was established based on better performance at screening MAFLD patients in the training set (AUC=0.896, specificity=83.8\%, sensitivity=82.1\%) and was confirmed in the testing set (AUC=0.917, specificity=89.8\%, sensitivity=84.4\%; P<.001). Conclusions: The novel MFSI model was built using WHtR, BFM, and TBW to screen for early MAFLD. These body parameters can be easily obtained using a body fat scale at home, and the mobile device software can record specific values and perform calculations. MFSI had better performance than other models for early MAFLD screening. The new model showed strong power and stability and shows promise in the area of MAFLD detection and self-assessment. The MRT was a practical tool to assess disease alterations in real time. ", doi="10.2196/56035", url="https://www.i-jmr.org/2024/1/e56035", url="http://www.ncbi.nlm.nih.gov/pubmed/39172506" } @Article{info:doi/10.2196/54740, author="Islam, Nazmul and Reuben, S. Jamie and Dale, Justin and Coates, W. James and Sapiah, Karan and Markson, R. Frank and Jordan, T. Craig and Smith, Clay", title="Predictive Models for Long Term Survival of AML Patients Treated with Venetoclax and Azacitidine or 7+3 Based on Post Treatment Events and Responses: Retrospective Cohort Study", journal="JMIR Cancer", year="2024", month="Aug", day="21", volume="10", pages="e54740", keywords="Leukemia, Myeloid, Acute", keywords="Venetoclax", keywords="Azacitidine", keywords="Anthracycline", keywords="Arabinoside, Cytosine", keywords="Clinical Decision Support", keywords="Clinical Informatics", keywords="Machine Learning", keywords="Predictive Model", keywords="Overall Survival", abstract="Background: The treatment of acute myeloid leukemia (AML) in older or unfit patients typically involves a regimen of venetoclax plus azacitidine (ven/aza). Toxicity and treatment responses are highly variable following treatment initiation and clinical decision-making continually evolves in response to these as treatment progresses. To improve clinical decision support (CDS) following treatment initiation, predictive models based on evolving and dynamic toxicities, disease responses, and other features should be developed. Objective: This study aims to generate machine learning (ML)--based predictive models that incorporate individual predictors of overall survival (OS) for patients with AML, based on clinical events occurring after the initiation of ven/aza or 7+3 regimen. Methods: Data from 221 patients with AML, who received either the ven/aza (n=101 patients) or 7+3 regimen (n=120 patients) as their initial induction therapy, were retrospectively analyzed. We performed stratified univariate and multivariate analyses to quantify the association between toxicities, hospital events, and short-term disease responses and OS for the 7+3 and ven/aza subgroups separately. We compared the estimates of confounders to assess potential effect modifications by treatment. 17 ML-based predictive models were developed. The optimal predictive models were selected based on their predictability and discriminability using cross-validation. Uncertainty in the estimation was assessed through bootstrapping. Results: The cumulative incidence of posttreatment toxicities varies between the ven/aza and 7+3 regimen. A variety of laboratory features and clinical events during the first 30 days were differentially associated with OS for the two treatments. An initial transfer to intensive care unit (ICU) worsened OS for 7+3 patients (aHR 1.18, 95\% CI 1.10-1.28), while ICU readmission adversely affected OS for those on ven/aza (aHR 1.24, 95\% CI 1.12-1.37). At the initial follow-up, achieving a morphologic leukemia free state (MLFS) did not affect OS for ven/aza (aHR 0.99, 95\% CI 0.94-1.05), but worsened OS following 7+3 (aHR 1.16, 95\% CI 1.01-1.31) compared to that of complete remission (CR). Having blasts over 5\% at the initial follow-up negatively impacted OS for both 7+3 (P<.001) and ven/aza (P<.001) treated patients. A best response of CR and CR with incomplete recovery (CRi) was superior to MLFS and refractory disease after ven/aza (P<.001), whereas for 7+3, CR was superior to CRi, MLFS, and refractory disease (P<.001), indicating unequal outcomes. Treatment-specific predictive models, trained on 120 7+3 and 101 ven/aza patients using over 114 features, achieved survival AUCs over 0.70. Conclusions: Our findings indicate that toxicities, clinical events, and responses evolve differently in patients receiving ven/aza compared with that of 7+3 regimen. ML-based predictive models were shown to be a feasible strategy for CDS in both forms of AML treatment. If validated with larger and more diverse data sets, these findings could offer valuable insights for developing AML-CDS tools that leverage posttreatment clinical data. ", doi="10.2196/54740", url="https://cancer.jmir.org/2024/1/e54740" } @Article{info:doi/10.2196/54097, author="Tsai, Feng-Fang and Chang, Yung-Chun and Chiu, Yu-Wen and Sheu, Bor-Ching and Hsu, Min-Huei and Yeh, Huei-Ming", title="Machine Learning Model for Anesthetic Risk Stratification for Gynecologic and Obstetric Patients: Cross-Sectional Study Outlining a Novel Approach for Early Detection", journal="JMIR Form Res", year="2024", month="Aug", day="21", volume="8", pages="e54097", keywords="gradient boosting machine", keywords="comorbidity", keywords="gynecological and obstetric procedure", keywords="ASA classification", keywords="American Society of Anesthesiologists", keywords="preoperative evaluation", keywords="machine learning", keywords="machine learning model", keywords="gynecology", keywords="obstetrics", keywords="early detection", keywords="artificial intelligence", keywords="physiological", keywords="gestational", keywords="anesthetic risk", keywords="clinical laboratory data", keywords="laboratory data", keywords="risk", keywords="risk classification", abstract="Background: Preoperative evaluation is important, and this study explored the application of machine learning methods for anesthetic risk classification and the evaluation of the contributions of various factors. To minimize the effects of confounding variables during model training, we used a homogenous group with similar physiological states and ages undergoing similar pelvic organ--related procedures not involving malignancies. Objective: Data on women of reproductive age (age 20-50 years) who underwent gestational or gynecological surgery between January 1, 2017, and December 31, 2021, were obtained from the National Taiwan University Hospital Integrated Medical Database. Methods: We first performed an exploratory analysis and selected key features. We then performed data preprocessing to acquire relevant features related to preoperative examination. To further enhance predictive performance, we used the log-likelihood ratio algorithm to generate comorbidity patterns. Finally, we input the processed features into the light gradient boosting machine (LightGBM) model for training and subsequent prediction. Results: A total of 10,892 patients were included. Within this data set, 9893 patients were classified as having low anesthetic risk (American Society of Anesthesiologists physical status score of 1-2), and 999 patients were classified as having high anesthetic risk (American Society of Anesthesiologists physical status score of >2). The area under the receiver operating characteristic curve of the proposed model was 0.6831. Conclusions: By combining comorbidity information and clinical laboratory data, our methodology based on the LightGBM model provides more accurate predictions for anesthetic risk classification. Trial Registration: Research Ethics Committee of the National Taiwan University Hospital 202204010RINB; https://www.ntuh.gov.tw/RECO/Index.action ", doi="10.2196/54097", url="https://formative.jmir.org/2024/1/e54097", url="http://www.ncbi.nlm.nih.gov/pubmed/38991090" } @Article{info:doi/10.2196/52730, author="Mutnuri, Kumar Maruthi and Stelfox, Thomas Henry and Forkert, Daniel Nils and Lee, Joon", title="Using Domain Adaptation and Inductive Transfer Learning to Improve Patient Outcome Prediction in the Intensive Care Unit: Retrospective Observational Study", journal="J Med Internet Res", year="2024", month="Aug", day="21", volume="26", pages="e52730", keywords="transfer learning", keywords="patient outcome prediction", keywords="intensive care", keywords="deep learning", keywords="electronic health record", abstract="Background: Accurate patient outcome prediction in the intensive care unit (ICU) can potentially lead to more effective and efficient patient care. Deep learning models are capable of learning from data to accurately predict patient outcomes, but they typically require large amounts of data and computational resources. Transfer learning (TL) can help in scenarios where data and computational resources are scarce by leveraging pretrained models. While TL has been widely used in medical imaging and natural language processing, it has been rare in electronic health record (EHR) analysis. Furthermore, domain adaptation (DA) has been the most common TL method in general, whereas inductive transfer learning (ITL) has been rare. To the best of our knowledge, DA and ITL have never been studied in-depth in the context of EHR-based ICU patient outcome prediction. Objective: This study investigated DA, as well as rarely researched ITL, in EHR-based ICU patient outcome prediction under simulated, varying levels of data scarcity. Methods: Two patient cohorts were used in this study: (1) eCritical, a multicenter ICU data from 55,689 unique admission records from 48,672 unique patients admitted to 15 medical-surgical ICUs in Alberta, Canada, between March 2013 and December 2019, and (2) Medical Information Mart for Intensive Care III, a single-center, publicly available ICU data set from Boston, Massachusetts, acquired between 2001 and 2012 containing 61,532 admission records from 46,476 patients. We compared DA and ITL models with baseline models (without TL) of fully connected neural networks, logistic regression, and lasso regression in the prediction of 30-day mortality, acute kidney injury, ICU length of stay, and hospital length of stay. Random subsets of training data, ranging from 1\% to 75\%, as well as the full data set, were used to compare the performances of DA and ITL with the baseline models at various levels of data scarcity. Results: Overall, the ITL models outperformed the baseline models in 55 of 56 comparisons (all P values <.001). The DA models outperformed the baseline models in 45 of 56 comparisons (all P values <.001). ITL resulted in better performance than DA in terms of the number of times and the margin with which it outperformed the baseline models. In 11 of 16 cases (8 of 8 for ITL and 3 of 8 for DA), TL models outperformed baseline models when trained using 1\% data subset. Conclusions: TL-based ICU patient outcome prediction models are useful in data-scarce scenarios. The results of this study can be used to estimate ICU outcome prediction performance at different levels of data scarcity, with and without TL. The publicly available pretrained models from this study can serve as building blocks in further research for the development and validation of models in other ICU cohorts and outcomes. ", doi="10.2196/52730", url="https://www.jmir.org/2024/1/e52730" } @Article{info:doi/10.2196/46946, author="Rahman, Jessica and Brankovic, Aida and Tracy, Mark and Khanna, Sankalp", title="Exploring Computational Techniques in Preprocessing Neonatal Physiological Signals for Detecting Adverse Outcomes: Scoping Review", journal="Interact J Med Res", year="2024", month="Aug", day="20", volume="13", pages="e46946", keywords="physiological signals", keywords="preterm", keywords="neonatal intensive care unit", keywords="morbidity", keywords="signal processing", keywords="signal analysis", keywords="adverse outcomes", keywords="predictive and diagnostic models", abstract="Background: Computational signal preprocessing is a prerequisite for developing data-driven predictive models for clinical decision support. Thus, identifying the best practices that adhere to clinical principles is critical to ensure transparency and reproducibility to drive clinical adoption. It further fosters reproducible, ethical, and reliable conduct of studies. This procedure is also crucial for setting up a software quality management system to ensure regulatory compliance in developing software as a medical device aimed at early preclinical detection of clinical deterioration. Objective: This scoping review focuses on the neonatal intensive care unit setting and summarizes the state-of-the-art computational methods used for preprocessing neonatal clinical physiological signals; these signals are used for the development of machine learning models to predict the risk of adverse outcomes. Methods: Five databases (PubMed, Web of Science, Scopus, IEEE, and ACM Digital Library) were searched using a combination of keywords and MeSH (Medical Subject Headings) terms. A total of 3585 papers from 2013 to January 2023 were identified based on the defined search terms and inclusion criteria. After removing duplicates, 2994 (83.51\%) papers were screened by title and abstract, and 81 (0.03\%) were selected for full-text review. Of these, 52 (64\%) were eligible for inclusion in the detailed analysis. Results: Of the 52 articles reviewed, 24 (46\%) studies focused on diagnostic models, while the remainder (n=28, 54\%) focused on prognostic models. The analysis conducted in these studies involved various physiological signals, with electrocardiograms being the most prevalent. Different programming languages were used, with MATLAB and Python being notable. The monitoring and capturing of physiological data used diverse systems, impacting data quality and introducing study heterogeneity. Outcomes of interest included sepsis, apnea, bradycardia, mortality, necrotizing enterocolitis, and hypoxic-ischemic encephalopathy, with some studies analyzing combinations of adverse outcomes. We found a partial or complete lack of transparency in reporting the setting and the methods used for signal preprocessing. This includes reporting methods to handle missing data, segment size for considered analysis, and details regarding the modification of the state-of-the-art methods for physiological signal processing to align with the clinical principles for neonates. Only 7 (13\%) of the 52 reviewed studies reported all the recommended preprocessing steps, which could have impacts on the downstream analysis. Conclusions: The review found heterogeneity in the techniques used and inconsistent reporting of parameters and procedures used for preprocessing neonatal physiological signals, which is necessary to confirm adherence to clinical and software quality management system practices, usefulness, and choice of best practices. Enhancing transparency in reporting and standardizing procedures will boost study interpretation and reproducibility and expedite clinical adoption, instilling confidence in the research findings and streamlining the translation of research outcomes into clinical practice, ultimately contributing to the advancement of neonatal care and patient outcomes. ", doi="10.2196/46946", url="https://www.i-jmr.org/2024/1/e46946" } @Article{info:doi/10.2196/56514, author="Knitza, Johannes and Hasanaj, Ragip and Beyer, Jonathan and Ganzer, Franziska and Slagman, Anna and Bolanaki, Myrto and Napierala, Hendrik and Schmieding, L. Malte and Al-Zaher, Nizam and Orlemann, Till and Muehlensiepen, Felix and Greenfield, Julia and Vuillerme, Nicolas and Kuhn, Sebastian and Schett, Georg and Achenbach, Stephan and Dechant, Katharina", title="Comparison of Two Symptom Checkers (Ada and Symptoma) in the Emergency Department: Randomized, Crossover, Head-to-Head, Double-Blinded Study", journal="J Med Internet Res", year="2024", month="Aug", day="20", volume="26", pages="e56514", keywords="symptom checker", keywords="triage", keywords="emergency", keywords="eHealth", keywords="diagnostic accuracy", keywords="apps, health service research", keywords="decision support system", abstract="Background: Emergency departments (EDs) are frequently overcrowded and increasingly used by nonurgent patients. Symptom checkers (SCs) offer on-demand access to disease suggestions and recommended actions, potentially improving overall patient flow. Contrary to the increasing use of SCs, there is a lack of supporting evidence based on direct patient use. Objective: This study aimed to compare the diagnostic accuracy, safety, usability, and acceptance of 2 SCs, Ada and Symptoma. Methods: A randomized, crossover, head-to-head, double-blinded study including consecutive adult patients presenting to the ED at University Hospital Erlangen. Patients completed both SCs, Ada and Symptoma. The primary outcome was the diagnostic accuracy of SCs. In total, 6 blinded independent expert raters classified diagnostic concordance of SC suggestions with the final discharge diagnosis as (1) identical, (2) plausible, or (3) diagnostically different. SC suggestions per patient were additionally classified as safe or potentially life-threatening, and the concordance of Ada's and physician-based triage category was assessed. Secondary outcomes were SC usability (5-point Likert-scale: 1=very easy to use to 5=very difficult to use) and SC acceptance net promoter score (NPS). Results: A total of 450 patients completed the study between April and November 2021. The most common chief complaint was chest pain (160/437, 37\%). The identical diagnosis was ranked first (or within the top 5 diagnoses) by Ada and Symptoma in 14\% (59/437; 27\%, 117/437) and 4\% (16/437; 13\%, 55/437) of patients, respectively. An identical or plausible diagnosis was ranked first (or within the top 5 diagnoses) by Ada and Symptoma in 58\% (253/437; 75\%, 329/437) and 38\% (164/437; 64\%, 281/437) of patients, respectively. Ada and Symptoma did not suggest potentially life-threatening diagnoses in 13\% (56/437) and 14\% (61/437) of patients, respectively. Ada correctly triaged, undertriaged, and overtriaged 34\% (149/437), 13\% (58/437), and 53\% (230/437) of patients, respectively. A total of 88\% (385/437) and 78\% (342/437) of participants rated Ada and Symptoma as very easy or easy to use, respectively. Ada's NPS was --34 (55\% [239/437] detractors; 21\% [93/437] promoters) and Symptoma's NPS was --47 (63\% [275/437] detractors and 16\% [70/437]) promoters. Conclusions: Ada demonstrated a higher diagnostic accuracy than Symptoma, and substantially more patients would recommend Ada and assessed Ada as easy to use. The high number of unrecognized potentially life-threatening diagnoses by both SCs and inappropriate triage advice by Ada was alarming. Overall, the trustworthiness of SC recommendations appears questionable. SC authorization should necessitate rigorous clinical evaluation studies to prevent misdiagnoses, fatal triage advice, and misuse of scarce medical resources. Trial Registration: German Register of Clinical Trials DRKS00024830; https://drks.de/search/en/trial/DRKS00024830 ", doi="10.2196/56514", url="https://www.jmir.org/2024/1/e56514" } @Article{info:doi/10.2196/50217, author="Bradshaw, Andy and Birtwistle, Jacqueline and Evans, J. Catherine and Sleeman, E. Katherine and Richards, Suzanne and Foy, Robbie and Millares Martin, Pablo and Carder, Paul and Allsop, J. Matthew and Twiddy, Maureen", title="Factors Influencing the Implementation of Digital Advance Care Planning: Qualitative Interview Study", journal="J Med Internet Res", year="2024", month="Aug", day="16", volume="26", pages="e50217", keywords="palliative care", keywords="electronic palliative care coordination systems", keywords="electronic health record systems", keywords="advance care planning", keywords="end of life care", keywords="technology", keywords="Normalization Process Theory", keywords="NPT", keywords="qualitative", abstract="Background: Palliative care aims to improve the quality of life for people with life-limiting illnesses. Advance care planning conversations that establish a patient's wishes and preferences for care are part of a person-centered approach. Internationally, electronic health record systems are digital interventions used to record and share patients' advance care plans across health care services and settings. They aim to provide tools that support electronic information sharing and care coordination. Within the United Kingdom, Electronic Palliative Care Coordination Systems (EPaCCS) are an example of this. Despite over a decade of policy promoting EPaCCS nationally, there has been limited implementation and consistently low levels of use by health professionals. Objective: The aim of this study is to explore the factors that influence the implementation of EPaCCS into routine clinical practice across different care services and settings in 2 major regions of England. Methods: A qualitative interview study design was used, guided by Normalization Process Theory (NPT). NPT explores factors affecting the implementation of complex interventions and consists of 4 primary components (coherence, cognitive participation, collective action, and reflexive monitoring). Health care and social care practitioners were purposively sampled based on their professional role and work setting. Individual web-based semistructured interviews were conducted. Data were analyzed using thematic framework analysis to explore issues which affected the implementation of EPaCCS across different settings at individual, team, organizational, and technical levels. Results: Participants (N=52) representing a range of professional roles were recruited across 6 care settings (hospice, primary care, care home, hospital, ambulatory, and community). In total, 6 themes were developed which mapped onto the 4 primary components of NPT and represented the multilevel influences affecting implementation. At an individual level, these included (1) EPaCCS providing a clear and distinct way of working and (2) collective contributions and buy-in. At a team and organizational level, these included (3) embedding EPaCCS into everyday practice and (4) championing driving implementation. At a technical level, these included (5) electronic functionality, interoperability, and access. Breakdowns in implementation at different levels led to variations in (6) confidence and trust in EPaCCS in terms of record accuracy and availability of access. Conclusions: EPaCCS implementation is influenced by individual, organizational, and technical factors. Key challenges include problems with access alongside inconsistent use and engagement across care settings. EPaCCS, in their current format as digital advance care planning systems are not consistently facilitating electronic information sharing and care coordination. A redesign of EPaCCS is likely to be necessary to determine configurations for their optimal implementation across different settings and locations. This includes supporting health care practitioners to document, access, use, and share information across multiple care settings. Lessons learned are relevant to other forms of digital advance care planning approaches being developed internationally. ", doi="10.2196/50217", url="https://www.jmir.org/2024/1/e50217" } @Article{info:doi/10.2196/57670, author="Zheng, Yingbin and Cai, Yunping and Yan, Yiwei and Chen, Sai and Gong, Kai", title="Novel Approach to Personalized Physician Recommendations Using Semantic Features and Response Metrics: Model Evaluation Study", journal="JMIR Hum Factors", year="2024", month="Aug", day="15", volume="11", pages="e57670", keywords="web-based medical service", keywords="text analysis", keywords="Sentence Bidirectional Encoder Representations From Transformers", keywords="SBERT", keywords="smart triage systems", keywords="patient-physician hybrid recommendation", keywords="PPHR", keywords="PPHR model", abstract="Background: The rapid growth of web-based medical services has highlighted the significance of smart triage systems in helping patients find the most appropriate physicians. However, traditional triage methods often rely on department recommendations and are insufficient to accurately match patients' textual questions with physicians' specialties. Therefore, there is an urgent need to develop algorithms for recommending physicians. Objective: This study aims to develop and validate a patient-physician hybrid recommendation (PPHR) model with response metrics for better triage performance. Methods: A total of 646,383 web-based medical consultation records from the Internet Hospital of the First Affiliated Hospital of Xiamen University were collected. Semantic features representing patients and physicians were developed to identify the set of most similar questions and semantically expand the pool of recommended physician candidates, respectively. The physicians' response rate feature was designed to improve candidate rankings. These 3 characteristics combine to create the PPHR model. Overall, 5 physicians participated in the evaluation of the efficiency of the PPHR model through multiple metrics and questionnaires as well as the performance of Sentence Bidirectional Encoder Representations from Transformers and Doc2Vec in text embedding. Results: The PPHR model reaches the best recommendation performance when the number of recommended physicians is 14. At this point, the model has an F1-score of 76.25\%, a proportion of high-quality services of 41.05\%, and a rating of 3.90. After removing physicians' characteristics and response rates from the PPHR model, the F1-score decreased by 12.05\%, the proportion of high-quality services fell by 10.87\%, the average hit ratio dropped by 1.06\%, and the rating declined by 11.43\%. According to whether those 5 physicians were recommended by the PPHR model, Sentence Bidirectional Encoder Representations from Transformers achieved an average hit ratio of 88.6\%, while Doc2Vec achieved an average hit ratio of 53.4\%. Conclusions: The PPHR model uses semantic features and response metrics to enable patients to accurately find the physician who best suits their needs. ", doi="10.2196/57670", url="https://humanfactors.jmir.org/2024/1/e57670", url="http://www.ncbi.nlm.nih.gov/pubmed/39146009" } @Article{info:doi/10.2196/57162, author="Szumilas, Dawid and Ochmann, Anna and Zi?ba, Katarzyna and Bartoszewicz, Bart?omiej and Kubrak, Anna and Makuch, Sebastian and Agrawal, Siddarth and Mazur, Grzegorz and Chudek, Jerzy", title="Evaluation of AI-Driven LabTest Checker for Diagnostic Accuracy and Safety: Prospective Cohort Study", journal="JMIR Med Inform", year="2024", month="Aug", day="14", volume="12", pages="e57162", keywords="LabTest Checker", keywords="CDSS", keywords="symptom checker", keywords="laboratory testing", keywords="AI", keywords="assessment", keywords="accuracy", keywords="artificial intelligence", keywords="health care", keywords="medical fields", keywords="clinical decision support systems", keywords="application", keywords="applications", keywords="diagnoses", keywords="patients", keywords="patient", keywords="medical history", keywords="tool", keywords="tools", abstract="Background: In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields, with the use of clinical decision support systems (CDSSs) as a notable application. Laboratory tests are vital for accurate diagnoses, but their increasing reliance presents challenges. The need for effective strategies for managing laboratory test interpretation is evident from the millions of monthly searches on test results' significance. As the potential role of CDSSs in laboratory diagnostics gains significance, however, more research is needed to explore this area. Objective: The primary objective of our study was to assess the accuracy and safety of LabTest Checker (LTC), a CDSS designed to support medical diagnoses by analyzing both laboratory test results and patients' medical histories. Methods: This cohort study embraced a prospective data collection approach. A total of 101 patients aged ?18 years, in stable condition, and requiring comprehensive diagnosis were enrolled. A panel of blood laboratory tests was conducted for each participant. Participants used LTC for test result interpretation. The accuracy and safety of the tool were assessed by comparing AI-generated suggestions to experienced doctor (consultant) recommendations, which are considered the gold standard. Results: The system achieved a 74.3\% accuracy and 100\% sensitivity for emergency safety and 92.3\% sensitivity for urgent cases. It potentially reduced unnecessary medical visits by 41.6\% (42/101) and achieved an 82.9\% accuracy in identifying underlying pathologies. Conclusions: This study underscores the transformative potential of AI-based CDSSs in laboratory diagnostics, contributing to enhanced patient care, efficient health care systems, and improved medical outcomes. LTC's performance evaluation highlights the advancements in AI's role in laboratory medicine. Trial Registration: ClinicalTrials.gov NCT05813938; https://clinicaltrials.gov/study/NCT05813938 ", doi="10.2196/57162", url="https://medinform.jmir.org/2024/1/e57162" } @Article{info:doi/10.2196/52506, author="Wang, Yueye and Han, Xiaotong and Li, Cong and Luo, Lixia and Yin, Qiuxia and Zhang, Jian and Peng, Guankai and Shi, Danli and He, Mingguang", title="Impact of Gold-Standard Label Errors on Evaluating Performance of Deep Learning Models in Diabetic Retinopathy Screening: Nationwide Real-World Validation Study", journal="J Med Internet Res", year="2024", month="Aug", day="14", volume="26", pages="e52506", keywords="artificial intelligence", keywords="diabetic retinopathy", keywords="diabetes", keywords="real world", keywords="deep learning", abstract="Background: For medical artificial intelligence (AI) training and validation, human expert labels are considered the gold standard that represents the correct answers or desired outputs for a given data set. These labels serve as a reference or benchmark against which the model's predictions are compared. Objective: This study aimed to assess the accuracy of a custom deep learning (DL) algorithm on classifying diabetic retinopathy (DR) and further demonstrate how label errors may contribute to this assessment in a nationwide DR-screening program. Methods: Fundus photographs from the Lifeline Express, a nationwide DR-screening program, were analyzed to identify the presence of referable DR using both (1) manual grading by National Health Service England--certificated graders and (2) a DL-based DR-screening algorithm with validated good lab performance. To assess the accuracy of labels, a random sample of images with disagreement between the DL algorithm and the labels was adjudicated by ophthalmologists who were masked to the previous grading results. The error rates of labels in this sample were then used to correct the number of negative and positive cases in the entire data set, serving as postcorrection labels. The DL algorithm's performance was evaluated against both pre- and postcorrection labels. Results: The analysis included 736,083 images from 237,824 participants. The DL algorithm exhibited a gap between the real-world performance and the lab-reported performance in this nationwide data set, with a sensitivity increase of 12.5\% (from 79.6\% to 92.5\%, P<.001) and a specificity increase of 6.9\% (from 91.6\% to 98.5\%, P<.001). In the random sample, 63.6\% (560/880) of negative images and 5.2\% (140/2710) of positive images were misclassified in the precorrection human labels. High myopia was the primary reason for misclassifying non-DR images as referable DR images, while laser spots were predominantly responsible for misclassified referable cases. The estimated label error rate for the entire data set was 1.2\%. The label correction was estimated to bring about a 12.5\% enhancement in the estimated sensitivity of the DL algorithm (P<.001). Conclusions: Label errors based on human image grading, although in a small percentage, can significantly affect the performance evaluation of DL algorithms in real-world DR screening. ", doi="10.2196/52506", url="https://www.jmir.org/2024/1/e52506" } @Article{info:doi/10.2196/51706, author="Chen, Binjun and Li, Yike and Sun, Yu and Sun, Haojie and Wang, Yanmei and Lyu, Jihan and Guo, Jiajie and Bao, Shunxing and Cheng, Yushu and Niu, Xun and Yang, Lian and Xu, Jianghong and Yang, Juanmei and Huang, Yibo and Chi, Fanglu and Liang, Bo and Ren, Dongdong", title="A 3D and Explainable Artificial Intelligence Model for Evaluation of Chronic Otitis Media Based on Temporal Bone Computed Tomography: Model Development, Validation, and Clinical Application", journal="J Med Internet Res", year="2024", month="Aug", day="8", volume="26", pages="e51706", keywords="artificial intelligence", keywords="cholesteatoma", keywords="deep learning", keywords="otitis media", keywords="tomography, x-ray computed", keywords="machine learning", keywords="mastoidectomy", keywords="convolutional neural networks", keywords="temporal bone", abstract="Background: Temporal bone computed tomography (CT) helps diagnose chronic otitis media (COM). However, its interpretation requires training and expertise. Artificial intelligence (AI) can help clinicians evaluate COM through CT scans, but existing models lack transparency and may not fully leverage multidimensional diagnostic information. Objective: We aimed to develop an explainable AI system based on 3D convolutional neural networks (CNNs) for automatic CT-based evaluation of COM. Methods: Temporal bone CT scans were retrospectively obtained from patients operated for COM between December 2015 and July 2021 at 2 independent institutes. A region of interest encompassing the middle ear was automatically segmented, and 3D CNNs were subsequently trained to identify pathological ears and cholesteatoma. An ablation study was performed to refine model architecture. Benchmark tests were conducted against a baseline 2D model and 7 clinical experts. Model performance was measured through cross-validation and external validation. Heat maps, generated using Gradient-Weighted Class Activation Mapping, were used to highlight critical decision-making regions. Finally, the AI system was assessed with a prospective cohort to aid clinicians in preoperative COM assessment. Results: Internal and external data sets contained 1661 and 108 patients (3153 and 211 eligible ears), respectively. The 3D model exhibited decent performance with mean areas under the receiver operating characteristic curves of 0.96 (SD 0.01) and 0.93 (SD 0.01), and mean accuracies of 0.878 (SD 0.017) and 0.843 (SD 0.015), respectively, for detecting pathological ears on the 2 data sets. Similar outcomes were observed for cholesteatoma identification (mean area under the receiver operating characteristic curve 0.85, SD 0.03 and 0.83, SD 0.05; mean accuracies 0.783, SD 0.04 and 0.813, SD 0.033, respectively). The proposed 3D model achieved a commendable balance between performance and network size relative to alternative models. It significantly outperformed the 2D approach in detecting COM (P?.05) and exhibited a substantial gain in identifying cholesteatoma (P<.001). The model also demonstrated superior diagnostic capabilities over resident fellows and the attending otologist (P<.05), rivaling all senior clinicians in both tasks. The generated heat maps properly highlighted the middle ear and mastoid regions, aligning with human knowledge in interpreting temporal bone CT. The resulting AI system achieved an accuracy of 81.8\% in generating preoperative diagnoses for 121 patients and contributed to clinical decision-making in 90.1\% cases. Conclusions: We present a 3D CNN model trained to detect pathological changes and identify cholesteatoma via temporal bone CT scans. In both tasks, this model significantly outperforms the baseline 2D approach, achieving levels comparable with or surpassing those of human experts. The model also exhibits decent generalizability and enhanced comprehensibility. This AI system facilitates automatic COM assessment and shows promising viability in real-world clinical settings. These findings underscore AI's potential as a valuable aid for clinicians in COM evaluation. Trial Registration: Chinese Clinical Trial Registry ChiCTR2000036300; https://www.chictr.org.cn/showprojEN.html?proj=58685 ", doi="10.2196/51706", url="https://www.jmir.org/2024/1/e51706" } @Article{info:doi/10.2196/53338, author="Subramanian, Devika and Sonabend, Rona and Singh, Ila", title="A Machine Learning Model for Risk Stratification of Postdiagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: Retrospective Study", journal="JMIR Diabetes", year="2024", month="Aug", day="7", volume="9", pages="e53338", keywords="pediatric type 1 diabetes", keywords="postdiagnosis diabetic ketoacidosis", keywords="risk prediction and stratification", keywords="XGBoost", keywords="Shapley values", keywords="ketoacidosis", keywords="risks", keywords="predict", keywords="prediction", keywords="predictive", keywords="gradient-boosted ensemble model", keywords="diabetes", keywords="pediatrics", keywords="children", keywords="machine learning", abstract="Background: Diabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality in pediatric type 1 diabetes (T1D), occurring in approximately 20\% of patients, with an economic cost of \$5.1 billion/year in the United States. Despite multiple risk factors for postdiagnosis DKA, there is still a need for explainable, clinic-ready models that accurately predict DKA hospitalization in established patients with pediatric T1D. Objective: We aimed to develop an interpretable machine learning model to predict the risk of postdiagnosis DKA hospitalization in children with T1D using routinely collected time-series of electronic health record (EHR) data. Methods: We conducted a retrospective case-control study using EHR data from 1787 patients from among 3794 patients with T1D treated at a large tertiary care US pediatric health system from January 2010 to June 2018. We trained a state-of-the-art; explainable, gradient-boosted ensemble (XGBoost) of decision trees with 44 regularly collected EHR features to predict postdiagnosis DKA. We measured the model's predictive performance using the area under the receiver operating characteristic curve--weighted F1-score, weighted precision, and recall, in a 5-fold cross-validation setting. We analyzed Shapley values to interpret the learned model and gain insight into its predictions. Results: Our model distinguished the cohort that develops DKA postdiagnosis from the one that does not (P<.001). It predicted postdiagnosis DKA risk with an area under the receiver operating characteristic curve of 0.80 (SD 0.04), a weighted F1-score of 0.78 (SD 0.04), and a weighted precision and recall of 0.83 (SD 0.03) and 0.76 (SD 0.05) respectively, using a relatively short history of data from routine clinic follow-ups post diagnosis. On analyzing Shapley values of the model output, we identified key risk factors predicting postdiagnosis DKA both at the cohort and individual levels. We observed sharp changes in postdiagnosis DKA risk with respect to 2 key features (diabetes age and glycated hemoglobin at 12 months), yielding time intervals and glycated hemoglobin cutoffs for potential intervention. By clustering model-generated Shapley values, we automatically stratified the cohort into 3 groups with 5\%, 20\%, and 48\% risk of postdiagnosis DKA. Conclusions: We have built an explainable, predictive, machine learning model with potential for integration into clinical workflow. The model risk-stratifies patients with pediatric T1D and identifies patients with the highest postdiagnosis DKA risk using limited follow-up data starting from the time of diagnosis. The model identifies key time points and risk factors to direct clinical interventions at both the individual and cohort levels. Further research with data from multiple hospital systems can help us assess how well our model generalizes to other populations. The clinical importance of our work is that the model can predict patients most at risk for postdiagnosis DKA and identify preventive interventions based on mitigation of individualized risk factors. ", doi="10.2196/53338", url="https://diabetes.jmir.org/2024/1/e53338", url="http://www.ncbi.nlm.nih.gov/pubmed/39110490" } @Article{info:doi/10.2196/48584, author="Nare, Matthew and Jurewicz, Katherina", title="Assessing Patient Trust in Automation in Health Care Systems: Within-Subjects Experimental Study", journal="JMIR Hum Factors", year="2024", month="Aug", day="6", volume="11", pages="e48584", keywords="automation", keywords="emergency department", keywords="trust", keywords="health care", keywords="artificial intelligence", keywords="emergency", keywords="perceptions", keywords="attitude", keywords="opinions", keywords="belief", keywords="automated", keywords="trust ratings", abstract="Background: Health care technology has the ability to change patient outcomes for the betterment when designed appropriately. Automation is becoming smarter and is increasingly being integrated into health care work systems. Objective: This study focuses on investigating trust between patients and an automated cardiac risk assessment tool (CRAT) in a simulated emergency department setting. Methods: A within-subjects experimental study was performed to investigate differences in automation modes for the CRAT: (1) no automation, (2) automation only, and (3) semiautomation. Participants were asked to enter their simulated symptoms for each scenario into the CRAT as instructed by the experimenter, and they would automatically be classified as high, medium, or low risk depending on the symptoms entered. Participants were asked to provide their trust ratings for each combination of risk classification and automation mode on a scale of 1 to 10 (1=absolutely no trust and 10=complete trust). Results: Results from this study indicate that the participants significantly trusted the semiautomation condition more compared to the automation-only condition (P=.002), and they trusted the no automation condition significantly more than the automation-only condition (P=.03). Additionally, participants significantly trusted the CRAT more in the high-severity scenario compared to the medium-severity scenario (P=.004). Conclusions: The findings from this study emphasize the importance of the human component of automation when designing automated technology in health care systems. Automation and artificially intelligent systems are becoming more prevalent in health care systems, and this work emphasizes the need to consider the human element when designing automation into care delivery. ", doi="10.2196/48584", url="https://humanfactors.jmir.org/2024/1/e48584", url="http://www.ncbi.nlm.nih.gov/pubmed/39106096" } @Article{info:doi/10.2196/57224, author="Dingel, Julius and Kleine, Anne-Kathrin and Cecil, Julia and Sigl, Leonie Anna and Lermer, Eva and Gaube, Susanne", title="Predictors of Health Care Practitioners' Intention to Use AI-Enabled Clinical Decision Support Systems: Meta-Analysis Based on the Unified Theory of Acceptance and Use of Technology", journal="J Med Internet Res", year="2024", month="Aug", day="5", volume="26", pages="e57224", keywords="Unified Theory of Acceptance and Use of Technology", keywords="UTAUT", keywords="artificial intelligence--enabled clinical decision support systems", keywords="AI-CDSSs", keywords="meta-analysis", keywords="health care practitioners", abstract="Background: Artificial intelligence--enabled clinical decision support systems (AI-CDSSs) offer potential for improving health care outcomes, but their adoption among health care practitioners remains limited. Objective: This meta-analysis identified predictors influencing health care practitioners' intention to use AI-CDSSs based on the Unified Theory of Acceptance and Use of Technology (UTAUT). Additional predictors were examined based on existing empirical evidence. Methods: The literature search using electronic databases, forward searches, conference programs, and personal correspondence yielded 7731 results, of which 17 (0.22\%) studies met the inclusion criteria. Random-effects meta-analysis, relative weight analyses, and meta-analytic moderation and mediation analyses were used to examine the relationships between relevant predictor variables and the intention to use AI-CDSSs. Results: The meta-analysis results supported the application of the UTAUT to the context of the intention to use AI-CDSSs. The results showed that performance expectancy (r=0.66), effort expectancy (r=0.55), social influence (r=0.66), and facilitating conditions (r=0.66) were positively associated with the intention to use AI-CDSSs, in line with the predictions of the UTAUT. The meta-analysis further identified positive attitude (r=0.63), trust (r=0.73), anxiety (r=--0.41), perceived risk (r=--0.21), and innovativeness (r=0.54) as additional relevant predictors. Trust emerged as the most influential predictor overall. The results of the moderation analyses show that the relationship between social influence and use intention becomes weaker with increasing age. In addition, the relationship between effort expectancy and use intention was stronger for diagnostic AI-CDSSs than for devices that combined diagnostic and treatment recommendations. Finally, the relationship between facilitating conditions and use intention was mediated through performance and effort expectancy. Conclusions: This meta-analysis contributes to the understanding of the predictors of intention to use AI-CDSSs based on an extended UTAUT model. More research is needed to substantiate the identified relationships and explain the observed variations in effect sizes by identifying relevant moderating factors. The research findings bear important implications for the design and implementation of training programs for health care practitioners to ease the adoption of AI-CDSSs into their practice. ", doi="10.2196/57224", url="https://www.jmir.org/2024/1/e57224" } @Article{info:doi/10.2196/49655, author="Kamel Rahimi, Amir and Pienaar, Oliver and Ghadimi, Moji and Canfell, J. Oliver and Pole, D. Jason and Shrapnel, Sally and van der Vegt, H. Anton and Sullivan, Clair", title="Implementing AI in Hospitals to Achieve a Learning Health System: Systematic Review of Current Enablers and Barriers", journal="J Med Internet Res", year="2024", month="Aug", day="2", volume="26", pages="e49655", keywords="life cycle", keywords="medical informatics", keywords="decision support system", keywords="clinical", keywords="electronic health records", keywords="artificial intelligence", keywords="machine learning", keywords="routinely collected health data", abstract="Background: Efforts are underway to capitalize on the computational power of the data collected in electronic medical records (EMRs) to achieve a learning health system (LHS). Artificial intelligence (AI) in health care has promised to improve clinical outcomes, and many researchers are developing AI algorithms on retrospective data sets. Integrating these algorithms with real-time EMR data is rare. There is a poor understanding of the current enablers and barriers to empower this shift from data set--based use to real-time implementation of AI in health systems. Exploring these factors holds promise for uncovering actionable insights toward the successful integration of AI into clinical workflows. Objective: The first objective was to conduct a systematic literature review to identify the evidence of enablers and barriers regarding the real-world implementation of AI in hospital settings. The second objective was to map the identified enablers and barriers to a 3-horizon framework to enable the successful digital health transformation of hospitals to achieve an LHS. Methods: The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were adhered to. PubMed, Scopus, Web of Science, and IEEE Xplore were searched for studies published between January 2010 and January 2022. Articles with case studies and guidelines on the implementation of AI analytics in hospital settings using EMR data were included. We excluded studies conducted in primary and community care settings. Quality assessment of the identified papers was conducted using the Mixed Methods Appraisal Tool and ADAPTE frameworks. We coded evidence from the included studies that related to enablers of and barriers to AI implementation. The findings were mapped to the 3-horizon framework to provide a road map for hospitals to integrate AI analytics. Results: Of the 1247 studies screened, 26 (2.09\%) met the inclusion criteria. In total, 65\% (17/26) of the studies implemented AI analytics for enhancing the care of hospitalized patients, whereas the remaining 35\% (9/26) provided implementation guidelines. Of the final 26 papers, the quality of 21 (81\%) was assessed as poor. A total of 28 enablers was identified; 8 (29\%) were new in this study. A total of 18 barriers was identified; 5 (28\%) were newly found. Most of these newly identified factors were related to information and technology. Actionable recommendations for the implementation of AI toward achieving an LHS were provided by mapping the findings to a 3-horizon framework. Conclusions: Significant issues exist in implementing AI in health care. Shifting from validating data sets to working with live data is challenging. This review incorporated the identified enablers and barriers into a 3-horizon framework, offering actionable recommendations for implementing AI analytics to achieve an LHS. The findings of this study can assist hospitals in steering their strategic planning toward successful adoption of AI. ", doi="10.2196/49655", url="https://www.jmir.org/2024/1/e49655" } @Article{info:doi/10.2196/56924, author="Katzburg, Omer and Roimi, Michael and Frenkel, Amit and Ilan, Roy and Bitan, Yuval", title="The Impact of Information Relevancy and Interactivity on Intensivists' Trust in a Machine Learning--Based Bacteremia Prediction System: Simulation Study", journal="JMIR Hum Factors", year="2024", month="Aug", day="1", volume="11", pages="e56924", keywords="user-interface design", keywords="user-interface designs", keywords="user interface", keywords="human-automation interaction", keywords="human-automation interactions", keywords="trust in automation", keywords="automation", keywords="human-computer interaction", keywords="human-computer interactions", keywords="human-ML", keywords="human-ML interaction", keywords="human-ML interactions", keywords="decision making", keywords="decision support system", keywords="clinical decision support", keywords="decision support", keywords="decision support systems", keywords="machine learning", keywords="ML", keywords="artificial intelligence", keywords="AI", keywords="machine learning algorithm", keywords="machine learning algorithms", keywords="digitization", keywords="digitization of information", abstract="Background: The exponential growth in computing power and the increasing digitization of information have substantially advanced the machine learning (ML) research field. However, ML algorithms are often considered ``black boxes,'' and this fosters distrust. In medical domains, in which mistakes can result in fatal outcomes, practitioners may be especially reluctant to trust ML algorithms. Objective: The aim of this study is to explore the effect of user-interface design features on intensivists' trust in an ML-based clinical decision support system. Methods: A total of 47 physicians from critical care specialties were presented with 3 patient cases of bacteremia in the setting of an ML-based simulation system. Three conditions of the simulation were tested according to combinations of information relevancy and interactivity. Participants' trust in the system was assessed by their agreement with the system's prediction and a postexperiment questionnaire. Linear regression models were applied to measure the effects. Results: Participants' agreement with the system's prediction did not differ according to the experimental conditions. However, in the postexperiment questionnaire, higher information relevancy ratings and interactivity ratings were associated with higher perceived trust in the system (P<.001 for both). The explicit visual presentation of the features of the ML algorithm on the user interface resulted in lower trust among the participants (P=.05). Conclusions: Information relevancy and interactivity features should be considered in the design of the user interface of ML-based clinical decision support systems to enhance intensivists' trust. This study sheds light on the connection between information relevancy, interactivity, and trust in human-ML interaction, specifically in the intensive care unit environment. ", doi="10.2196/56924", url="https://humanfactors.jmir.org/2024/1/e56924" } @Article{info:doi/10.2196/54009, author="Hassan, Ayman and Benlamri, Rachid and Diner, Trina and Cristofaro, Keli and Dillistone, Lucas and Khallouki, Hajar and Ahghari, Mahvareh and Littlefield, Shalyn and Siddiqui, Rabail and MacDonald, Russell and Savage, W. David", title="An App for Navigating Patient Transportation and Acute Stroke Care in Northwestern Ontario Using Machine Learning: Retrospective Study", journal="JMIR Form Res", year="2024", month="Aug", day="1", volume="8", pages="e54009", keywords="stroke care", keywords="acute stroke", keywords="northwestern", keywords="Ontario", keywords="prediction", keywords="models", keywords="machine learning", keywords="stroke", keywords="cardiovascular", keywords="brain", keywords="neuroscience", keywords="TIA", keywords="transient ischemic attack", keywords="coordinated care", keywords="navigation", keywords="navigating", keywords="mHealth", keywords="mobile health", keywords="app", keywords="apps", keywords="applications", keywords="geomapping", keywords="geography", keywords="geographical", keywords="location", keywords="spatial", keywords="predict", keywords="predictions", keywords="predictive", abstract="Background: A coordinated care system helps provide timely access to treatment for suspected acute stroke. In Northwestern Ontario (NWO), Canada, communities are widespread with several hospitals offering various diagnostic equipment and services. Thus, resources are limited, and health care providers must often transfer patients with stroke to different hospital locations to ensure the most appropriate care access within recommended time frames. However, health care providers frequently situated temporarily (locum) in NWO or providing care remotely from other areas of Ontario may lack sufficient information and experience in the region to access care for a patient with a time-sensitive condition. Suboptimal decision-making may lead to multiple transfers before definitive stroke care is obtained, resulting in poor outcomes and additional health care system costs. Objective: We aimed to develop a tool to inform and assist NWO health care providers in determining the best transfer options for patients with stroke to provide the most efficient care access. We aimed to develop an app using a comprehensive geomapping navigation and estimation system based on machine learning algorithms. This app uses key stroke-related timelines including the last time the patient was known to be well, patient location, treatment options, and imaging availability at different health care facilities. Methods: Using historical data (2008-2020), an accurate prediction model using machine learning methods was developed and incorporated into a mobile app. These data contained parameters regarding air (Ornge) and land medical transport (3 services), which were preprocessed and cleaned. For cases in which Ornge air services and land ambulance medical transport were both involved in a patient transport process, data were merged and time intervals of the transport journey were determined. The data were distributed for training (35\%), testing (35\%), and validation (30\%) of the prediction model. Results: In total, 70,623 records were collected in the data set from Ornge and land medical transport services to develop a prediction model. Various learning models were analyzed; all learning models perform better than the simple average of all points in predicting output variables. The decision tree model provided more accurate results than the other models. The decision tree model performed remarkably well, with the values from testing, validation, and the model within a close range. This model was used to develop the ``NWO Navigate Stroke'' system. The system provides accurate results and demonstrates that a mobile app can be a significant tool for health care providers navigating stroke care in NWO, potentially impacting patient care and outcomes. Conclusions: The NWO Navigate Stroke system uses a data-driven, reliable, accurate prediction model while considering all variations and is simultaneously linked to all required acute stroke management pathways and tools. It was tested using historical data, and the next step will to involve usability testing with end users. ", doi="10.2196/54009", url="https://formative.jmir.org/2024/1/e54009" } @Article{info:doi/10.2196/53562, author="Pellemans, Mathijs and Salmi, Salim and M{\'e}relle, Saskia and Janssen, Wilco and van der Mei, Rob", title="Automated Behavioral Coding to Enhance the Effectiveness of Motivational Interviewing in a Chat-Based Suicide Prevention Helpline: Secondary Analysis of a Clinical Trial", journal="J Med Internet Res", year="2024", month="Aug", day="1", volume="26", pages="e53562", keywords="motivational interviewing", keywords="behavioral coding", keywords="suicide prevention", keywords="artificial intelligence", keywords="effectiveness", keywords="counseling", keywords="support tool", keywords="online help", keywords="mental health", abstract="Background: With the rise of computer science and artificial intelligence, analyzing large data sets promises enormous potential in gaining insights for developing and improving evidence-based health interventions. One such intervention is the counseling strategy motivational interviewing (MI), which has been found effective in improving a wide range of health-related behaviors. Despite the simplicity of its principles, MI can be a challenging skill to learn and requires expertise to apply effectively. Objective: This study aims to investigate the performance of artificial intelligence models in classifying MI behavior and explore the feasibility of using these models in online helplines for mental health as an automated support tool for counselors in clinical practice. Methods: We used a coded data set of 253 MI counseling chat sessions from the 113 Suicide Prevention helpline. With 23,982 messages coded with the MI Sequential Code for Observing Process Exchanges codebook, we trained and evaluated 4 machine learning models and 1 deep learning model to classify client- and counselor MI behavior based on language use. Results: The deep learning model BERTje outperformed all machine learning models, accurately predicting counselor behavior (accuracy=0.72, area under the curve [AUC]=0.95, Cohen $\kappa$=0.69). It differentiated MI congruent and incongruent counselor behavior (AUC=0.92, $\kappa$=0.65) and evocative and nonevocative language (AUC=0.92, $\kappa$=0.66). For client behavior, the model achieved an accuracy of 0.70 (AUC=0.89, $\kappa$=0.55). The model's interpretable predictions discerned client change talk and sustain talk, counselor affirmations, and reflection types, facilitating valuable counselor feedback. Conclusions: The results of this study demonstrate that artificial intelligence techniques can accurately classify MI behavior, indicating their potential as a valuable tool for enhancing MI proficiency in online helplines for mental health. Provided that the data set size is sufficiently large with enough training samples for each behavioral code, these methods can be trained and applied to other domains and languages, offering a scalable and cost-effective way to evaluate MI adherence, accelerate behavioral coding, and provide therapists with personalized, quick, and objective feedback. ", doi="10.2196/53562", url="https://www.jmir.org/2024/1/e53562", url="http://www.ncbi.nlm.nih.gov/pubmed/39088244" } @Article{info:doi/10.2196/50849, author="den Braber, Niala and Braem, R. Carlijn I. and Vollenbroek-Hutten, R. Miriam M. and Hermens, J. Hermie and Urgert, Thomas and Yavuz, S. Utku and Veltink, H. Peter and Laverman, D. Gozewijn", title="Consequences of Data Loss on Clinical Decision-Making in Continuous Glucose Monitoring: Retrospective Cohort Study", journal="Interact J Med Res", year="2024", month="Jul", day="31", volume="13", pages="e50849", keywords="continuous glucose monitoring", keywords="missing data", keywords="clinical decision-making", keywords="clinical targets", keywords="time below range", keywords="TBR", keywords="diabetes mellitus", keywords="data interpretation", keywords="clinical practice", keywords="data analysis", keywords="continuous glucose monitoring metrics", keywords="glucose", keywords="diabetes", keywords="diabetic", keywords="metrics", keywords="data loss", keywords="decision-making", keywords="decision support", keywords="missing values", keywords="data science", abstract="Background: The impact of missing data on individual continuous glucose monitoring (CGM) data is unknown but can influence clinical decision-making for patients. Objective: We aimed to investigate the consequences of data loss on glucose metrics in individual patient recordings from continuous glucose monitors and assess its implications on clinical decision-making. Methods: The CGM data were collected from patients with type 1 and 2 diabetes using the FreeStyle Libre sensor (Abbott Diabetes Care). We selected 7-28 days of 24 hours of continuous data without any missing values from each individual patient. To mimic real-world data loss, missing data ranging from 5\% to 50\% were introduced into the data set. From this modified data set, clinical metrics including time below range (TBR), TBR level 2 (TBR2), and other common glucose metrics were calculated in the data sets with and that without data loss. Recordings in which glucose metrics deviated relevantly due to data loss, as determined by clinical experts, were defined as expert panel boundary error ($\epsilon$EPB). These errors were expressed as a percentage of the total number of recordings. The errors for the recordings with glucose management indicator <53 mmol/mol were investigated. Results: A total of 84 patients contributed to 798 recordings over 28 days. With 5\%-50\% data loss for 7-28 days recordings, the $\epsilon$EPB varied from 0 out of 798 (0.0\%) to 147 out of 736 (20.0\%) for TBR and 0 out of 612 (0.0\%) to 22 out of 408 (5.4\%) recordings for TBR2. In the case of 14-day recordings, TBR and TBR2 episodes completely disappeared due to 30\% data loss in 2 out of 786 (0.3\%) and 32 out of 522 (6.1\%) of the cases, respectively. However, the initial values of the disappeared TBR and TBR2 were relatively small (<0.1\%). In the recordings with glucose management indicator <53 mmol/mol the $\epsilon$EPB was 9.6\% for 14 days with 30\% data loss. Conclusions: With a maximum of 30\% data loss in 14-day CGM recordings, there is minimal impact of missing data on the clinical interpretation of various glucose metrics. Trial Registration: ClinicalTrials.gov NCT05584293; https://clinicaltrials.gov/study/NCT05584293 ", doi="10.2196/50849", url="https://www.i-jmr.org/2024/1/e50849" } @Article{info:doi/10.2196/50067, author="Yang, Jingang and Li, Yingxue and Li, Xiang and Tao, Shuiying and Zhang, Yuan and Chen, Tiange and Xie, Guotong and Xu, Haiyan and Gao, Xiaojin and Yang, Yuejin", title="A Machine Learning Model for Predicting In-Hospital Mortality in Chinese Patients With ST-Segment Elevation Myocardial Infarction: Findings From the China Myocardial Infarction Registry", journal="J Med Internet Res", year="2024", month="Jul", day="30", volume="26", pages="e50067", keywords="ST-elevation myocardial infarction", keywords="in-hospital mortality", keywords="risk prediction", keywords="explainable machine learning", keywords="machine learning", keywords="acute myocardial infarction", keywords="myocardial infarction", keywords="mortality", keywords="risk", keywords="predication model", keywords="china", keywords="clinical practice", keywords="validation", keywords="patient management", keywords="management", abstract="Background: Machine learning (ML) risk prediction models, although much more accurate than traditional statistical methods, are inconvenient to use in clinical practice due to their nontransparency and requirement of a large number of input variables. Objective: We aimed to develop a precise, explainable, and flexible ML model to predict the risk of in-hospital mortality in patients with ST-segment elevation myocardial infarction (STEMI). Methods: This study recruited 18,744 patients enrolled in the 2013 China Acute Myocardial Infarction (CAMI) registry and 12,018 patients from the China Patient-Centered Evaluative Assessment of Cardiac Events (PEACE)-Retrospective Acute Myocardial Infarction Study. The Extreme Gradient Boosting (XGBoost) model was derived from 9616 patients in the CAMI registry (2014, 89 variables) with 5-fold cross-validation and validated on both the 9125 patients in the CAMI registry (89 variables) and the independent China PEACE cohort (10 variables). The Shapley Additive Explanations (SHAP) approach was employed to interpret the complex relationships embedded in the proposed model. Results: In the XGBoost model for predicting all-cause in-hospital mortality, the variables with the top 8 most important scores were age, left ventricular ejection fraction, Killip class, heart rate, creatinine, blood glucose, white blood cell count, and use of angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin II receptor blockers (ARBs). The area under the curve (AUC) on the CAMI validation set was 0.896 (95\% CI 0.884-0.909), significantly higher than the previous models. The AUC for the Global Registry of Acute Coronary Events (GRACE) model was 0.809 (95\% CI 0.790-0.828), and for the TIMI model, it was 0.782 (95\% CI 0.763-0.800). Despite the China PEACE validation set only having 10 available variables, the AUC reached 0.840 (0.829-0.852), showing a substantial improvement to the GRACE (0.762, 95\% CI 0.748-0.776) and TIMI (0.789, 95\% CI 0.776-0.803) scores. Several novel and nonlinear relationships were discovered between patients' characteristics and in-hospital mortality, including a U-shape pattern of high-density lipoprotein cholesterol (HDL-C). Conclusions: The proposed ML risk prediction model was highly accurate in predicting in-hospital mortality. Its flexible and explainable characteristics make the model convenient to use in clinical practice and could help guide patient management. Trial Registration: ClinicalTrials.gov NCT01874691; https://clinicaltrials.gov/study/NCT01874691 ", doi="10.2196/50067", url="https://www.jmir.org/2024/1/e50067" } @Article{info:doi/10.2196/48595, author="Ben Yehuda, Ori and Itelman, Edward and Vaisman, Adva and Segal, Gad and Lerner, Boaz", title="Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study", journal="J Med Internet Res", year="2024", month="Jul", day="30", volume="26", pages="e48595", keywords="pulmonary embolism", keywords="deep vein thrombosis", keywords="venous thromboembolism", keywords="imbalanced data", keywords="clustering", keywords="risk factors", keywords="Wells score", keywords="revised Genova score", keywords="hospital admission", keywords="machine learning", abstract="Background: Under- or late identification of pulmonary embolism (PE)---a thrombosis of 1 or more pulmonary arteries that seriously threatens patients' lives---is a major challenge confronting modern medicine. Objective: We aimed to establish accurate and informative machine learning (ML) models to identify patients at high risk for PE as they are admitted to the hospital, before their initial clinical checkup, by using only the information in their medical records. Methods: We collected demographics, comorbidities, and medications data for 2568 patients with PE and 52,598 control patients. We focused on data available prior to emergency department admission, as these are the most universally accessible data. We trained an ML random forest algorithm to detect PE at the earliest possible time during a patient's hospitalization---at the time of his or her admission. We developed and applied 2 ML-based methods specifically to address the data imbalance between PE and non-PE patients, which causes misdiagnosis of PE. Results: The resulting models predicted PE based on age, sex, BMI, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, obtaining an 80\% geometric mean value for the PE and non-PE classification accuracies. Although on hospital admission only 4\% (1942/46,639) of the patients had a diagnosis of PE, we identified 2 clustering schemes comprising subgroups with more than 61\% (705/1120 in clustering scheme 1; 427/701 and 340/549 in clustering scheme 2) positive patients for PE. One subgroup in the first clustering scheme included 36\% (705/1942) of all patients with PE who were characterized by a definite past PE diagnosis, a 6-fold higher prevalence of deep vein thrombosis, and a 3-fold higher prevalence of pneumonia, compared with patients of the other subgroups in this scheme. In the second clustering scheme, 2 subgroups (1 of only men and 1 of only women) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third subgroup included only those patients with a past diagnosis of pneumonia. Conclusions: This study established an ML tool for early diagnosis of PE almost immediately upon hospital admission. Despite the highly imbalanced scenario undermining accurate PE prediction and using information available only from the patient's medical history, our models were both accurate and informative, enabling the identification of patients already at high risk for PE upon hospital admission, even before the initial clinical checkup was performed. The fact that we did not restrict our patients to those at high risk for PE according to previously published scales (eg, Wells or revised Genova scores) enabled us to accurately assess the application of ML on raw medical data and identify new, previously unidentified risk factors for PE, such as previous pulmonary disease, in general populations. ", doi="10.2196/48595", url="https://www.jmir.org/2024/1/e48595" } @Article{info:doi/10.2196/56715, author="Lowe, Cabella and Sephton, Ruth and Marsh, William and Morrissey, Dylan", title="Evaluation of a Musculoskeletal Digital Assessment Routing Tool (DART): Crossover Noninferiority Randomized Pilot Trial", journal="JMIR Form Res", year="2024", month="Jul", day="30", volume="8", pages="e56715", keywords="mHealth", keywords="eHealth", keywords="digital health", keywords="digital technology", keywords="digital triage", keywords="musculoskeletal", keywords="triage", keywords="physiotherapy triage", keywords="validation", keywords="acceptability", keywords="physiotherapy", keywords="primary care", keywords="randomized controlled trial", keywords="usability", keywords="assess", keywords="assessment", keywords="triaging", keywords="referrals", keywords="crossover", abstract="Background: Musculoskeletal conditions account for 16\% of global disability, resulting in a negative effect on patients and increasing demand for health care use. Triage directing patients to appropriate level intervention improving health outcomes and efficiency has been prioritized. We developed a musculoskeletal digital assessment routing tool (DART) mobile health (mHealth) system, which requires evaluation prior to implementation. Such innovations are rarely rigorously tested in clinical trials---considered the gold standard for evaluating safety and efficacy. This pilot study is a precursor to a trial assessing DART performance with a physiotherapist-led triage assessment. Objective: The study aims to evaluate trial design, assess procedures, and collect exploratory data to establish the feasibility of delivering an adequately powered, definitive randomized trial, assessing DART safety and efficacy in an NHS primary care setting. Methods: A crossover, noninferiority pilot trial using an integrated knowledge translation approach within a National Health Service England primary care setting. Participants were patients seeking assessment for a musculoskeletal condition, completing a DART assessment and the history-taking element of a face-to-face physiotherapist-led triage in a randomized order. The primary outcome was agreement between DART and physiotherapist triage recommendation. Data allowed analysis of participant recruitment and retention, randomization, blinding, study burden, and potential barriers to intervention delivery. Participant satisfaction was measured using the System Usability Scale. Results: Over 8 weeks, 129 patients were invited to participate. Of these, 92\% (119/129) proceeded to eligibility assessment, with 60\% (78/129) meeting the inclusion criteria and being randomized into each intervention arm (39/39). There were no dropouts and data were analyzed for all 78 participants. Agreement between physiotherapist and DART across all participants and all primary triage outcomes was 41\% (32/78; 95\% CI 22-45), intraclass correlation coefficient 0.37 (95\% CI 0.16-0.55), indicating that the reliability of DART was poor to moderate. Feedback from the clinical service team led to an adjusted analysis yielding of 78\% (61/78; 95\% CI 47-78) and an intraclass correlation coefficient of 0.57 (95\% CI 0.40-0.70). Participant satisfaction was measured quantitively using amalgamated System Usability Scale scores (n=78; mean score 84.0; 90\% CI +2.94 to --2.94), equating to an ``excellent'' system. There were no study incidents, and the trial burden was acceptable. Conclusions: Physiotherapist-DART agreement of 78\%, with no adverse triage decisions and high patient satisfaction, was sufficient to conclude DART had the potential to improve the musculoskeletal pathway. Study validity was enhanced by the recruitment of real-world patients and using an integrated knowledge translation approach. Completion of a context-specific consensus process is recommended to provide definitive definitions of safety criteria, range of appropriateness, noninferiority margin, and sample size. This pilot demonstrated an adequately powered definitive trial is feasible, which would provide evidence of DART safety and efficacy, ultimately informing potential for DART implementation. Trial Registration: ClinicalTrials.gov NCT04904029; http://clinicaltrials.gov/ct2/show/NCT04904029 International Registered Report Identifier (IRRID): RR2-10.2196/31541 ", doi="10.2196/56715", url="https://formative.jmir.org/2024/1/e56715", url="http://www.ncbi.nlm.nih.gov/pubmed/39078682" } @Article{info:doi/10.2196/45780, author="Akhter-Khan, C. Samia and Tao, Qiushan and Ang, Alvin Ting Fang and Karjadi, Cody and Itchapurapu, Swetha Indira and Libon, J. David and Alosco, Michael and Mez, Jesse and Qiu, Qiao Wei and Au, Rhoda", title="Cerebral Microbleeds in Different Brain Regions and Their Associations With the Digital Clock-Drawing Test: Secondary Analysis of the Framingham Heart Study", journal="J Med Internet Res", year="2024", month="Jul", day="29", volume="26", pages="e45780", keywords="cerebral microbleeds", keywords="CMB", keywords="digital clock-drawing test", keywords="DCT", keywords="Alzheimer disease", keywords="dementia", keywords="early screening", keywords="Boston Process Approach", keywords="cerebral microbleed", keywords="neuroimaging", keywords="cerebrovascular diseases", keywords="aging", keywords="MRI", keywords="magnetic resonance imaging", keywords="clock-drawing test", keywords="cognitive function", abstract="Background: Cerebral microbleeds (CMB) increase the risk for Alzheimer disease. Current neuroimaging methods that are used to detect CMB are costly and not always accessible. Objective: This study aimed to explore whether the digital clock-drawing test (DCT) may provide a behavioral indicator of CMB. Methods: In this study, we analyzed data from participants in the Framingham Heart Study offspring cohort who underwent both brain magnetic resonance imaging scans (Siemens 1.5T, Siemens Healthcare Private Limited; T2*-GRE weighted sequences) for CMB diagnosis and the DCT as a predictor. Additionally, paper-based clock-drawing tests were also collected during the DCT. Individuals with a history of dementia or stroke were excluded. Robust multivariable linear regression models were used to examine the association between DCT facet scores with CMB prevalence, adjusting for relevant covariates. Receiver operating characteristic (ROC) curve analyses were used to evaluate DCT facet scores as predictors of CMB prevalence. Sensitivity analyses were conducted by further including participants with stroke and dementia. Results: The study sample consisted of 1020 (n=585, 57.35\% female) individuals aged 45 years and older (mean 72, SD 7.9 years). Among them, 64 (6.27\%) participants exhibited CMB, comprising 46 with lobar-only, 11 with deep-only, and 7 with mixed (lobar+deep) CMB. Individuals with CMB tended to be older and had a higher prevalence of mild cognitive impairment and higher white matter hyperintensities compared to those without CMB (P<.05). While CMB were not associated with the paper-based clock-drawing test, participants with CMB had a lower overall DCT score (CMB: mean 68, SD 23 vs non-CMB: mean 76, SD 20; P=.009) in the univariate comparison. In the robust multiple regression model adjusted for covariates, deep CMB were significantly associated with lower scores on the drawing efficiency ($\beta$=--0.65, 95\% CI --1.15 to --0.15; P=.01) and simple motor ($\beta$=--0.86, 95\% CI --1.43 to --0.30; P=.003) domains of the command DCT. In the ROC curve analysis, DCT facets discriminated between no CMB and the CMB subtypes. The area under the ROC curve was 0.76 (95\% CI 0.69-0.83) for lobar CMB, 0.88 (95\% CI 0.78-0.98) for deep CMB, and 0.98 (95\% CI 0.96-1.00) for mixed CMB, where the area under the ROC curve value nearing 1 indicated an accurate model. Conclusions: The study indicates a significant association between CMB, especially deep and mixed types, and reduced performance in drawing efficiency and motor skills as assessed by the DCT. This highlights the potential of the DCT for early detection of CMB and their subtypes, providing a reliable alternative for cognitive assessment and making it a valuable tool for primary care screening before neuroimaging referral. ", doi="10.2196/45780", url="https://www.jmir.org/2024/1/e45780", url="http://www.ncbi.nlm.nih.gov/pubmed/39073857" } @Article{info:doi/10.2196/54577, author="Grazioli, Silvia and Crippa, Alessandro and Buo, Noemi and Busti Ceccarelli, Silvia and Molteni, Massimo and Nobile, Maria and Salandi, Antonio and Trabattoni, Sara and Caselli, Gabriele and Colombo, Paola", title="Use of Machine Learning Models to Differentiate Neurodevelopment Conditions Through Digitally Collected Data: Cross-Sectional Questionnaire Study", journal="JMIR Form Res", year="2024", month="Jul", day="29", volume="8", pages="e54577", keywords="digital-aided clinical assessment", keywords="machine learning", keywords="random forest", keywords="logistic regression", keywords="computational psychometrics", keywords="telemedicine", keywords="neurodevelopmental conditions", keywords="parent-report questionnaires", keywords="attention-deficit/hyperactivity disorder", keywords="autism spectrum disorder", keywords="ASD", keywords="autism", keywords="autistic", keywords="attention deficit", keywords="hyperactivity", keywords="classification", abstract="Background: Diagnosis of child and adolescent psychopathologies involves a multifaceted approach, integrating clinical observations, behavioral assessments, medical history, cognitive testing, and familial context information. Digital technologies, especially internet-based platforms for administering caregiver-rated questionnaires, are increasingly used in this field, particularly during the screening phase. The ascent of digital platforms for data collection has propelled advanced psychopathology classification methods such as supervised machine learning (ML) into the forefront of both research and clinical environments. This shift, recently called psycho-informatics, has been facilitated by gradually incorporating computational devices into clinical workflows. However, an actual integration between telemedicine and the ML approach has yet to be fulfilled. Objective: Under these premises, exploring the potential of ML applications for analyzing digitally collected data may have significant implications for supporting the clinical practice of diagnosing early psychopathology. The purpose of this study was, therefore, to exploit ML models for the classification of attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) using internet-based parent-reported socio-anamnestic data, aiming at obtaining accurate predictive models for new help-seeking families. Methods: In this retrospective, single-center observational study, socio-anamnestic data were collected from 1688 children and adolescents referred for suspected neurodevelopmental conditions. The data included sociodemographic, clinical, environmental, and developmental factors, collected remotely through the first Italian internet-based screening tool for neurodevelopmental disorders, the Medea Information and Clinical Assessment On-Line (MedicalBIT). Random forest (RF), decision tree, and logistic regression models were developed and evaluated using classification accuracy, sensitivity, specificity, and importance of independent variables. Results: The RF model demonstrated robust accuracy, achieving 84\% (95\% CI 82-85; P<.001) for ADHD and 86\% (95\% CI 84-87; P<.001) for ASD classifications. Sensitivities were also high, with 93\% for ADHD and 95\% for ASD. In contrast, the DT and LR models exhibited lower accuracy (DT 74\%, 95\% CI 71-77; P<.001 for ADHD; DT 79\%, 95\% CI 77-82; P<.001 for ASD; LR 61\%, 95\% CI 57-64; P<.001 for ADHD; LR 63\%, 95\% CI 60-67; P<.001 for ASD) and sensitivities (DT: 82\% for ADHD and 88\% for ASD; LR: 62\% for ADHD and 68\% for ASD). The independent variables considered for classification differed in importance between the 2 models, reflecting the distinct characteristics of the 3 ML approaches. Conclusions: This study highlights the potential of ML models, particularly RF, in enhancing the diagnostic process of child and adolescent psychopathology. Altogether, the current findings underscore the significance of leveraging digital platforms and computational techniques in the diagnostic process. While interpretability remains crucial, the developed approach might provide valuable screening tools for clinicians, highlighting the significance of embedding computational techniques in the diagnostic process. ", doi="10.2196/54577", url="https://formative.jmir.org/2024/1/e54577", url="http://www.ncbi.nlm.nih.gov/pubmed/39073858" } @Article{info:doi/10.2196/54872, author="Liu, Chang and Zhang, Kai and Yang, Xiaodong and Meng, Bingbing and Lou, Jingsheng and Liu, Yanhong and Cao, Jiangbei and Liu, Kexuan and Mi, Weidong and Li, Hao", title="Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study", journal="JMIR Aging", year="2024", month="Jul", day="26", volume="7", pages="e54872", keywords="myocardial injury after noncardiac surgery", keywords="older patients", keywords="machine learning", keywords="personalized prediction", keywords="myocardial injury", keywords="risk prediction", keywords="noncardiac surgery", abstract="Background: Myocardial injury after noncardiac surgery (MINS) is an easily overlooked complication but closely related to postoperative cardiovascular adverse outcomes; therefore, the early diagnosis and prediction are particularly important. Objective: We aimed to develop and validate an explainable machine learning (ML) model for predicting MINS among older patients undergoing noncardiac surgery. Methods: The retrospective cohort study included older patients who had noncardiac surgery from 1 northern center and 1 southern center in China. The data sets from center 1 were divided into a training set and an internal validation set. The data set from center 2 was used as an external validation set. Before modeling, the least absolute shrinkage and selection operator and recursive feature elimination methods were used to reduce dimensions of data and select key features from all variables. Prediction models were developed based on the extracted features using several ML algorithms, including category boosting, random forest, logistic regression, na{\"i}ve Bayes, light gradient boosting machine, extreme gradient boosting, support vector machine, and decision tree. Prediction performance was assessed by the area under the receiver operating characteristic (AUROC) curve as the main evaluation metric to select the best algorithms. The model performance was verified by internal and external validation data sets with the best algorithm and compared to the Revised Cardiac Risk Index. The Shapley Additive Explanations (SHAP) method was applied to calculate values for each feature, representing the contribution to the predicted risk of complication, and generate personalized explanations. Results: A total of 19,463 eligible patients were included; among those, 12,464 patients in center 1 were included as the training set; 4754 patients in center 1 were included as the internal validation set; and 2245 in center 2 were included as the external validation set. The best-performing model for prediction was the CatBoost algorithm, achieving the highest AUROC of 0.805 (95\% CI 0.778?0.831) in the training set, validating with an AUROC of 0.780 in the internal validation set and 0.70 in external validation set. Additionally, CatBoost demonstrated superior performance compared to the Revised Cardiac Risk Index (AUROC 0.636; P<.001). The SHAP values indicated the ranking of the level of importance of each variable, with preoperative serum creatinine concentration, red blood cell distribution width, and age accounting for the top three. The results from the SHAP method can predict events with positive values or nonevents with negative values, providing an explicit explanation of individualized risk predictions. Conclusions: The ML models can provide a personalized and fairly accurate risk prediction of MINS, and the explainable perspective can help identify potentially modifiable sources of risk at the patient level. ", doi="10.2196/54872", url="https://aging.jmir.org/2024/1/e54872" } @Article{info:doi/10.2196/47645, author="Cai, Yu-Qing and Gong, Da-Xin and Tang, Li-Ying and Cai, Yue and Li, Hui-Jun and Jing, Tian-Ci and Gong, Mengchun and Hu, Wei and Zhang, Zhen-Wei and Zhang, Xingang and Zhang, Guang-Wei", title="Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions", journal="J Med Internet Res", year="2024", month="Jul", day="26", volume="26", pages="e47645", keywords="cardiovascular diseases", keywords="risk prediction models", keywords="machine learning", keywords="problem", keywords="solution", doi="10.2196/47645", url="https://www.jmir.org/2024/1/e47645" } @Article{info:doi/10.2196/58886, author="Weile, Synne Kathrine and Mathiasen, Ren{\'e} and Winther, Falck Jeanette and Hasle, Henrik and Henriksen, Tram Louise", title="Hjernetegn.dk---The Danish Central Nervous System Tumor Awareness Initiative Digital Decision Support Tool: Design and Implementation Report", journal="JMIR Med Inform", year="2024", month="Jul", day="25", volume="12", pages="e58886", keywords="digital health initiative", keywords="digital health initiatives", keywords="clinical decision support", keywords="decision support", keywords="decision support system", keywords="decision support systems", keywords="decision support tool", keywords="decision support tools", keywords="diagnostic delay", keywords="awareness initiative", keywords="pediatric neurology", keywords="pediatric CNS tumors", keywords="CNS tumor", keywords="CNS tumour", keywords="CNS tumours", keywords="co-creation", keywords="health systems and services", keywords="communication", keywords="central nervous system", abstract="Background: Childhood tumors in the central nervous system (CNS) have longer diagnostic delays than other pediatric tumors. Vague presenting symptoms pose a challenge in the diagnostic process; it has been indicated that patients and parents may be hesitant to seek help, and health care professionals (HCPs) may lack awareness and knowledge about clinical presentation. To raise awareness among HCPs, the Danish CNS tumor awareness initiative hjernetegn.dk was launched. Objective: This study aims to present the learnings from designing and implementing a decision support tool for HCPs to reduce diagnostic delay in childhood CNS tumors. The aims also include decisions regarding strategies for dissemination and use of social media, and an evaluation of the digital impact 6 months after launch. Methods: The phases of developing and implementing the tool include participatory co-creation workshops, designing the website and digital platforms, and implementing a press and media strategy. The digital impact of hjernetegn.dk was evaluated through website analytics and social media engagement. Implementation (Results): hjernetegn.dk was launched in August 2023. The results after 6 months exceeded key performance indicators. The analysis showed a high number of website visitors and engagement, with a plateau reached 3 months after the initial launch. The LinkedIn campaign and Google Search strategy also generated a high number of impressions and clicks. Conclusions: The findings suggest that the initiative has been successfully integrated, raising awareness and providing a valuable tool for HCPs in diagnosing childhood CNS tumors. The study highlights the importance of interdisciplinary collaboration, co-creation, and ongoing community management, as well as broad dissemination strategies when introducing a digital support tool. ", doi="10.2196/58886", url="https://medinform.jmir.org/2024/1/e58886" } @Article{info:doi/10.2196/49142, author="Lee, Hsin-Ying and Kuo, Po-Chih and Qian, Frank and Li, Chien-Hung and Hu, Jiun-Ruey and Hsu, Wan-Ting and Jhou, Hong-Jie and Chen, Po-Huang and Lee, Cho-Hao and Su, Chin-Hua and Liao, Po-Chun and Wu, I-Ju and Lee, Chien-Chang", title="Prediction of In-Hospital Cardiac Arrest in the Intensive Care Unit: Machine Learning--Based Multimodal Approach", journal="JMIR Med Inform", year="2024", month="Jul", day="23", volume="12", pages="e49142", keywords="cardiac arrest", keywords="machine learning", keywords="intensive care", keywords="mortality", keywords="medical emergency team", keywords="early warning scores", abstract="Background: Early identification of impending in-hospital cardiac arrest (IHCA) improves clinical outcomes but remains elusive for practicing clinicians. Objective: We aimed to develop a multimodal machine learning algorithm based on ensemble techniques to predict the occurrence of IHCA. Methods: Our model was developed by the Multiparameter Intelligent Monitoring of Intensive Care (MIMIC)--IV database and validated in the Electronic Intensive Care Unit Collaborative Research Database (eICU-CRD). Baseline features consisting of patient demographics, presenting illness, and comorbidities were collected to train a random forest model. Next, vital signs were extracted to train a long short-term memory model. A support vector machine algorithm then stacked the results to form the final prediction model. Results: Of 23,909 patients in the MIMIC-IV database and 10,049 patients in the eICU-CRD database, 452 and 85 patients, respectively, had IHCA. At 13 hours in advance of an IHCA event, our algorithm had already demonstrated an area under the receiver operating characteristic curve of 0.85 (95\% CI 0.815?0.885) in the MIMIC-IV database. External validation with the eICU-CRD and National Taiwan University Hospital databases also presented satisfactory results, showing area under the receiver operating characteristic curve values of 0.81 (95\% CI 0.763-0.851) and 0.945 (95\% CI 0.934-0.956), respectively. Conclusions: Using only vital signs and information available in the electronic medical record, our model demonstrates it is possible to detect a trajectory of clinical deterioration up to 13 hours in advance. This predictive tool, which has undergone external validation, could forewarn and help clinicians identify patients in need of assessment to improve their overall prognosis. ", doi="10.2196/49142", url="https://medinform.jmir.org/2024/1/e49142" } @Article{info:doi/10.2196/49230, author="Sharma, Videha and McDermott, John and Keen, Jessica and Foster, Simon and Whelan, Pauline and Newman, William", title="Pharmacogenetics Clinical Decision Support Systems for Primary Care in England: Co-Design Study", journal="J Med Internet Res", year="2024", month="Jul", day="23", volume="26", pages="e49230", keywords="personalized medicine", keywords="genomic medicine", keywords="pharmacogenetics", keywords="user-centred design", keywords="medical informatics", keywords="clinical decision support systems", keywords="side effect", keywords="information technology", keywords="data", keywords="primary care", keywords="health informatic", abstract="Background: Pharmacogenetics can impact patient care and outcomes through personalizing the selection of medicines, resulting in improved efficacy and a reduction in harmful side effects. Despite the existence of compelling clinical evidence and international guidelines highlighting the benefits of pharmacogenetics in clinical practice, implementation within the National Health Service in the United Kingdom is limited. An important barrier to overcome is the development of IT solutions that support the integration of pharmacogenetic data into health care systems. This necessitates a better understanding of the role of electronic health records (EHRs) and the design of clinical decision support systems that are acceptable to clinicians, particularly those in primary care. Objective: Explore the needs and requirements of a pharmacogenetic service from the perspective of primary care clinicians with a view to co-design a prototype solution. Methods: We used ethnographic and think-aloud observations, user research workshops, and prototyping. The participants for this study included general practitioners and pharmacists. In total, we undertook 5 sessions of ethnographic observation to understand current practices and workflows. This was followed by 3 user research workshops, each with its own topic guide starting with personas and early ideation, through to exploring the potential of clinical decision support systems and prototype design. We subsequently analyzed workshop data using affinity diagramming and refined the key requirements for the solution collaboratively as a multidisciplinary project team. Results: User research results identified that pharmacogenetic data must be incorporated within existing EHRs rather than through a stand-alone portal. The information presented through clinical decision support systems must be clear, accessible, and user-friendly as the service will be used by a range of end users. Critically, the information should be displayed within the prescribing workflow, rather than discrete results stored statically in the EHR. Finally, the prescribing recommendations should be authoritative to provide confidence in the validity of the results. Based on these findings we co-designed an interactive prototype, demonstrating pharmacogenetic clinical decision support integrated within the prescribing workflow of an EHR. Conclusions: This study marks a significant step forward in the design of systems that support pharmacogenetic-guided prescribing in primary care settings. Clinical decision support systems have the potential to enhance the personalization of medicines, provided they are effectively implemented within EHRs and present pharmacogenetic data in a user-friendly, actionable, and standardized format. Achieving this requires the development of a decoupled, standards-based architecture that allows for the separation of data from application, facilitating integration across various EHRs through the use of application programming interfaces (APIs). More globally, this study demonstrates the role of health informatics and user-centered design in realizing the potential of personalized medicine at scale and ensuring that the benefits of genomic innovation reach patients and populations effectively. ", doi="10.2196/49230", url="https://www.jmir.org/2024/1/e49230" } @Article{info:doi/10.2196/58599, author="Tsai, Chung-You and Tian, Jing-Hui and Lee, Chien-Cheng and Kuo, Hann-Chorng", title="Building Dual AI Models and Nomograms Using Noninvasive Parameters for Aiding Male Bladder Outlet Obstruction Diagnosis and Minimizing the Need for Invasive Video-Urodynamic Studies: Development and Validation Study", journal="J Med Internet Res", year="2024", month="Jul", day="23", volume="26", pages="e58599", keywords="bladder outlet obstruction", keywords="lower urinary tract symptoms", keywords="machine learning", keywords="nomogram", keywords="artificial intelligence", keywords="video urodynamic study", abstract="Background: Diagnosing underlying causes of nonneurogenic male lower urinary tract symptoms associated with bladder outlet obstruction (BOO) is challenging. Video-urodynamic studies (VUDS) and pressure-flow studies (PFS) are both invasive diagnostic methods for BOO. VUDS can more precisely differentiate etiologies of male BOO, such as benign prostatic obstruction, primary bladder neck obstruction, and dysfunctional voiding, potentially outperforming PFS. Objective: These examinations' invasive nature highlights the need for developing noninvasive predictive models to facilitate BOO diagnosis and reduce the necessity for invasive procedures. Methods: We conducted a retrospective study with a cohort of men with medication-refractory, nonneurogenic lower urinary tract symptoms suspected of BOO who underwent VUDS from 2001 to 2022. In total, 2 BOO predictive models were developed---1 based on the International Continence Society's definition (International Continence Society--defined bladder outlet obstruction; ICS-BOO) and the other on video-urodynamic studies--diagnosed bladder outlet obstruction (VBOO). The patient cohort was randomly split into training and test sets for analysis. A total of 6 machine learning algorithms, including logistic regression, were used for model development. During model development, we first performed development validation using repeated 5-fold cross-validation on the training set and then test validation to assess the model's performance on an independent test set. Both models were implemented as paper-based nomograms and integrated into a web-based artificial intelligence prediction tool to aid clinical decision-making. Results: Among 307 patients, 26.7\% (n=82) met the ICS-BOO criteria, while 82.1\% (n=252) were diagnosed with VBOO. The ICS-BOO prediction model had a mean area under the receiver operating characteristic curve (AUC) of 0.74 (SD 0.09) and mean accuracy of 0.76 (SD 0.04) in development validation and AUC and accuracy of 0.86 and 0.77, respectively, in test validation. The VBOO prediction model yielded a mean AUC of 0.71 (SD 0.06) and mean accuracy of 0.77 (SD 0.06) internally, with AUC and accuracy of 0.72 and 0.76, respectively, externally. When both models' predictions are applied to the same patient, their combined insights can significantly enhance clinical decision-making and simplify the diagnostic pathway. By the dual-model prediction approach, if both models positively predict BOO, suggesting all cases actually resulted from medication-refractory primary bladder neck obstruction or benign prostatic obstruction, surgical intervention may be considered. Thus, VUDS might be unnecessary for 100 (32.6\%) patients. Conversely, when ICS-BOO predictions are negative but VBOO predictions are positive, indicating varied etiology, VUDS rather than PFS is advised for precise diagnosis and guiding subsequent therapy, accurately identifying 51.1\% (47/92) of patients for VUDS. Conclusions: The 2 machine learning models predicting ICS-BOO and VBOO, based on 6 noninvasive clinical parameters, demonstrate commendable discrimination performance. Using the dual-model prediction approach, when both models predict positively, VUDS may be avoided, assisting in male BOO diagnosis and reducing the need for such invasive procedures. ", doi="10.2196/58599", url="https://www.jmir.org/2024/1/e58599" } @Article{info:doi/10.2196/55542, author="Knitza, Johannes and Tascilar, Koray and Fuchs, Franziska and Mohn, Jacob and Kuhn, Sebastian and Bohr, Daniela and Muehlensiepen, Felix and Bergmann, Christina and Labinsky, Hannah and Morf, Harriet and Araujo, Elizabeth and Englbrecht, Matthias and Vorbr{\"u}ggen, Wolfgang and von der Decken, Cay-Benedict and Kleinert, Stefan and Ramming, Andreas and Distler, W. J{\"o}rg H. and Bartz-Bazzanella, Peter and Vuillerme, Nicolas and Schett, Georg and Welcker, Martin and Hueber, Axel", title="Diagnostic Accuracy of a Mobile AI-Based Symptom Checker and a Web-Based Self-Referral Tool in Rheumatology: Multicenter Randomized Controlled Trial", journal="J Med Internet Res", year="2024", month="Jul", day="23", volume="26", pages="e55542", keywords="symptom checker", keywords="artificial intelligence", keywords="eHealth", keywords="diagnostic decision support system", keywords="rheumatology", keywords="decision support", keywords="decision", keywords="diagnostic", keywords="tool", keywords="rheumatologists", keywords="symptom assessment", keywords="resources", keywords="randomized controlled trial", keywords="diagnosis", keywords="decision support system", keywords="support system", keywords="support", abstract="Background: The diagnosis of inflammatory rheumatic diseases (IRDs) is often delayed due to unspecific symptoms and a shortage of rheumatologists. Digital diagnostic decision support systems (DDSSs) have the potential to expedite diagnosis and help patients navigate the health care system more efficiently. Objective: The aim of this study was to assess the diagnostic accuracy of a mobile artificial intelligence (AI)--based symptom checker (Ada) and a web-based self-referral tool (Rheport) regarding IRDs. Methods: A prospective, multicenter, open-label, crossover randomized controlled trial was conducted with patients newly presenting to 3 rheumatology centers. Participants were randomly assigned to complete a symptom assessment using either Ada or Rheport. The primary outcome was the correct identification of IRDs by the DDSSs, defined as the presence of any IRD in the list of suggested diagnoses by Ada or achieving a prespecified threshold score with Rheport. The gold standard was the diagnosis made by rheumatologists. Results: A total of 600 patients were included, among whom 214 (35.7\%) were diagnosed with an IRD. Most frequent IRD was rheumatoid arthritis with 69 (11.5\%) patients. Rheport's disease suggestion and Ada's top 1 (D1) and top 5 (D5) disease suggestions demonstrated overall diagnostic accuracies of 52\%, 63\%, and 58\%, respectively, for IRDs. Rheport showed a sensitivity of 62\% and a specificity of 47\% for IRDs. Ada's D1 and D5 disease suggestions showed a sensitivity of 52\% and 66\%, respectively, and a specificity of 68\% and 54\%, respectively, concerning IRDs. Ada's diagnostic accuracy regarding individual diagnoses was heterogenous, and Ada performed considerably better in identifying rheumatoid arthritis in comparison to other diagnoses (D1: 42\%; D5: 64\%). The Cohen $\kappa$ statistic of Rheport for agreement on any rheumatic disease diagnosis with Ada D1 was 0.15 (95\% CI 0.08-0.18) and with Ada D5 was 0.08 (95\% CI 0.00-0.16), indicating poor agreement for the presence of any rheumatic disease between the 2 DDSSs. Conclusions: To our knowledge, this is the largest comparative DDSS trial with actual use of DDSSs by patients. The diagnostic accuracies of both DDSSs for IRDs were not promising in this high-prevalence patient population. DDSSs may lead to a misuse of scarce health care resources. Our results underscore the need for stringent regulation and drastic improvements to ensure the safety and efficacy of DDSSs. Trial Registration: German Register of Clinical Trials DRKS00017642; https://drks.de/search/en/trial/DRKS00017642 ", doi="10.2196/55542", url="https://www.jmir.org/2024/1/e55542" } @Article{info:doi/10.2196/54994, author="Levinson, T. Rebecca and Paul, Cinara and Meid, D. Andreas and Schultz, Jobst-Hendrik and Wild, Beate", title="Identifying Predictors of Heart Failure Readmission in Patients From a Statutory Health Insurance Database: Retrospective Machine Learning Study", journal="JMIR Cardio", year="2024", month="Jul", day="23", volume="8", pages="e54994", keywords="statutory health insurance", keywords="readmission", keywords="machine learning", keywords="heart failure", keywords="heart", keywords="cardiology", keywords="cardiac", keywords="hospitalization", keywords="insurance", keywords="predict", keywords="predictive", keywords="prediction", keywords="predictions", keywords="predictor", keywords="predictors", keywords="all cause", abstract="Background: Patients with heart failure (HF) are the most commonly readmitted group of adult patients in Germany. Most patients with HF are readmitted for noncardiovascular reasons. Understanding the relevance of HF management outside the hospital setting is critical to understanding HF and factors that lead to readmission. Application of machine learning (ML) on data from statutory health insurance (SHI) allows the evaluation of large longitudinal data sets representative of the general population to support clinical decision-making. Objective: This study aims to evaluate the ability of ML methods to predict 1-year all-cause and HF-specific readmission after initial HF-related admission of patients with HF in outpatient SHI data and identify important predictors. Methods: We identified individuals with HF using outpatient data from 2012 to 2018 from the AOK Baden-W{\"u}rttemberg SHI in Germany. We then trained and applied regression and ML algorithms to predict the first all-cause and HF-specific readmission in the year after the first admission for HF. We fitted a random forest, an elastic net, a stepwise regression, and a logistic regression to predict readmission by using diagnosis codes, drug exposures, demographics (age, sex, nationality, and type of coverage within SHI), degree of rurality for residence, and participation in disease management programs for common chronic conditions (diabetes mellitus type 1 and 2, breast cancer, chronic obstructive pulmonary disease, and coronary heart disease). We then evaluated the predictors of HF readmission according to their importance and direction to predict readmission. Results: Our final data set consisted of 97,529 individuals with HF, and 78,044 (80\%) were readmitted within the observation period. Of the tested modeling approaches, the random forest approach best predicted 1-year all-cause and HF-specific readmission with a C-statistic of 0.68 and 0.69, respectively. Important predictors for 1-year all-cause readmission included prescription of pantoprazole, chronic obstructive pulmonary disease, atherosclerosis, sex, rurality, and participation in disease management programs for type 2 diabetes mellitus and coronary heart disease. Relevant features for HF-specific readmission included a large number of canonical HF comorbidities. Conclusions: While many of the predictors we identified were known to be relevant comorbidities for HF, we also uncovered several novel associations. Disease management programs have widely been shown to be effective at managing chronic disease; however, our results indicate that in the short term they may be useful for targeting patients with HF with comorbidity at increased risk of readmission. Our results also show that living in a more rural location increases the risk of readmission. Overall, factors beyond comorbid disease were relevant for risk of HF readmission. This finding may impact how outpatient physicians identify and monitor patients at risk of HF readmission. ", doi="10.2196/54994", url="https://cardio.jmir.org/2024/1/e54994" } @Article{info:doi/10.2196/50130, author="Bienefeld, Nadine and Keller, Emanuela and Grote, Gudela", title="Human-AI Teaming in Critical Care: A Comparative Analysis of Data Scientists' and Clinicians' Perspectives on AI Augmentation and Automation", journal="J Med Internet Res", year="2024", month="Jul", day="22", volume="26", pages="e50130", keywords="AI in health care", keywords="human-AI teaming", keywords="sociotechnical systems", keywords="intensive care", keywords="ICU", keywords="AI adoption", keywords="AI implementation", keywords="augmentation", keywords="automation, health care policy and regulatory foresight", keywords="explainable AI", keywords="explainable", keywords="human-AI", keywords="human-computer", keywords="human-machine", keywords="ethical implications of AI in health care", keywords="ethical", keywords="ethic", keywords="ethics", keywords="artificial intelligence", keywords="policy", keywords="foresight", keywords="policies", keywords="recommendation", keywords="recommendations", keywords="policy maker", keywords="policy makers", keywords="Delphi", keywords="sociotechnical", abstract="Background: Artificial intelligence (AI) holds immense potential for enhancing clinical and administrative health care tasks. However, slow adoption and implementation challenges highlight the need to consider how humans can effectively collaborate with AI within broader socio-technical systems in health care. Objective: In the example of intensive care units (ICUs), we compare data scientists' and clinicians' assessments of the optimal utilization of human and AI capabilities by determining suitable levels of human-AI teaming for safely and meaningfully augmenting or automating 6 core tasks. The goal is to provide actionable recommendations for policy makers and health care practitioners regarding AI design and implementation. Methods: In this multimethod study, we combine a systematic task analysis across 6 ICUs with an international Delphi survey involving 19 health data scientists from the industry and academia and 61 ICU clinicians (25 physicians and 36 nurses) to define and assess optimal levels of human-AI teaming (level 1=no performance benefits; level 2=AI augments human performance; level 3=humans augment AI performance; level 4=AI performs without human input). Stakeholder groups also considered ethical and social implications. Results: Both stakeholder groups chose level 2 and 3 human-AI teaming for 4 out of 6 core tasks in the ICU. For one task (monitoring), level 4 was the preferred design choice. For the task of patient interactions, both data scientists and clinicians agreed that AI should not be used regardless of technological feasibility due to the importance of the physician-patient and nurse-patient relationship and ethical concerns. Human-AI design choices rely on interpretability, predictability, and control over AI systems. If these conditions are not met and AI performs below human-level reliability, a reduction to level 1 or shifting accountability away from human end users is advised. If AI performs at or beyond human-level reliability and these conditions are not met, shifting to level 4 automation should be considered to ensure safe and efficient human-AI teaming. Conclusions: By considering the sociotechnical system and determining appropriate levels of human-AI teaming, our study showcases the potential for improving the safety and effectiveness of AI usage in ICUs and broader health care settings. Regulatory measures should prioritize interpretability, predictability, and control if clinicians hold full accountability. Ethical and social implications must be carefully evaluated to ensure effective collaboration between humans and AI, particularly considering the most recent advancements in generative AI. ", doi="10.2196/50130", url="https://www.jmir.org/2024/1/e50130" } @Article{info:doi/10.2196/58158, author="Chen, Xi and Wang, Li and You, MingKe and Liu, WeiZhi and Fu, Yu and Xu, Jie and Zhang, Shaoting and Chen, Gang and Li, Kang and Li, Jian", title="Evaluating and Enhancing Large Language Models' Performance in Domain-Specific Medicine: Development and Usability Study With DocOA", journal="J Med Internet Res", year="2024", month="Jul", day="22", volume="26", pages="e58158", keywords="large language model", keywords="retrieval-augmented generation", keywords="domain-specific benchmark framework", keywords="osteoarthritis management", abstract="Background: The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. Objective: This study focused on evaluating and enhancing the clinical capabilities and explainability of LLMs in specific domains, using OA management as a case study. Methods: A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explainability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results: Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. Conclusions: This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs. ", doi="10.2196/58158", url="https://www.jmir.org/2024/1/e58158", url="http://www.ncbi.nlm.nih.gov/pubmed/38833165" } @Article{info:doi/10.2196/48600, author="Wichmann, Johannes and Gesk, Sophie Tanja and Leyer, Michael", title="Acceptance of AI in Health Care for Short- and Long-Term Treatments: Pilot Development Study of an Integrated Theoretical Model", journal="JMIR Form Res", year="2024", month="Jul", day="18", volume="8", pages="e48600", keywords="health information systems", keywords="integrated theoretical model", keywords="artificial intelligence", keywords="health care", keywords="technology acceptance", keywords="long-term treatments", keywords="short-term treatments", keywords="mobile phone", abstract="Background: As digital technologies and especially artificial intelligence (AI) become increasingly important in health care, it is essential to determine whether and why potential users intend to use related health information systems (HIS). Several theories exist, but they focus mainly on aspects of health care or information systems, in addition to general psychological theories, and hence provide a small number of variables to explain future behavior. Thus, research that provides a larger number of variables by combining several theories from health care, information systems, and psychology is necessary. Objective: This study aims to investigate the intention to use new HIS for decisions concerning short- and long-term medical treatments using an integrated approach with several variables to explain future behavior. Methods: We developed an integrated theoretical model based on theories from health care, information systems, and psychology that allowed us to analyze the duality approach of adaptive and nonadaptive appraisals and their influence on the intention to use HIS. We applied the integrated theoretical model to the short-term treatment using AI-based HIS for surgery and the long-term treatment of diabetes tracking using survey data with structured equation modeling. To differentiate between certain levels of AI involvement, we used several scenarios that include treatments by physicians only, physicians with AI support, and AI only to understand how individuals perceive the influence of AI. Results: Our results showed that for short- and long-term treatments, the variables perceived threats, fear (disease), perceived efficacy, attitude (HIS), and perceived norms are important to consider when determining the intention to use AI-based HIS. Furthermore, the results revealed that perceived efficacy and attitude (HIS) are the most important variables to determine intention to use for all treatments and scenarios. In contrast, abilities (HIS) were important for short-term treatments only. For our 9 scenarios, adaptive and nonadaptive appraisals were both important to determine intention to use, depending on whether the treatment is known. Furthermore, we determined R{\texttwosuperior} values that varied between 57.9\% and 81.7\% for our scenarios, which showed that the explanation power of our model is medium to good. Conclusions: We contribute to HIS literature by highlighting the importance of integrating disease- and technology-related factors and by providing an integrated theoretical model. As such, we show how adaptive and nonadaptive appraisals should be arranged to report on medical decisions in the future, especially in the short and long terms. Physicians and HIS developers can use our insights to identify promising rationale for HIS adoption concerning short- and long-term treatments and adapt and develop HIS accordingly. Specifically, HIS developers should ensure that future HIS act in terms of HIS functions, as our study shows that efficient HIS lead to a positive attitude toward the HIS and ultimately to a higher intention to use. ", doi="10.2196/48600", url="https://formative.jmir.org/2024/1/e48600" } @Article{info:doi/10.2196/56361, author="Zha, Bowen and Cai, Angshu and Wang, Guiqi", title="Diagnostic Accuracy of Artificial Intelligence in Endoscopy: Umbrella Review", journal="JMIR Med Inform", year="2024", month="Jul", day="15", volume="12", pages="e56361", keywords="endoscopy", keywords="artificial intelligence", keywords="umbrella review", keywords="meta-analyses", keywords="AI", keywords="diagnostic", keywords="researchers", keywords="researcher", keywords="tools", keywords="tool", keywords="assessment", abstract="Background: Some research has already reported the diagnostic value of artificial intelligence (AI) in different endoscopy outcomes. However, the evidence is confusing and of varying quality. Objective: This review aimed to comprehensively evaluate the credibility of the evidence of AI's diagnostic accuracy in endoscopy. Methods: Before the study began, the protocol was registered on PROSPERO (CRD42023483073). First, 2 researchers searched PubMed, Web of Science, Embase, and Cochrane Library using comprehensive search terms. Then, researchers screened the articles and extracted information. We used A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR2) to evaluate the quality of the articles. When there were multiple studies aiming at the same result, we chose the study with higher-quality evaluations for further analysis. To ensure the reliability of the conclusions, we recalculated each outcome. Finally, the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) was used to evaluate the credibility of the outcomes. Results: A total of 21 studies were included for analysis. Through AMSTAR2, it was found that 8 research methodologies were of moderate quality, while other studies were regarded as having low or critically low quality. The sensitivity and specificity of 17 different outcomes were analyzed. There were 4 studies on esophagus, 4 studies on stomach, and 4 studies on colorectal regions. Two studies were associated with capsule endoscopy, two were related to laryngoscopy, and one was related to ultrasonic endoscopy. In terms of sensitivity, gastroesophageal reflux disease had the highest accuracy rate, reaching 97\%, while the invasion depth of colon neoplasia, with 71\%, had the lowest accuracy rate. On the other hand, the specificity of colorectal cancer was the highest, reaching 98\%, while the gastrointestinal stromal tumor, with only 80\%, had the lowest specificity. The GRADE evaluation suggested that the reliability of most outcomes was low or very low. Conclusions: AI proved valuabe in endoscopic diagnoses, especially in esophageal and colorectal diseases. These findings provide a theoretical basis for developing and evaluating AI-assisted systems, which are aimed at assisting endoscopists in carrying out examinations, leading to improved patient health outcomes. However, further high-quality research is needed in the future to fully validate AI's effectiveness. ", doi="10.2196/56361", url="https://medinform.jmir.org/2024/1/e56361" } @Article{info:doi/10.2196/49811, author="Holdefer, A. Ashley and Pizarro, Jeno and Saunders-Hastings, Patrick and Beers, Jeffrey and Sang, Arianna and Hettinger, Zachary Aaron and Blumenthal, Joseph and Martinez, Erik and Jones, Daniel Lance and Deady, Matthew and Ezzeldin, Hussein and Anderson, A. Steven", title="Development of Interoperable Computable Phenotype Algorithms for Adverse Events of Special Interest to Be Used for Biologics Safety Surveillance: Validation Study", journal="JMIR Public Health Surveill", year="2024", month="Jul", day="15", volume="10", pages="e49811", keywords="adverse event", keywords="vaccine safety", keywords="computable phenotype", keywords="postmarket surveillance system", keywords="real-world data", keywords="validation study", keywords="Food and Drug Administration", keywords="electronic health records", keywords="COVID-19 vaccine", abstract="Background: Adverse events associated with vaccination have been evaluated by epidemiological studies and more recently have gained additional attention with the emergency use authorization of several COVID-19 vaccines. As part of its responsibility to conduct postmarket surveillance, the US Food and Drug Administration continues to monitor several adverse events of special interest (AESIs) to ensure vaccine safety, including for COVID-19. Objective: This study is part of the Biologics Effectiveness and Safety Initiative, which aims to improve the Food and Drug Administration's postmarket surveillance capabilities while minimizing public burden. This study aimed to enhance active surveillance efforts through a rules-based, computable phenotype algorithm to identify 5 AESIs being monitored by the Center for Disease Control and Prevention for COVID-19 or other vaccines: anaphylaxis, Guillain-Barr{\'e} syndrome, myocarditis/pericarditis, thrombosis with thrombocytopenia syndrome, and febrile seizure. This study examined whether these phenotypes have sufficiently high positive predictive value (PPV) to ensure that the cases selected for surveillance are reasonably likely to be a postbiologic adverse event. This allows patient privacy, and security concerns for the data sharing of patients who had nonadverse events can be properly accounted for when evaluating the cost-benefit aspect of our approach. Methods: AESI phenotype algorithms were developed to apply to electronic health record data at health provider organizations across the country by querying for standard and interoperable codes. The codes queried in the rules represent symptoms, diagnoses, or treatments of the AESI sourced from published case definitions and input from clinicians. To validate the performance of the algorithms, we applied them to electronic health record data from a US academic health system and provided a sample of cases for clinicians to evaluate. Performance was assessed using PPV. Results: With a PPV of 93.3\%, our anaphylaxis algorithm performed the best. The PPVs for our febrile seizure, myocarditis/pericarditis, thrombocytopenia syndrome, and Guillain-Barr{\'e} syndrome algorithms were 89\%, 83.5\%, 70.2\%, and 47.2\%, respectively. Conclusions: Given our algorithm design and performance, our results support continued research into using interoperable algorithms for widespread AESI postmarket detection. ", doi="10.2196/49811", url="https://publichealth.jmir.org/2024/1/e49811", url="http://www.ncbi.nlm.nih.gov/pubmed/39008361" } @Article{info:doi/10.2196/50402, author="Cheng, Vienna and Sayre, C. Eric and Cheng, Vicki and Garg, Ria and Gill, Sharlene and Farooq, Ameer and De Vera, A. Mary", title="Patterns of Prescription Medication Use Before Diagnosis of Early Age-Onset Colorectal Cancer: Population-Based Descriptive Study", journal="JMIR Cancer", year="2024", month="Jul", day="12", volume="10", pages="e50402", keywords="colorectal cancer", keywords="medications", keywords="medication patterns", keywords="cancer diagnosis", keywords="prediagnosis", keywords="prescriptions", keywords="patterns", keywords="early-onset", keywords="population-based", keywords="incidence", keywords="male individuals", keywords="female individuals", keywords="health databases", keywords="pharmacology", keywords="diagnostic", keywords="descriptive study", keywords="gastroenterology", keywords="cancers", abstract="Background: Colorectal cancer (CRC) is estimated to be the fourth most common cancer diagnosis in Canada (except for nonmelanoma skin cancers) and the second and third leading cause of cancer-related death in male and female individuals, respectively. Objective: The rising incidence of early age-onset colorectal cancer (EAO-CRC; diagnosis at less than 50 years) calls for a better understanding of patients' pathway to diagnosis. Therefore, we evaluated patterns of prescription medication use before EAO-CRC diagnosis. Methods: We used linked administrative health databases in British Columbia (BC), Canada, to identify individuals diagnosed with EAO-CRC between January 1, 2010, and December 31, 2016 (hereinafter referred to as ``cases''), along with cancer-free controls (1:10), matched by age and sex. We identified all prescriptions dispensed from community pharmacies during the year prior to diagnosis and used the Anatomical Therapeutic Chemical Classification system Level 3 to group prescriptions according to the drug class. A parallel assessment was conducted for individuals diagnosed with average age-onset CRC (diagnosis at age 50 years and older). Results: We included 1001 EAO-CRC cases (n=450, 45\% female participants; mean 41.0, SD 6.1 years), and 12,989 prescriptions were filled in the year before diagnosis by 797 (79.7\%) individuals. Top-filled drugs were antidepressants (first; n=1698, 13.1\%). Drugs for peptic ulcer disease and gastroesophageal reflux disease (third; n=795, 6.1\%) were more likely filled by EAO-CRC cases than controls (odds ratio [OR] 1.4, 95\% CI 1.2-1.7) and with more frequent fills (OR 1.8, 95\% CI 1.7-1.9). We noted similar patterns for topical agents for hemorrhoids and anal fissures, which were more likely filled by EAO-CRC cases than controls (OR 7.4, 95\% CI 5.8-9.4) and with more frequent fills (OR 15.6, 95\% CI 13.1-18.6). Conclusions: We observed frequent prescription medication use in the year before diagnosis of EAO-CRC, including for drugs to treat commonly reported symptoms of EAO-CRC. ", doi="10.2196/50402", url="https://cancer.jmir.org/2024/1/e50402" } @Article{info:doi/10.2196/48535, author="Kong, Hye Sung and Cho, Wonwoo and Park, Bae Sung and Choo, Jaegul and Kim, Hee Jung and Kim, Wan Sang and Shin, Soo Chan", title="A Computed Tomography--Based Fracture Prediction Model With Images of Vertebral Bones and Muscles by Employing Deep Learning: Development and Validation Study", journal="J Med Internet Res", year="2024", month="Jul", day="12", volume="26", pages="e48535", keywords="fracture", keywords="bone", keywords="bones", keywords="muscle", keywords="muscles", keywords="musculoskeletal", keywords="prediction", keywords="deep learning", keywords="prospective cohort", keywords="fracture risk assessment", keywords="predict", keywords="predictive", keywords="machine learning", keywords="develop", keywords="development", keywords="validate", keywords="validation", keywords="imaging", keywords="tomography", keywords="scanning", abstract="Background: With the progressive increase in aging populations, the use of opportunistic computed tomography (CT) scanning is increasing, which could be a valuable method for acquiring information on both muscles and bones of aging populations. Objective: The aim of this study was to develop and externally validate opportunistic CT-based fracture prediction models by using images of vertebral bones and paravertebral muscles. Methods: The models were developed based on a retrospective longitudinal cohort study of 1214 patients with abdominal CT images between 2010 and 2019. The models were externally validated in 495 patients. The primary outcome of this study was defined as the predictive accuracy for identifying vertebral fracture events within a 5-year follow-up. The image models were developed using an attention convolutional neural network--recurrent neural network model from images of the vertebral bone and paravertebral muscles. Results: The mean ages of the patients in the development and validation sets were 73 years and 68 years, and 69.1\% (839/1214) and 78.8\% (390/495) of them were females, respectively. The areas under the receiver operator curve (AUROCs) for predicting vertebral fractures were superior in images of the vertebral bone and paravertebral muscles than those in the bone-only images in the external validation cohort (0.827, 95\% CI 0.821-0.833 vs 0.815, 95\% CI 0.806-0.824, respectively; P<.001). The AUROCs of these image models were higher than those of the fracture risk assessment models (0.810 for major osteoporotic risk, 0.780 for hip fracture risk). For the clinical model using age, sex, BMI, use of steroids, smoking, possible secondary osteoporosis, type 2 diabetes mellitus, HIV, hepatitis C, and renal failure, the AUROC value in the external validation cohort was 0.749 (95\% CI 0.736-0.762), which was lower than that of the image model using vertebral bones and muscles (P<.001). Conclusions: The model using the images of the vertebral bone and paravertebral muscle showed better performance than that using the images of the bone-only or clinical variables. Opportunistic CT screening may contribute to identifying patients with a high fracture risk in the future. ", doi="10.2196/48535", url="https://www.jmir.org/2024/1/e48535", url="http://www.ncbi.nlm.nih.gov/pubmed/38995678" } @Article{info:doi/10.2196/47785, author="Ortiz, Fernanda and Grasberger, Juulia and Ekstrand, Agneta and Helanter{\"a}, Ilkka and Giunti, Guido", title="Interactive Health Technology Tool for Kidney Living Donor Assessment to Standardize the Informed Consent Process: Usability and Qualitative Content Analysis", journal="JMIR Form Res", year="2024", month="Jul", day="9", volume="8", pages="e47785", keywords="eHealth", keywords="kidney living donor", keywords="informed consent", keywords="telemedicine", keywords="process standardization", keywords="kidney", keywords="donor", keywords="tool", keywords="usability", keywords="psychological impact", keywords="utility", keywords="smartphone", keywords="coping", keywords="surgery", abstract="Background: Kidney living donation carries risks, yet standardized information provision regarding nephrectomy risks and psychological impacts for candidates remains lacking. Objective: This study assesses the benefit of interactive health technology in improving the informed consent process for kidney living donation. Methods: The Kidney Hub institutional open portal offers comprehensive information on kidney disease and donation. Individuals willing to start the kidney living donation process at Helsinki University Hospital (January 2019-January 2022) were invited to use the patient-tailored digital care path (Living Donor Digital Care Path) included in the Kidney Hub. This platform provides detailed donation process information and facilitates communication between health care professionals and patients. eHealth literacy was evaluated via the eHealth Literacy Scale (eHEALS), usability with the System Usability Scale (SUS), and system utility through Likert-scale surveys with scores of 1-5. Qualitative content analysis addressed an open-ended question. Results: The Kidney Hub portal received over 8000 monthly visits, including to its sections on donation benefits (n=1629 views) and impact on donors' lives (n=4850 views). Of 127 living kidney donation candidates, 7 did not use Living Donor Digital Care Path. Users' ages ranged from 20 to 79 years, and they exchanged over 3500 messages. A total of 74 living donor candidates participated in the survey. Female candidates more commonly searched the internet about kidney donation (n=79 female candidates vs n=48 male candidates; P=.04). The mean eHEALS score correlated with internet use for health decisions (r=0.45; P<.001) and its importance (r=0.40; P=.01). Participants found that the Living Donor Digital Care Path was technically satisfactory (mean SUS score 4.4, SD 0.54) and useful but not pivotal in donation decision-making. Concerns focused on postsurgery coping for donors and recipients. Conclusions: Telemedicine effectively educates living kidney donor candidates on the donation process. The Living Donor Digital Care Path serves as a valuable eHealth tool, aiding clinicians in standardizing steps toward informed consent. Trial Registration: ClinicalTrials.gov NCT04791670; https://clinicaltrials.gov/study/NCT04791670 International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2021-051166 ", doi="10.2196/47785", url="https://formative.jmir.org/2024/1/e47785" } @Article{info:doi/10.2196/56110, author="Hoppe, Michael John and Auer, K. Matthias and Str{\"u}ven, Anna and Massberg, Steffen and Stremmel, Christopher", title="ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis", journal="J Med Internet Res", year="2024", month="Jul", day="8", volume="26", pages="e56110", keywords="emergency department", keywords="diagnosis", keywords="accuracy", keywords="artificial intelligence", keywords="ChatGPT", keywords="internal medicine", keywords="AI", keywords="natural language processing", keywords="NLP", keywords="emergency medicine triage", keywords="triage", keywords="physicians", keywords="physician", keywords="diagnostic accuracy", keywords="OpenAI", abstract="Background: OpenAI's ChatGPT is a pioneering artificial intelligence (AI) in the field of natural language processing, and it holds significant potential in medicine for providing treatment advice. Additionally, recent studies have demonstrated promising results using ChatGPT for emergency medicine triage. However, its diagnostic accuracy in the emergency department (ED) has not yet been evaluated. Objective: This study compares the diagnostic accuracy of ChatGPT with GPT-3.5 and GPT-4 and primary treating resident physicians in an ED setting. Methods: Among 100 adults admitted to our ED in January 2023 with internal medicine issues, the diagnostic accuracy was assessed by comparing the diagnoses made by ED resident physicians and those made by ChatGPT with GPT-3.5 or GPT-4 against the final hospital discharge diagnosis, using a point system for grading accuracy. Results: The study enrolled 100 patients with a median age of 72 (IQR 58.5-82.0) years who were admitted to our internal medicine ED primarily for cardiovascular, endocrine, gastrointestinal, or infectious diseases. GPT-4 outperformed both GPT-3.5 (P<.001) and ED resident physicians (P=.01) in diagnostic accuracy for internal medicine emergencies. Furthermore, across various disease subgroups, GPT-4 consistently outperformed GPT-3.5 and resident physicians. It demonstrated significant superiority in cardiovascular (GPT-4 vs ED physicians: P=.03) and endocrine or gastrointestinal diseases (GPT-4 vs GPT-3.5: P=.01). However, in other categories, the differences were not statistically significant. Conclusions: In this study, which compared the diagnostic accuracy of GPT-3.5, GPT-4, and ED resident physicians against a discharge diagnosis gold standard, GPT-4 outperformed both the resident physicians and its predecessor, GPT-3.5. Despite the retrospective design of the study and its limited sample size, the results underscore the potential of AI as a supportive diagnostic tool in ED settings. ", doi="10.2196/56110", url="https://www.jmir.org/2024/1/e56110" } @Article{info:doi/10.2196/54748, author="Hu, Xinyue and Sun, Zenan and Nian, Yi and Wang, Yichen and Dang, Yifang and Li, Fang and Feng, Jingna and Yu, Evan and Tao, Cui", title="Self-Explainable Graph Neural Network for Alzheimer Disease and Related Dementias Risk Prediction: Algorithm Development and Validation Study", journal="JMIR Aging", year="2024", month="Jul", day="8", volume="7", pages="e54748", keywords="Alzheimer disease and related dementias", keywords="risk prediction", keywords="graph neural network", keywords="relation importance", keywords="machine learning", abstract="Background: Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. Objective: The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction. Methods: We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction. Results: In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6\% and 9.1\%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5\% and 8.9\%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1\% and 8.5\% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression. Conclusions: Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data. ", doi="10.2196/54748", url="https://aging.jmir.org/2024/1/e54748" } @Article{info:doi/10.2196/57981, author="Xu, Jie and Talankar, Sankalp and Pan, Jinqian and Harmon, Ira and Wu, Yonghui and Fedele, A. David and Brailsford, Jennifer and Fishe, Noel Jennifer", title="Combining Federated Machine Learning and Qualitative Methods to Investigate Novel Pediatric Asthma Subtypes: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2024", month="Jul", day="8", volume="13", pages="e57981", keywords="pediatric asthma", keywords="machine learning", keywords="federated learning", keywords="qualitative research", abstract="Background: Pediatric asthma is a heterogeneous disease; however, current characterizations of its subtypes are limited. Machine learning (ML) methods are well-suited for identifying subtypes. In particular, deep neural networks can learn patient representations by leveraging longitudinal information captured in electronic health records (EHRs) while considering future outcomes. However, the traditional approach for subtype analysis requires large amounts of EHR data, which may contain protected health information causing potential concerns regarding patient privacy. Federated learning is the key technology to address privacy concerns while preserving the accuracy and performance of ML algorithms. Federated learning could enable multisite development and implementation of ML algorithms to facilitate the translation of artificial intelligence into clinical practice. Objective: The aim of this study is to develop a research protocol for implementation of federated ML across a large clinical research network to identify and discover pediatric asthma subtypes and their progression over time. Methods: This mixed methods study uses data and clinicians from the OneFlorida+ clinical research network, which is a large regional network covering linked and longitudinal patient-level real-world data (RWD) of over 20 million patients from Florida, Georgia, and Alabama in the United States. To characterize the subtypes, we will use OneFlorida+ data from 2011 to 2023 and develop a research-grade pediatric asthma computable phenotype and clinical natural language processing pipeline to identify pediatric patients with asthma aged 2-18 years. We will then apply federated learning to characterize pediatric asthma subtypes and their temporal progression. Using the Promoting Action on Research Implementation in Health Services framework, we will conduct focus groups with practicing pediatric asthma clinicians within the OneFlorida+ network to investigate the clinical utility of the subtypes. With a user-centered design, we will create prototypes to visualize the subtypes in the EHR to best assist with the clinical management of children with asthma. Results: OneFlorida+ data from 2011 to 2023 have been collected for 411,628 patients aged 2-18 years along with 11,156,148 clinical notes. We expect to complete the computable phenotyping within the first year of the project, followed by subtyping during the second and third years, and then will perform the focus groups and establish the user-centered design in the fourth and fifth years of the project. Conclusions: Pediatric asthma subtypes incorporating RWD from diverse populations could improve patient outcomes by moving the field closer to precision pediatric asthma care. Our privacy-preserving federated learning methodology and qualitative implementation work will address several challenges of applying ML to large, multicenter RWD data. International Registered Report Identifier (IRRID): DERR1-10.2196/57981 ", doi="10.2196/57981", url="https://www.researchprotocols.org/2024/1/e57981", url="http://www.ncbi.nlm.nih.gov/pubmed/38976313" } @Article{info:doi/10.2196/54263, author="Shang, Yong and Tian, Yu and Lyu, Kewei and Zhou, Tianshu and Zhang, Ping and Chen, Jianghua and Li, Jingsong", title="Electronic Health Record--Oriented Knowledge Graph System for Collaborative Clinical Decision Support Using Multicenter Fragmented Medical Data: Design and Application Study", journal="J Med Internet Res", year="2024", month="Jul", day="5", volume="26", pages="e54263", keywords="knowledge graph", keywords="electronic health record", keywords="ontology", keywords="data fragmentation", keywords="data privacy", keywords="knowledge graphs", keywords="visualization", keywords="ontologies", keywords="data science", keywords="privacy", keywords="security", keywords="collaborative", keywords="collaboration", keywords="kidney", keywords="CKD", keywords="nephrology", keywords="EHR", keywords="health record", keywords="hypernym", keywords="encryption", keywords="encrypt", keywords="encrypted", keywords="decision support", keywords="semantic", keywords="vocabulary", keywords="blockchain", abstract="Background: The medical knowledge graph provides explainable decision support, helping clinicians with prompt diagnosis and treatment suggestions. However, in real-world clinical practice, patients visit different hospitals seeking various medical services, resulting in fragmented patient data across hospitals. With data security issues, data fragmentation limits the application of knowledge graphs because single-hospital data cannot provide complete evidence for generating precise decision support and comprehensive explanations. It is important to study new methods for knowledge graph systems to integrate into multicenter, information-sensitive medical environments, using fragmented patient records for decision support while maintaining data privacy and security. Objective: This study aims to propose an electronic health record (EHR)--oriented knowledge graph system for collaborative reasoning with multicenter fragmented patient medical data, all the while preserving data privacy. Methods: The study introduced an EHR knowledge graph framework and a novel collaborative reasoning process for utilizing multicenter fragmented information. The system was deployed in each hospital and used a unified semantic structure and Observational Medical Outcomes Partnership (OMOP) vocabulary to standardize the local EHR data set. The system transforms local EHR data into semantic formats and performs semantic reasoning to generate intermediate reasoning findings. The generated intermediate findings used hypernym concepts to isolate original medical data. The intermediate findings and hash-encrypted patient identities were synchronized through a blockchain network. The multicenter intermediate findings were collaborated for final reasoning and clinical decision support without gathering original EHR data. Results: The system underwent evaluation through an application study involving the utilization of multicenter fragmented EHR data to alert non-nephrology clinicians about overlooked patients with chronic kidney disease (CKD). The study covered 1185 patients in nonnephrology departments from 3 hospitals. The patients visited at least two of the hospitals. Of these, 124 patients were identified as meeting CKD diagnosis criteria through collaborative reasoning using multicenter EHR data, whereas the data from individual hospitals alone could not facilitate the identification of CKD in these patients. The assessment by clinicians indicated that 78/91 (86\%) patients were CKD positive. Conclusions: The proposed system was able to effectively utilize multicenter fragmented EHR data for clinical application. The application study showed the clinical benefits of the system with prompt and comprehensive decision support. ", doi="10.2196/54263", url="https://www.jmir.org/2024/1/e54263" } @Article{info:doi/10.2196/51397, author="Duggan, M. Nicole and Jin, Mike and Duran Mendicuti, Alejandra Maria and Hallisey, Stephen and Bernier, Denie and Selame, A. Lauren and Asgari-Targhi, Ameneh and Fischetti, E. Chanel and Lucassen, Ruben and Samir, E. Anthony and Duhaime, Erik and Kapur, Tina and Goldsmith, J. Andrew", title="Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis", journal="J Med Internet Res", year="2024", month="Jul", day="4", volume="26", pages="e51397", keywords="crowdsource", keywords="crowdsourced", keywords="crowdsourcing", keywords="machine learning", keywords="artificial intelligence", keywords="point-of-care ultrasound", keywords="POCUS", keywords="lung ultrasound", keywords="B-lines", keywords="gamification", keywords="gamify", keywords="gamified", keywords="label", keywords="labels", keywords="labeling", keywords="classification", keywords="lung", keywords="pulmonary", keywords="respiratory", keywords="ultrasound", keywords="imaging", keywords="medical image", keywords="diagnostic", keywords="diagnose", keywords="diagnosis", keywords="data science", abstract="Background: Machine learning (ML) models can yield faster and more accurate medical diagnoses; however, developing ML models is limited by a lack of high-quality labeled training data. Crowdsourced labeling is a potential solution but can be constrained by concerns about label quality. Objective: This study aims to examine whether a gamified crowdsourcing platform with continuous performance assessment, user feedback, and performance-based incentives could produce expert-quality labels on medical imaging data. Methods: In this diagnostic comparison study, 2384 lung ultrasound clips were retrospectively collected from 203 emergency department patients. A total of 6 lung ultrasound experts classified 393 of these clips as having no B-lines, one or more discrete B-lines, or confluent B-lines to create 2 sets of reference standard data sets (195 training clips and 198 test clips). Sets were respectively used to (1) train users on a gamified crowdsourcing platform and (2) compare the concordance of the resulting crowd labels to the concordance of individual experts to reference standards. Crowd opinions were sourced from DiagnosUs (Centaur Labs) iOS app users over 8 days, filtered based on past performance, aggregated using majority rule, and analyzed for label concordance compared with a hold-out test set of expert-labeled clips. The primary outcome was comparing the labeling concordance of collated crowd opinions to trained experts in classifying B-lines on lung ultrasound clips. Results: Our clinical data set included patients with a mean age of 60.0 (SD 19.0) years; 105 (51.7\%) patients were female and 114 (56.1\%) patients were White. Over the 195 training clips, the expert-consensus label distribution was 114 (58\%) no B-lines, 56 (29\%) discrete B-lines, and 25 (13\%) confluent B-lines. Over the 198 test clips, expert-consensus label distribution was 138 (70\%) no B-lines, 36 (18\%) discrete B-lines, and 24 (12\%) confluent B-lines. In total, 99,238 opinions were collected from 426 unique users. On a test set of 198 clips, the mean labeling concordance of individual experts relative to the reference standard was 85.0\% (SE 2.0), compared with 87.9\% crowdsourced label concordance (P=.15). When individual experts' opinions were compared with reference standard labels created by majority vote excluding their own opinion, crowd concordance was higher than the mean concordance of individual experts to reference standards (87.4\% vs 80.8\%, SE 1.6 for expert concordance; P<.001). Clips with discrete B-lines had the most disagreement from both the crowd consensus and individual experts with the expert consensus. Using randomly sampled subsets of crowd opinions, 7 quality-filtered opinions were sufficient to achieve near the maximum crowd concordance. Conclusions: Crowdsourced labels for B-line classification on lung ultrasound clips via a gamified approach achieved expert-level accuracy. This suggests a strategic role for gamified crowdsourcing in efficiently generating labeled image data sets for training ML systems. ", doi="10.2196/51397", url="https://www.jmir.org/2024/1/e51397" } @Article{info:doi/10.2196/52045, author="Maekawa, Eduardo and Grua, Martino Eoin and Nakamura, Akemi Carina and Scazufca, Marcia and Araya, Ricardo and Peters, Tim and van de Ven, Pepijn", title="Bayesian Networks for Prescreening in Depression: Algorithm Development and Validation", journal="JMIR Ment Health", year="2024", month="Jul", day="4", volume="11", pages="e52045", keywords="Bayesian network", keywords="target depressive symptomatology", keywords="probabilistic machine learning", keywords="stochastic gradient descent", keywords="patient screening", keywords="depressive symptom", keywords="machine learning model", keywords="machine learning", keywords="survey", keywords="prediction", keywords="socioeconomic data sets", keywords="utilization", keywords="depression", keywords="mental health", keywords="digital mental health", keywords="artificial intelligence", keywords="AI", keywords="prediction modeling", keywords="patient", keywords="mood", keywords="anxiety", keywords="mood disorders", keywords="mood disorder", keywords="eHealth", keywords="mobile health", keywords="mHealth", keywords="telehealth", abstract="Background: Identifying individuals with depressive symptomatology (DS) promptly and effectively is of paramount importance for providing timely treatment. Machine learning models have shown promise in this area; however, studies often fall short in demonstrating the practical benefits of using these models and fail to provide tangible real-world applications. Objective: This study aims to establish a novel methodology for identifying individuals likely to exhibit DS, identify the most influential features in a more explainable way via probabilistic measures, and propose tools that can be used in real-world applications. Methods: The study used 3 data sets: PROACTIVE, the Brazilian National Health Survey (Pesquisa?Nacional?de Sa{\'u}de [PNS]) 2013, and PNS 2019, comprising sociodemographic and health-related features. A Bayesian network was used for feature selection. Selected features were then used to train machine learning models to predict DS, operationalized as a score of ?10 on the 9-item Patient Health Questionnaire. The study also analyzed the impact of varying sensitivity rates on the reduction of screening interviews compared to a random approach. Results: The methodology allows the users to make an informed trade-off among sensitivity, specificity, and a reduction in the number of interviews. At the thresholds of 0.444, 0.412, and 0.472, determined by maximizing the Youden index, the models achieved sensitivities of 0.717, 0.741, and 0.718, and specificities of 0.644, 0.737, and 0.766 for PROACTIVE, PNS 2013, and PNS 2019, respectively. The area under the receiver operating characteristic curve was 0.736, 0.801, and 0.809 for these 3 data sets, respectively. For the PROACTIVE data set, the most influential features identified were postural balance, shortness of breath, and how old people feel they are. In the PNS 2013 data set, the features were the ability to do usual activities, chest pain, sleep problems, and chronic back problems. The PNS 2019 data set shared 3 of the most influential features with the PNS 2013 data set. However, the difference was the replacement of chronic back problems with verbal abuse. It is important to note that the features contained in the PNS data sets differ from those found in the PROACTIVE data set. An empirical analysis demonstrated that using the proposed model led to a potential reduction in screening interviews of up to 52\% while maintaining a sensitivity of 0.80. Conclusions: This study developed a novel methodology for identifying individuals with DS, demonstrating the utility of using Bayesian networks to identify the most significant features. Moreover, this approach has the potential to substantially reduce the number of screening interviews while maintaining high sensitivity, thereby facilitating improved early identification and intervention strategies for individuals experiencing DS. ", doi="10.2196/52045", url="https://mental.jmir.org/2024/1/e52045" } @Article{info:doi/10.2196/56127, author="Wong, Chia-En and Chen, Pei-Wen and Hsu, Heng-Jui and Cheng, Shao-Yang and Fan, Chen-Che and Chen, Yen-Chang and Chiu, Yi-Pei and Lee, Jung-Shun and Liang, Sheng-Fu", title="Collaborative Human--Computer Vision Operative Video Analysis Algorithm for Analyzing Surgical Fluency and Surgical Interruptions in Endonasal Endoscopic Pituitary Surgery: Cohort Study", journal="J Med Internet Res", year="2024", month="Jul", day="4", volume="26", pages="e56127", keywords="algorithm", keywords="computer vision", keywords="endonasal endoscopic approach", keywords="pituitary", keywords="transsphenoidal surgery", abstract="Background: The endonasal endoscopic approach (EEA) is effective for pituitary adenoma resection. However, manual review of operative videos is time-consuming. The application of a computer vision (CV) algorithm could potentially reduce the time required for operative video review and facilitate the training of surgeons to overcome the learning curve of EEA. Objective: This study aimed to evaluate the performance of a CV-based video analysis system, based on OpenCV algorithm, to detect surgical interruptions and analyze surgical fluency in EEA. The accuracy of the CV-based video analysis was investigated, and the time required for operative video review using CV-based analysis was compared to that of manual review. Methods: The dominant color of each frame in the EEA video was determined using OpenCV. We developed an algorithm to identify events of surgical interruption if the alterations in the dominant color pixels reached certain thresholds. The thresholds were determined by training the current algorithm using EEA videos. The accuracy of the CV analysis was determined by manual review, and the time spent was reported. Results: A total of 46 EEA operative videos were analyzed, with 93.6\%, 95.1\%, and 93.3\% accuracies in the training, test 1, and test 2 data sets, respectively. Compared with manual review, CV-based analysis reduced the time required for operative video review by 86\% (manual review: 166.8 and CV analysis: 22.6 minutes; P<.001). The application of a human-computer collaborative strategy increased the overall accuracy to 98.5\%, with a 74\% reduction in the review time (manual review: 166.8 and human-CV collaboration: 43.4 minutes; P<.001). Analysis of the different surgical phases showed that the sellar phase had the lowest frequency (nasal phase: 14.9, sphenoidal phase: 15.9, and sellar phase: 4.9 interruptions/10 minutes; P<.001) and duration (nasal phase: 67.4, sphenoidal phase: 77.9, and sellar phase: 31.1 seconds/10 minutes; P<.001) of surgical interruptions. A comparison of the early and late EEA videos showed that increased surgical experience was associated with a decreased number (early: 4.9 and late: 2.9 interruptions/10 minutes; P=.03) and duration (early: 41.1 and late: 19.8 seconds/10 minutes; P=.02) of surgical interruptions during the sellar phase. Conclusions: CV-based analysis had a 93\% to 98\% accuracy in detecting the number, frequency, and duration of surgical interruptions occurring during EEA. Moreover, CV-based analysis reduced the time required to analyze the surgical fluency in EEA videos compared to manual review. The application of CV can facilitate the training of surgeons to overcome the learning curve of endoscopic skull base surgery. Trial Registration: ClinicalTrials.gov NCT06156020; https://clinicaltrials.gov/study/NCT06156020 ", doi="10.2196/56127", url="https://www.jmir.org/2024/1/e56127", url="http://www.ncbi.nlm.nih.gov/pubmed/38963694" } @Article{info:doi/10.2196/52139, author="Cho, Youngjin and Yoon, Minjae and Kim, Joonghee and Lee, Hyun Ji and Oh, Il-Young and Lee, Joo Chan and Kang, Seok-Min and Choi, Dong-Ju", title="Artificial Intelligence--Based Electrocardiographic Biomarker for Outcome Prediction in Patients With Acute Heart Failure: Prospective Cohort Study", journal="J Med Internet Res", year="2024", month="Jul", day="3", volume="26", pages="e52139", keywords="acute heart failure", keywords="electrocardiography", keywords="artificial intelligence", keywords="deep learning", abstract="Background: Although several biomarkers exist for patients with heart failure (HF), their use in routine clinical practice is often constrained by high costs and limited availability. Objective: We examined the utility of an artificial intelligence (AI) algorithm that analyzes printed electrocardiograms (ECGs) for outcome prediction in patients with acute HF. Methods: We retrospectively analyzed prospectively collected data of patients with acute HF at two tertiary centers in Korea. Baseline ECGs were analyzed using a deep-learning system called Quantitative ECG (QCG), which was trained to detect several urgent clinical conditions, including shock, cardiac arrest, and reduced left ventricular ejection fraction (LVEF). Results: Among the 1254 patients enrolled, in-hospital cardiac death occurred in 53 (4.2\%) patients, and the QCG score for critical events (QCG-Critical) was significantly higher in these patients than in survivors (mean 0.57, SD 0.23 vs mean 0.29, SD 0.20; P<.001). The QCG-Critical score was an independent predictor of in-hospital cardiac death after adjustment for age, sex, comorbidities, HF etiology/type, atrial fibrillation, and QRS widening (adjusted odds ratio [OR] 1.68, 95\% CI 1.47-1.92 per 0.1 increase; P<.001), and remained a significant predictor after additional adjustments for echocardiographic LVEF and N-terminal prohormone of brain natriuretic peptide level (adjusted OR 1.59, 95\% CI 1.36-1.87 per 0.1 increase; P<.001). During long-term follow-up, patients with higher QCG-Critical scores (>0.5) had higher mortality rates than those with low QCG-Critical scores (<0.25) (adjusted hazard ratio 2.69, 95\% CI 2.14-3.38; P<.001). Conclusions: Predicting outcomes in patients with acute HF using the QCG-Critical score is feasible, indicating that this AI-based ECG score may be a novel biomarker for these patients. Trial Registration: ClinicalTrials.gov NCT01389843; https://clinicaltrials.gov/study/NCT01389843 ", doi="10.2196/52139", url="https://www.jmir.org/2024/1/e52139" } @Article{ref1, url="" } @Article{info:doi/10.2196/50295, author="S{\'a}ez, Carlos and Ferri, Pablo and Garc{\'i}a-G{\'o}mez, M. Juan", title="Resilient Artificial Intelligence in Health: Synthesis and Research Agenda Toward Next-Generation Trustworthy Clinical Decision Support", journal="J Med Internet Res", year="2024", month="Jun", day="28", volume="26", pages="e50295", keywords="artificial intelligence", keywords="clinical decision support", keywords="resilience", keywords="clinical medicine", keywords="machine learning", keywords="data quality", keywords="fairness", keywords="trustworthy AI", keywords="regulation", keywords="AI regulation", keywords="AI Act", keywords="EHDS", keywords="European Health Data Space", keywords="emergency medical dispatch", keywords="clinical decision support systems", doi="10.2196/50295", url="https://www.jmir.org/2024/1/e50295", url="http://www.ncbi.nlm.nih.gov/pubmed/38941134" } @Article{info:doi/10.2196/58491, author="Lu, Linken and Lu, Tangsheng and Tian, Chunyu and Zhang, Xiujun", title="AI: Bridging Ancient Wisdom and Modern Innovation in Traditional Chinese Medicine", journal="JMIR Med Inform", year="2024", month="Jun", day="28", volume="12", pages="e58491", keywords="traditional Chinese medicine", keywords="TCM", keywords="artificial intelligence", keywords="AI", keywords="diagnosis", doi="10.2196/58491", url="https://medinform.jmir.org/2024/1/e58491", url="http://www.ncbi.nlm.nih.gov/pubmed/38941141" } @Article{info:doi/10.2196/56345, author="Razjouyan, Javad and Orkaby, R. Ariela and Horstman, J. Molly and Goyal, Parag and Intrator, Orna and Naik, D. Aanand", title="The Frailty Trajectory's Additional Edge Over the Frailty Index: Retrospective Cohort Study of Veterans With Heart Failure", journal="JMIR Aging", year="2024", month="Jun", day="27", volume="7", pages="e56345", keywords="gerontology", keywords="geriatric", keywords="geriatrics", keywords="older adult", keywords="older adults", keywords="elder", keywords="elderly", keywords="older person", keywords="older people", keywords="ageing", keywords="aging", keywords="frailty", keywords="frailty index", keywords="frailty trajectory", keywords="frail", keywords="weak", keywords="weakness", keywords="heart failure", keywords="HF", keywords="cardiovascular disease", keywords="CVD", keywords="congestive heart failure", keywords="CHF", keywords="myocardial infarction", keywords="MI", keywords="unstable angina", keywords="angina", keywords="cardiac arrest", keywords="atherosclerosis", keywords="cardiology", keywords="cardiac", keywords="cardiologist", keywords="cardiologists", doi="10.2196/56345", url="https://aging.jmir.org/2024/1/e56345" } @Article{info:doi/10.2196/55855, author="Schaffert, Daniel and Bibi, Igor and Blauth, Mara and Lull, Christian and von Ahnen, Alwin Jan and Gross, Georg and Schulze-Hagen, Theresa and Knitza, Johannes and Kuhn, Sebastian and Benecke, Johannes and Schmieder, Astrid and Leipe, Jan and Olsavszky, Victor", title="Using Automated Machine Learning to Predict Necessary Upcoming Therapy Changes in Patients With Psoriasis Vulgaris and Psoriatic Arthritis and Uncover New Influences on Disease Progression: Retrospective Study", journal="JMIR Form Res", year="2024", month="Jun", day="27", volume="8", pages="e55855", keywords="psoriasis vulgaris", keywords="psoriatic arthritis", keywords="automated machine learning", keywords="therapy change", keywords="Psoriasis Area and Severity Index", keywords="PASI score change", keywords="Bath Ankylosing Spondylitis Disease Activity Index", keywords="BASDAI classification", keywords="mobile phone", abstract="Background: Psoriasis vulgaris (PsV) and psoriatic arthritis (PsA) are complex, multifactorial diseases significantly impacting health and quality of life. Predicting treatment response and disease progression is crucial for optimizing therapeutic interventions, yet challenging. Automated machine learning (AutoML) technology shows promise for rapidly creating accurate predictive models based on patient features and treatment data. Objective: This study aims to develop highly accurate machine learning (ML) models using AutoML to address key clinical questions for PsV and PsA patients, including predicting therapy changes, identifying reasons for therapy changes, and factors influencing skin lesion progression or an abnormal Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) score. Methods: Clinical study data from 309 PsV and PsA patients were extensively prepared and analyzed using AutoML to build and select the most accurate predictive models for each variable of interest. Results: Therapy change at 24 weeks follow-up was modeled using the extreme gradient boosted trees classifier with early stopping (area under the receiver operating characteristic curve [AUC] of 0.9078 and logarithmic loss [LogLoss] of 0.3955 for the holdout partition). Key influencing factors included the initial systemic therapeutic agent, the Classification Criteria for Psoriatic Arthritis score at baseline, and changes in quality of life. An average blender incorporating three models (gradient boosted trees classifier, ExtraTrees classifier, and Eureqa generalized additive model classifier) with an AUC of 0.8750 and LogLoss of 0.4603 was used to predict therapy changes for 2 hypothetical patients, highlighting the significance of these factors. Treatments such as methotrexate or specific biologicals showed a lower propensity for change. An average blender of a random forest classifier, an extreme gradient boosted trees classifier, and a Eureqa classifier (AUC of 0.9241 and LogLoss of 0.4498) was used to estimate PASI (Psoriasis Area and Severity Index) change after 24 weeks. Primary predictors included the initial PASI score, change in pruritus levels, and change in therapy. A lower initial PASI score and consistently low pruritus were associated with better outcomes. BASDAI classification at onset was analyzed using an average blender of a Eureqa generalized additive model classifier, an extreme gradient boosted trees classifier with early stopping, and a dropout additive regression trees classifier with an AUC of 0.8274 and LogLoss of 0.5037. Influential factors included initial pain, disease activity, and Hospital Anxiety and Depression Scale scores for depression and anxiety. Increased pain, disease activity, and psychological distress generally led to higher BASDAI scores. Conclusions: The practical implications of these models for clinical decision-making in PsV and PsA can guide early investigation and treatment, contributing to improved patient outcomes. ", doi="10.2196/55855", url="https://formative.jmir.org/2024/1/e55855", url="http://www.ncbi.nlm.nih.gov/pubmed/38738977" } @Article{info:doi/10.2196/52993, author="DeLaRosby, Anna and Mulcahy, Julie and Norwood, Todd", title="A Proposed Decision-Making Framework for the Translation of In-Person Clinical Care to Digital Care: Tutorial", journal="JMIR Med Educ", year="2024", month="Jun", day="26", volume="10", pages="e52993", keywords="clinical decision-making", keywords="digital health", keywords="telehealth", keywords="telerehab", keywords="framework", keywords="digital medicine", keywords="cognitive process", keywords="telemedicine", keywords="clinical training", doi="10.2196/52993", url="https://mededu.jmir.org/2024/1/e52993" } @Article{info:doi/10.2196/59267, author="Hirosawa, Takanobu and Harada, Yukinori and Mizuta, Kazuya and Sakamoto, Tetsu and Tokumasu, Kazuki and Shimizu, Taro", title="Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases", journal="JMIR Form Res", year="2024", month="Jun", day="26", volume="8", pages="e59267", keywords="decision support system", keywords="diagnostic errors", keywords="diagnostic excellence", keywords="diagnosis", keywords="large language model", keywords="LLM", keywords="natural language processing", keywords="GPT-4", keywords="ChatGPT", keywords="diagnoses", keywords="physicians", keywords="artificial intelligence", keywords="AI", keywords="chatbots", keywords="medical diagnosis", keywords="assessment", keywords="decision-making support", keywords="application", keywords="applications", keywords="app", keywords="apps", abstract="Background: The potential of artificial intelligence (AI) chatbots, particularly ChatGPT with GPT-4 (OpenAI), in assisting with medical diagnosis is an emerging research area. However, it is not yet clear how well AI chatbots can evaluate whether the final diagnosis is included in differential diagnosis lists. Objective: This study aims to assess the capability of GPT-4 in identifying the final diagnosis from differential-diagnosis lists and to compare its performance with that of physicians for case report series. Methods: We used a database of differential-diagnosis lists from case reports in the American Journal of Case Reports, corresponding to final diagnoses. These lists were generated by 3 AI systems: GPT-4, Google Bard (currently Google Gemini), and Large Language Models by Meta AI 2 (LLaMA2). The primary outcome was focused on whether GPT-4's evaluations identified the final diagnosis within these lists. None of these AIs received additional medical training or reinforcement. For comparison, 2 independent physicians also evaluated the lists, with any inconsistencies resolved by another physician. Results: The 3 AIs generated a total of 1176 differential diagnosis lists from 392 case descriptions. GPT-4's evaluations concurred with those of the physicians in 966 out of 1176 lists (82.1\%). The Cohen $\kappa$ coefficient was 0.63 (95\% CI 0.56-0.69), indicating a fair to good agreement between GPT-4 and the physicians' evaluations. Conclusions: GPT-4 demonstrated a fair to good agreement in identifying the final diagnosis from differential-diagnosis lists, comparable to physicians for case report series. Its ability to compare differential diagnosis lists with final diagnoses suggests its potential to aid clinical decision-making support through diagnostic feedback. While GPT-4 showed a fair to good agreement for evaluation, its application in real-world scenarios and further validation in diverse clinical environments are essential to fully understand its utility in the diagnostic process. ", doi="10.2196/59267", url="https://formative.jmir.org/2024/1/e59267" } @Article{info:doi/10.2196/50980, author="Spoladore, Daniele and Colombo, Vera and Fumagalli, Alessia and Tosi, Martina and Lorenzini, Cecilia Erna and Sacco, Marco", title="An Ontology-Based Decision Support System for Tailored Clinical Nutrition Recommendations for Patients With Chronic Obstructive Pulmonary Disease: Development and Acceptability Study", journal="JMIR Med Inform", year="2024", month="Jun", day="26", volume="12", pages="e50980", keywords="ontology-based decision support system", keywords="nutritional recommendation", keywords="chronic obstructive pulmonary disease", keywords="clinical decision support system", keywords="pulmonary rehabilitation", abstract="Background: Chronic obstructive pulmonary disease (COPD) is a chronic condition among the main causes of morbidity and mortality worldwide, representing a burden on health care systems. Scientific literature highlights that nutrition is pivotal in respiratory inflammatory processes connected to COPD, including exacerbations. Patients with COPD have an increased risk of developing nutrition-related comorbidities, such as diabetes, cardiovascular diseases, and malnutrition. Moreover, these patients often manifest sarcopenia and cachexia. Therefore, an adequate nutritional assessment and therapy are essential to help individuals with COPD in managing the progress of the disease. However, the role of nutrition in pulmonary rehabilitation (PR) programs is often underestimated due to a lack of resources and dedicated services, mostly because pneumologists may lack the specialized training for such a discipline. Objective: This work proposes a novel knowledge-based decision support system to support pneumologists in considering nutritional aspects in PR. The system provides clinicians with patient-tailored dietary recommendations leveraging expert knowledge. Methods: The expert knowledge---acquired from experts and clinical literature---was formalized in domain ontologies and rules, which were developed leveraging the support of Italian clinicians with expertise in the rehabilitation of patients with COPD. Thus, by following an agile ontology engineering methodology, the relevant formal ontologies were developed to act as a backbone for an application targeted at pneumologists. The recommendations provided by the decision support system were validated by a group of nutrition experts, whereas the acceptability of such an application in the context of PR was evaluated by pneumologists. Results: A total of 7 dieticians (mean age 46.60, SD 13.35 years) were interviewed to assess their level of agreement with the decision support system's recommendations by evaluating 5 patients' health conditions. The preliminary results indicate that the system performed more than adequately (with an overall average score of 4.23, SD 0.52 out of 5 points), providing meaningful and safe recommendations in compliance with clinical practice. With regard to the acceptability of the system by lung specialists (mean age 44.71, SD 11.94 years), the usefulness and relevance of the proposed solution were extremely positive---the scores on each of the perceived usefulness subscales of the technology acceptance model 3 were 4.86 (SD 0.38) out of 5 points, whereas the score on the intention to use subscale was 4.14 (SD 0.38) out of 5 points. Conclusions: Although designed for the Italian clinical context, the proposed system can be adapted for any other national clinical context by modifying the domain ontologies, thus providing a multidisciplinary approach to the management of patients with COPD. ", doi="10.2196/50980", url="https://medinform.jmir.org/2024/1/e50980", url="http://www.ncbi.nlm.nih.gov/pubmed/38922666" } @Article{info:doi/10.2196/56241, author="O'Driscoll, Fiona and O'Brien, Niki and Guo, Chaohui and Prime, Matthew and Darzi, Ara and Ghafur, Saira", title="Clinical Simulation in the Regulation of Software as a Medical Device: An eDelphi Study", journal="JMIR Form Res", year="2024", month="Jun", day="25", volume="8", pages="e56241", keywords="digital health technology", keywords="software as a medical device", keywords="clinical simulation", keywords="Delphi study", keywords="eDelphi study", keywords="artificial intelligence", keywords="digital health", abstract="Background: Accelerated digitalization in the health sector requires the development of appropriate evaluation methods to ensure that digital health technologies (DHTs) are safe and effective. Software as a medical device (SaMD) is a commonly used DHT by clinicians to provide care to patients. Traditional research methods for evaluating health care products, such as randomized clinical trials, may not be suitable for DHTs, such as SaMD. However, evidence to show their safety and efficacy is needed by regulators before they can be used in practice. Clinical simulation can be used by researchers to test SaMD in an agile and low-cost way; yet, there is limited research on criteria to assess the robustness of simulations and, subsequently, their relevance for a regulatory decision. Objective: The objective of this study was to gain consensus on the criteria that should be used to assess clinical simulation from a regulatory perspective when it is used to generate evidence for SaMD. Methods: An eDelphi study approach was chosen to develop a set of criteria to assess clinical simulation when used to evaluate SaMD. Participants were recruited through purposive and snowball sampling based on their experience and knowledge in relevant sectors. They were guided through an initial scoping questionnaire with key themes identified from the literature to obtain a comprehensive list of criteria. Participants voted upon these criteria in 2 Delphi rounds, with criteria being excluded if consensus was not met. Participants were invited to add qualitative comments during rounds and qualitative analysis was performed on the comments gathered during the first round. Consensus was predefined by 2 criteria: if <10\% of the panelists deemed the criteria as ``not important'' or ``not important at all'' and >60\% ``important'' or ``very important.'' Results: In total, 33 international experts in the digital health field, including academics, regulators, policy makers, and industry representatives, completed both Delphi rounds, and 43 criteria gained consensus from the participants. The research team grouped these criteria into 7 domains---background and context, overall study design, study population, delivery of the simulation, fidelity, software and artificial intelligence, and study analysis. These 7 domains were formulated into the simulation for regulation of SaMD framework. There were key areas of concern identified by participants regarding the framework criteria, such as the importance of how simulation fidelity is achieved and reported and the avoidance of bias throughout all stages. Conclusions: This study proposes the simulation for regulation of SaMD framework, developed through an eDelphi consensus process, to evaluate clinical simulation when used to assess SaMD. Future research should prioritize the development of safe and effective SaMD, while implementing and refining the framework criteria to adapt to new challenges. ", doi="10.2196/56241", url="https://formative.jmir.org/2024/1/e56241" } @Article{info:doi/10.2196/54265, author="Soares, Andrey and Schilling, M. Lisa and Richardson, Joshua and Kommadi, Bhagvan and Subbian, Vignesh and Dehnbostel, Joanne and Shahin, Khalid and Robinson, A. Karen and Afzal, Muhammad and Lehmann, P. Harold and Kunnamo, Ilkka and Alper, S. Brian", title="Making Science Computable Using Evidence-Based Medicine on Fast Healthcare Interoperability Resources: Standards Development Project", journal="J Med Internet Res", year="2024", month="Jun", day="25", volume="26", pages="e54265", keywords="evidence-based medicine", keywords="FHIR", keywords="Fast Healthcare Interoperability Resources", keywords="computable evidence", keywords="EBMonFHIR", keywords="evidence-based medicine on Fast Healthcare Interoperability Resources", abstract="Background: Evidence-based medicine (EBM) has the potential to improve health outcomes, but EBM has not been widely integrated into the systems used for research or clinical decision-making. There has not been a scalable and reusable computer-readable standard for distributing research results and synthesized evidence among creators, implementers, and the ultimate users of that evidence. Evidence that is more rapidly updated, synthesized, disseminated, and implemented would improve both the delivery of EBM and evidence-based health care policy. Objective: This study aimed to introduce the EBM on Fast Healthcare Interoperability Resources (FHIR) project (EBMonFHIR), which is extending the methods and infrastructure of Health Level Seven (HL7) FHIR to provide an interoperability standard for the electronic exchange of health-related scientific knowledge. Methods: As an ongoing process, the project creates and refines FHIR resources to represent evidence from clinical studies and syntheses of those studies and develops tools to assist with the creation and visualization of FHIR resources. Results: The EBMonFHIR project created FHIR resources (ie, ArtifactAssessment, Citation, Evidence, EvidenceReport, and EvidenceVariable) for representing evidence. The COVID-19 Knowledge Accelerator (COKA) project, now Health Evidence Knowledge Accelerator (HEvKA), took this work further and created FHIR resources that express EvidenceReport, Citation, and ArtifactAssessment concepts. The group is (1) continually refining FHIR resources to support the representation of EBM; (2) developing controlled terminology related to EBM (ie, study design, statistic type, statistical model, and risk of bias); and (3) developing tools to facilitate the visualization and data entry of EBM information into FHIR resources, including human-readable interfaces and JSON viewers. Conclusions: EBMonFHIR resources in conjunction with other FHIR resources can support relaying EBM components in a manner that is interoperable and consumable by downstream tools and health information technology systems to support the users of evidence. ", doi="10.2196/54265", url="https://www.jmir.org/2024/1/e54265", url="http://www.ncbi.nlm.nih.gov/pubmed/38916936" } @Article{info:doi/10.2196/53162, author="Gonz{\'a}lez-Colom, Rub{\`e}n and Mitra, Kangkana and Vela, Emili and Gezsi, Andras and Paajanen, Teemu and G{\'a}l, Zs{\'o}fia and Hullam, Gabor and M{\"a}kinen, Hannu and Nagy, Tamas and Kuokkanen, Mikko and Piera-Jim{\'e}nez, Jordi and Roca, Josep and Antal, Peter and Juhasz, Gabriella and Cano, Isaac", title="Multicentric Assessment of a Multimorbidity-Adjusted Disability Score to Stratify Depression-Related Risks Using Temporal Disease Maps: Instrument Validation Study", journal="J Med Internet Res", year="2024", month="Jun", day="24", volume="26", pages="e53162", keywords="health risk assessment", keywords="multimorbidity", keywords="disease trajectories", keywords="major depressive disorder", abstract="Background: Comprehensive management of multimorbidity can significantly benefit from advanced health risk assessment tools that facilitate value-based interventions, allowing for the assessment and prediction of disease progression. Our study proposes a novel methodology, the Multimorbidity-Adjusted Disability Score (MADS), which integrates disease trajectory methodologies with advanced techniques for assessing interdependencies among concurrent diseases. This approach is designed to better assess the clinical burden of clusters of interrelated diseases and enhance our ability to anticipate disease progression, thereby potentially informing targeted preventive care interventions. Objective: This study aims to evaluate the effectiveness of the MADS in stratifying patients into clinically relevant risk groups based on their multimorbidity profiles, which accurately reflect their clinical complexity and the probabilities of developing new associated disease conditions. Methods: In a retrospective multicentric cohort study, we developed the MADS by analyzing disease trajectories and applying Bayesian statistics to determine disease-disease probabilities combined with well-established disability weights. We used major depressive disorder (MDD) as a primary case study for this evaluation. We stratified patients into different risk levels corresponding to?different?percentiles of MADS distribution. We statistically assessed the association of MADS risk strata with mortality, health care resource use, and disease progression across 1 million individuals from Spain, the United Kingdom, and Finland. Results: The results revealed significantly different distributions of the assessed outcomes across the MADS risk tiers, including mortality rates; primary care visits; specialized care outpatient consultations; visits in mental health specialized centers; emergency room visits; hospitalizations; pharmacological and nonpharmacological expenditures; and dispensation of antipsychotics, anxiolytics, sedatives, and antidepressants (P<.001 in all cases). Moreover, the results of the pairwise comparisons between adjacent risk tiers illustrate a substantial and gradual pattern of increased mortality rate, heightened health care use, increased health care expenditures, and a raised pharmacological burden as individuals progress from lower MADS risk tiers to higher-risk tiers. The analysis also revealed an augmented risk of multimorbidity progression within the high-risk groups, aligned with a higher incidence of new onsets of MDD-related diseases. Conclusions: The MADS seems to be a promising approach for predicting health risks associated with multimorbidity. It might complement current risk assessment state-of-the-art tools by providing valuable insights for tailored epidemiological impact analyses of clusters of interrelated diseases and by accurately assessing multimorbidity progression risks. This study paves the way for innovative digital developments to support advanced health risk assessment strategies.?Further validation is required to generalize its use beyond the initial case study of MDD. ", doi="10.2196/53162", url="https://www.jmir.org/2024/1/e53162", url="http://www.ncbi.nlm.nih.gov/pubmed/38913991" } @Article{info:doi/10.2196/49613, author="Lin, Z. Rebecca and Amith, Tuan Muhammad and Wang, X. Cynthia and Strickley, John and Tao, Cui", title="Dermoscopy Differential Diagnosis Explorer (D3X) Ontology to Aggregate and Link Dermoscopic Patterns to Differential Diagnoses: Development and Usability Study", journal="JMIR Med Inform", year="2024", month="Jun", day="21", volume="12", pages="e49613", keywords="medical informatics", keywords="biomedical ontology", keywords="ontology", keywords="ontologies", keywords="vocabulary", keywords="OWL", keywords="web ontology language", keywords="skin", keywords="semiotic", keywords="web app", keywords="web application", keywords="visual", keywords="visualization", keywords="dermoscopic", keywords="diagnosis", keywords="diagnoses", keywords="diagnostic", keywords="information storage", keywords="information retrieval", keywords="skin lesion", keywords="skin diseases", keywords="dermoscopy differential diagnosis explorer", keywords="dermatology", keywords="dermoscopy", keywords="differential diagnosis", keywords="information storage and retrieval", abstract="Background: Dermoscopy is a growing field that uses microscopy to allow dermatologists and primary care physicians to identify skin lesions. For a given skin lesion, a wide variety of differential diagnoses exist, which may be challenging for inexperienced users to name and understand. Objective: In this study, we describe the creation of the dermoscopy differential diagnosis explorer (D3X), an ontology linking dermoscopic patterns to differential diagnoses. Methods: Existing ontologies that were incorporated into D3X include the elements of visuals ontology and dermoscopy elements of visuals ontology, which connect visual features to dermoscopic patterns. A list of differential diagnoses for each pattern was generated from the literature and in consultation with domain experts. Open-source images were incorporated from DermNet, Dermoscopedia, and open-access research papers. Results: D3X was encoded in the OWL 2 web ontology language and includes 3041 logical axioms, 1519 classes, 103 object properties, and 20 data properties. We compared D3X with publicly available ontologies in the dermatology domain using a semiotic theory--driven metric to measure the innate qualities of D3X with others. The results indicate that D3X is adequately comparable with other ontologies of the dermatology domain. Conclusions: The D3X ontology is a resource that can link and integrate dermoscopic differential diagnoses and supplementary information with existing ontology-based resources. Future directions include developing a web application based on D3X for dermoscopy education and clinical practice. ", doi="10.2196/49613", url="https://medinform.jmir.org/2024/1/e49613", url="http://www.ncbi.nlm.nih.gov/pubmed/38904996" } @Article{info:doi/10.2196/46691, author="Wong, Chi-Wai David and Bonnici, Timothy and Gerry, Stephen and Birks, Jacqueline and Watkinson, J. Peter", title="Effect of Digital Early Warning Scores on Hospital Vital Sign Observation Protocol Adherence: Stepped-Wedge Evaluation", journal="J Med Internet Res", year="2024", month="Jun", day="20", volume="26", pages="e46691", keywords="vital signs", keywords="early warning score", keywords="track and trigger", keywords="electronic charting", keywords="stepped-wedge", keywords="vital", keywords="charting", keywords="documentation", keywords="deterioration", keywords="hospital management", keywords="clinical intervention", keywords="decision-making", keywords="patient risk", keywords="hospital", keywords="ICU", keywords="intensive care unit", keywords="UK", keywords="United Kingdom", keywords="intervention", abstract="Background: Early warning scores (EWS) are routinely used in hospitals to assess a patient's risk of deterioration. EWS are traditionally recorded on paper observation charts but are increasingly recorded digitally. In either case, evidence for the clinical effectiveness of such scores is mixed, and previous studies have not considered whether EWS leads to changes in how deteriorating patients are managed. Objective: This study aims to examine whether the introduction of a digital EWS system was associated with more frequent observation of patients with abnormal vital signs, a precursor to earlier clinical intervention. Methods: We conducted a 2-armed stepped-wedge study from February 2015 to December 2016, over 4 hospitals in 1 UK hospital trust. In the control arm, vital signs were recorded using paper observation charts. In the intervention arm, a digital EWS system was used. The primary outcome measure was time to next observation (TTNO), defined as the time between a patient's first elevated EWS (EWS ?3) and subsequent observations set. Secondary outcomes were time to death in the hospital, length of stay, and time to unplanned intensive care unit admission. Differences between the 2 arms were analyzed using a mixed-effects Cox model. The usability of the system was assessed using the system usability score survey. Results: We included 12,802 admissions, 1084 in the paper (control) arm and 11,718 in the digital EWS (intervention) arm. The system usability score was 77.6, indicating good usability. The median TTNO in the control and intervention arms were 128 (IQR 73-218) minutes and 131 (IQR 73-223) minutes, respectively. The corresponding hazard ratio for TTNO was 0.99 (95\% CI 0.91-1.07; P=.73). Conclusions: We demonstrated strong clinical engagement with the system. We found no difference in any of the predefined patient outcomes, suggesting that the introduction of a highly usable electronic system can be achieved without impacting clinical care. Our findings contrast with previous claims that digital EWS systems are associated with improvement in clinical outcomes. Future research should investigate how digital EWS systems can be integrated with new clinical pathways adjusting staff behaviors to improve patient outcomes. ", doi="10.2196/46691", url="https://www.jmir.org/2024/1/e46691", url="http://www.ncbi.nlm.nih.gov/pubmed/38900529" } @Article{info:doi/10.2196/50209, author="Abdullahi, Tassallah and Mercurio, Laura and Singh, Ritambhara and Eickhoff, Carsten", title="Retrieval-Based Diagnostic Decision Support: Mixed Methods Study", journal="JMIR Med Inform", year="2024", month="Jun", day="19", volume="12", pages="e50209", keywords="clinical decision support", keywords="rare diseases", keywords="ensemble learning", keywords="retrieval-augmented learning", keywords="machine learning", keywords="electronic health records", keywords="natural language processing", keywords="retrieval augmented generation", keywords="RAG", keywords="electronic health record", keywords="EHR", keywords="data sparsity", keywords="information retrieval", abstract="Background: Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability. Objective: This study aims to develop an information retrieval (IR)--based framework that accommodates data sparsity to facilitate broader diagnostic decision support. Methods: We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR's performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions. Results: On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions. Conclusions: Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases. ", doi="10.2196/50209", url="https://medinform.jmir.org/2024/1/e50209", url="http://www.ncbi.nlm.nih.gov/pubmed/38896468" } @Article{info:doi/10.2196/52290, author="Doll, Joy and Anzalone, Jerrod A. and Clarke, Martina and Cooper, Kathryn and Polich, Ann and Siedlik, Jacob", title="A Call for a Health Data--Informed Workforce Among Clinicians", journal="JMIR Med Educ", year="2024", month="Jun", day="17", volume="10", pages="e52290", keywords="health data--informed workforce", keywords="health data", keywords="health informaticist", keywords="data literacy", keywords="workforce development", doi="10.2196/52290", url="https://mededu.jmir.org/2024/1/e52290" } @Article{info:doi/10.2196/53354, author="Zheng, Yue and Zhao, Ailin and Yang, Yuqi and Wang, Laduona and Hu, Yifei and Luo, Ren and Wu, Yijun", title="Real-World Survival Comparisons Between Radiotherapy and Surgery for Metachronous Second Primary Lung Cancer and Predictions of Lung Cancer--Specific Outcomes Using Machine Learning: Population-Based Study", journal="JMIR Cancer", year="2024", month="Jun", day="12", volume="10", pages="e53354", keywords="metachronous second primary lung cancer", keywords="radiotherapy", keywords="surgical resection", keywords="propensity score matching analysis", keywords="machine learning", abstract="Background: Metachronous second primary lung cancer (MSPLC) is not that rare but is seldom studied. Objective: We aim to compare real-world survival outcomes between different surgery strategies and radiotherapy for MSPLC. Methods: This retrospective study analyzed data collected from patients with MSPLC between 1988 and 2012 in the Surveillance, Epidemiology, and End Results (SEER) database. Propensity score matching (PSM) analyses and machine learning were performed to compare variables between patients with MSPLC. Survival curves were plotted using the Kaplan-Meier method and were compared using log-rank tests. Results: A total of 2451 MSPLC patients were categorized into the following treatment groups: 864 (35.3\%) received radiotherapy, 759 (31\%) underwent surgery, 89 (3.6\%) had surgery plus radiotherapy, and 739 (30.2\%) had neither treatment. After PSM, 470 pairs each for radiotherapy and surgery were generated. The surgery group had significantly better survival than the radiotherapy group (P<.001) and the untreated group (563 pairs; P<.001). Further analysis revealed that both wedge resection (85 pairs; P=.004) and lobectomy (71 pairs; P=.002) outperformed radiotherapy in overall survival for MSPLC patients. Machine learning models (extreme gradient boosting, random forest classifier, adaptive boosting) demonstrated high predictive performance based on area under the curve (AUC) values. Least absolute shrinkage and selection operator (LASSO) regression analysis identified 9 significant variables impacting cancer-specific survival, emphasizing surgery's consistent influence across 1 year to 10 years. These variables encompassed age at diagnosis, sex, year of diagnosis, radiotherapy of initial primary lung cancer (IPLC), primary site, histology, surgery, chemotherapy, and radiotherapy of MPSLC. Competing risk analysis highlighted lower mortality for female MPSLC patients (hazard ratio [HR]=0.79, 95\% CI 0.71-0.87) and recent IPLC diagnoses (HR=0.79, 95\% CI 0.73-0.85), while radiotherapy for IPLC increased mortality (HR=1.31, 95\% CI 1.16-1.50). Surgery alone had the lowest cancer-specific mortality (HR=0.83, 95\% CI 0.81-0.85), with sublevel resection having the lowest mortality rate among the surgical approaches (HR=0.26, 95\% CI 0.21-0.31). The findings provide valuable insights into the factors that influence cumulative cancer-specific mortality. Conclusions: Surgical resections such as wedge resection and lobectomy confer better survival than radiation therapy for MSPLC, but radiation can be a valid alternative for the treatment of MSPLC. ", doi="10.2196/53354", url="https://cancer.jmir.org/2024/1/e53354", url="http://www.ncbi.nlm.nih.gov/pubmed/38865182" } @Article{info:doi/10.2196/57678, author="Yin, Ziming and Kuang, Zhongling and Zhang, Haopeng and Guo, Yu and Li, Ting and Wu, Zhengkun and Wang, Lihua", title="Explainable AI Method for Tinnitus Diagnosis via Neighbor-Augmented Knowledge Graph and Traditional Chinese Medicine: Development and Validation Study", journal="JMIR Med Inform", year="2024", month="Jun", day="10", volume="12", pages="e57678", keywords="knowledge graph", keywords="syndrome differentiation", keywords="tinnitus", keywords="traditional Chinese medicine", keywords="explainable", keywords="ear", keywords="audiology", keywords="TCM", keywords="algorithm", keywords="diagnosis", keywords="AI", keywords="artificial intelligence", abstract="Background: Tinnitus diagnosis poses a challenge in otolaryngology owing to an extremely complex pathogenesis, lack of effective objectification methods, and factor-affected diagnosis. There is currently a lack of explainable auxiliary diagnostic tools for tinnitus in clinical practice. Objective: This study aims to develop a diagnostic model using an explainable artificial intelligence (AI) method to address the issue of low accuracy in tinnitus diagnosis. Methods: In this study, a knowledge graph--based tinnitus diagnostic method was developed by combining clinical medical knowledge with electronic medical records. Electronic medical record data from 1267 patients were integrated with traditional Chinese clinical medical knowledge to construct a tinnitus knowledge graph. Subsequently, weights were introduced, which measured patient similarity in the knowledge graph based on mutual information values. Finally, a collaborative neighbor algorithm was proposed, which scored patient similarity to obtain the recommended diagnosis. We conducted 2 group experiments and 1 case derivation to explore the effectiveness of our models and compared the models with state-of-the-art graph algorithms and other explainable machine learning models. Results: The experimental results indicate that the method achieved 99.4\% accuracy, 98.5\% sensitivity, 99.6\% specificity, 98.7\% precision, 98.6\% F1-score, and 99\% area under the receiver operating characteristic curve for the inference of 5 tinnitus subtypes among 253 test patients. Additionally, it demonstrated good interpretability. The topological structure of knowledge graphs provides transparency that can explain the reasons for the similarity between patients. Conclusions: This method provides doctors with a reliable and explainable diagnostic tool that is expected to improve tinnitus diagnosis accuracy. ", doi="10.2196/57678", url="https://medinform.jmir.org/2024/1/e57678", url="http://www.ncbi.nlm.nih.gov/pubmed/38857077" } @Article{info:doi/10.2196/56271, author="Leston, Meredith and Ord{\'o}{\~n}ez-Mena, Jos{\'e} and Joy, Mark and de Lusignan, Simon and Hobbs, Richard and McInnes, Iain and Lee, Lennard", title="Defining and Risk-Stratifying Immunosuppression (the DESTINIES Study): Protocol for an Electronic Delphi Study", journal="JMIR Res Protoc", year="2024", month="Jun", day="6", volume="13", pages="e56271", keywords="immunosuppressed", keywords="immunocompromised", keywords="COVID", keywords="vaccines", keywords="COVID-19", keywords="surveillance", keywords="phenotype", keywords="adult", keywords="immunosuppression", keywords="clinical risk", keywords="disease surveillance", keywords="clinical consensus", keywords="eDelphi", keywords="immunosuppressed patient", keywords="immunosuppressed patients", keywords="study design", keywords="Delphi", keywords="methods", keywords="methodology", keywords="statistic", keywords="statistics", keywords="statistical", keywords="consensus", keywords="immune", keywords="immunity", keywords="immunology", keywords="immunological", abstract="Background: Globally, there are marked inconsistencies in how immunosuppression is characterized and subdivided into clinical risk groups. This is detrimental to the precision and comparability of disease surveillance efforts---which has negative implications for the care of those who are immunosuppressed and their health outcomes. This was particularly apparent during the COVID-19 pandemic; despite collective motivation to protect these patients, conflicting clinical definitions created international rifts in how those who were immunosuppressed were monitored and managed during this period. We propose that international clinical consensus be built around the conditions that lead to immunosuppression and their gradations of severity concerning COVID-19. Such information can then be formalized into a digital phenotype to enhance disease surveillance and provide much-needed intelligence on risk-prioritizing these patients. Objective: We aim to demonstrate how electronic Delphi objectives, methodology, and statistical approaches will help address this lack of consensus internationally and deliver a COVID-19 risk-stratified phenotype for ``adult immunosuppression.'' Methods: Leveraging existing evidence for heterogeneous COVID-19 outcomes in adults who are immunosuppressed, this work will recruit over 50 world-leading clinical, research, or policy experts in the area of immunology or clinical risk prioritization. After 2 rounds of clinical consensus building and 1 round of concluding debate, these panelists will confirm the medical conditions that should be classed as immunosuppressed and their differential vulnerability to COVID-19. Consensus statements on the time and dose dependencies of these risks will also be presented. This work will be conducted iteratively, with opportunities for panelists to ask clarifying questions between rounds and provide ongoing feedback to improve questionnaire items. Statistical analysis will focus on levels of agreement between responses. Results: This protocol outlines a robust method for improving consensus on the definition and meaningful subdivision of adult immunosuppression concerning COVID-19. Panelist recruitment took place between April and May of 2024; the target set for over 50 panelists was achieved. The study launched at the end of May and data collection is projected to end in July 2024. Conclusions: This protocol, if fully implemented, will deliver a universally acceptable, clinically relevant, and electronic health record--compatible phenotype for adult immunosuppression. As well as having immediate value for COVID-19 resource prioritization, this exercise and its output hold prospective value for clinical decision-making across all diseases that disproportionately affect those who are immunosuppressed. International Registered Report Identifier (IRRID): PRR1-10.2196/56271 ", doi="10.2196/56271", url="https://www.researchprotocols.org/2024/1/e56271", url="http://www.ncbi.nlm.nih.gov/pubmed/38842925" } @Article{info:doi/10.2196/50344, author="Zawati, H. Ma'n and Lang, Michael", title="Does an App a Day Keep the Doctor Away? AI Symptom Checker Applications, Entrenched Bias, and Professional Responsibility", journal="J Med Internet Res", year="2024", month="Jun", day="5", volume="26", pages="e50344", keywords="artificial intelligence", keywords="applications", keywords="mobile health", keywords="mHealth", keywords="bias", keywords="biases", keywords="professional obligations", keywords="professional obligation", keywords="app", keywords="apps", keywords="application", keywords="symptom checker", keywords="symptom checkers", keywords="diagnose", keywords="diagnosis", keywords="self-diagnose", keywords="self-diagnosis", keywords="ethic", keywords="ethics", keywords="ethical", keywords="regulation", keywords="regulations", keywords="legal", keywords="law", keywords="laws", keywords="safety", keywords="mobile phone", doi="10.2196/50344", url="https://www.jmir.org/2024/1/e50344", url="http://www.ncbi.nlm.nih.gov/pubmed/38838309" } @Article{info:doi/10.2196/53918, author="Li, Ming and Xiong, XiaoMin and Xu, Bo and Dickson, Conan", title="Chinese Oncologists' Perspectives on Integrating AI into Clinical Practice: Cross-Sectional Survey Study", journal="JMIR Form Res", year="2024", month="Jun", day="5", volume="8", pages="e53918", keywords="artificial intelligence", keywords="AI", keywords="machine learning", keywords="oncologist", keywords="concern", keywords="clinical practice", abstract="Background: The rapid development of artificial intelligence (AI) has brought significant interest to its potential applications in oncology. Although AI-powered tools are already being implemented in some Chinese hospitals, their integration into clinical practice raises several concerns for Chinese oncologists. Objective: This study aims to explore the concerns of Chinese oncologists regarding the integration of AI into clinical practice and to identify the factors influencing these concerns. Methods: A total of 228 Chinese oncologists participated in a cross-sectional web-based survey from April to June in 2023 in mainland China. The survey gauged their worries about AI with multiple-choice questions. The survey evaluated their views on the statements of ``The impact of AI on the doctor-patient relationship'' and ``AI will replace doctors.'' The data were analyzed using descriptive statistics, and variate analyses were used to find correlations between the oncologists' backgrounds and their concerns. Results: The study revealed that the most prominent concerns were the potential for AI to mislead diagnosis and treatment (163/228, 71.5\%); an overreliance on AI (162/228, 71\%); data and algorithm bias (123/228, 54\%); issues with data security and patient privacy (123/228, 54\%); and a lag in the adaptation of laws, regulations, and policies in keeping up with AI's development (115/228, 50.4\%). Oncologists with a bachelor's degree expressed heightened concerns related to data and algorithm bias (34/49, 69\%; P=.03) and the lagging nature of legal, regulatory, and policy issues (32/49, 65\%; P=.046). Regarding AI's impact on doctor-patient relationships, 53.1\% (121/228) saw a positive impact, whereas 35.5\% (81/228) found it difficult to judge, 9.2\% (21/228) feared increased disputes, and 2.2\% (5/228) believed that there is no impact. Although sex differences were not significant (P=.08), perceptions varied---male oncologists tended to be more positive than female oncologists (74/135, 54.8\% vs 47/93, 50\%). Oncologists with a bachelor's degree (26/49, 53\%; P=.03) and experienced clinicians (?21 years; 28/56, 50\%; P=.054). found it the hardest to judge. Those with IT experience were significantly more positive (25/35, 71\%) than those without (96/193, 49.7\%; P=.02). Opinions regarding the possibility of AI replacing doctors were diverse, with 23.2\% (53/228) strongly disagreeing, 14\% (32/228) disagreeing, 29.8\% (68/228) being neutral, 16.2\% (37/228) agreeing, and 16.7\% (38/228) strongly agreeing. There were no significant correlations with demographic and professional factors (all P>.05). Conclusions: Addressing oncologists' concerns about AI requires collaborative efforts from policy makers, developers, health care professionals, and legal experts. Emphasizing transparency, human-centered design, bias mitigation, and education about AI's potential and limitations is crucial. Through close collaboration and a multidisciplinary strategy, AI can be effectively integrated into oncology, balancing benefits with ethical considerations and enhancing patient care. ", doi="10.2196/53918", url="https://formative.jmir.org/2024/1/e53918", url="http://www.ncbi.nlm.nih.gov/pubmed/38838307" } @Article{info:doi/10.2196/46737, author="Meng, Fan-Tsui and Jhuang, Jing-Rong and Peng, Yan-Teng and Chiang, Chun-Ju and Yang, Ya-Wen and Huang, Chi-Yen and Huang, Kuo-Ping and Lee, Wen-Chung", title="Predicting Lung Cancer Survival to the Future: Population-Based Cancer Survival Modeling Study", journal="JMIR Public Health Surveill", year="2024", month="May", day="31", volume="10", pages="e46737", keywords="lung cancer", keywords="survival", keywords="survivorship-period-cohort model", keywords="prediction", keywords="prognosis", keywords="early diagnosis", keywords="lung cancer screening", keywords="survival trend", keywords="population-based", keywords="population health", keywords="public health", keywords="surveillance", keywords="low-dose computed tomography", abstract="Background: Lung cancer remains the leading cause of cancer-related mortality globally, with late diagnoses often resulting in poor prognosis. In response, the Lung Ambition Alliance aims to double the 5-year survival rate by 2025. Objective: Using the Taiwan Cancer Registry, this study uses the survivorship-period-cohort model to assess the feasibility of achieving this goal by predicting future survival rates of patients with lung cancer in Taiwan. Methods: This retrospective study analyzed data from 205,104 patients with lung cancer registered between 1997 and 2018. Survival rates were calculated using the survivorship-period-cohort model, focusing on 1-year interval survival rates and extrapolating to predict 5-year outcomes for diagnoses up to 2020, as viewed from 2025. Model validation involved comparing predicted rates with actual data using symmetric mean absolute percentage error. Results: The study identified notable improvements in survival rates beginning in 2004, with the predicted 5-year survival rate for 2020 reaching 38.7\%, marking a considerable increase from the most recent available data of 23.8\% for patients diagnosed in 2013. Subgroup analysis revealed varied survival improvements across different demographics and histological types. Predictions based on current trends indicate that achieving the Lung Ambition Alliance's goal could be within reach. Conclusions: The analysis demonstrates notable improvements in lung cancer survival rates in Taiwan, driven by the adoption of low-dose computed tomography screening, alongside advances in diagnostic technologies and treatment strategies. While the ambitious target set by the Lung Ambition Alliance appears achievable, ongoing advancements in medical technology and health policies will be crucial. The study underscores the potential impact of continued enhancements in lung cancer management and the importance of strategic health interventions to further improve survival outcomes. ", doi="10.2196/46737", url="https://publichealth.jmir.org/2024/1/e46737", url="http://www.ncbi.nlm.nih.gov/pubmed/38819904" } @Article{info:doi/10.2196/51234, author="Claggett, Jennifer and Petter, Stacie and Joshi, Amol and Ponzio, Todd and Kirkendall, Eric", title="An Infrastructure Framework for Remote Patient Monitoring Interventions and Research", journal="J Med Internet Res", year="2024", month="May", day="30", volume="26", pages="e51234", keywords="remote patient monitoring", keywords="eHealth", keywords="telehealth", keywords="telemonitoring", keywords="telemedicine", keywords="digital infrastructure", keywords="clinical decision-making", doi="10.2196/51234", url="https://www.jmir.org/2024/1/e51234", url="http://www.ncbi.nlm.nih.gov/pubmed/38815263" } @Article{info:doi/10.2196/56909, author="Shao, Jian and Pan, Ying and Kou, Wei-Bin and Feng, Huyi and Zhao, Yu and Zhou, Kaixin and Zhong, Shao", title="Generalization of a Deep Learning Model for Continuous Glucose Monitoring--Based Hypoglycemia Prediction: Algorithm Development and Validation Study", journal="JMIR Med Inform", year="2024", month="May", day="24", volume="12", pages="e56909", keywords="hypoglycemia prediction", keywords="hypoglycemia", keywords="hypoglycemic", keywords="blood sugar", keywords="prediction", keywords="predictive", keywords="deep learning", keywords="generalization", keywords="machine learning", keywords="glucose", keywords="diabetes", keywords="continuous glucose monitoring", keywords="type 1 diabetes", keywords="type 2 diabetes", keywords="LSTM", keywords="long short-term memory", abstract="Background: Predicting hypoglycemia while maintaining a low false alarm rate is a challenge for the wide adoption of continuous glucose monitoring (CGM) devices in diabetes management. One small study suggested that a deep learning model based on the long short-term memory (LSTM) network had better performance in hypoglycemia prediction than traditional machine learning algorithms in European patients with type 1 diabetes. However, given that many well-recognized deep learning models perform poorly outside the training setting, it remains unclear whether the LSTM model could be generalized to different populations or patients with other diabetes subtypes. Objective: The aim of this study was to validate LSTM hypoglycemia prediction models in more diverse populations and across a wide spectrum of patients with different subtypes of diabetes. Methods: We assembled two large data sets of patients with type 1 and type 2 diabetes. The primary data set including CGM data from 192 Chinese patients with diabetes was used to develop the LSTM, support vector machine (SVM), and random forest (RF) models for hypoglycemia prediction with a prediction horizon of 30 minutes. Hypoglycemia was categorized into mild (glucose=54-70 mg/dL) and severe (glucose<54 mg/dL) levels. The validation data set of 427 patients of European-American ancestry in the United States was used to validate the models and examine their generalizations. The predictive performance of the models was evaluated according to the sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Results: For the difficult-to-predict mild hypoglycemia events, the LSTM model consistently achieved AUC values greater than 97\% in the primary data set, with a less than 3\% AUC reduction in the validation data set, indicating that the model was robust and generalizable across populations. AUC values above 93\% were also achieved when the LSTM model was applied to both type 1 and type 2 diabetes in the validation data set, further strengthening the generalizability of the model. Under different satisfactory levels of sensitivity for mild and severe hypoglycemia prediction, the LSTM model achieved higher specificity than the SVM and RF models, thereby reducing false alarms. Conclusions: Our results demonstrate that the LSTM model is robust for hypoglycemia prediction and is generalizable across populations or diabetes subtypes. Given its additional advantage of false-alarm reduction, the LSTM model is a strong candidate to be widely implemented in future CGM devices for hypoglycemia prediction. ", doi="10.2196/56909", url="https://medinform.jmir.org/2024/1/e56909" } @Article{info:doi/10.2196/57001, author="Wang, Anan and Wu, Yunong and Ji, Xiaojian and Wang, Xiangyang and Hu, Jiawen and Zhang, Fazhan and Zhang, Zhanchao and Pu, Dong and Tang, Lulu and Ma, Shikui and Liu, Qiang and Dong, Jing and He, Kunlun and Li, Kunpeng and Teng, Da and Li, Tao", title="Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment", journal="JMIR Res Protoc", year="2024", month="May", day="24", volume="13", pages="e57001", keywords="spondyloarthritis", keywords="benchmark", keywords="large language model", keywords="artificial intelligence", keywords="AI", keywords="AI chatbot", keywords="AI-assistant diagnosis", abstract="Background: Spondyloarthritis (SpA), a chronic inflammatory disorder, predominantly impacts the sacroiliac joints and spine, significantly escalating the risk of disability. SpA's complexity, as evidenced by its diverse clinical presentations and symptoms that often mimic other diseases, presents substantial challenges in its accurate diagnosis and differentiation. This complexity becomes even more pronounced in nonspecialist health care environments due to limited resources, resulting in delayed referrals, increased misdiagnosis rates, and exacerbated disability outcomes for patients with SpA. The emergence of large language models (LLMs) in medical diagnostics introduces a revolutionary potential to overcome these diagnostic hurdles. Despite recent advancements in artificial intelligence and LLMs demonstrating effectiveness in diagnosing and treating various diseases, their application in SpA remains underdeveloped. Currently, there is a notable absence of SpA-specific LLMs and an established benchmark for assessing the performance of such models in this particular field. Objective: Our objective is to develop a foundational medical model, creating a comprehensive evaluation benchmark tailored to the essential medical knowledge of SpA and its unique diagnostic and treatment protocols. The model, post-pretraining, will be subject to further enhancement through supervised fine-tuning. It is projected to significantly aid physicians in SpA diagnosis and treatment, especially in settings with limited access to specialized care. Furthermore, this initiative is poised to promote early and accurate SpA detection at the primary care level, thereby diminishing the risks associated with delayed or incorrect diagnoses. Methods: A rigorous benchmark, comprising 222 meticulously formulated multiple-choice questions on SpA, will be established and developed. These questions will be extensively revised to ensure their suitability for accurately evaluating LLMs' performance in real-world diagnostic and therapeutic scenarios. Our methodology involves selecting and refining top foundational models using public data sets. The best-performing model in our benchmark will undergo further training. Subsequently, more than 80,000 real-world inpatient and outpatient cases from hospitals will enhance LLM training, incorporating techniques such as supervised fine-tuning and low-rank adaptation. We will rigorously assess the models' generated responses for accuracy and evaluate their reasoning processes using the metrics of fluency, relevance, completeness, and medical proficiency. Results: Development of the model is progressing, with significant enhancements anticipated by early 2024. The benchmark, along with the results of evaluations, is expected to be released in the second quarter of 2024. Conclusions: Our trained model aims to capitalize on the capabilities of LLMs in analyzing complex clinical data, thereby enabling precise detection, diagnosis, and treatment of SpA. This innovation is anticipated to play a vital role in diminishing the disabilities arising from delayed or incorrect SpA diagnoses. By promoting this model across diverse health care settings, we anticipate a significant improvement in SpA management, culminating in enhanced patient outcomes and a reduced overall burden of the disease. International Registered Report Identifier (IRRID): DERR1-10.2196/57001 ", doi="10.2196/57001", url="https://www.researchprotocols.org/2024/1/e57001", url="http://www.ncbi.nlm.nih.gov/pubmed/38788208" } @Article{info:doi/10.2196/52185, author="Lin, Chien-Chung and Shen, Jian-Hong and Chen, Shu-Fang and Chen, Hung-Ming and Huang, Hung-Meng", title="Developing a Cost-Effective Surgical Scheduling System Applying Lean Thinking and Toyota's Methods for Surgery-Related Big Data for Improved Data Use in Hospitals: User-Centered Design Approach", journal="JMIR Form Res", year="2024", month="May", day="24", volume="8", pages="e52185", keywords="algorithm", keywords="process", keywords="computational thinking", keywords="continuous improvement", keywords="customer needs", keywords="lean principles", keywords="problem solving", keywords="Toyota Production System", keywords="value stream map", keywords="need", keywords="needs", keywords="operating room", abstract="Background: Surgical scheduling is pivotal in managing daily surgical sequences, impacting patient experience and hospital resources significantly. With operating rooms costing approximately US \$36 per minute, efficient scheduling is vital. However, global practices in surgical scheduling vary, largely due to challenges in predicting individual surgeon times for diverse patient conditions. Inspired by the Toyota Production System's efficiency in addressing similar logistical challenges, we applied its principles as detailed in the book ``Lean Thinking'' by Womack and Jones, which identifies processes that do not meet customer needs as wasteful. This insight is critical in health care, where waste can compromise patient safety and medical quality. Objective: This study aims to use lean thinking and Toyota methods to develop a more efficient surgical scheduling system that better aligns with user needs without additional financial burdens. Methods: We implemented the 5 principles of the Toyota system: specifying value, identifying the value stream, enabling flow, establishing pull, and pursuing perfection. Value was defined in terms of meeting the customer's needs, which in this context involved developing a responsive and efficient scheduling system. Our approach included 2 subsystems: one handling presurgery patient data and another for intraoperative and postoperative data. We identified inefficiencies in the presurgery data subsystem and responded by creating a comprehensive value stream map of the surgical process. We developed 2 Excel (Microsoft Corporation) macros using Visual Basic for Applications. The first calculated average surgery times from intra- or postoperative historic data, while the second estimated surgery durations and generated concise, visually engaging scheduling reports from presurgery data. We assessed the effectiveness of the new system by comparing task completion times and user satisfaction between the old and new systems. Results: The implementation of the revised scheduling system significantly reduced the overall scheduling time from 301 seconds to 261 seconds (P=.02), with significant time reductions in the revised process from 99 seconds to 62 seconds (P<.001). Despite these improvements, approximately 21\% of nurses preferred the older system for its familiarity. The new system protects patient data privacy and streamlines schedule dissemination through a secure LINE group (LY Corp), ensuring seamless flow. The design of the system allows for real-time updates and has been effectively monitoring surgical durations daily for over 3 years. The ``pull'' principle was demonstrated when an unplanned software issue prompted immediate, user-led troubleshooting, enhancing system reliability. Continuous improvement efforts are ongoing, except for the preoperative patient confirmation step, which requires further enhancement to ensure optimal patient safety. Conclusions: Lean principles and Toyota's methods, combined with computer programming, can revitalize surgical scheduling processes. They offer effective solutions for surgical scheduling challenges and enable the creation of a novel surgical scheduling system without incurring additional costs. ", doi="10.2196/52185", url="https://formative.jmir.org/2024/1/e52185", url="http://www.ncbi.nlm.nih.gov/pubmed/38787610" } @Article{info:doi/10.2196/54283, author="Takagi, Soshi and Koda, Masahide and Watari, Takashi", title="The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam", journal="JMIR Med Educ", year="2024", month="May", day="23", volume="10", pages="e54283", keywords="ChatGPT", keywords="medical licensing examination", keywords="generative artificial intelligence", keywords="medical education", keywords="large language model", keywords="images", keywords="tables", keywords="artificial intelligence", keywords="AI", keywords="Japanese", keywords="reliability", keywords="medical application", keywords="medical applications", keywords="diagnostic", keywords="diagnostics", keywords="online data", keywords="web-based data", doi="10.2196/54283", url="https://mededu.jmir.org/2024/1/e54283" } @Article{info:doi/10.2196/54996, author="Stevens, R. Elizabeth and Xu, Lynn and Kwon, JaeEun and Tasneem, Sumaiya and Henning, Natalie and Feldthouse, Dawn and Kim, Ji Eun and Hess, Rachel and Dauber-Decker, L. Katherine and Smith, D. Paul and Halm, Wendy and Gautam-Goyal, Pranisha and Feldstein, A. David and Mann, M. Devin", title="Barriers to Implementing Registered Nurse--Driven Clinical Decision Support for Antibiotic Stewardship: Retrospective Case Study", journal="JMIR Form Res", year="2024", month="May", day="23", volume="8", pages="e54996", keywords="integrated clinical prediction rules", keywords="EHR", keywords="electronic health record", keywords="implementation", keywords="barriers", keywords="acute respiratory infections", keywords="antibiotics", keywords="CDS", keywords="clinical decision support", keywords="decision support", keywords="antibiotic", keywords="prescribe", keywords="prescription", keywords="acute respiratory infection", keywords="barrier", keywords="effectiveness", keywords="registered nurse", keywords="RN", keywords="RN-driven intervention", keywords="personnel availability", keywords="workflow variability", keywords="infrastructure", keywords="infrastructures", keywords="law", keywords="laws", keywords="policy", keywords="policies", keywords="clinical-care setting", keywords="clinical setting", keywords="electronic health records", keywords="RN-driven", keywords="antibiotic stewardship", keywords="retrospective analysis", keywords="Consolidated Framework for Implementation Research", keywords="CFIR", keywords="CDS-based intervention", keywords="urgent care", keywords="New York", keywords="chart review", keywords="interview", keywords="interviews", keywords="staff change", keywords="staff changes", keywords="RN shortage", keywords="RN shortages", keywords="turnover", keywords="health system", keywords="nurse", keywords="nurses", keywords="researcher", keywords="researchers", abstract="Background: Up to 50\% of antibiotic prescriptions for upper respiratory infections (URIs) are inappropriate. Clinical decision support (CDS) systems to mitigate unnecessary antibiotic prescriptions have been implemented into electronic health records, but their use by providers has been limited. Objective: As a delegation protocol, we adapted a validated electronic health record--integrated clinical prediction rule (iCPR) CDS-based intervention for registered nurses (RNs), consisting of triage to identify patients with low-acuity URI followed by CDS-guided RN visits. It was implemented in February 2022 as a randomized controlled stepped-wedge trial in 43 primary and urgent care practices within 4 academic health systems in New York, Wisconsin, and Utah. While issues were pragmatically addressed as they arose, a systematic assessment of the barriers to implementation is needed to better understand and address these barriers. Methods: We performed a retrospective case study, collecting quantitative and qualitative data regarding clinical workflows and triage-template use from expert interviews, study surveys, routine check-ins with practice personnel, and chart reviews over the first year of implementation of the iCPR intervention. Guided by the updated CFIR (Consolidated Framework for Implementation Research), we characterized the initial barriers to implementing a URI iCPR intervention for RNs in ambulatory care. CFIR constructs were coded as missing, neutral, weak, or strong implementation factors. Results: Barriers were identified within all implementation domains. The strongest barriers were found in the outer setting, with those factors trickling down to impact the inner setting. Local conditions driven by COVID-19 served as one of the strongest barriers, impacting attitudes among practice staff and ultimately contributing to a work infrastructure characterized by staff changes, RN shortages and turnover, and competing responsibilities. Policies and laws regarding scope of practice of RNs varied by state and institutional application of those laws, with some allowing more clinical autonomy for RNs. This necessitated different study procedures at each study site to meet practice requirements, increasing innovation complexity. Similarly, institutional policies led to varying levels of compatibility with existing triage, rooming, and documentation workflows. These workflow conflicts were compounded by limited available resources, as well as an implementation climate of optional participation, few participation incentives, and thus low relative priority compared to other clinical duties. Conclusions: Both between and within health care systems, significant variability existed in workflows for patient intake and triage. Even in a relatively straightforward clinical workflow, workflow and cultural differences appreciably impacted intervention adoption. Takeaways from this study can be applied to other RN delegation protocol implementations of new and innovative CDS tools within existing workflows to support integration and improve uptake. When implementing a system-wide clinical care intervention, considerations must be made for variability in culture and workflows at the state, health system, practice, and individual levels. Trial Registration: ClinicalTrials.gov NCT04255303; https://clinicaltrials.gov/ct2/show/NCT04255303 ", doi="10.2196/54996", url="https://formative.jmir.org/2024/1/e54996", url="http://www.ncbi.nlm.nih.gov/pubmed/38781006" } @Article{info:doi/10.2196/56607, author="Flores, Ericka Joan and Trambas, Christina and Jovanovic, Natasha and Thompson, J. Alexander and Howell, Jessica", title="Impact of an Automated Population-Level Cirrhosis Screening Program Using Common Pathology Tests on Rates of Cirrhosis Diagnosis and Linkage to Specialist Care (CAPRISE): Protocol for a Pilot Prospective Single-Arm Intervention Study", journal="JMIR Res Protoc", year="2024", month="May", day="22", volume="13", pages="e56607", keywords="noninvasive tests", keywords="cirrhosis", keywords="population health", keywords="screening", keywords="liver cirrhosis", keywords="liver", keywords="compensated", keywords="risk factor", keywords="pathology", keywords="population based", keywords="liver screening", keywords="prevalence", keywords="hepatocellular carcinoma", keywords="transient elastography", keywords="FibroScan", abstract="Background: People with compensated cirrhosis receive the greatest benefit from risk factor modification and prevention programs to reduce liver decompensation and improve early liver cancer detection. Blood-based liver fibrosis algorithms such as the Aspartate Transaminase--to-Platelet Ratio Index (APRI) and Fibrosis-4 (FIB-4) index are calculated using routinely ordered blood tests and are effective screening tests to exclude cirrhosis in people with chronic liver disease, triaging the need for further investigations to confirm cirrhosis and linkage to specialist care. Objective: This pilot study aims to evaluate the impact of a population screening program for liver cirrhosis (CAPRISE [Cirrhosis Automated APRI and FIB-4 Screening Evaluation]), which uses automated APRI and FIB-4 calculation and reporting on routinely ordered blood tests, on monthly rates of referral for transient elastography, cirrhosis diagnosis, and linkage to specialist care. Methods: We have partnered with a large pathology service in Victoria, Australia, to pilot a population-level liver cirrhosis screening package, which comprises (1) automated calculation and reporting of APRI and FIB-4 on routinely ordered blood tests; (2) provision of brief information about liver cirrhosis; and (3) a web link for transient elastography referral. APRI and FIB-4 will be prospectively calculated on all community-ordered pathology results in adults attending a single pathology service. This single-center, prospective, single-arm, pre-post study will compare the monthly rates of transient elastography (FibroScan) referral, liver cirrhosis diagnosis, and the proportion linked to specialist care in the 6 months after intervention to the 6 months prior to the intervention. Results: As of January 2024, in the preintervention phase of this study, a total of 120,972 tests were performed by the laboratory. Of these tests, 78,947 (65.3\%) tests were excluded, with the remaining 42,025 (34.7\%) tests on 37,872 individuals meeting inclusion criteria with APRI and FIB-4 being able to be calculated. Of these 42,025 tests, 1.3\% (n=531) had elevated APRI>1 occurring in 446 individuals, and 2.3\% (n=985) had elevated FIB-4>2.67 occurring in 816 individuals. Linking these data with FibroScan referral and appointment attendance is ongoing and will continue during the intervention phase, which is expected to commence on February 1, 2024. Conclusions: We will determine the feasibility and effectiveness of automated APRI and FIB-4 reporting on the monthly rate of transient elastography referrals, liver cirrhosis diagnosis, and linkage to specialist care. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12623000295640; https://tinyurl.com/58dv9ypp International Registered Report Identifier (IRRID): DERR1-10.2196/56607 ", doi="10.2196/56607", url="https://www.researchprotocols.org/2024/1/e56607", url="http://www.ncbi.nlm.nih.gov/pubmed/38776541" } @Article{info:doi/10.2196/51952, author="Ray, Jessica and Finn, Benjamin Emily and Tyrrell, Hollyce and Aloe, F. Carlin and Perrin, M. Eliana and Wood, T. Charles and Miner, S. Dean and Grout, Randall and Michel, J. Jeremy and Damschroder, J. Laura and Sharifi, Mona", title="User-Centered Framework for Implementation of Technology (UFIT): Development of an Integrated Framework for Designing Clinical Decision Support Tools Packaged With Tailored Implementation Strategies", journal="J Med Internet Res", year="2024", month="May", day="21", volume="26", pages="e51952", keywords="user-centered design", keywords="implementation science", keywords="clinical decision support", keywords="human factors", keywords="implementation", keywords="decision support", keywords="develop", keywords="development", keywords="framework", keywords="frameworks", keywords="design", keywords="user-centered", keywords="digital health", keywords="health technology", keywords="health technologies", keywords="need", keywords="needs", keywords="tailor", keywords="tailoring", keywords="guidance", keywords="guideline", keywords="guidelines", keywords="pediatric", keywords="pediatrics", keywords="child", keywords="children", keywords="obese", keywords="obesity", keywords="weight", keywords="overweight", keywords="primary care", abstract="Background: Electronic health record--based clinical decision support (CDS) tools can facilitate the adoption of evidence into practice. Yet, the impact of CDS beyond single-site implementation is often limited by dissemination and implementation barriers related to site- and user-specific variation in workflows and behaviors. The translation of evidence-based CDS from initial development to implementation in heterogeneous environments requires a framework that assures careful balancing of fidelity to core functional elements with adaptations to ensure compatibility with new contexts. Objective: This study aims to develop and apply a framework to guide tailoring and implementing CDS across diverse clinical settings. Methods: In preparation for a multisite trial implementing CDS for pediatric overweight or obesity in primary care, we developed the User-Centered Framework for Implementation of Technology (UFIT), a framework that integrates principles from user-centered design (UCD), human factors/ergonomics theories, and implementation science to guide both CDS adaptation and tailoring of related implementation strategies. Our transdisciplinary study team conducted semistructured interviews with pediatric primary care clinicians and a diverse group of stakeholders from 3 health systems in the northeastern, midwestern, and southeastern United States to inform and apply the framework for our formative evaluation. Results: We conducted 41 qualitative interviews with primary care clinicians (n=21) and other stakeholders (n=20). Our workflow analysis found 3 primary ways in which clinicians interact with the electronic health record during primary care well-child visits identifying opportunities for decision support. Additionally, we identified differences in practice patterns across contexts necessitating a multiprong design approach to support a variety of workflows, user needs, preferences, and implementation strategies. Conclusions: UFIT integrates theories and guidance from UCD, human factors/ergonomics, and implementation science to promote fit with local contexts for optimal outcomes. The components of UFIT were used to guide the development of Improving Pediatric Obesity Practice Using Prompts, an integrated package comprising CDS for obesity or overweight treatment with tailored implementation strategies. Trial Registration: ClinicalTrials.gov NCT05627011; https://clinicaltrials.gov/study/NCT05627011 ", doi="10.2196/51952", url="https://www.jmir.org/2024/1/e51952", url="http://www.ncbi.nlm.nih.gov/pubmed/38771622" } @Article{info:doi/10.2196/50117, author="Tabashum, Thasina and Snyder, Cooper Robert and O'Brien, K. Megan and Albert, V. Mark", title="Machine Learning Models for Parkinson Disease: Systematic Review", journal="JMIR Med Inform", year="2024", month="May", day="17", volume="12", pages="e50117", keywords="Parkinson disease", keywords="machine learning", keywords="systematic review", keywords="deep learning", keywords="clinical adoption", keywords="validation techniques", keywords="PRISMA", keywords="Preferred Reporting Items for Systematic Reviews and Meta-Analyses", abstract="Background: With the increasing availability of data, computing resources, and easier-to-use software libraries, machine learning (ML) is increasingly used in disease detection and prediction, including for Parkinson disease (PD). Despite the large number of studies published every year, very few ML systems have been adopted for real-world use. In particular, a lack of external validity may result in poor performance of these systems in clinical practice. Additional methodological issues in ML design and reporting can also hinder clinical adoption, even for applications that would benefit from such data-driven systems. Objective: To sample the current ML practices in PD applications, we conducted a systematic review of studies published in 2020 and 2021 that used ML models to diagnose PD or track PD progression. Methods: We conducted a systematic literature review in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines in PubMed between January 2020 and April 2021, using the following exact string: ``Parkinson's'' AND (``ML'' OR ``prediction'' OR ``classification'' OR ``detection'' or ``artificial intelligence'' OR ``AI''). The search resulted in 1085 publications. After a search query and review, we found 113 publications that used ML for the classification or regression-based prediction of PD or PD-related symptoms. Results: Only 65.5\% (74/113) of studies used a holdout test set to avoid potentially inflated accuracies, and approximately half (25/46, 54\%) of the studies without a holdout test set did not state this as a potential concern. Surprisingly, 38.9\% (44/113) of studies did not report on how or if models were tuned, and an additional 27.4\% (31/113) used ad hoc model tuning, which is generally frowned upon in ML model optimization. Only 15\% (17/113) of studies performed direct comparisons of results with other models, severely limiting the interpretation of results. Conclusions: This review highlights the notable limitations of current ML systems and techniques that may contribute to a gap between reported performance in research and the real-life applicability of ML models aiming to detect and predict diseases such as PD. ", doi="10.2196/50117", url="https://medinform.jmir.org/2024/1/e50117" } @Article{info:doi/10.2196/53985, author="Harada, Yukinori and Sakamoto, Tetsu and Sugimoto, Shu and Shimizu, Taro", title="Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study", journal="JMIR Form Res", year="2024", month="May", day="17", volume="8", pages="e53985", keywords="atypical presentations", keywords="diagnostic accuracy", keywords="diagnosis", keywords="diagnostics", keywords="symptom checker", keywords="uncommon diseases", keywords="symptom checkers", keywords="uncommon", keywords="rare", keywords="artificial intelligence", abstract="Background: Artificial intelligence (AI) symptom checker models should be trained using real-world patient data to improve their diagnostic accuracy. Given that AI-based symptom checkers are currently used in clinical practice, their performance should improve over time. However, longitudinal evaluations of the diagnostic accuracy of these symptom checkers are limited. Objective: This study aimed to assess the longitudinal changes in the accuracy of differential diagnosis lists created by an AI-based symptom checker used in the real world. Methods: This was a single-center, retrospective, observational study. Patients who visited an outpatient clinic without an appointment between May 1, 2019, and April 30, 2022, and who were admitted to a community hospital in Japan within 30 days of their index visit were considered eligible. We only included patients who underwent an AI-based symptom checkup at the index visit, and the diagnosis was finally confirmed during follow-up. Final diagnoses were categorized as common or uncommon, and all cases were categorized as typical or atypical. The primary outcome measure was the accuracy of the differential diagnosis list created by the AI-based symptom checker, defined as the final diagnosis in a list of 10 differential diagnoses created by the symptom checker. To assess the change in the symptom checker's diagnostic accuracy over 3 years, we used a chi-square test to compare the primary outcome over 3 periods: from May 1, 2019, to April 30, 2020 (first year); from May 1, 2020, to April 30, 2021 (second year); and from May 1, 2021, to April 30, 2022 (third year). Results: A total of 381 patients were included. Common diseases comprised 257 (67.5\%) cases, and typical presentations were observed in 298 (78.2\%) cases. Overall, the accuracy of the differential diagnosis list created by the AI-based symptom checker was 172 (45.1\%), which did not differ across the 3 years (first year: 97/219, 44.3\%; second year: 32/72, 44.4\%; and third year: 43/90, 47.7\%; P=.85). The accuracy of the differential diagnosis list created by the symptom checker was low in those with uncommon diseases (30/124, 24.2\%) and atypical presentations (12/83, 14.5\%). In the multivariate logistic regression model, common disease (P<.001; odds ratio 4.13, 95\% CI 2.50-6.98) and typical presentation (P<.001; odds ratio 6.92, 95\% CI 3.62-14.2) were significantly associated with the accuracy of the differential diagnosis list created by the symptom checker. Conclusions: A 3-year longitudinal survey of the diagnostic accuracy of differential diagnosis lists developed by an AI-based symptom checker, which has been implemented in real-world clinical practice settings, showed no improvement over time. Uncommon diseases and atypical presentations were independently associated with a lower diagnostic accuracy. In the future, symptom checkers should be trained to recognize uncommon conditions. ", doi="10.2196/53985", url="https://formative.jmir.org/2024/1/e53985", url="http://www.ncbi.nlm.nih.gov/pubmed/38758588" } @Article{info:doi/10.2196/51187, author="Gwon, Nam Yong and Kim, Heon Jae and Chung, Soo Hyun and Jung, Jee Eun and Chun, Joey and Lee, Serin and Shim, Ryul Sung", title="The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation", journal="JMIR Med Inform", year="2024", month="May", day="14", volume="12", pages="e51187", keywords="artificial intelligence", keywords="search engine", keywords="systematic review", keywords="evidence-based medicine", keywords="ChatGPT", keywords="language model", keywords="education", keywords="tool", keywords="clinical decision support system", keywords="decision support", keywords="support", keywords="treatment", abstract="Background: A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development. Objective: This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings. Methods: The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer. Results: From ChatGPT, 7 (0.5\%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40\%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies. Conclusions: This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user's point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly. ", doi="10.2196/51187", url="https://medinform.jmir.org/2024/1/e51187" } @Article{info:doi/10.2196/57026, author="Zhang, Jinbo and Yang, Pingping and Zeng, Lu and Li, Shan and Zhou, Jiamei", title="Ventilator-Associated Pneumonia Prediction Models Based on AI: Scoping Review", journal="JMIR Med Inform", year="2024", month="May", day="14", volume="12", pages="e57026", keywords="artificial intelligence", keywords="machine learning", keywords="ventilator-associated pneumonia", keywords="prediction", keywords="scoping", keywords="PRISMA", keywords="Preferred Reporting Items for Systematic Reviews and Meta-Analyses", abstract="Background: Ventilator-associated pneumonia (VAP) is a serious complication of mechanical ventilation therapy that affects patients' treatments and prognoses. Owing to its excellent data mining capabilities, artificial intelligence (AI) has been increasingly used to predict VAP. Objective: This paper reviews VAP prediction models that are based on AI, providing a reference for the early identification of high-risk groups in future clinical practice. Methods: A scoping review was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. The Wanfang database, the Chinese Biomedical Literature Database, Cochrane Library, Web of Science, PubMed, MEDLINE, and Embase were searched to identify relevant articles. Study selection and data extraction were independently conducted by 2 reviewers. The data extracted from the included studies were synthesized narratively. Results: Of the 137 publications retrieved, 11 were included in this scoping review. The included studies reported the use of AI for predicting VAP. All 11 studies predicted VAP occurrence, and studies on VAP prognosis were excluded. Further, these studies used text data, and none of them involved imaging data. Public databases were the primary sources of data for model building (studies: 6/11, 55\%), and 5 studies had sample sizes of <1000. Machine learning was the primary algorithm for studying the VAP prediction models. However, deep learning and large language models were not used to construct VAP prediction models. The random forest model was the most commonly used model (studies: 5/11, 45\%). All studies only performed internal validations, and none of them addressed how to implement and apply the final model in real-life clinical settings. Conclusions: This review presents an overview of studies that used AI to predict and diagnose VAP. AI models have better predictive performance than traditional methods and are expected to provide indispensable tools for VAP risk prediction in the future. However, the current research is in the model construction and validation stage, and the implementation of and guidance for clinical VAP prediction require further research. ", doi="10.2196/57026", url="https://medinform.jmir.org/2024/1/e57026" } @Article{info:doi/10.2196/45593, author="Bienzeisler, Jonas and Becker, Guido and Erdmann, Bernadett and Kombeiz, Alexander and Majeed, W. Raphael and R{\"o}hrig, Rainer and Greiner, Felix and Otto, Ronny and Otto-Sobotka, Fabian and ", title="The Effects of Displaying the Time Targets of the Manchester Triage System to Emergency Department Personnel: Prospective Crossover Study", journal="J Med Internet Res", year="2024", month="May", day="14", volume="26", pages="e45593", keywords="EHR", keywords="emergency medicine", keywords="AKTIN, process management", keywords="crowding", keywords="triage system", keywords="electronic health record", keywords="health care", keywords="treatment", keywords="emergency department", abstract="Background: The use of triage systems such as the Manchester Triage System (MTS) is a standard procedure to determine the sequence of treatment in emergency departments (EDs). When using the MTS, time targets for treatment are determined. These are commonly displayed in the ED information system (EDIS) to ED staff. Using measurements as targets has been associated with a decline in meeting those targets. Objective: This study investigated the impact of displaying time targets for treatment to physicians on processing times in the ED. Methods: We analyzed the effects of displaying time targets to ED staff on waiting times in a prospective crossover study, during the introduction of a new EDIS in a large regional hospital in Germany. The old information system version used a module that showed the time target determined by the MTS, while the new system version used a priority list instead. Evaluation was based on 35,167 routinely collected electronic health records from the preintervention period and 10,655 records from the postintervention period. Electronic health records were extracted from the EDIS, and data were analyzed using descriptive statistics and generalized additive models. We evaluated the effects of the intervention on waiting times and the odds of achieving timely treatment according to the time targets set by the MTS. Results: The average ED length of stay and waiting times increased when the EDIS that did not display time targets was used (average time from admission to treatment: preintervention phase=median 15, IQR 6-39 min; postintervention phase=median 11, IQR 5-23 min). However, severe cases with high acuity (as indicated by the triage score) benefited from lower waiting times (0.15 times as high as in the preintervention period for MTS1, only 0.49 as high for MTS2). Furthermore, these patients were less likely to receive delayed treatment, and we observed reduced odds of late treatment when crowding occurred. Conclusions: Our results suggest that it is beneficial to use a priority list instead of displaying time targets to ED personnel. These time targets may lead to false incentives. Our work highlights that working better is not the same as working faster. ", doi="10.2196/45593", url="https://www.jmir.org/2024/1/e45593", url="http://www.ncbi.nlm.nih.gov/pubmed/38743464" } @Article{info:doi/10.2196/52399, author="Denecke, Kerstin and May, Richard and and Rivera Romero, Octavio", title="Potential of Large Language Models in Health Care: Delphi Study", journal="J Med Internet Res", year="2024", month="May", day="13", volume="26", pages="e52399", keywords="large language models", keywords="LLMs", keywords="health care", keywords="Delphi study", keywords="natural language processing", keywords="NLP", keywords="artificial intelligence", keywords="language model", keywords="Delphi", keywords="future", keywords="innovation", keywords="interview", keywords="interviews", keywords="informatics", keywords="experience", keywords="experiences", keywords="attitude", keywords="attitudes", keywords="opinion", keywords="perception", keywords="perceptions", keywords="perspective", keywords="perspectives", keywords="implementation", abstract="Background: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. Objective: The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. Methods: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. Results: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93\% in round 1 and 20/21, 95\% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. Conclusions: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice. ", doi="10.2196/52399", url="https://www.jmir.org/2024/1/e52399", url="http://www.ncbi.nlm.nih.gov/pubmed/38739445" } @Article{info:doi/10.2196/54042, author="Seeger, Nico and Gutknecht, Stefan and Zschokke, Irin and Fleischmann, Isabella and Roth, Nadja and Metzger, J{\"u}rg and Weber, Markus and Breitenstein, Stefan and Grochola, Filip Lukasz", title="A Predictive Noninvasive Single-Nucleotide Variation--Based Biomarker Signature for Resectable Pancreatic Cancer: Protocol for a Prospective Validation Study", journal="JMIR Res Protoc", year="2024", month="May", day="13", volume="13", pages="e54042", keywords="single-nucleotide polymorphism", keywords="SNP", keywords="single-nucleotide variation", keywords="SNV", keywords="pancreatic ductal adenocarcinoma", keywords="PDAC", keywords="noninvasive biomarker", keywords="survival", keywords="resection", keywords="prospective validation", abstract="Background: Single-nucleotide variations (SNVs; formerly SNPs) are inherited genetic variants that can be easily determined in routine clinical practice using a simple blood or saliva test. SNVs have potential to serve as noninvasive biomarkers for predicting cancer-specific patient outcomes after resection of pancreatic ductal adenocarcinoma (PDAC). Two recent analyses led to the identification and validation of three SNVs in the CD44 and CHI3L2 genes (rs187115, rs353630, and rs684559), which can be used as predictive biomarkers to help select patients most likely to benefit from pancreatic resection. These variants were associated with an over 2-fold increased risk for tumor-related death in three independent PDAC study cohorts from Europe and the United States, including The Cancer Genome Atlas cohorts (reaching a P value of 1{\texttimes}10--8). However, these analyses were limited by the inherent biases of a retrospective study design, such as selection and publication biases, thereby limiting the clinical use of these promising biomarkers in guiding PDAC therapy. Objective: To overcome the limitations of previous retrospectively designed studies and translate the findings into clinical practice, we aim to validate the association of the identified SNVs with survival in a controlled setting using a prospective cohort of patients with PDAC following pancreatic resection. Methods: All patients with PDAC who will undergo pancreatic resection at three participating hospitals in Switzerland and fulfill the inclusion criteria will be included in the study consecutively. The SNV genotypes will be determined using standard genotyping techniques from patient blood samples. For each genotyped locus, log-rank and Cox multivariate regression tests will be performed, accounting for the relevant covariates American Joint Committee on Cancer stage and resection status. Clinical follow-up data will be collected for at least 3 years. Sample size calculation resulted in a required sample of 150 patients to sufficiently power the analysis. Results: The follow-up data collection started in August 2019 and the estimated end of data collection will be in May 2027. The study is still recruiting participants and 142 patients have been recruited as of November 2023. The DNA extraction and genotyping of the SNVs will be performed after inclusion of the last patient. Since no SNV genotypes have been determined, no data analysis has been performed to date. The results are expected to be published in 2027. Conclusions: This is the first prospective study of the CD44 and CHI3L2 SNV--based biomarker signature in PDAC. A prospective validation of this signature would enable its clinical use as a noninvasive predictive biomarker of survival after pancreatic resection that is readily available at the time of diagnosis and can assist in guiding PDAC therapy. The results of this study may help to individualize treatment decisions and potentially improve patient outcomes. International Registered Report Identifier (IRRID): DERR1-10.2196/54042 ", doi="10.2196/54042", url="https://www.researchprotocols.org/2024/1/e54042", url="http://www.ncbi.nlm.nih.gov/pubmed/38635586" } @Article{info:doi/10.2196/53787, author="Preiksaitis, Carl and Ashenburg, Nicholas and Bunney, Gabrielle and Chu, Andrew and Kabeer, Rana and Riley, Fran and Ribeira, Ryan and Rose, Christian", title="The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review", journal="JMIR Med Inform", year="2024", month="May", day="10", volume="12", pages="e53787", keywords="large language model", keywords="LLM", keywords="emergency medicine", keywords="clinical decision support", keywords="workflow efficiency", keywords="medical education", keywords="artificial intelligence", keywords="AI", keywords="natural language processing", keywords="NLP", keywords="AI literacy", keywords="ChatGPT", keywords="Bard", keywords="Pathways Language Model", keywords="Med-PaLM", keywords="Bidirectional Encoder Representations from Transformers", keywords="BERT", keywords="generative pretrained transformer", keywords="GPT", keywords="United States", keywords="US", keywords="China", keywords="scoping review", keywords="Preferred Reporting Items for Systematic Reviews and Meta-Analyses", keywords="PRISMA", keywords="decision support", keywords="risk", keywords="ethics", keywords="education", keywords="communication", keywords="medical training", keywords="physician", keywords="health literacy", keywords="emergency care", abstract="Background: Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM. Objective: Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs' potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field. Methods: Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs' use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data. Results: A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs' outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs' capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills. Conclusions: LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians' AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied. ", doi="10.2196/53787", url="https://medinform.jmir.org/2024/1/e53787", url="http://www.ncbi.nlm.nih.gov/pubmed/38728687" } @Article{info:doi/10.2196/49848, author="Xie, Puguang and Wang, Hao and Xiao, Jun and Xu, Fan and Liu, Jingyang and Chen, Zihang and Zhao, Weijie and Hou, Siyu and Wu, Dongdong and Ma, Yu and Xiao, Jingjing", title="Development and Validation of an Explainable Deep Learning Model to Predict In-Hospital Mortality for Patients With Acute Myocardial Infarction: Algorithm Development and Validation Study", journal="J Med Internet Res", year="2024", month="May", day="10", volume="26", pages="e49848", keywords="acute myocardial infarction", keywords="mortality", keywords="deep learning", keywords="explainable model", keywords="prediction", abstract="Background: Acute myocardial infarction (AMI) is one of the most severe cardiovascular diseases and is associated with a high risk of in-hospital mortality. However, the current deep learning models for in-hospital mortality prediction lack interpretability. Objective: This study aims to establish an explainable deep learning model to provide individualized in-hospital mortality prediction and risk factor assessment for patients with AMI. Methods: In this retrospective multicenter study, we used data for consecutive patients hospitalized with AMI from the Chongqing University Central Hospital between July 2016 and December 2022 and the Electronic Intensive Care Unit Collaborative Research Database. These patients were randomly divided into training (7668/10,955, 70\%) and internal test (3287/10,955, 30\%) data sets. In addition, data of patients with AMI from the Medical Information Mart for Intensive Care database were used for external validation. Deep learning models were used to predict in-hospital mortality in patients with AMI, and they were compared with linear and tree-based models. The Shapley Additive Explanations method was used to explain the model with the highest area under the receiver operating characteristic curve in both the internal test and external validation data sets to quantify and visualize the features that drive predictions. Results: A total of 10,955 patients with AMI who were admitted to Chongqing University Central Hospital or included in the Electronic Intensive Care Unit Collaborative Research Database were randomly divided into a training data set of 7668 (70\%) patients and an internal test data set of 3287 (30\%) patients. A total of 9355 patients from the Medical Information Mart for Intensive Care database were included for independent external validation. In-hospital mortality occurred in 8.74\% (670/7668), 8.73\% (287/3287), and 9.12\% (853/9355) of the patients in the training, internal test, and external validation cohorts, respectively. The Self-Attention and Intersample Attention Transformer model performed best in both the internal test data set and the external validation data set among the 9 prediction models, with the highest area under the receiver operating characteristic curve of 0.86 (95\% CI 0.84-0.88) and 0.85 (95\% CI 0.84-0.87), respectively. Older age, high heart rate, and low body temperature were the 3 most important predictors of increased mortality, according to the explanations of the Self-Attention and Intersample Attention Transformer model. Conclusions: The explainable deep learning model that we developed could provide estimates of mortality and visual contribution of the features to the prediction for a patient with AMI. The explanations suggested that older age, unstable vital signs, and metabolic disorders may increase the risk of mortality in patients with AMI. ", doi="10.2196/49848", url="https://www.jmir.org/2024/1/e49848", url="http://www.ncbi.nlm.nih.gov/pubmed/38728685" } @Article{info:doi/10.2196/44805, author="Granviken, Fredrik and Vasseljen, Ottar and Bach, Kerstin and Jaiswal, Amar and Meisingset, Ingebrigt", title="Decision Support for Managing Common Musculoskeletal Pain Disorders: Development of a Case-Based Reasoning Application", journal="JMIR Form Res", year="2024", month="May", day="10", volume="8", pages="e44805", keywords="case-based reasoning", keywords="musculoskeletal pain", keywords="physiotherapy", keywords="decision support", keywords="primary care", keywords="artificial intelligence", abstract="Background: Common interventions for musculoskeletal pain disorders either lack evidence to support their use or have small to modest or short-term effects. Given the heterogeneity of patients with musculoskeletal pain disorders, treatment guidelines and systematic reviews have limited transferability to clinical practice. A problem-solving method in artificial intelligence, case-based reasoning (CBR), where new problems are solved based on experiences from past similar problems, might offer guidance in such situations. Objective: This study aims to use CBR to build a decision support system for patients with musculoskeletal pain disorders seeking physiotherapy care. This study describes the development of the CBR system SupportPrim PT and demonstrates its ability to identify similar patients. Methods: Data from physiotherapy patients in primary care in Norway were collected to build a case base for SupportPrim PT. We used the local-global principle in CBR to identify similar patients. The global similarity measures are attributes used to identify similar patients and consisted of prognostic attributes. They were weighted in terms of prognostic importance and choice of treatment, where the weighting represents the relevance of the different attributes. For the local similarity measures, the degree of similarity within each attribute was based on minimal clinically important differences and expert knowledge. The SupportPrim PT's ability to identify similar patients was assessed by comparing the similarity scores of all patients in the case base with the scores on an established screening tool (the short form {\"O}rebro Musculoskeletal Pain Screening Questionnaire [{\"O}MSPQ]) and an outcome measure (the Musculoskeletal Health Questionnaire [MSK-HQ]) used in musculoskeletal pain. We also assessed the same in a more extensive case base. Results: The original case base contained 105 patients with musculoskeletal pain (mean age 46, SD 15 years; 77/105, 73.3\% women). The SupportPrim PT consisted of 29 weighted attributes with local similarities. When comparing the similarity scores for all patients in the case base, one at a time, with the {\"O}MSPQ and MSK-HQ, the most similar patients had a mean absolute difference from the query patient of 9.3 (95\% CI 8.0-10.6) points on the {\"O}MSPQ and a mean absolute difference of 5.6 (95\% CI 4.6-6.6) points on the MSK-HQ. For both {\"O}MSPQ and MSK-HQ, the absolute score difference increased as the rank of most similar patients decreased. Patients retrieved from a more extensive case base (N=486) had a higher mean similarity score and were slightly more similar to the query patients in {\"O}MSPQ and MSK-HQ compared with the original smaller case base. Conclusions: This study describes the development of a CBR system, SupportPrim PT, for musculoskeletal pain in primary care. The SupportPrim PT identified similar patients according to an established screening tool and an outcome measure for patients with musculoskeletal pain. ", doi="10.2196/44805", url="https://formative.jmir.org/2024/1/e44805", url="http://www.ncbi.nlm.nih.gov/pubmed/38728686" } @Article{info:doi/10.2196/51274, author="Senior, Rashaud and Tsai, Timothy and Ratliff, William and Nadler, Lisa and Balu, Suresh and Malcolm, Elizabeth and McPeek Hinz, Eugenia", title="Evaluation of SNOMED CT Grouper Accuracy and Coverage in Organizing the Electronic Health Record Problem List by Clinical System: Observational Study", journal="JMIR Med Inform", year="2024", month="May", day="9", volume="12", pages="e51274", keywords="electronic health record", keywords="problem List", keywords="problem list organization", keywords="problem list management", keywords="SNOMED CT", keywords="SNOMED CT Groupers", keywords="Systematized Nomenclature of Medicine", keywords="clinical term", keywords="ICD-10", keywords="International Classification of Diseases", abstract="Background: The problem list (PL) is a repository of diagnoses for patients' medical conditions and health-related issues. Unfortunately, over time, our PLs have become overloaded with duplications, conflicting entries, and no-longer-valid diagnoses. The lack of a standardized structure for review adds to the challenges of clinical use. Previously, our default electronic health record (EHR) organized the PL primarily via alphabetization, with other options available, for example, organization by clinical systems or priority settings. The system's PL was built with limited groupers, resulting in many diagnoses that were inconsistent with the expected clinical systems or not associated with any clinical systems at all. As a consequence of these limited EHR configuration options, our PL organization has poorly supported clinical use over time, particularly as the number of diagnoses on the PL has increased. Objective: We aimed to measure the accuracy of sorting PL diagnoses into PL system groupers based on Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) concept groupers implemented in our EHR. Methods: We transformed and developed 21 system- or condition-based groupers, using 1211 SNOMED CT hierarchal concepts refined with Boolean logic, to reorganize the PL in our EHR. To evaluate the clinical utility of our new groupers, we extracted all diagnoses on the PLs from a convenience sample of 50 patients with 3 or more encounters in the previous year. To provide a spectrum of clinical diagnoses, we included patients from all ages and divided them by sex in a deidentified format. Two physicians independently determined whether each diagnosis was correctly attributed to the expected clinical system grouper. Discrepancies were discussed, and if no consensus was reached, they were adjudicated by a third physician. Descriptive statistics and Cohen $\kappa$ statistics for interrater reliability were calculated. Results: Our 50-patient sample had a total of 869 diagnoses (range 4-59; median 12, IQR 9-24). The reviewers initially agreed on 821 system attributions. Of the remaining 48 items, 16 required adjudication with the tie-breaking third physician. The calculated $\kappa$ statistic was 0.7. The PL groupers appropriately associated diagnoses to the expected clinical system with a sensitivity of 97.6\%, a specificity of 58.7\%, a positive predictive value of 96.8\%, and an F1-score of 0.972. Conclusions: We found that PL organization by clinical specialty or condition using SNOMED CT concept groupers accurately reflects clinical systems. Our system groupers were subsequently adopted by our vendor EHR in their foundation system for PL organization. ", doi="10.2196/51274", url="https://medinform.jmir.org/2024/1/e51274" } @Article{info:doi/10.2196/52700, author="Hacking, Sean", title="ChatGPT and Medicine: Together We Embrace the AI Renaissance", journal="JMIR Bioinform Biotech", year="2024", month="May", day="7", volume="5", pages="e52700", keywords="ChatGPT", keywords="generative AI", keywords="NLP", keywords="medicine", keywords="bioinformatics", keywords="AI democratization", keywords="AI renaissance", keywords="artificial intelligence", keywords="natural language processing", doi="10.2196/52700", url="https://bioinform.jmir.org/2024/1/e52700", url="http://www.ncbi.nlm.nih.gov/pubmed/38935938" } @Article{info:doi/10.2196/56884, author="Bui, Thu Huong Thi and Nguy?n Th? Ph??ng, Qu?nh and Cam Tu, Ho and Nguyen Phuong, Sinh and Pham, Thi Thuy and Vu, Thu and Nguyen Thi Thu, Huyen and Khanh Ho, Lam and Nguyen Tien, Dung", title="The Roles of NOTCH3 p.R544C and Thrombophilia Genes in Vietnamese Patients With Ischemic Stroke: Study Involving a Hierarchical Cluster Analysis", journal="JMIR Bioinform Biotech", year="2024", month="May", day="7", volume="5", pages="e56884", keywords="Glasgow Coma Scale", keywords="ischemic stroke", keywords="hierarchical cluster analysis", keywords="clustering", keywords="machine learning", keywords="MTHFR", keywords="NOTCH3", keywords="modified Rankin scale", keywords="National Institutes of Health Stroke Scale", keywords="prothrombin", keywords="thrombophilia", keywords="mutations", keywords="genetics", keywords="genomics", keywords="ischemia", keywords="risk", keywords="risk analysis", abstract="Background: The etiology of ischemic stroke is multifactorial. Several gene mutations have been identified as leading causes of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), a hereditary disease that causes stroke and other neurological symptoms. Objective: We aimed to identify the variants of NOTCH3 and thrombophilia genes, and their complex interactions with other factors. Methods: We conducted a hierarchical cluster analysis (HCA) on the data of 100 patients diagnosed with ischemic stroke. The variants of NOTCH3 and thrombophilia genes were identified by polymerase chain reaction with confronting 2-pair primers and real-time polymerase chain reaction. The overall preclinical characteristics, cumulative cutpoint values, and factors associated with these somatic mutations were analyzed in unidimensional and multidimensional scaling models. Results: We identified the following optimal cutpoints: creatinine, 83.67 (SD 9.19) {\textmu}mol/L; age, 54 (SD 5) years; prothrombin (PT) time, 13.25 (SD 0.17) seconds; and international normalized ratio (INR), 1.02 (SD 0.03). Using the Nagelkerke method, cutpoint 50\% values of the Glasgow Coma Scale score; modified Rankin scale score; and National Institutes of Health Stroke Scale scores at admission, after 24 hours, and at discharge were 12.77, 2.86 (SD 1.21), 9.83 (SD 2.85), 7.29 (SD 2.04), and 6.85 (SD 2.90), respectively. Conclusions: The variants of MTHFR (C677T and A1298C) and NOTCH3 p.R544C may influence the stroke severity under specific conditions of PT, creatinine, INR, and BMI, with risk ratios of 4.8 (95\% CI 1.53-15.04) and 3.13 (95\% CI 1.60-6.11), respectively (Pfisher<.05). It is interesting that although there are many genes linked to increased atrial fibrillation risk, not all of them are associated with ischemic stroke risk. With the detection of stroke risk loci, more information can be gained on their impacts and interconnections, especially in young patients. ", doi="10.2196/56884", url="https://bioinform.jmir.org/2024/1/e56884", url="http://www.ncbi.nlm.nih.gov/pubmed/38935968" } @Article{info:doi/10.2196/54363, author="Gao, Zhenyue and Liu, Xiaoli and Kang, Yu and Hu, Pan and Zhang, Xiu and Yan, Wei and Yan, Muyang and Yu, Pengming and Zhang, Qing and Xiao, Wendong and Zhang, Zhengbo", title="Improving the Prognostic Evaluation Precision of Hospital Outcomes for Heart Failure Using Admission Notes and Clinical Tabular Data: Multimodal Deep Learning Model", journal="J Med Internet Res", year="2024", month="May", day="2", volume="26", pages="e54363", keywords="heart failure", keywords="multimodal deep learning", keywords="mortality prediction", keywords="admission notes", keywords="clinical tabular data", keywords="tabular", keywords="notes", keywords="deep learning", keywords="machine learning", keywords="cardiology", keywords="heart", keywords="cardiac", keywords="documentation", keywords="prognostic", keywords="prognosis", keywords="prognoses", keywords="predict", keywords="prediction", keywords="predictions", keywords="predictive", abstract="Background: Clinical notes contain contextualized information beyond structured data related to patients' past and current health status. Objective: This study aimed to design a multimodal deep learning approach to improve the evaluation precision of hospital outcomes for heart failure (HF) using admission clinical notes and easily collected tabular data. Methods: Data for the development and validation of the multimodal model were retrospectively derived from 3 open-access US databases, including the Medical Information Mart for Intensive Care III v1.4 (MIMIC-III) and MIMIC-IV v1.0, collected from a teaching hospital from 2001 to 2019, and the eICU Collaborative Research Database v1.2, collected from 208 hospitals from 2014 to 2015. The study cohorts consisted of all patients with critical HF. The clinical notes, including chief complaint, history of present illness, physical examination, medical history, and admission medication, as well as clinical variables recorded in electronic health records, were analyzed. We developed a deep learning mortality prediction model for in-hospital patients, which underwent complete internal, prospective, and external evaluation. The Integrated Gradients and SHapley Additive exPlanations (SHAP) methods were used to analyze the importance of risk factors. Results: The study included 9989 (16.4\%) patients in the development set, 2497 (14.1\%) patients in the internal validation set, 1896 (18.3\%) in the prospective validation set, and 7432 (15\%) patients in the external validation set. The area under the receiver operating characteristic curve of the models was 0.838 (95\% CI 0.827-0.851), 0.849 (95\% CI 0.841-0.856), and 0.767 (95\% CI 0.762-0.772), for the internal, prospective, and external validation sets, respectively. The area under the receiver operating characteristic curve of the multimodal model outperformed that of the unimodal models in all test sets, and tabular data contributed to higher discrimination. The medical history and physical examination were more useful than other factors in early assessments. Conclusions: The multimodal deep learning model for combining admission notes and clinical tabular data showed promising efficacy as a potentially novel method in evaluating the risk of mortality in patients with HF, providing more accurate and timely decision support. ", doi="10.2196/54363", url="https://www.jmir.org/2024/1/e54363", url="http://www.ncbi.nlm.nih.gov/pubmed/38696251" } @Article{info:doi/10.2196/49910, author="de Beijer, E. Ismay A. and van den Oever, R. Selina and Charalambous, Eliana and Cangioli, Giorgio and Balaguer, Julia and Bardi, Edit and Alfes, Marie and Ca{\~n}ete Nieto, Adela and Correcher, Marisa and Pinto da Costa, Tiago and Degelsegger-M{\'a}rquez, Alexander and D{\"u}ster, Vanessa and Filbert, Anna-Liesa and Grabow, Desiree and Gredinger, Gerald and Gsell, Hannah and Haupt, Riccardo and van Helvoirt, Maria and Ladenstein, Ruth and Langer, Thorsten and Laschkolnig, Anja and Muraca, Monica and Pluijm, F. Saskia M. and Rascon, Jelena and Schreier, G{\"u}nter and Tom{\'a}{\vs}ikova, Zuzana and Trauner, Florian and Trink?nas, Justas and Trunner, Kathrin and Uyttebroeck, Anne and Kremer, M. Leontien C. and van der Pal, H. Helena J. and Chronaki, Catherine and ", title="IT-Related Barriers and Facilitators to the Implementation of a New European eHealth Solution, the Digital Survivorship Passport (SurPass Version 2.0): Semistructured Digital Survey", journal="J Med Internet Res", year="2024", month="May", day="2", volume="26", pages="e49910", keywords="pediatric oncology", keywords="long-term follow up care", keywords="survivorship", keywords="cancer survivors", keywords="Survivorship Passport", keywords="SurPass, eHealth", keywords="information and technology", abstract="Background: To overcome knowledge gaps and optimize long-term follow-up (LTFU) care for childhood cancer survivors, the concept of the Survivorship Passport (SurPass) has been invented. Within the European PanCareSurPass project, the semiautomated and interoperable SurPass (version 2.0) will be optimized, implemented, and evaluated at 6 LTFU care centers representing 6 European countries and 3 distinct health system scenarios: (1) national electronic health information systems (EHISs) in Austria and Lithuania, (2) regional or local EHISs in Italy and Spain, and (3) cancer registries or hospital-based EHISs in Belgium and Germany. Objective: We aimed to identify and describe barriers and facilitators for SurPass (version 2.0) implementation concerning semiautomation of data input, interoperability, data protection, privacy, and cybersecurity. Methods: IT specialists from the 6 LTFU care centers participated in a semistructured digital survey focusing on IT-related barriers and facilitators to SurPass (version 2.0) implementation. We used the fit-viability model to assess the compatibility and feasibility of integrating SurPass into existing EHISs. Results: In total, 13/20 (65\%) invited IT specialists participated. The main barriers and facilitators in all 3 health system scenarios related to semiautomated data input and interoperability included unaligned EHIS infrastructure and the use of interoperability frameworks and international coding systems. The main barriers and facilitators related to data protection or privacy and cybersecurity included pseudonymization of personal health data and data retention. According to the fit-viability model, the first health system scenario provides the best fit for SurPass implementation, followed by the second and third scenarios. Conclusions: This study provides essential insights into the information and IT-related influencing factors that need to be considered when implementing the SurPass (version 2.0) in clinical practice. We recommend the adoption of Health Level Seven Fast Healthcare Interoperability Resources and data security measures such as encryption, pseudonymization, and multifactor authentication to protect personal health data where applicable. In sum, this study offers practical insights into integrating digital health solutions into existing EHISs. ", doi="10.2196/49910", url="https://www.jmir.org/2024/1/e49910", url="http://www.ncbi.nlm.nih.gov/pubmed/38696248" } @Article{info:doi/10.2196/51354, author="Li, Mingxia and Han, Shuzhe and Liang, Fang and Hu, Chenghuan and Zhang, Buyao and Hou, Qinlan and Zhao, Shuangping", title="Machine Learning for Predicting Risk and Prognosis of Acute Kidney Disease in Critically Ill Elderly Patients During Hospitalization: Internet-Based and Interpretable Model Study", journal="J Med Internet Res", year="2024", month="May", day="1", volume="26", pages="e51354", keywords="acute kidney disease", keywords="AKD", keywords="machine learning", keywords="critically ill patients", keywords="elderly patients", keywords="Shapley additive explanation", keywords="SHAP", abstract="Background: Acute kidney disease (AKD) affects more than half of critically ill elderly patients with acute kidney injury (AKI), which leads to worse short-term outcomes. Objective: We aimed to establish 2 machine learning models to predict the risk and prognosis of AKD in the elderly and to deploy the models as online apps. Methods: Data on elderly patients with AKI (n=3542) and AKD (n=2661) from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used to develop 2 models for predicting the AKD risk and in-hospital mortality, respectively. Data collected from Xiangya Hospital of Central South University were for external validation. A bootstrap method was used for internal validation to obtain relatively stable results. We extracted the indicators within 24 hours of the first diagnosis of AKI and the fluctuation range of some indicators, namely delta (day 3 after AKI minus day 1), as features. Six machine learning algorithms were used for modeling; the area under the receiver operating characteristic curve (AUROC), decision curve analysis, and calibration curve for evaluating; Shapley additive explanation (SHAP) analysis for visually interpreting; and the Heroku platform for deploying the best-performing models as web-based apps. Results: For the model of predicting the risk of AKD in elderly patients with AKI during hospitalization, the Light Gradient Boosting Machine (LightGBM) showed the best overall performance in the training (AUROC=0.844, 95\% CI 0.831-0.857), internal validation (AUROC=0.853, 95\% CI 0.841-0.865), and external (AUROC=0.755, 95\% CI 0.699--0.811) cohorts. In addition, LightGBM performed well for the AKD prognostic prediction in the training (AUROC=0.861, 95\% CI 0.843-0.878), internal validation (AUROC=0.868, 95\% CI 0.851-0.885), and external (AUROC=0.746, 95\% CI 0.673-0.820) cohorts. The models deployed as online prediction apps allowed users to predict and provide feedback to submit new data for model iteration. In the importance ranking and correlation visualization of the model's top 10 influencing factors conducted based on the SHAP value, partial dependence plots revealed the optimal cutoff of some interventionable indicators. The top 5 factors predicting the risk of AKD were creatinine on day 3, sepsis, delta blood urea nitrogen (BUN), diastolic blood pressure (DBP), and heart rate, while the top 5 factors determining in-hospital mortality were age, BUN on day 1, vasopressor use, BUN on day 3, and partial pressure of carbon dioxide (PaCO2). Conclusions: We developed and validated 2 online apps for predicting the risk of AKD and its prognostic mortality in elderly patients, respectively. The top 10 factors that influenced the AKD risk and mortality during hospitalization were identified and explained visually, which might provide useful applications for intelligent management and suggestions for future prospective research. ", doi="10.2196/51354", url="https://www.jmir.org/2024/1/e51354", url="http://www.ncbi.nlm.nih.gov/pubmed/38691403" } @Article{info:doi/10.2196/54948, author="Busch, Felix and Han, Tianyu and Makowski, R. Marcus and Truhn, Daniel and Bressem, K. Keno and Adams, Lisa", title="Integrating Text and Image Analysis: Exploring GPT-4V's Capabilities in Advanced Radiological Applications Across Subspecialties", journal="J Med Internet Res", year="2024", month="May", day="1", volume="26", pages="e54948", keywords="GPT-4", keywords="ChatGPT", keywords="Generative Pre-Trained Transformer", keywords="multimodal large language models", keywords="artificial intelligence", keywords="AI applications in medicine", keywords="diagnostic radiology", keywords="clinical decision support systems", keywords="generative AI", keywords="medical image analysis", doi="10.2196/54948", url="https://www.jmir.org/2024/1/e54948", url="http://www.ncbi.nlm.nih.gov/pubmed/38691404" } @Article{info:doi/10.2196/51092, author="Pinho, Xavier and Meijer, Willemijn and de Graaf, Albert", title="Deriving Treatment Decision Support From Dutch Electronic Health Records by Exploring the Applicability of a Precision Cohort--Based Procedure for Patients With Type 2 Diabetes Mellitus: Precision Cohort Study", journal="Online J Public Health Inform", year="2024", month="May", day="1", volume="16", pages="e51092", keywords="personalized care", keywords="electronic health records", keywords="EHRs", keywords="machine learning", keywords="type 2 diabetes mellitus", keywords="T2DM", keywords="decision-making", abstract="Background: The rapidly increasing availability of medical data in electronic health records (EHRs) may contribute to the concept of learning health systems, allowing for better personalized care. Type 2 diabetes mellitus was chosen as the use case in this study. Objective: This study aims to explore the applicability of a recently developed patient similarity--based analytics approach based on EHRs as a candidate data analytical decision support tool. Methods: A previously published precision cohort analytics workflow was adapted for the Dutch primary care setting using EHR data from the Nivel Primary Care Database. The workflow consisted of extracting patient data from the Nivel Primary Care Database to retrospectively generate decision points for treatment change, training a similarity model, generating a precision cohort of the most similar patients, and analyzing treatment options. This analysis showed the treatment options that led to a better outcome for the precision cohort in terms of clinical readouts for glycemic control. Results: Data from 11,490 registered patients diagnosed with type 2 diabetes mellitus were extracted from the database. Treatment-specific filter cohorts of patient groups were generated, and the effect of past treatment choices in these cohorts was assessed separately for glycated hemoglobin and fasting glucose as clinical outcome variables. Precision cohorts were generated for several individual patients from the filter cohorts. Treatment options and outcome analyses were technically well feasible but in general had a lack of statistical power to demonstrate statistical significance for treatment options with better outcomes. Conclusions: The precision cohort analytics workflow was successfully adapted for the Dutch primary care setting, proving its potential for use as a learning health system component. Although the approach proved technically well feasible, data size limitations need to be overcome before application for clinical decision support becomes realistically possible. ", doi="10.2196/51092", url="https://ojphi.jmir.org/2024/1/e51092", url="http://www.ncbi.nlm.nih.gov/pubmed/38691393" } @Article{info:doi/10.2196/54026, author="Tang, Haiyang and Tian, Yijia and Fang, Jing and Yuan, Xiaoying and Yao, Minli and Wang, Yujia and Feng, Yan and Shu, Jia and Ni, Yan and Yu, Ying and Wang, Yuanhe and Liang, Ping and Li, Xingmin and Bai, Xiaoxia", title="Detection of Urinary Misfolded Proteins for Imminent Prediction of Preeclampsia in Pregnant Women With Suspected Cases: Protocol for a Prospective Noninterventional Study", journal="JMIR Res Protoc", year="2024", month="Apr", day="26", volume="13", pages="e54026", keywords="preeclampsia", keywords="misfolded protein", keywords="congophilia", keywords="noninvasive", keywords="prospective", abstract="Background: Preeclampsia (PE) is one of the most common hypertensive diseases, affecting 2\%-8\% of all pregnancies. The high maternal and fetal mortality rates of PE are due to a lack of early identification of affected pregnant women that would have led to closer monitoring and care. Recent data suggest that misfolded proteins might be a promising biomarker for PE prediction, which can be detected in urine samples of pregnant women according to their congophilia (aggregated) characteristic. Objective: The main purpose of this trial is to evaluate the value of the urine congophilia-based detection of misfolded proteins for the imminent prediction of PE in women presenting with suspected PE. The secondary objectives are to demonstrate that the presence of urine misfolded proteins correlates with PE-related maternal or neonatal adverse outcomes, and to establish an accurate PE prediction model by combining misfolded proteins with multiple indicators. Methods: At least 300 pregnant women with clinical suspicion of PE will be enrolled in this prospective cohort study. Participants should meet the following inclusion criteria in addition to a suspicion of PE: ?18 years old, gestational week between 20+0 and 33+6, and single pregnancy. Consecutive urine samples will be collected, blinded, and tested for misfolded proteins and other PE-related biomarkers at enrollment and at 4 follow-up visits. Clinical assessments of PE status and related complications for all participants will be performed at regular intervals using strict diagnostic criteria. Investigators and participants will remain blinded to the results. Follow-up will be performed until 42 days postpartum. Data from medical records, including maternal and fetal outcomes, will be collected. The performance of urine misfolded proteins alone and combined with other biomarkers or clinical variables for the prediction of PE will be statistically analyzed. Results: Enrollment started in July 2023 and was still open upon manuscript submission. As of March 2024, a total of 251 eligible women have been enrolled in the study and enrollment is expected to continue until August 2024. Results analysis is scheduled to start after all participants reach the follow-up endpoint and complete clinical data are collected. Conclusions: Upon completion of the study, we expect to derive an accurate PE prediction model, which will allow for proactive management of pregnant women with clinical suspicion of PE and possibly reduce the associated adverse pregnancy outcomes. The additional prognostic value of misfolded proteins is also expected to be confirmed. Trial Registration: Chinese Clinical Trials Registry ChiCTR2300074878; https://www.chictr.org.cn/showproj.html?proj=202096 International Registered Report Identifier (IRRID): PRR1-10.2196/54026 ", doi="10.2196/54026", url="https://www.researchprotocols.org/2024/1/e54026", url="http://www.ncbi.nlm.nih.gov/pubmed/38669061" } @Article{info:doi/10.2196/56764, author="Choudhury, Avishek and Chaudhry, Zaira", title="Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals", journal="J Med Internet Res", year="2024", month="Apr", day="25", volume="26", pages="e56764", keywords="trust", keywords="ChatGPT", keywords="human factors", keywords="healthcare", keywords="LLMs", keywords="large language models", keywords="LLM user trust", keywords="AI accountability", keywords="artificial intelligence", keywords="AI technology", keywords="technologies", keywords="effectiveness", keywords="policy", keywords="medical student", keywords="medical students", keywords="risk factor", keywords="quality of care", keywords="healthcare professional", keywords="healthcare professionals", keywords="human element", doi="10.2196/56764", url="https://www.jmir.org/2024/1/e56764", url="http://www.ncbi.nlm.nih.gov/pubmed/38662419" } @Article{info:doi/10.2196/54388, author="Isangula, Ganka Kahabi and Haule, John Rogers", title="Leveraging AI and Machine Learning to Develop and Evaluate a Contextualized User-Friendly Cough Audio Classifier for Detecting Respiratory Diseases: Protocol for a Diagnostic Study in Rural Tanzania", journal="JMIR Res Protoc", year="2024", month="Apr", day="23", volume="13", pages="e54388", keywords="artificial intelligence", keywords="machine learning", keywords="respiratory diseases", keywords="cough classifiers", keywords="Tanzania", keywords="Africa", keywords="mobile phone", keywords="user-friendly", keywords="cough", keywords="detecting respiratory disease", keywords="diagnostic study", keywords="tuberculosis", keywords="asthma", keywords="chronic obstructive pulmonary disease", keywords="treatment", keywords="management", keywords="noninvasive", keywords="rural", keywords="cross-sectional research", keywords="analysis", keywords="cough sound", abstract="Background: Respiratory diseases, including active tuberculosis (TB), asthma, and chronic obstructive pulmonary disease (COPD), constitute substantial global health challenges, necessitating timely and accurate diagnosis for effective treatment and management. Objective: This research seeks to develop and evaluate a noninvasive user-friendly artificial intelligence (AI)--powered cough audio classifier for detecting these respiratory conditions in rural Tanzania. Methods: This is a nonexperimental cross-sectional research with the primary objective of collection and analysis of cough sounds from patients with active TB, asthma, and COPD in outpatient clinics to generate and evaluate a noninvasive cough audio classifier. Specialized cough sound recording devices, designed to be nonintrusive and user-friendly, will facilitate the collection of diverse cough sound samples from patients attending outpatient clinics in 20 health care facilities in the Shinyanga region. The collected cough sound data will undergo rigorous analysis, using advanced AI signal processing and machine learning techniques. By comparing acoustic features and patterns associated with TB, asthma, and COPD, a robust algorithm capable of automated disease discrimination will be generated facilitating the development of a smartphone-based cough sound classifier. The classifier will be evaluated against the calculated reference standards including clinical assessments, sputum smear, GeneXpert, chest x-ray, culture and sensitivity, spirometry and peak expiratory flow, and sensitivity and predictive values. Results: This research represents a vital step toward enhancing the diagnostic capabilities available in outpatient clinics, with the potential to revolutionize the field of respiratory disease diagnosis. Findings from the 4 phases of the study will be presented as descriptions supported by relevant images, tables, and figures. The anticipated outcome of this research is the creation of a reliable, noninvasive diagnostic cough classifier that empowers health care professionals and patients themselves to identify and differentiate these respiratory diseases based on cough sound patterns. Conclusions: Cough sound classifiers use advanced technology for early detection and management of respiratory conditions, offering a less invasive and more efficient alternative to traditional diagnostics. This technology promises to ease public health burdens, improve patient outcomes, and enhance health care access in under-resourced areas, potentially transforming respiratory disease management globally. International Registered Report Identifier (IRRID): PRR1-10.2196/54388 ", doi="10.2196/54388", url="https://www.researchprotocols.org/2024/1/e54388", url="http://www.ncbi.nlm.nih.gov/pubmed/38652526" } @Article{info:doi/10.2196/53091, author="Shara, Nawar and Mirabal-Beltran, Roxanne and Talmadge, Bethany and Falah, Noor and Ahmad, Maryam and Dempers, Ramon and Crovatt, Samantha and Eisenberg, Steven and Anderson, Kelley", title="Use of Machine Learning for Early Detection of Maternal Cardiovascular Conditions: Retrospective Study Using Electronic Health Record Data", journal="JMIR Cardio", year="2024", month="Apr", day="22", volume="8", pages="e53091", keywords="machine learning", keywords="preeclampsia", keywords="cardiovascular", keywords="maternal", keywords="obstetrics", keywords="health disparities", keywords="woman", keywords="women", keywords="pregnancy", keywords="pregnant", keywords="cardiovascular condition", keywords="retrospective study", keywords="electronic health record", keywords="EHR", keywords="technology", keywords="decision-making", keywords="health disparity", keywords="virtual server", keywords="thromboembolism", keywords="kidney failure", keywords="HOPE-CAT", abstract="Background: Cardiovascular conditions (eg, cardiac and coronary conditions, hypertensive disorders of pregnancy, and cardiomyopathies) were the leading cause of maternal mortality between 2017 and 2019. The United States has the highest maternal mortality rate of any high-income nation, disproportionately impacting those who identify as non-Hispanic Black or Hispanic. Novel clinical approaches to the detection and diagnosis of cardiovascular conditions are therefore imperative. Emerging research is demonstrating that machine learning (ML) is a promising tool for detecting patients at increased risk for hypertensive disorders during pregnancy. However, additional studies are required to determine how integrating ML and big data, such as electronic health records (EHRs), can improve the identification of obstetric patients at higher risk of cardiovascular conditions. Objective: This study aimed to evaluate the capability and timing of a proprietary ML algorithm, Healthy Outcomes for all Pregnancy Experiences-Cardiovascular-Risk Assessment Technology (HOPE-CAT), to detect maternal-related cardiovascular conditions and outcomes. Methods: Retrospective data from the EHRs of a large health care system were investigated by HOPE-CAT in a virtual server environment. Deidentification of EHR data and standardization enabled HOPE-CAT to analyze data without pre-existing biases. The ML algorithm assessed risk factors selected by clinical experts in cardio-obstetrics, and the algorithm was iteratively trained using relevant literature and current standards of risk identification. After refinement of the algorithm's learned risk factors, risk profiles were generated for every patient including a designation of standard versus high risk. The profiles were individually paired with clinical outcomes pertaining to cardiovascular pregnancy conditions and complications, wherein a delta was calculated between the date of the risk profile and the actual diagnosis or intervention in the EHR. Results: In total, 604 pregnancies resulting in birth had records or diagnoses that could be compared against the risk profile; the majority of patients identified as Black (n=482, 79.8\%) and aged between 21 and 34 years (n=509, 84.4\%). Preeclampsia (n=547, 90.6\%) was the most common condition, followed by thromboembolism (n=16, 2.7\%) and acute kidney disease or failure (n=13, 2.2\%). The average delta was 56.8 (SD 69.7) days between the identification of risk factors by HOPE-CAT and the first date of diagnosis or intervention of a related condition reported in the EHR. HOPE-CAT showed the strongest performance in early risk detection of myocardial infarction at a delta of 65.7 (SD 81.4) days. Conclusions: This study provides additional evidence to support ML in obstetrical patients to enhance the early detection of cardiovascular conditions during pregnancy. ML can synthesize multiday patient presentations to enhance provider decision-making and potentially reduce maternal health disparities. ", doi="10.2196/53091", url="https://cardio.jmir.org/2024/1/e53091", url="http://www.ncbi.nlm.nih.gov/pubmed/38648629" } @Article{info:doi/10.2196/54419, author="Kernberg, Annessa and Gold, A. Jeffrey and Mohan, Vishnu", title="Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study", journal="J Med Internet Res", year="2024", month="Apr", day="22", volume="26", pages="e54419", keywords="generative AI", keywords="generative artificial intelligence", keywords="ChatGPT", keywords="simulation", keywords="large language model", keywords="clinical documentation", keywords="quality", keywords="accuracy", keywords="reproducibility", keywords="publicly available", keywords="medical note", keywords="medical notes", keywords="generation", keywords="medical documentation", keywords="documentation", keywords="documentations", keywords="AI", keywords="artificial intelligence", keywords="transcript", keywords="transcripts", keywords="ChatGPT-4", abstract="Background: Medical documentation plays a crucial role in clinical practice, facilitating accurate patient management and communication among health care professionals. However, inaccuracies in medical notes can lead to miscommunication and diagnostic errors. Additionally, the demands of documentation contribute to physician burnout. Although intermediaries like medical scribes and speech recognition software have been used to ease this burden, they have limitations in terms of accuracy and addressing provider-specific metrics. The integration of ambient artificial intelligence (AI)--powered solutions offers a promising way to improve documentation while fitting seamlessly into existing workflows. Objective: This study aims to assess the accuracy and quality of Subjective, Objective, Assessment, and Plan (SOAP) notes generated by ChatGPT-4, an AI model, using established transcripts of History and Physical Examination as the gold standard. We seek to identify potential errors and evaluate the model's performance across different categories. Methods: We conducted simulated patient-provider encounters representing various ambulatory specialties and transcribed the audio files. Key reportable elements were identified, and ChatGPT-4 was used to generate SOAP notes based on these transcripts. Three versions of each note were created and compared to the gold standard via chart review; errors generated from the comparison were categorized as omissions, incorrect information, or additions. We compared the accuracy of data elements across versions, transcript length, and data categories. Additionally, we assessed note quality using the Physician Documentation Quality Instrument (PDQI) scoring system. Results: Although ChatGPT-4 consistently generated SOAP-style notes, there were, on average, 23.6 errors per clinical case, with errors of omission (86\%) being the most common, followed by addition errors (10.5\%) and inclusion of incorrect facts (3.2\%). There was significant variance between replicates of the same case, with only 52.9\% of data elements reported correctly across all 3 replicates. The accuracy of data elements varied across cases, with the highest accuracy observed in the ``Objective'' section. Consequently, the measure of note quality, assessed by PDQI, demonstrated intra- and intercase variance. Finally, the accuracy of ChatGPT-4 was inversely correlated to both the transcript length (P=.05) and the number of scorable data elements (P=.05). Conclusions: Our study reveals substantial variability in errors, accuracy, and note quality generated by ChatGPT-4. Errors were not limited to specific sections, and the inconsistency in error types across replicates complicated predictability. Transcript length and data complexity were inversely correlated with note accuracy, raising concerns about the model's effectiveness in handling complex medical cases. The quality and reliability of clinical notes produced by ChatGPT-4 do not meet the standards required for clinical use. Although AI holds promise in health care, caution should be exercised before widespread adoption. Further research is needed to address accuracy, variability, and potential errors. ChatGPT-4, while valuable in various applications, should not be considered a safe alternative to human-generated clinical documentation at this time. ", doi="10.2196/54419", url="https://www.jmir.org/2024/1/e54419", url="http://www.ncbi.nlm.nih.gov/pubmed/38648636" } @Article{info:doi/10.2196/55037, author="Pham, Cecilia and Govender, Romi and Tehami, Salik and Chavez, Summer and Adepoju, E. Omolola and Liaw, Winston", title="ChatGPT's Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study", journal="J Med Internet Res", year="2024", month="Apr", day="22", volume="26", pages="e55037", keywords="ChatGPT", keywords="artificial intelligence", keywords="AI", keywords="large language model", keywords="LLM", keywords="cardiac arrest", keywords="bradycardia", keywords="simulation", keywords="advanced cardiovascular life support", keywords="ACLS", keywords="bradycardia simulations", keywords="America", keywords="American", keywords="heart association", keywords="cardiac", keywords="life support", keywords="exploratory study", keywords="heart", keywords="heart attack", keywords="clinical decision support", keywords="diagnostics", keywords="algorithms", abstract="Background: ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT's performance found that artificial intelligence could pass the American Heart Association's advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT's accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical. Objective: This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest. Methods: We evaluated the accuracy of ChatGPT's responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times. Results: ChatGPT's median accuracy for each step was 85\% (IQR 40\%-100\%) for cardiac arrest and 30\% (IQR 13\%-81\%) for bradycardia. ChatGPT's median accuracy over 20 simulation attempts for cardiac arrest was 69\% (IQR 67\%-74\%) and for bradycardia was 42\% (IQR 33\%-50\%). We found that ChatGPT's outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented. Conclusions: This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice. ", doi="10.2196/55037", url="https://www.jmir.org/2024/1/e55037", url="http://www.ncbi.nlm.nih.gov/pubmed/38648098" } @Article{info:doi/10.2196/52343, author="Van den Eynde, Jef", title="CHDmap: One Step Further Toward Integrating Medicine-Based Evidence Into Practice", journal="JMIR Med Inform", year="2024", month="Apr", day="19", volume="12", pages="e52343", keywords="artificial intelligence", keywords="clinical practice", keywords="congenital heart disease", keywords="decision-making", keywords="evidence-based medicine", keywords="machine learning", keywords="medicine-based evidence", keywords="patient similarity networks", keywords="precision medicine", keywords="randomized controlled trials", doi="10.2196/52343", url="https://medinform.jmir.org/2024/1/e52343" } @Article{info:doi/10.2196/52344, author="Alvarez-Romero, Celia and Polo-Molina, Alejandro and S{\'a}nchez-{\'U}beda, Francisco Eugenio and Jimenez-De-Juan, Carlos and Cuadri-Benitez, Pastora Maria and Rivas-Gonzalez, Antonio Jose and Portela, Jose and Palacios, Rafael and Rodriguez-Morcillo, Carlos and Mu{\~n}oz, Antonio and Parra-Calderon, Luis Carlos and Nieto-Martin, Dolores Maria and Ollero-Baturone, Manuel and Hern{\'a}ndez-Quiles, Carlos", title="Machine Learning--Based Prediction of Changes in the Clinical Condition of Patients With Complex Chronic Diseases: 2-Phase Pilot Prospective Single-Center Observational Study", journal="JMIR Form Res", year="2024", month="Apr", day="19", volume="8", pages="e52344", keywords="patients with complex chronic diseases", keywords="functional impairment", keywords="Barthel Index", keywords="artificial intelligence", keywords="machine learning", keywords="prediction model", keywords="pilot study", keywords="chronic patients", keywords="chronic", keywords="development study", keywords="prognostic", keywords="diagnostic", keywords="therapeutic", keywords="wearable", keywords="wearables", keywords="wearable activity tracker", keywords="mobility device", keywords="device", keywords="physical activity", keywords="caregiver", abstract="Background: Functional impairment is one of the most decisive prognostic factors in patients with complex chronic diseases. A more significant functional impairment indicates that the disease is progressing, which requires implementing diagnostic and therapeutic actions that stop the exacerbation of the disease. Objective: This study aimed to predict alterations in the clinical condition of patients with complex chronic diseases by predicting the Barthel Index (BI), to assess their clinical and functional status using an artificial intelligence model and data collected through an internet of things mobility device. Methods: A 2-phase pilot prospective single-center observational study was designed. During both phases, patients were recruited, and a wearable activity tracker was allocated to gather physical activity data. Patients were categorized into class A (BI?20; total dependence), class B (2060; moderate or mild dependence, or independent). Data preprocessing and machine learning techniques were used to analyze mobility data. A decision tree was used to achieve a robust and interpretable model. To assess the quality of the predictions, several metrics including the mean absolute error, median absolute error, and root mean squared error were considered. Statistical analysis was performed using SPSS and Python for the machine learning modeling. Results: Overall, 90 patients with complex chronic diseases were included: 50 during phase 1 (class A: n=10; class B: n=20; and class C: n=20) and 40 during phase 2 (class B: n=20 and class C: n=20). Most patients (n=85, 94\%) had a caregiver. The mean value of the BI was 58.31 (SD 24.5). Concerning mobility aids, 60\% (n=52) of patients required no aids, whereas the others required walkers (n=18, 20\%), wheelchairs (n=15, 17\%), canes (n=4, 7\%), and crutches (n=1, 1\%). Regarding clinical complexity, 85\% (n=76) met patient with polypathology criteria with a mean of 2.7 (SD 1.25) categories, 69\% (n=61) met the frailty criteria, and 21\% (n=19) met the patients with complex chronic diseases criteria. The most characteristic symptoms were dyspnea (n=73, 82\%), chronic pain (n=63, 70\%), asthenia (n=62, 68\%), and anxiety (n=41, 46\%). Polypharmacy was presented in 87\% (n=78) of patients. The most important variables for predicting the BI were identified as the maximum step count during evening and morning periods and the absence of a mobility device. The model exhibited consistency in the median prediction error with a median absolute error close to 5 in the training, validation, and production-like test sets. The model accuracy for identifying the BI class was 91\%, 88\%, and 90\% in the training, validation, and test sets, respectively. Conclusions: Using commercially available mobility recording devices makes it possible to identify different mobility patterns and relate them to functional capacity in patients with polypathology according to the BI without using clinical parameters. ", doi="10.2196/52344", url="https://formative.jmir.org/2024/1/e52344", url="http://www.ncbi.nlm.nih.gov/pubmed/38640473" } @Article{info:doi/10.2196/55202, author="Baus, Adam and Boatman, D. Dannell and Calkins, Andrea and Pollard, Cecil and Conn, Ellen Mary and Subramanian, Sujha and Kennedy-Rea, Stephenie", title="A Health Information Technology Protocol to Enhance Colorectal Cancer Screening", journal="JMIR Form Res", year="2024", month="Apr", day="19", volume="8", pages="e55202", keywords="electronic health record", keywords="EHR", keywords="colorectal cancer screening", keywords="health information technology", keywords="cancer", keywords="colorectal cancer", doi="10.2196/55202", url="https://formative.jmir.org/2024/1/e55202", url="http://www.ncbi.nlm.nih.gov/pubmed/38640474" } @Article{info:doi/10.2196/47125, author="Wang, Echo H. and Weiner, P. Jonathan and Saria, Suchi and Kharrazi, Hadi", title="Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis", journal="J Med Internet Res", year="2024", month="Apr", day="18", volume="26", pages="e47125", keywords="algorithmic bias", keywords="model bias", keywords="predictive models", keywords="model fairness", keywords="health disparity", keywords="hospital readmission", keywords="retrospective analysis", abstract="Background: The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited. Objective: This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics. Methods: We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare \& Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50\% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations. Results: The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations. Conclusions: Caution must be taken when interpreting fairness measures' face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities. ", doi="10.2196/47125", url="https://www.jmir.org/2024/1/e47125", url="http://www.ncbi.nlm.nih.gov/pubmed/38422347" } @Article{info:doi/10.2196/52592, author="Barton, J. Hanna and Maru, Apoorva and Leaf, A. Margaret and Hekman, J. Daniel and Wiegmann, A. Douglas and Shah, N. Manish and Patterson, W. Brian", title="Academic Detailing as a Health Information Technology Implementation Method: Supporting the Design and Implementation of an Emergency Department--Based Clinical Decision Support Tool to Prevent Future Falls", journal="JMIR Hum Factors", year="2024", month="Apr", day="18", volume="11", pages="e52592", keywords="emergency medicine", keywords="clinical decision support", keywords="health IT", keywords="human factors", keywords="work systems", keywords="SEIPS", keywords="Systems Engineering Initiative for Patient Safety", keywords="educational outreach", keywords="academic detailing", keywords="implementation method", keywords="department-based", keywords="CDS", keywords="clinical care", keywords="evidence-based", keywords="CDS tool", keywords="gerontology", keywords="geriatric", keywords="geriatrics", keywords="older adult", keywords="older adults", keywords="elder", keywords="elderly", keywords="older person", keywords="older people", keywords="preventative intervention", keywords="team-based analysis", keywords="machine learning", keywords="high-risk patient", keywords="high-risk patients", keywords="pharmaceutical", keywords="pharmaceutical sales", keywords="United States", keywords="fall-risk prediction", keywords="EHR", keywords="electronic health record", keywords="interview", keywords="ED environment", keywords="emergency department", abstract="Background: Clinical decision support (CDS) tools that incorporate machine learning--derived content have the potential to transform clinical care by augmenting clinicians' expertise. To realize this potential, such tools must be designed to fit the dynamic work systems of the clinicians who use them. We propose the use of academic detailing---personal visits to clinicians by an expert in a specific health IT tool---as a method for both ensuring the correct understanding of that tool and its evidence base and identifying factors influencing the tool's implementation. Objective: This study aimed to assess academic detailing as a method for simultaneously ensuring the correct understanding of an emergency department--based CDS tool to prevent future falls and identifying factors impacting clinicians' use of the tool through an analysis of the resultant qualitative data. Methods: Previously, our team designed a CDS tool to identify patients aged 65 years and older who are at the highest risk of future falls and prompt an interruptive alert to clinicians, suggesting the patient be referred to a mobility and falls clinic for an evidence-based preventative intervention. We conducted 10-minute academic detailing interviews (n=16) with resident emergency medicine physicians and advanced practice providers who had encountered our CDS tool in practice. We conducted an inductive, team-based content analysis to identify factors that influenced clinicians' use of the CDS tool. Results: The following categories of factors that impacted clinicians' use of the CDS were identified: (1) aspects of the CDS tool's design (2) clinicians' understanding (or misunderstanding) of the CDS or referral process, (3) the busy nature of the emergency department environment, (4) clinicians' perceptions of the patient and their associated fall risk, and (5) the opacity of the referral process. Additionally, clinician education was done to address any misconceptions about the CDS tool or referral process, for example, demonstrating how simple it is to place a referral via the CDS and clarifying which clinic the referral goes to. Conclusions: Our study demonstrates the use of academic detailing for supporting the implementation of health information technologies, allowing us to identify factors that impacted clinicians' use of the CDS while concurrently educating clinicians to ensure the correct understanding of the CDS tool and intervention. Thus, academic detailing can inform both real-time adjustments of a tool's implementation, for example, refinement of the language used to introduce the tool, and larger scale redesign of the CDS tool to better fit the dynamic work environment of clinicians. ", doi="10.2196/52592", url="https://humanfactors.jmir.org/2024/1/e52592", url="http://www.ncbi.nlm.nih.gov/pubmed/38635318" } @Article{info:doi/10.2196/56572, author="Nkoy, L. Flory and Stone, L. Bryan and Zhang, Yue and Luo, Gang", title="A Roadmap for Using Causal Inference and Machine Learning to Personalize Asthma Medication Selection", journal="JMIR Med Inform", year="2024", month="Apr", day="17", volume="12", pages="e56572", keywords="asthma", keywords="causal inference", keywords="forecasting", keywords="machine learning", keywords="decision support", keywords="drug", keywords="drugs", keywords="pharmacy", keywords="pharmacies", keywords="pharmacology", keywords="pharmacotherapy", keywords="pharmaceutic", keywords="pharmaceutics", keywords="pharmaceuticals", keywords="pharmaceutical", keywords="medication", keywords="medications", keywords="medication selection", keywords="respiratory", keywords="pulmonary", keywords="forecast", keywords="ICS", keywords="inhaled corticosteroid", keywords="inhaler", keywords="inhaled", keywords="corticosteroid", keywords="corticosteroids", keywords="artificial intelligence", keywords="personalized", keywords="customized", doi="10.2196/56572", url="https://medinform.jmir.org/2024/1/e56572", url="http://www.ncbi.nlm.nih.gov/pubmed/38630536" } @Article{info:doi/10.2196/57778, author="Herrmann-Werner, Anne and Festl-Wietek, Teresa and Holderried, Friederike and Herschbach, Lea and Griewatz, Jan and Masters, Ken and Zipfel, Stephan and Mahling, Moritz", title="Authors' Reply: ``Evaluating GPT-4's Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications''", journal="J Med Internet Res", year="2024", month="Apr", day="16", volume="26", pages="e57778", keywords="answer", keywords="artificial intelligence", keywords="assessment", keywords="Bloom's taxonomy", keywords="ChatGPT", keywords="classification", keywords="error", keywords="exam", keywords="examination", keywords="generative", keywords="GPT-4", keywords="Generative Pre-trained Transformer 4", keywords="language model", keywords="learning outcome", keywords="LLM", keywords="MCQ", keywords="medical education", keywords="medical exam", keywords="multiple-choice question", keywords="natural language processing", keywords="NLP", keywords="psychosomatic", keywords="question", keywords="response", keywords="taxonomy", doi="10.2196/57778", url="https://www.jmir.org/2024/1/e57778", url="http://www.ncbi.nlm.nih.gov/pubmed/38625723" } @Article{info:doi/10.2196/56997, author="Huang, Kuan-Ju", title="Evaluating GPT-4's Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications", journal="J Med Internet Res", year="2024", month="Apr", day="16", volume="26", pages="e56997", keywords="artificial intelligence", keywords="ChatGPT", keywords="Bloom taxonomy", keywords="AI", keywords="cognition", doi="10.2196/56997", url="https://www.jmir.org/2024/1/e56997", url="http://www.ncbi.nlm.nih.gov/pubmed/38625725" } @Article{info:doi/10.2196/50475, author="Shulha, Michael and Hovdebo, Jordan and D'Souza, Vinita and Thibault, Francis and Harmouche, Rola", title="Integrating Explainable Machine Learning in Clinical Decision Support Systems: Study Involving a Modified Design Thinking Approach", journal="JMIR Form Res", year="2024", month="Apr", day="16", volume="8", pages="e50475", keywords="explainable machine learning", keywords="XML", keywords="design thinking approach", keywords="NASSS framework", keywords="clinical decision support", keywords="clinician engagement", keywords="clinician-facing interface", keywords="clinician trust in machine learning", keywords="COVID-19", keywords="chest x-ray", keywords="severity prediction", abstract="Background: Though there has been considerable effort to implement machine learning (ML) methods for health care, clinical implementation has lagged. Incorporating explainable machine learning (XML) methods through the development of a decision support tool using a design thinking approach is expected to lead to greater uptake of such tools. Objective: This work aimed to explore how constant engagement of clinician end users can address the lack of adoption of ML tools in clinical contexts due to their lack of transparency and address challenges related to presenting explainability in a decision support interface. Methods: We used a design thinking approach augmented with additional theoretical frameworks to provide more robust approaches to different phases of design. In particular, in the problem definition phase, we incorporated the nonadoption, abandonment, scale-up, spread, and sustainability of technology in health care (NASSS) framework to assess these aspects in a health care network. This process helped focus on the development of a prognostic tool that predicted the likelihood of admission to an intensive care ward based on disease severity in chest x-ray images. In the ideate, prototype, and test phases, we incorporated a metric framework to assess physician trust in artificial intelligence (AI) tools. This allowed us to compare physicians' assessments of the domain representation, action ability, and consistency of the tool. Results: Physicians found the design of the prototype elegant, and domain appropriate representation of data was displayed in the tool. They appreciated the simplified explainability overlay, which only displayed the most predictive patches that cumulatively explained 90\% of the final admission risk score. Finally, in terms of consistency, physicians unanimously appreciated the capacity to compare multiple x-ray images in the same view. They also appreciated the ability to toggle the explainability overlay so that both options made it easier for them to assess how consistently the tool was identifying elements of the x-ray image they felt would contribute to overall disease severity. Conclusions: The adopted approach is situated in an evolving space concerned with incorporating XML or AI technologies into health care software. We addressed the alignment of AI as it relates to clinician trust, describing an approach to wire framing and prototyping, which incorporates the use of a theoretical framework for trust in the design process itself. Moreover, we proposed that alignment of AI is dependent upon integration of end users throughout the larger design process. Our work shows the importance and value of engaging end users prior to tool development. We believe that the described approach is a unique and valuable contribution that outlines a direction for ML experts, user experience designers, and clinician end users on how to collaborate in the creation of trustworthy and usable XML-based clinical decision support tools. ", doi="10.2196/50475", url="https://formative.jmir.org/2024/1/e50475", url="http://www.ncbi.nlm.nih.gov/pubmed/38625728" } @Article{info:doi/10.2196/53000, author="Karki, Saugat and Shaw, Sarah and Lieberman, Michael and P{\'e}rez, Alejandro and Pincus, Jonathan and Jakhmola, Priya and Tailor, Amrita and Ogunrinde, Bukky Oyinkansola and Sill, Danielle and Morgan, Shane and Alvarez, Miguel and Todd, Jonathan and Smith, Dawn and Mishra, Ninad", title="Clinical Decision Support System for Guidelines-Based Treatment of Gonococcal Infections, Screening for HIV, and Prescription of Pre-Exposure Prophylaxis: Design and Implementation Study", journal="JMIR Form Res", year="2024", month="Apr", day="15", volume="8", pages="e53000", keywords="clinical decision support systems", keywords="CDS", keywords="gonorrhea", keywords="pre-exposure prophylaxis", keywords="PrEP", keywords="HIV", keywords="sexually transmitted infections", keywords="electronic health records", keywords="guideline adherence", abstract="Background: The syndemic nature of gonococcal infections and HIV provides an opportunity to develop a synergistic intervention tool that could address the need for adequate treatment for gonorrhea, screen for HIV infections, and offer pre-exposure prophylaxis (PrEP) for persons who meet the criteria. By leveraging information available on electronic health records, a clinical decision support (CDS) system tool could fulfill this need and improve adherence to Centers for Disease Control and Prevention (CDC) treatment and screening guidelines for gonorrhea, HIV, and PrEP. Objective: The goal of this study was to translate portions of CDC treatment guidelines for gonorrhea and relevant portions of HIV screening and prescribing PrEP that stem from a diagnosis of gonorrhea as an electronic health record--based CDS intervention. We also assessed whether this CDS solution worked in real-world clinic. Methods: We developed 4 tools for this CDS intervention: a form for capturing sexual history information (SmartForm), rule-based alerts (best practice advisory), an enhanced sexually transmitted infection (STI) order set (SmartSet), and a documentation template (SmartText). A mixed methods pre-post design was used to measure the feasibility, use, and usability of the CDS solution. The study period was 12 weeks with a baseline patient sample of 12 weeks immediately prior to the intervention period for comparison. While the entire clinic had access to the CDS solution, we focused on a subset of clinicians who frequently engage in the screening and treatment of STIs within the clinical site under the name ``X-Clinic.'' We measured the use of the CDS solution within the population of patients who had either a confirmed gonococcal infection or an STI-related chief complaint. We conducted 4 midpoint surveys and 3 key informant interviews to quantify perception and impact of the CDS solution and solicit suggestions for potential future enhancements. The findings from qualitative data were determined using a combination of explorative and comparative analysis. Statistical analysis was conducted to compare the differences between patient populations in the baseline and intervention periods. Results: Within the X-Clinic, the CDS alerted clinicians (as a best practice advisory) in one-tenth (348/3451, 10.08\%) of clinical encounters. These 348 encounters represented 300 patients; SmartForms were opened for half of these patients (157/300, 52.33\%) and was completed for most for them (147/300, 89.81\%). STI test orders (SmartSet) were initiated by clinical providers in half of those patients (162/300, 54\%). HIV screening was performed during about half of those patient encounters (191/348, 54.89\%). Conclusions: We successfully built and implemented multiple CDC treatment and screening guidelines into a single cohesive CDS solution. The CDS solution was integrated into the clinical workflow and had a high rate of use. ", doi="10.2196/53000", url="https://formative.jmir.org/2024/1/e53000", url="http://www.ncbi.nlm.nih.gov/pubmed/38621237" } @Article{info:doi/10.2196/52412, author="Kawamoto, Shota and Morikawa, Yoshihiko and Yahagi, Naohisa", title="Novel Approach for Detecting Respiratory Syncytial Virus in Pediatric Patients Using Machine Learning Models Based on Patient-Reported Symptoms: Model Development and Validation Study", journal="JMIR Form Res", year="2024", month="Apr", day="12", volume="8", pages="e52412", keywords="respiratory syncytial virus", keywords="machine learning", keywords="self-reported information", keywords="clinical decision support system", keywords="decision support", keywords="decision-making", keywords="artificial intelligence", keywords="model development", keywords="evaluation study", keywords="detection", keywords="respiratory", keywords="respiratory virus", keywords="virus", keywords="machine learning model", keywords="pediatric", keywords="Japan", keywords="detection model", abstract="Background: Respiratory syncytial virus (RSV) affects children, causing serious infections, particularly in high-risk groups. Given the seasonality of RSV and the importance of rapid isolation of infected individuals, there is an urgent need for more efficient diagnostic methods to expedite this process. Objective: This study aimed to investigate the performance of a machine learning model that leverages the temporal diversity of symptom onset for detecting RSV infections and elucidate its discriminatory ability. Methods: The study was conducted in pediatric and emergency outpatient settings in Japan. We developed a detection model that remotely confirms RSV infection based on patient-reported symptom information obtained using a structured electronic template incorporating the differential points of skilled pediatricians. An extreme gradient boosting--based machine learning model was developed using the data of 4174 patients aged ?24 months who underwent RSV rapid antigen testing. These patients visited either the pediatric or emergency department of Yokohama City Municipal Hospital between January 1, 2009, and December 31, 2015. The primary outcome was the diagnostic accuracy of the machine learning model for RSV infection, as determined by rapid antigen testing, measured using the area under the receiver operating characteristic curve. The clinical efficacy was evaluated by calculating the discriminative performance based on the number of days elapsed since the onset of the first symptom and exclusion rates based on thresholds of reasonable sensitivity and specificity. Results: Our model demonstrated an area under the receiver operating characteristic curve of 0.811 (95\% CI 0.784-0.833) with good calibration and 0.746 (95\% CI 0.694-0.794) for patients within 3 days of onset. It accurately captured the temporal evolution of symptoms; based on adjusted thresholds equivalent to those of a rapid antigen test, our model predicted that 6.9\% (95\% CI 5.4\%-8.5\%) of patients in the entire cohort would be positive and 68.7\% (95\% CI 65.4\%-71.9\%) would be negative. Our model could eliminate the need for additional testing in approximately three-quarters of all patients. Conclusions: Our model may facilitate the immediate detection of RSV infection in outpatient settings and, potentially, in home environments. This approach could streamline the diagnostic process, reduce discomfort caused by invasive tests in children, and allow rapid implementation of appropriate treatments and isolation at home. The findings underscore the potential of machine learning in augmenting clinical decision-making in the early detection of RSV infection. ", doi="10.2196/52412", url="https://formative.jmir.org/2024/1/e52412", url="http://www.ncbi.nlm.nih.gov/pubmed/38608268" } @Article{info:doi/10.2196/52612, author="Ahmadzadeh, Bahareh and Patey, Christopher and Hurley, Oliver and Knight, John and Norman, Paul and Farrell, Alison and Czarnuch, Stephen and Asghari, Shabnam", title="Applications of Artificial Intelligence in Emergency Departments to Improve Wait Times: Protocol for an Integrative Living Review", journal="JMIR Res Protoc", year="2024", month="Apr", day="12", volume="13", pages="e52612", keywords="emergency department", keywords="ED", keywords="wait time", keywords="artificial intelligence", keywords="AI", keywords="living systematic review", keywords="LSR", abstract="Background: Long wait times in the emergency department (ED) are a major issue for health care systems all over the world. The application of artificial intelligence (AI) is a novel strategy to reduce ED wait times when compared to the interventions included in previous research endeavors. To date, comprehensive systematic reviews that include studies involving AI applications in the context of EDs have covered a wide range of AI implementation issues. However, the lack of an iterative update strategy limits the use of these reviews. Since the subject of AI development is cutting edge and is continuously changing, reviews in this area must be frequently updated to remain relevant. Objective: This study aims to provide a summary of the evidence that is currently available regarding how AI can affect ED wait times; discuss the applications of AI in improving wait times; and periodically assess the depth, breadth, and quality of the evidence supporting the application of AI in reducing ED wait times. Methods: We plan to conduct a living systematic review (LSR). Our strategy involves conducting continuous monitoring of evidence, with biannual search updates and annual review updates. Upon completing the initial round of the review, we will refine the search strategy and establish clear schedules for updating the LSR. An interpretive synthesis using Whittemore and Knafl's framework will be performed to compile and summarize the findings. The review will be carried out using an integrated knowledge translation strategy, and knowledge users will be involved at all stages of the review to guarantee applicability, usability, and clarity of purpose. Results: The literature search was completed by September 22, 2023, and identified 17,569 articles. The title and abstract screening were completed by December 9, 2023. In total, 70 papers were eligible. The full-text screening is in progress. Conclusions: The review will summarize AI applications that improve ED wait time. The LSR enables researchers to maintain high methodological rigor while enhancing the timeliness, applicability, and value of the review. International Registered Report Identifier (IRRID): DERR1-10.2196/52612 ", doi="10.2196/52612", url="https://www.researchprotocols.org/2024/1/e52612", url="http://www.ncbi.nlm.nih.gov/pubmed/38607662" } @Article{info:doi/10.2196/51138, author="Washington, Peter", title="A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health", journal="J Med Internet Res", year="2024", month="Apr", day="11", volume="26", pages="e51138", keywords="crowdsourcing", keywords="digital medicine", keywords="human-in-the-loop", keywords="human in the loop", keywords="human-AI collaboration", keywords="machine learning", keywords="precision health", keywords="artificial intelligence", keywords="AI", doi="10.2196/51138", url="https://www.jmir.org/2024/1/e51138", url="http://www.ncbi.nlm.nih.gov/pubmed/38602750" } @Article{info:doi/10.2196/46698, author="Hoffmann, Christin and Avery, Kerry and Macefield, Rhiannon and Dvo?{\'a}k, Tade{\'a}{\vs} and Snelgrove, Val and Blazeby, Jane and Hopkins, Della and Hickey, Shireen and Gibbison, Ben and Rooshenas, Leila and Williams, Adam and Aning, Jonathan and Bekker, L. Hilary and McNair, GK Angus and ", title="Usability of an Automated System for Real-Time Monitoring of Shared Decision-Making for Surgery: Mixed Methods Evaluation", journal="JMIR Hum Factors", year="2024", month="Apr", day="10", volume="11", pages="e46698", keywords="surgery", keywords="shared decision-making", keywords="patient participation", keywords="mixed methods", keywords="real-time measurement", keywords="patient-reported measure", keywords="electronic data collection", keywords="usability", keywords="data collection", keywords="patient reported", keywords="satisfaction", keywords="mobile phone", abstract="Background: Improving shared decision-making (SDM) for patients has become a health policy priority in many countries. Achieving high-quality SDM is particularly important for approximately 313 million surgical treatment decisions patients make globally every year. Large-scale monitoring of surgical patients' experience of SDM in real time is needed to identify the failings of SDM before surgery is performed. We developed a novel approach to automating real-time data collection using an electronic measurement system to address this. Examining usability will facilitate its optimization and wider implementation to inform interventions aimed at improving SDM. Objective: This study examined the usability of an electronic real-time measurement system to monitor surgical patients' experience of SDM. We aimed to evaluate the metrics and indicators relevant to system effectiveness, system efficiency, and user satisfaction. Methods: We performed a mixed methods usability evaluation using multiple participant cohorts. The measurement system was implemented in a large UK hospital to measure patients' experience of SDM electronically before surgery using 2 validated measures (CollaboRATE and SDM-Q-9). Quantitative data (collected between April 1 and December 31, 2021) provided measurement system metrics to assess system effectiveness and efficiency. We included adult patients booked for urgent and elective surgery across 7 specialties and excluded patients without the capacity to consent for medical procedures, those without access to an internet-enabled device, and those undergoing emergency or endoscopic procedures. Additional groups of service users (group 1: public members who had not engaged with the system; group 2: a subset of patients who completed the measurement system) completed user-testing sessions and semistructured interviews to assess system effectiveness and user satisfaction. We conducted quantitative data analysis using descriptive statistics and calculated the task completion rate and survey response rate (system effectiveness) as well as the task completion time, task efficiency, and relative efficiency (system efficiency). Qualitative thematic analysis identified indicators of and barriers to good usability (user satisfaction). Results: A total of 2254 completed surveys were returned to the measurement system. A total of 25 service users (group 1: n=9; group 2: n=16) participated in user-testing sessions and interviews. The task completion rate was high (169/171, 98.8\%) and the survey response rate was good (2254/5794, 38.9\%). The median task completion time was 3 (IQR 2-13) minutes, suggesting good system efficiency and effectiveness. The qualitative findings emphasized good user satisfaction. The identified themes suggested that the measurement system is acceptable, easy to use, and easy to access. Service users identified potential barriers and solutions to acceptability and ease of access. Conclusions: A mixed methods evaluation of an electronic measurement system for automated, real-time monitoring of patients' experience of SDM showed that usability among patients was high. Future pilot work will optimize the system for wider implementation to ultimately inform intervention development to improve SDM. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2023-079155 ", doi="10.2196/46698", url="https://humanfactors.jmir.org/2024/1/e46698", url="http://www.ncbi.nlm.nih.gov/pubmed/38598276" } @Article{info:doi/10.2196/45978, author="Haslam-Larmer, Lynn and Grigorovich, Alisa and Shum, Leia and Bianchi, Andria and Newman, Kristine and Iaboni, Andrea and McMurray, Josephine", title="Factors That Influence Successful Adoption of Real-Time Location Systems for Use in a Dementia Care Setting: Mixed Methods Study", journal="JMIR Aging", year="2024", month="Apr", day="8", volume="7", pages="e45978", keywords="remote sensing technologies", keywords="dementia", keywords="real-time location systems", keywords="Fit between Individuals, Tasks, and Technology framework", keywords="FITT framework", keywords="technology implementation", abstract="Background: Technology has been identified as a potential solution to alleviate resource gaps and augment care delivery in dementia care settings such as hospitals, long-term care, and retirement homes. There has been an increasing interest in using real-time location systems (RTLS) across health care settings for older adults with dementia, specifically related to the ability to track a person's movement and location. Objective: In this study, we aimed to explore the factors that influence the adoption or nonadoption of an RTLS during its implementation in a specialized inpatient dementia unit in a tertiary care rehabilitation hospital. Methods: The study included data from a brief quantitative survey and interviews from a convenience sample of frontline participants. Our deductive analysis of the interview used the 3 categories of the Fit Between Individuals, Task, and Technology framework as follows: individual and task, individual and technology, and task and technology. The purpose of using this framework was to assess the quality of the fit between technology attributes and an individual's self-reported intentions to adopt RTLS technology. Results: A total of 20 health care providers (HCPs) completed the survey, of which 16 (80\%) participated in interviews. Coding and subsequent analysis identified 2 conceptual subthemes in the individual-task fit category, including the identification of the task and the perception that participants were missing at-risk patient events. The task-technology fit category consisted of 3 subthemes, including reorganization of the task, personal control in relation to the task, and efficiency or resource allocation. A total of 4 subthemes were identified in the individual-technology fit category, including privacy and personal agency, trust in the technology, user interfaces, and perceptions of increased safety. Conclusions: By the end of the study, most of the unit's HCPs were using the tablet app based on their perception of its usefulness, its alignment with their comfort level with technology, and its ability to help them perform job responsibilities. HCPs perceived that they were able to reduce patient search time dramatically, yet any improvements in care were noted to be implied, as this was not measured. There was limited anecdotal evidence of reduced patient risk or adverse events, but greater reported peace of mind for HCPs overseeing patients' activity levels. ", doi="10.2196/45978", url="https://aging.jmir.org/2024/1/e45978", url="http://www.ncbi.nlm.nih.gov/pubmed/38587884" } @Article{info:doi/10.2196/55318, author="Sivarajkumar, Sonish and Kelley, Mark and Samolyk-Mazzanti, Alyssa and Visweswaran, Shyam and Wang, Yanshan", title="An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study", journal="JMIR Med Inform", year="2024", month="Apr", day="8", volume="12", pages="e55318", keywords="large language model", keywords="LLM", keywords="LLMs", keywords="natural language processing", keywords="NLP", keywords="in-context learning", keywords="prompt engineering", keywords="evaluation", keywords="zero-shot", keywords="few shot", keywords="prompting", keywords="GPT", keywords="language model", keywords="language", keywords="models", keywords="machine learning", keywords="clinical data", keywords="clinical information", keywords="extraction", keywords="BARD", keywords="Gemini", keywords="LLaMA-2", keywords="heuristic", keywords="prompt", keywords="prompts", keywords="ensemble", abstract="Background: Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches. Objective: The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types---heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models. Methods: This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches. Results: The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types. Conclusions: This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area. ", doi="10.2196/55318", url="https://medinform.jmir.org/2024/1/e55318", url="http://www.ncbi.nlm.nih.gov/pubmed/38587879" } @Article{info:doi/10.2196/54109, author="Coutinho-Almeida, Jo{\~a}o and Cardoso, Alexandrina and Cruz-Correia, Ricardo and Pereira-Rodrigues, Pedro", title="Fast Healthcare Interoperability Resources--Based Support System for Predicting Delivery Type: Model Development and Evaluation Study", journal="JMIR Form Res", year="2024", month="Apr", day="8", volume="8", pages="e54109", keywords="obstetrics", keywords="machine-learning", keywords="clinical decision support", keywords="interoperability", keywords="interoperable", keywords="obstetric", keywords="cesarean delivery", keywords="cesarean", keywords="cesarean deliveries", keywords="decision support", keywords="pregnant", keywords="pregnancy", keywords="maternal", keywords="algorithm", keywords="algorithms", keywords="simulation", keywords="simulations", abstract="Background: The escalating prevalence of cesarean delivery globally poses significant health impacts on mothers and newborns. Despite this trend, the underlying reasons for increased cesarean delivery rates, which have risen to 36.3\% in Portugal as of 2020, remain unclear. This study delves into these issues within the Portuguese health care context, where national efforts are underway to reduce cesarean delivery occurrences. Objective: This paper aims to introduce a machine learning, algorithm-based support system designed to assist clinical teams in identifying potentially unnecessary cesarean deliveries. Key objectives include developing clinical decision support systems for cesarean deliveries using interoperability standards, identifying predictive factors influencing delivery type, assessing the economic impact of implementing this tool, and comparing system outputs with clinicians' decisions. Methods: This study used retrospective data collected from 9 public Portuguese hospitals, encompassing maternal and fetal data and delivery methods from 2019 to 2020. We used various machine learning algorithms for model development, with light gradient-boosting machine (LightGBM) selected for deployment due to its efficiency. The model's performance was compared with clinician assessments through questionnaires. Additionally, an economic simulation was conducted to evaluate the financial impact on Portuguese public hospitals. Results: The deployed model, based on LightGBM, achieved an area under the receiver operating characteristic curve of 88\%. In the trial deployment phase at a single hospital, 3.8\% (123/3231) of cases triggered alarms for potentially unnecessary cesarean deliveries. Financial simulation results indicated potential benefits for 30\% (15/48) of Portuguese public hospitals with the implementation of our tool. However, this study acknowledges biases in the model, such as combining different vaginal delivery types and focusing on potentially unwarranted cesarean deliveries. Conclusions: This study presents a promising system capable of identifying potentially incorrect cesarean delivery decisions, with potentially positive implications for medical practice and health care economics. However, it also highlights the challenges and considerations necessary for real-world application, including further evaluation of clinical decision-making impacts and understanding the diverse reasons behind delivery type choices. This study underscores the need for careful implementation and further robust analysis to realize the full potential and real-world applicability of such clinical support systems. ", doi="10.2196/54109", url="https://formative.jmir.org/2024/1/e54109", url="http://www.ncbi.nlm.nih.gov/pubmed/38587885" } @Article{info:doi/10.2196/49548, author="Patel, Dipen and Msosa, Joseph Yamiko and Wang, Tao and Williams, Julie and Mustafa, G. Omar and Gee, Siobhan and Arroyo, Barbara and Larkin, Damian and Tiedt, Trevor and Roberts, Angus and Dobson, B. Richard J. and Gaughran, Fiona", title="Implementation of an Electronic Clinical Decision Support System for the Early Recognition and Management of Dysglycemia in an Inpatient Mental Health Setting Using CogStack: Protocol for a Pilot Hybrid Type 3 Effectiveness-Implementation Randomized Controlled Cluster Trial", journal="JMIR Res Protoc", year="2024", month="Apr", day="5", volume="13", pages="e49548", keywords="blood sugar", keywords="CDSS", keywords="clinical decision support system", keywords="decision support", keywords="diabetes", keywords="diabetic", keywords="dysglycemia", keywords="electronic clinical decision support", keywords="hyperglycemia", keywords="hypoglycemia", keywords="implementation", keywords="medical informatics", keywords="mental health", keywords="mental healthcare", keywords="mental illness", keywords="metabolic health", keywords="randomized controlled trial", keywords="RCT", abstract="Background: Severe mental illnesses (SMIs), including schizophrenia, bipolar affective disorder, and major depressive disorder, are associated with an increased risk of physical health comorbidities and premature mortality from conditions including cardiovascular disease and diabetes. Digital technologies such as electronic clinical decision support systems (eCDSSs) could play a crucial role in improving the clinician-led management of conditions such as dysglycemia (deranged blood sugar levels) and associated conditions such as diabetes in people with a diagnosis of SMI in mental health settings. Objective: We have developed a real-time eCDSS using CogStack, an information retrieval and extraction platform, to automatically alert clinicians with National Health Service Trust--approved, guideline-based recommendations for dysglycemia monitoring and management in secondary mental health care. This novel system aims to improve the management of dysglycemia and associated conditions, such as diabetes, in SMI. This protocol describes a pilot study to explore the acceptability, feasibility, and evaluation of its implementation in a mental health inpatient setting. Methods: This will be a pilot hybrid type 3 effectiveness-implementation randomized controlled cluster trial in inpatient mental health wards. A ward will be the unit of recruitment, where it will be randomly allocated to receive either access to the eCDSS plus usual care or usual care alone over a 4-month period. We will measure implementation outcomes, including the feasibility and acceptability of the eCDSS to clinicians, as primary outcomes, alongside secondary outcomes relating to the process of care measures such as dysglycemia screening rates. An evaluation of other implementation outcomes relating to the eCDSS will be conducted, identifying facilitators and barriers based on established implementation science frameworks. Results: Enrollment of wards began in April 2022, after which clinical staff were recruited to take part in surveys and interviews. The intervention period of the trial began in February 2023, and subsequent data collection was completed in August 2023. Data are currently being analyzed, and results are expected to be available in June 2024. Conclusions: An eCDSS can have the potential to improve clinician-led management of dysglycemia in inpatient mental health settings. If found to be feasible and acceptable, then, in combination with the results of the implementation evaluation, the system can be refined and improved to support future successful implementation. A larger and more definitive effectiveness trial should then be conducted to assess its impact on clinical outcomes and to inform scalability and application to other conditions in wider mental health care settings. Trial Registration: ClinicalTrials.gov NCT04792268; https://clinicaltrials.gov/study/NCT04792268 International Registered Report Identifier (IRRID): DERR1-10.2196/49548 ", doi="10.2196/49548", url="https://www.researchprotocols.org/2024/1/e49548", url="http://www.ncbi.nlm.nih.gov/pubmed/38578666" } @Article{info:doi/10.2196/54787, author="Kim, Paik Jane and Yang, Hyun-Joon and Kim, Bohye and Ryan, Katie and Roberts, Weiss Laura", title="Understanding Physician's Perspectives on AI in Health Care: Protocol for a Sequential Multiple Assignment Randomized Vignette Study", journal="JMIR Res Protoc", year="2024", month="Apr", day="4", volume="13", pages="e54787", keywords="AI-based clinical decision support", keywords="decision-making", keywords="hypothetical vignettes", keywords="physician perspective", keywords="web-based survey", keywords="hypothesis-driven research", keywords="ethics", keywords="stakeholder attitudes", abstract="Background: As the availability and performance of artificial intelligence (AI)--based clinical decision support (CDS) systems improve, physicians and other care providers poised to be on the front lines will be increasingly tasked with using these tools in patient care and incorporating their outputs into clinical decision-making processes. Vignette studies provide a means to explore emerging hypotheses regarding how context-specific factors, such as clinical risk, the amount of information provided about the AI, and the AI result, may impact physician acceptance and use of AI-based CDS tools. To best anticipate how such factors influence the decision-making of frontline physicians in clinical scenarios involving AI decision-support tools, hypothesis-driven research is needed that enables scenario testing before the implementation and deployment of these tools. Objective: This study's objectives are to (1) design an original, web-based vignette-based survey that features hypothetical scenarios based on emerging or real-world applications of AI-based CDS systems that will vary systematically by features related to clinical risk, the amount of information provided about the AI, and the AI result; and (2) test and determine causal effects of specific factors on the judgments and perceptions salient to physicians' clinical decision-making. Methods: US-based physicians with specialties in family or internal medicine will be recruited through email and mail (target n=420). Through a web-based survey, participants will be randomized to a 3-part ``sequential multiple assignment randomization trial (SMART) vignette'' detailing a hypothetical clinical scenario involving an AI decision support tool. The SMART vignette design is similar to the SMART design but adapted to a survey design. Each respondent will be randomly assigned to 1 of the possible vignette variations of the factors we are testing at each stage, which include the level of clinical risk, the amount of information provided about the AI, and the certainty of the AI output. Respondents will be given questions regarding their hypothetical decision-making in response to the hypothetical scenarios. Results: The study is currently in progress and data collection is anticipated to be completed in 2024. Conclusions: The web-based vignette study will provide information on how contextual factors such as clinical risk, the amount of information provided about an AI tool, and the AI result influence physicians' reactions to hypothetical scenarios that are based on emerging applications of AI in frontline health care settings. Our newly proposed ``SMART vignette'' design offers several benefits not afforded by the extensively used traditional vignette design, due to the 2 aforementioned features. These advantages are (1) increased validity of analyses targeted at understanding the impact of a factor on the decision outcome, given previous outcomes and other contextual factors; and (2) balanced sample sizes across groups. This study will generate a better understanding of physician decision-making within this context. International Registered Report Identifier (IRRID): DERR1-10.2196/54787 ", doi="10.2196/54787", url="https://www.researchprotocols.org/2024/1/e54787", url="http://www.ncbi.nlm.nih.gov/pubmed/38573756" } @Article{info:doi/10.2196/48862, author="Lin, Yu-Ting and Deng, Yuan-Xiang and Tsai, Chu-Lin and Huang, Chien-Hua and Fu, Li-Chen", title="Interpretable Deep Learning System for Identifying Critical Patients Through the Prediction of Triage Level, Hospitalization, and Length of Stay: Prospective Study", journal="JMIR Med Inform", year="2024", month="Apr", day="1", volume="12", pages="e48862", keywords="emergency department", keywords="triage system", keywords="hospital admission", keywords="length of stay", keywords="multimodal integration", abstract="Background: Triage is the process of accurately assessing patients' symptoms and providing them with proper clinical treatment in the emergency department (ED). While many countries have developed their triage process to stratify patients' clinical severity and thus distribute medical resources, there are still some limitations of the current triage process. Since the triage level is mainly identified by experienced nurses based on a mix of subjective and objective criteria, mis-triage often occurs in the ED. It can not only cause adverse effects on patients, but also impose an undue burden on the health care delivery system. Objective: Our study aimed to design a prediction system based on triage information, including demographics, vital signs, and chief complaints. The proposed system can not only handle heterogeneous data, including tabular data and free-text data, but also provide interpretability for better acceptance by the ED staff in the hospital. Methods: In this study, we proposed a system comprising 3 subsystems, with each of them handling a single task, including triage level prediction, hospitalization prediction, and length of stay prediction. We used a large amount of retrospective data to pretrain the model, and then, we fine-tuned the model on a prospective data set with a golden label. The proposed deep learning framework was built with TabNet and MacBERT (Chinese version of bidirectional encoder representations from transformers [BERT]). Results: The performance of our proposed model was evaluated on data collected from the National Taiwan University Hospital (901 patients were included). The model achieved promising results on the collected data set, with accuracy values of 63\%, 82\%, and 71\% for triage level prediction, hospitalization prediction, and length of stay prediction, respectively. Conclusions: Our system improved the prediction of 3 different medical outcomes when compared with other machine learning methods. With the pretrained vital sign encoder and repretrained mask language modeling MacBERT encoder, our multimodality model can provide a deeper insight into the characteristics of electronic health records. Additionally, by providing interpretability, we believe that the proposed system can assist nursing staff and physicians in taking appropriate medical decisions. ", doi="10.2196/48862", url="https://medinform.jmir.org/2024/1/e48862", url="http://www.ncbi.nlm.nih.gov/pubmed/38557661" } @Article{info:doi/10.2196/47017, author="Sun, Wan-Na and Kao, Chi-Yin", title="The Challenges in Using eHealth Decision Resources for Surrogate Decision-Making in the Intensive Care Unit", journal="J Med Internet Res", year="2024", month="Apr", day="1", volume="26", pages="e47017", keywords="decision-making", keywords="eHealth", keywords="intensive care unit", keywords="literacy", keywords="surrogate", keywords="mobile phone", doi="10.2196/47017", url="https://www.jmir.org/2024/1/e47017", url="http://www.ncbi.nlm.nih.gov/pubmed/38557504" } @Article{info:doi/10.2196/53343, author="Sung, Sumi and Kim, Youlim and Kim, Hwan Su and Jung, Hyesil", title="Identification of Predictors for Clinical Deterioration in Patients With COVID-19 via Electronic Nursing Records: Retrospective Observational Study", journal="J Med Internet Res", year="2024", month="Mar", day="29", volume="26", pages="e53343", keywords="COVID-19", keywords="infectious", keywords="respiratory", keywords="SARS-CoV-2", keywords="nursing records", keywords="SNOMED CT", keywords="random forest", keywords="logistic regression", keywords="EHR", keywords="EHRs", keywords="machine learning", keywords="documentation", keywords="deterioration", keywords="health records", keywords="health record", keywords="patient record", keywords="patient records", keywords="nursing", keywords="standardization", keywords="standard", keywords="standards", keywords="standardized", keywords="standardize", keywords="nomenclature", keywords="term", keywords="terms", keywords="terminologies", keywords="terminology", abstract="Background: Few studies have used standardized nursing records with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT) to identify predictors of clinical deterioration. Objective: This study aims to standardize the nursing documentation records of patients with COVID-19 using SNOMED CT and identify predictive factors of clinical deterioration in patients with COVID-19 via standardized nursing records. Methods: In this study, 57,558 nursing statements from 226 patients with COVID-19 were analyzed. Among these, 45,852 statements were from 207 patients in the stable (control) group and 11,706 from 19 patients in the exacerbated (case) group who were transferred to the intensive care unit within 7 days. The data were collected between December 2019 and June 2022. These nursing statements were standardized using the SNOMED CT International Edition released on November 30, 2022. The 260 unique nursing statements that accounted for the top 90\% of 57,558 statements were selected as the mapping source and mapped into SNOMED CT concepts based on their meaning by 2 experts with more than 5 years of SNOMED CT mapping experience. To identify the main features of nursing statements associated with the exacerbation of patient condition, random forest algorithms were used, and optimal hyperparameters were selected for nursing problems or outcomes and nursing procedure--related statements. Additionally, logistic regression analysis was conducted to identify features that determine clinical deterioration in patients with COVID-19. Results: All nursing statements were semantically mapped to SNOMED CT concepts for ``clinical finding,'' ``situation with explicit context,'' and ``procedure'' hierarchies. The interrater reliability of the mapping results was 87.7\%. The most important features calculated by random forest were ``oxygen saturation below reference range,'' ``dyspnea,'' ``tachypnea,'' and ``cough'' in ``clinical finding,'' and ``oxygen therapy,'' ``pulse oximetry monitoring,'' ``temperature taking,'' ``notification of physician,'' and ``education about isolation for infection control'' in ``procedure.'' Among these, ``dyspnea'' and ``inadequate food diet'' in ``clinical finding'' increased clinical deterioration risk (dyspnea: odds ratio [OR] 5.99, 95\% CI 2.25-20.29; inadequate food diet: OR 10.0, 95\% CI 2.71-40.84), and ``oxygen therapy'' and ``notification of physician'' in ``procedure'' also increased the risk of clinical deterioration in patients with COVID-19 (oxygen therapy: OR 1.89, 95\% CI 1.25-3.05; notification of physician: OR 1.72, 95\% CI 1.02-2.97). Conclusions: The study used SNOMED CT to express and standardize nursing statements. Further, it revealed the importance of standardized nursing records as predictive variables for clinical deterioration in patients. ", doi="10.2196/53343", url="https://www.jmir.org/2024/1/e53343", url="http://www.ncbi.nlm.nih.gov/pubmed/38414056" } @Article{info:doi/10.2196/51058, author="Thomas, Amy and Asnes, Andrea and Libby, Kyle and Hsiao, Allen and Tiyyagura, Gunjan", title="Developing and Testing the Usability of a Novel Child Abuse Clinical Decision Support System: Mixed Methods Study", journal="J Med Internet Res", year="2024", month="Mar", day="29", volume="26", pages="e51058", keywords="child abuse", keywords="clinical decision support", keywords="CDS", keywords="pediatrics", keywords="child", keywords="children", keywords="natural language processing", keywords="usability", keywords="clinical decision support system", keywords="physical abuse", abstract="Background: Despite the impact of physical abuse on children, it is often underdiagnosed, especially among children evaluated in emergency departments (EDs). Electronic clinical decision support (CDS) can improve the recognition of child physical abuse. Objective: We aimed to develop and test the usability of a natural language processing--based child abuse CDS system, known as the Child Abuse Clinical Decision Support (CA-CDS), to alert ED clinicians about high-risk injuries suggestive of abuse in infants' charts. Methods: Informed by available evidence, a multidisciplinary team, including an expert in user design, developed the CA-CDS prototype that provided evidence-based recommendations for the evaluation and management of suspected child abuse when triggered by documentation of a high-risk injury. Content was customized for medical versus nursing providers and initial versus subsequent exposure to the alert. To assess the usability of and refine the CA-CDS, we interviewed 24 clinicians from 4 EDs about their interactions with the prototype. Interview transcripts were coded and analyzed using conventional content analysis. Results: Overall, 5 main categories of themes emerged from the study. CA-CDS benefits included providing an extra layer of protection, providing evidence-based recommendations, and alerting the entire clinical ED team. The user-centered, workflow-compatible design included soft-stop alert configuration, editable and automatic documentation, and attention-grabbing formatting. Recommendations for improvement included consolidating content, clearer design elements, and adding a hyperlink with additional resources. Barriers to future implementation included alert fatigue, hesitancy to change, and concerns regarding documentation. Facilitators of future implementation included stakeholder buy-in, provider education, and sharing the test characteristics. On the basis of user feedback, iterative modifications were made to the prototype. Conclusions: With its user-centered design and evidence-based content, the CA-CDS can aid providers in the real-time recognition and evaluation of infant physical abuse and has the potential to reduce the number of missed cases. ", doi="10.2196/51058", url="https://www.jmir.org/2024/1/e51058", url="http://www.ncbi.nlm.nih.gov/pubmed/38551639" } @Article{info:doi/10.2196/52566, author="Aoki, Nozomi and Miyagami, Taiju and Saita, Mizue and Naito, Toshio", title="AI Analysis of General Medicine in Japan: Present and Future Considerations", journal="JMIR Form Res", year="2024", month="Mar", day="29", volume="8", pages="e52566", keywords="artificial intelligence", keywords="physicians", keywords="hospitalists", keywords="polypharmacy", keywords="sexism", keywords="Japan", keywords="AI", keywords="medicine", keywords="gender-biased", keywords="physician", keywords="medical care", keywords="gender", keywords="women", keywords="Pharmacology", keywords="older adults", keywords="geriatric", keywords="elderly", keywords="Japanese", doi="10.2196/52566", url="https://formative.jmir.org/2024/1/e52566", url="http://www.ncbi.nlm.nih.gov/pubmed/38551640" } @Article{info:doi/10.2196/41065, author="He, Feng and Ng Yin Ling, Clarissa and Nusinovici, Simon and Cheng, Ching-Yu and Wong, Yin Tien and Li, Jialiang and Sabanayagam, Charumathi", title="Development and External Validation of Machine Learning Models for Diabetic Microvascular Complications: Cross-Sectional Study With Metabolites", journal="J Med Internet Res", year="2024", month="Mar", day="28", volume="26", pages="e41065", keywords="machine learning", keywords="diabetic microvascular complication", keywords="diabetic kidney disease", keywords="diabetic retinopathy", keywords="biomarkers", keywords="metabolomics", keywords="complication", keywords="adult", keywords="cardiovascular disease", keywords="metabolites", keywords="biomedical big data", keywords="kidney disease", abstract="Background: Diabetic kidney disease (DKD) and diabetic retinopathy (DR) are major diabetic microvascular complications, contributing significantly to morbidity, disability, and mortality worldwide. The kidney and the eye, having similar microvascular structures and physiological and pathogenic features, may experience similar metabolic changes in diabetes. Objective: This study aimed to use machine learning (ML) methods integrated with metabolic data to identify biomarkers associated with DKD and DR in a multiethnic Asian population with diabetes, as well as to improve the performance of DKD and DR detection models beyond traditional risk factors. Methods: We used ML algorithms (logistic regression [LR] with Least Absolute Shrinkage and Selection Operator and gradient-boosting decision tree) to analyze 2772 adults with diabetes from the Singapore Epidemiology of Eye Diseases study, a population-based cross-sectional study conducted in Singapore (2004-2011). From 220 circulating metabolites and 19 risk factors, we selected the most important variables associated with DKD (defined as an estimated glomerular filtration rate <60 mL/min/1.73 m2) and DR (defined as an Early Treatment Diabetic Retinopathy Study severity level ?20). DKD and DR detection models were developed based on the variable selection results and externally validated on a sample of 5843 participants with diabetes from the UK biobank (2007-2010). Machine-learned model performance (area under the receiver operating characteristic curve [AUC] with 95\% CI, sensitivity, and specificity) was compared to that of traditional LR adjusted for age, sex, diabetes duration, hemoglobin A1c, systolic blood pressure, and BMI. Results: Singapore Epidemiology of Eye Diseases participants had a median age of 61.7 (IQR 53.5-69.4) years, with 49.1\% (1361/2772) being women, 20.2\% (555/2753) having DKD, and 25.4\% (685/2693) having DR. UK biobank participants had a median age of 61.0 (IQR 55.0-65.0) years, with 35.8\% (2090/5843) being women, 6.7\% (374/5570) having DKD, and 6.1\% (355/5843) having DR. The ML algorithms identified diabetes duration, insulin usage, age, and tyrosine as the most important factors of both DKD and DR. DKD was additionally associated with cardiovascular disease history, antihypertensive medication use, and 3 metabolites (lactate, citrate, and cholesterol esters to total lipids ratio in intermediate-density lipoprotein), while DR was additionally associated with hemoglobin A1c, blood glucose, pulse pressure, and alanine. Machine-learned models for DKD and DR detection outperformed traditional LR models in both internal (AUC 0.838 vs 0.743 for DKD and 0.790 vs 0.764 for DR) and external validation (AUC 0.791 vs 0.691 for DKD and 0.778 vs 0.760 for DR). Conclusions: This study highlighted diabetes duration, insulin usage, age, and circulating tyrosine as important factors in detecting DKD and DR. The integration of ML with biomedical big data enables biomarker discovery and improves disease detection beyond traditional risk factors. ", doi="10.2196/41065", url="https://www.jmir.org/2024/1/e41065", url="http://www.ncbi.nlm.nih.gov/pubmed/38546730" } @Article{info:doi/10.2196/55802, author="Hu, Zhao and Wang, Min and Zheng, Si and Xu, Xiaowei and Zhang, Zhuxin and Ge, Qiaoyue and Li, Jiao and Yao, Yan", title="Clinical Decision Support Requirements for Ventricular Tachycardia Diagnosis Within the Frameworks of Knowledge and Practice: Survey Study", journal="JMIR Hum Factors", year="2024", month="Mar", day="26", volume="11", pages="e55802", keywords="clinical decision support system", keywords="requirements analysis", keywords="ventricular tachycardia", keywords="knowledge", keywords="clinical practice", keywords="questionnaires", abstract="Background: Ventricular tachycardia (VT) diagnosis is challenging due to the similarity between VT and some forms of supraventricular tachycardia, complexity of clinical manifestations, heterogeneity of underlying diseases, and potential for life-threatening hemodynamic instability. Clinical decision support systems (CDSSs) have emerged as promising tools to augment the diagnostic capabilities of cardiologists. However, a requirements analysis is acknowledged to be vital for the success of a CDSS, especially for complex clinical tasks such as VT diagnosis. Objective: The aims of this study were to analyze the requirements for a VT diagnosis CDSS within the frameworks of knowledge and practice and to determine the clinical decision support (CDS) needs. Methods: Our multidisciplinary team first conducted semistructured interviews with seven cardiologists related to the clinical challenges of VT and expected decision support. A questionnaire was designed by the multidisciplinary team based on the results of interviews. The questionnaire was divided into four sections: demographic information, knowledge assessment, practice assessment, and CDS needs. The practice section consisted of two simulated cases for a total score of 10 marks. Online questionnaires were disseminated to registered cardiologists across China from December 2022 to February 2023. The scores for the practice section were summarized as continuous variables, using the mean, median, and range. The knowledge and CDS needs sections were assessed using a 4-point Likert scale without a neutral option. Kruskal-Wallis tests were performed to investigate the relationship between scores and practice years or specialty. Results: Of the 687 cardiologists who completed the questionnaire, 567 responses were eligible for further analysis. The results of the knowledge assessment showed that 383 cardiologists (68\%) lacked knowledge in diagnostic evaluation. The overall average score of the practice assessment was 6.11 (SD 0.55); the etiological diagnosis section had the highest overall scores (mean 6.74, SD 1.75), whereas the diagnostic evaluation section had the lowest scores (mean 5.78, SD 1.19). A majority of cardiologists (344/567, 60.7\%) reported the need for a CDSS. There was a significant difference in practice competency scores between general cardiologists and arrhythmia specialists (P=.02). Conclusions: There was a notable deficiency in the knowledge and practice of VT among Chinese cardiologists. Specific knowledge and practice support requirements were identified, which provide a foundation for further development and optimization of a CDSS. Moreover, it is important to consider clinicians' specialization levels and years of practice for effective and personalized support. ", doi="10.2196/55802", url="https://humanfactors.jmir.org/2024/1/e55802", url="http://www.ncbi.nlm.nih.gov/pubmed/38530337" } @Article{info:doi/10.2196/56933, author="Harada, Yukinori and Kawamura, Ren and Yokose, Masashi and Shimizu, Taro and Singh, Hardeep", title="Definitions and Measurements for Atypical Presentations at Risk for Diagnostic Errors in Internal Medicine: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2024", month="Mar", day="25", volume="13", pages="e56933", keywords="atypical presentations", keywords="diagnostic errors", keywords="internal medicine", keywords="scoping review protocol", keywords="atypical presentation", keywords="high risk", keywords="data extraction", keywords="descriptive statistics", keywords="criteria", keywords="qualitative", keywords="content analysis", keywords="inductive approach", keywords="clinical informatics", keywords="clinical decision support", abstract="Background: Atypical presentations have been increasingly recognized as a significant contributing factor to diagnostic errors in internal medicine. However, research to address associations between atypical presentations and diagnostic errors has not been evaluated due to the lack of widely applicable definitions and criteria for what is considered an atypical presentation. Objective: The aim of the study is to describe how atypical presentations are defined and measured in studies of diagnostic errors in internal medicine and use this new information to develop new criteria to identify atypical presentations at high risk for diagnostic errors. Methods: This study will follow an established framework for conducting scoping reviews. Inclusion criteria are developed according to the participants, concept, and context framework. This review will consider studies that fulfill all of the following criteria: include adult patients (participants); explore the association between atypical presentations and diagnostic errors using any definition, criteria, or measurement to identify atypical presentations and diagnostic errors (concept); and focus on internal medicine (context). Regarding the type of sources, this scoping review will consider quantitative, qualitative, and mixed methods study designs; systematic reviews; and opinion papers for inclusion. Case reports, case series, and conference abstracts will be excluded. The data will be extracted through MEDLINE, Web of Science, CINAHL, Embase, Cochrane Library, and Google Scholar searches. No limits will be applied to language, and papers indexed from database inception to December 31, 2023, will be included. Two independent reviewers (YH and RK) will conduct study selection and data extraction. The data extracted will include specific details about the patient characteristics (eg, age, sex, and disease), the definitions and measuring methods for atypical presentations and diagnostic errors, clinical settings (eg, department and outpatient or inpatient), type of evidence source, and the association between atypical presentations and diagnostic errors relevant to the review question. The extracted data will be presented in tabular format with descriptive statistics, allowing us to identify the key components or types of atypical presentations and develop new criteria to identify atypical presentations for future studies of diagnostic errors. Developing the new criteria will follow guidance for a basic qualitative content analysis with an inductive approach. Results: As of January 2024, a literature search through multiple databases is ongoing. We will complete this study by December 2024. Conclusions: This scoping review aims to provide rigorous evidence to develop new criteria to identify atypical presentations at high risk for diagnostic errors in internal medicine. Such criteria could facilitate the development of a comprehensive conceptual model to understand the associations between atypical presentations and diagnostic errors in internal medicine. Trial Registration: Open Science Framework; www.osf.io/27d5m International Registered Report Identifier (IRRID): DERR1-10.2196/56933 ", doi="10.2196/56933", url="https://www.researchprotocols.org/2024/1/e56933", url="http://www.ncbi.nlm.nih.gov/pubmed/38526541" } @Article{info:doi/10.2196/52462, author="Raja, Hina and Munawar, Asim and Mylonas, Nikolaos and Delsoz, Mohammad and Madadi, Yeganeh and Elahi, Muhammad and Hassan, Amr and Abu Serhan, Hashem and Inam, Onur and Hernandez, Luis and Chen, Hao and Tran, Sang and Munir, Wuqaas and Abd-Alrazaq, Alaa and Yousefi, Siamak", title="Automated Category and Trend Analysis of Scientific Articles on Ophthalmology Using Large Language Models: Development and Usability Study", journal="JMIR Form Res", year="2024", month="Mar", day="22", volume="8", pages="e52462", keywords="Bidirectional and Auto-Regressive Transformers", keywords="BART", keywords="bidirectional encoder representations from transformers", keywords="BERT", keywords="ophthalmology", keywords="text classification", keywords="large language model", keywords="LLM", keywords="trend analysis", abstract="Background: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). Objective: The aim of this study is to evaluate the applicability of various LLMs based on textual content of scientific ophthalmology papers. Methods: We developed a model based on natural language processing techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we used zero-shot learning LLMs and compared Bidirectional and Auto-Regressive Transformers (BART) and its variants with Bidirectional Encoder Representations from Transformers (BERT) and its variants, such as distilBERT, SciBERT, PubmedBERT, and BioBERT. To evaluate the LLMs, we compiled a data set (retinal diseases [RenD] ) of 1000 ocular disease--related articles, which were expertly annotated by a panel of 6 specialists into 19 distinct categories. In addition to the classification of articles, we also performed analysis on different classified groups to find the patterns and trends in the field. Results: The classification results demonstrate the effectiveness of LLMs in categorizing a large number of ophthalmology papers without human intervention. The model achieved a mean accuracy of 0.86 and a mean F1-score of 0.85 based on the RenD data set. Conclusions: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval. We performed a trend analysis that enables researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines. ", doi="10.2196/52462", url="https://formative.jmir.org/2024/1/e52462", url="http://www.ncbi.nlm.nih.gov/pubmed/38517457" } @Article{info:doi/10.2196/53400, author="Seo, Hyeram and Ahn, Imjin and Gwon, Hansle and Kang, Heejun and Kim, Yunha and Choi, Heejung and Kim, Minkyoung and Han, Jiye and Kee, Gaeun and Park, Seohyun and Ko, Soyoung and Jung, HyoJe and Kim, Byeolhee and Oh, Jungsik and Jun, Joon Tae and Kim, Young-Hak", title="Forecasting Hospital Room and Ward Occupancy Using Static and Dynamic Information Concurrently: Retrospective Single-Center Cohort Study", journal="JMIR Med Inform", year="2024", month="Mar", day="21", volume="12", pages="e53400", keywords="hospital bed occupancy", keywords="electronic medical records", keywords="time series forecasting", keywords="short-term memory", keywords="combining static and dynamic variables", abstract="Background: Predicting the bed occupancy rate (BOR) is essential for efficient hospital resource management, long-term budget planning, and patient care planning. Although macro-level BOR prediction for the entire hospital is crucial, predicting occupancy at a detailed level, such as specific wards and rooms, is more practical and useful for hospital scheduling. Objective: The aim of this study was to develop a web-based support tool that allows hospital administrators to grasp the BOR for each ward and room according to different time periods. Methods: We trained time-series models based on long short-term memory (LSTM) using individual bed data aggregated hourly each day to predict the BOR for each ward and room in the hospital. Ward training involved 2 models with 7- and 30-day time windows, and room training involved models with 3- and 7-day time windows for shorter-term planning. To further improve prediction performance, we added 2 models trained by concatenating dynamic data with static data representing room-specific details. Results: We confirmed the results of a total of 12 models using bidirectional long short-term memory (Bi-LSTM) and LSTM, and the model based on Bi-LSTM showed better performance. The ward-level prediction model had a mean absolute error (MAE) of 0.067, mean square error (MSE) of 0.009, root mean square error (RMSE) of 0.094, and R2 score of 0.544. Among the room-level prediction models, the model that combined static data exhibited superior performance, with a MAE of 0.129, MSE of 0.050, RMSE of 0.227, and R2 score of 0.600. Model results can be displayed on an electronic dashboard for easy access via the web. Conclusions: We have proposed predictive BOR models for individual wards and rooms that demonstrate high performance. The results can be visualized through a web-based dashboard, aiding hospital administrators in bed operation planning. This contributes to resource optimization and the reduction of hospital resource use. ", doi="10.2196/53400", url="https://medinform.jmir.org/2024/1/e53400", url="http://www.ncbi.nlm.nih.gov/pubmed/38513229" } @Article{info:doi/10.2196/52073, author="Yim, Dobin and Khuntia, Jiban and Parameswaran, Vijaya and Meyers, Arlen", title="Preliminary Evidence of the Use of Generative AI in Health Care Clinical Services: Systematic Narrative Review", journal="JMIR Med Inform", year="2024", month="Mar", day="20", volume="12", pages="e52073", keywords="generative artificial intelligence tools and applications", keywords="GenAI", keywords="service", keywords="clinical", keywords="health care", keywords="transformation", keywords="digital", abstract="Background: Generative artificial intelligence tools and applications (GenAI) are being increasingly used in health care. Physicians, specialists, and other providers have started primarily using GenAI as an aid or tool to gather knowledge, provide information, train, or generate suggestive dialogue between physicians and patients or between physicians and patients' families or friends. However, unless the use of GenAI is oriented to be helpful in clinical service encounters that can improve the accuracy of diagnosis, treatment, and patient outcomes, the expected potential will not be achieved. As adoption continues, it is essential to validate the effectiveness of the infusion of GenAI as an intelligent technology in service encounters to understand the gap in actual clinical service use of GenAI. Objective: This study synthesizes preliminary evidence on how GenAI assists, guides, and automates clinical service rendering and encounters in health care The review scope was limited to articles published in peer-reviewed medical journals. Methods: We screened and selected 0.38\% (161/42,459) of articles published between January 1, 2020, and May 31, 2023, identified from PubMed. We followed the protocols outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to select highly relevant studies with at least 1 element on clinical use, evaluation, and validation to provide evidence of GenAI use in clinical services. The articles were classified based on their relevance to clinical service functions or activities using the descriptive and analytical information presented in the articles. Results: Of 161 articles, 141 (87.6\%) reported using GenAI to assist services through knowledge access, collation, and filtering. GenAI was used for disease detection (19/161, 11.8\%), diagnosis (14/161, 8.7\%), and screening processes (12/161, 7.5\%) in the areas of radiology (17/161, 10.6\%), cardiology (12/161, 7.5\%), gastrointestinal medicine (4/161, 2.5\%), and diabetes (6/161, 3.7\%). The literature synthesis in this study suggests that GenAI is mainly used for diagnostic processes, improvement of diagnosis accuracy, and screening and diagnostic purposes using knowledge access. Although this solves the problem of knowledge access and may improve diagnostic accuracy, it is oriented toward higher value creation in health care. Conclusions: GenAI informs rather than assisting or automating clinical service functions in health care. There is potential in clinical service, but it has yet to be actualized for GenAI. More clinical service--level evidence that GenAI is used to streamline some functions or provides more automated help than only information retrieval is needed. To transform health care as purported, more studies related to GenAI applications must automate and guide human-performed services and keep up with the optimism that forward-thinking health care organizations will take advantage of GenAI. ", doi="10.2196/52073", url="https://medinform.jmir.org/2024/1/e52073", url="http://www.ncbi.nlm.nih.gov/pubmed/38506918" } @Article{info:doi/10.2196/53951, author="Toh, An Zheng and Berg, Bj{\o}rnar and Han, Claudia Qin Yun and Hey, Dennis Hwee Weng and Pikkarainen, Minna and Grotle, Margreth and He, Hong-Gu", title="Clinical Decision Support System Used in Spinal Disorders: Scoping Review", journal="J Med Internet Res", year="2024", month="Mar", day="19", volume="26", pages="e53951", keywords="back pain", keywords="clinical decision support systems", keywords="CDSS", keywords="diagnosis", keywords="imaging", keywords="predictive", keywords="prognosis", keywords="spine", abstract="Background: Spinal disorders are highly prevalent worldwide with high socioeconomic costs. This cost is associated with the demand for treatment and productivity loss, prompting the exploration of technologies to improve patient outcomes. Clinical decision support systems (CDSSs) are computerized systems that are increasingly used to facilitate safe and efficient health care. Their applications range in depth and can be found across health care specialties. Objective: This scoping review aims to explore the use of CDSSs in patients with spinal disorders. Methods: We used the Joanna Briggs Institute methodological guidance for this scoping review and reported according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) statement. Databases, including PubMed, Embase, Cochrane, CINAHL, Web of Science, Scopus, ProQuest, and PsycINFO, were searched from inception until October 11, 2022. The included studies examined the use of digitalized CDSSs in patients with spinal disorders. Results: A total of 4 major CDSS functions were identified from 31 studies: preventing unnecessary imaging (n=8, 26\%), aiding diagnosis (n=6, 19\%), aiding prognosis (n=11, 35\%), and recommending treatment options (n=6, 20\%). Most studies used the knowledge-based system. Logistic regression was the most commonly used method, followed by decision tree algorithms. The use of CDSSs to aid in the management of spinal disorders was generally accepted over the threat to physicians' clinical decision-making autonomy. Conclusions: Although the effectiveness was frequently evaluated by examining the agreement between the decisions made by the CDSSs and the health care providers, comparing the CDSS recommendations with actual clinical outcomes would be preferable. In addition, future studies on CDSS development should focus on system integration, considering end user's needs and preferences, and external validation and impact studies to assess effectiveness and generalizability. Trial Registration: OSF Registries osf.io/dyz3f; https://osf.io/dyz3f ", doi="10.2196/53951", url="https://www.jmir.org/2024/1/e53951", url="http://www.ncbi.nlm.nih.gov/pubmed/38502157" } @Article{info:doi/10.2196/52071, author="Ganeshan, Smitha and Liu, W. Andrew and Kroeger, Anne and Anand, Prerna and Seefeldt, Richard and Regner, Alexis and Vaughn, Diana and Odisho, Y. Anobel and Mourad, Michelle", title="An Electronic Health Record--Based Automated Self-Rescheduling Tool to Improve Patient Access: Retrospective Cohort Study", journal="J Med Internet Res", year="2024", month="Mar", day="19", volume="26", pages="e52071", keywords="appointment", keywords="consultation", keywords="cost", keywords="digital health", keywords="digital tools", keywords="electronic health record", keywords="EHR", keywords="informatics", keywords="patient access", keywords="retrospective review", keywords="revenue", keywords="self-rescheduling tool", keywords="self-scheduling", keywords="waiting time", abstract="Background: In many large health centers, patients face long appointment wait times and difficulties accessing care. Last-minute cancellations and patient no-shows leave unfilled slots in a clinician's schedule, exacerbating delays in care from poor access. The mismatch between the supply of outpatient appointments and patient demand has led health systems to adopt many tools and strategies to minimize appointment no-show rates and fill open slots left by patient cancellations. Objective: We evaluated an electronic health record (EHR)--based self-scheduling tool, Fast Pass, at a large academic medical center to understand the impacts of the tool on the ability to fill cancelled appointment slots, patient access to earlier appointments, and clinical revenue from visits that may otherwise have gone unscheduled. Methods: In this retrospective cohort study, we extracted Fast Pass appointment offers and scheduling data, including patient demographics, from the EHR between June 18, 2022, and March 9, 2023. We analyzed the outcomes of Fast Pass offers (accepted, declined, expired, and unavailable) and the outcomes of scheduled appointments resulting from accepted Fast Pass offers (completed, canceled, and no-show). We stratified outcomes based on appointment specialty. For each specialty, the patient service revenue from appointments filled by Fast Pass was calculated using the visit slots filled, the payer mix of the appointments, and the contribution margin by payer. Results: From June 18 to March 9, 2023, there were a total of 60,660 Fast Pass offers sent to patients for 21,978 available appointments. Of these offers, 6603 (11\%) were accepted across all departments, and 5399 (8.9\%) visits were completed. Patients were seen a median (IQR) of 14 (4-33) days sooner for their appointments. In a multivariate logistic regression model with primary outcome Fast Pass offer acceptance, patients who were aged 65 years or older (vs 20-40 years; P=.005 odds ratio [OR] 0.86, 95\% CI 0.78-0.96), other ethnicity (vs White; P<.001, OR 0.84, 95\% CI 0.77-0.91), primarily Chinese speakers (P<.001; OR 0.62, 95\% CI 0.49-0.79), and other language speakers (vs English speakers; P=.001; OR 0.71, 95\% CI 0.57-0.87) were less likely to accept an offer. Fast Pass added 2576 patient service hours to the clinical schedule, with a median (IQR) of 251 (216-322) hours per month. The estimated value of physician fees from these visits scheduled through 9 months of Fast Pass scheduling in professional fees at our institution was US \$3 million. Conclusions: Self-scheduling tools that provide patients with an opportunity to schedule into cancelled or unfilled appointment slots have the potential to improve patient access and efficiently capture additional revenue from filling unfilled slots. The demographics of the patients accepting these offers suggest that such digital tools may exacerbate inequities in access. ", doi="10.2196/52071", url="https://www.jmir.org/2024/1/e52071", url="http://www.ncbi.nlm.nih.gov/pubmed/38502159" } @Article{info:doi/10.2196/52322, author="Zeinali, Nahid and Youn, Nayung and Albashayreh, Alaa and Fan, Weiguo and Gilbertson White, St{\'e}phanie", title="Machine Learning Approaches to Predict Symptoms in People With Cancer: Systematic Review", journal="JMIR Cancer", year="2024", month="Mar", day="19", volume="10", pages="e52322", keywords="machine learning", keywords="ML", keywords="deep learning", keywords="DL", keywords="cancer symptoms", keywords="prediction model", abstract="Background: People with cancer frequently experience severe and distressing symptoms associated with cancer and its treatments. Predicting symptoms in patients with cancer continues to be a significant challenge for both clinicians and researchers. The rapid evolution of machine learning (ML) highlights the need for a current systematic review to improve cancer symptom prediction. Objective: This systematic review aims to synthesize the literature that has used ML algorithms to predict the development of cancer symptoms and to identify the predictors of these symptoms. This is essential for integrating new developments and identifying gaps in existing literature. Methods: We conducted this systematic review in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist. We conducted a systematic search of CINAHL, Embase, and PubMed for English records published from 1984 to August 11, 2023, using the following search terms: cancer, neoplasm, specific symptoms, neural networks, machine learning, specific algorithm names, and deep learning. All records that met the eligibility criteria were individually reviewed by 2 coauthors, and key findings were extracted and synthesized. We focused on studies using ML algorithms to predict cancer symptoms, excluding nonhuman research, technical reports, reviews, book chapters, conference proceedings, and inaccessible full texts. Results: A total of 42 studies were included, the majority of which were published after 2017. Most studies were conducted in North America (18/42, 43\%) and Asia (16/42, 38\%). The sample sizes in most studies (27/42, 64\%) typically ranged from 100 to 1000 participants. The most prevalent category of algorithms was supervised ML, accounting for 39 (93\%) of the 42 studies. Each of the methods---deep learning, ensemble classifiers, and unsupervised ML---constituted 3 (3\%) of the 42 studies. The ML algorithms with the best performance were logistic regression (9/42, 17\%), random forest (7/42, 13\%), artificial neural networks (5/42, 9\%), and decision trees (5/42, 9\%). The most commonly included primary cancer sites were the head and neck (9/42, 22\%) and breast (8/42, 19\%), with 17 (41\%) of the 42 studies not specifying the site. The most frequently studied symptoms were xerostomia (9/42, 14\%), depression (8/42, 13\%), pain (8/42, 13\%), and fatigue (6/42, 10\%). The significant predictors were age, gender, treatment type, treatment number, cancer site, cancer stage, chemotherapy, radiotherapy, chronic diseases, comorbidities, physical factors, and psychological factors. Conclusions: This review outlines the algorithms used for predicting symptoms in individuals with cancer. Given the diversity of symptoms people with cancer experience, analytic approaches that can handle complex and nonlinear relationships are critical. This knowledge can pave the way for crafting algorithms tailored to a specific symptom. In addition, to improve prediction precision, future research should compare cutting-edge ML strategies such as deep learning and ensemble methods with traditional statistical models. ", doi="10.2196/52322", url="https://cancer.jmir.org/2024/1/e52322", url="http://www.ncbi.nlm.nih.gov/pubmed/38502171" } @Article{info:doi/10.2196/50369, author="Yang, Meicheng and Chen, Hui and Hu, Wenhan and Mischi, Massimo and Shan, Caifeng and Li, Jianqing and Long, Xi and Liu, Chengyu", title="Development and Validation of an Interpretable Conformal Predictor to Predict Sepsis Mortality Risk: Retrospective Cohort Study", journal="J Med Internet Res", year="2024", month="Mar", day="18", volume="26", pages="e50369", keywords="sepsis", keywords="critical care", keywords="clinical decision-making", keywords="mortality prediction", keywords="conformal prediction", abstract="Background: Early and reliable identification of patients with sepsis who are at high risk of mortality is important to improve clinical outcomes. However, 3 major barriers to artificial intelligence (AI) models, including the lack of interpretability, the difficulty in generalizability, and the risk of automation bias, hinder the widespread adoption of AI models for use in clinical practice. Objective: This study aimed to develop and validate (internally and externally) a conformal predictor of sepsis mortality risk in patients who are critically ill, leveraging AI-assisted prediction modeling. The proposed approach enables explaining the model output and assessing its confidence level. Methods: We retrospectively extracted data on adult patients with sepsis from a database collected in a teaching hospital at Beth Israel Deaconess Medical Center for model training and internal validation. A large multicenter critical care database from the Philips eICU Research Institute was used for external validation. A total of 103 clinical features were extracted from the first day after admission. We developed an AI model using gradient-boosting machines to predict the mortality risk of sepsis and used Mondrian conformal prediction to estimate the prediction uncertainty. The Shapley additive explanation method was used to explain the model. Results: A total of 16,746 (80\%) patients from Beth Israel Deaconess Medical Center were used to train the model. When tested on the internal validation population of 4187 (20\%) patients, the model achieved an area under the receiver operating characteristic curve of 0.858 (95\% CI 0.845-0.871), which was reduced to 0.800 (95\% CI 0.789-0.811) when externally validated on 10,362 patients from the Philips eICU database. At a specified confidence level of 90\% for the internal validation cohort the percentage of error predictions (n=438) out of all predictions (n=4187) was 10.5\%, with 1229 (29.4\%) predictions requiring clinician review. In contrast, the AI model without conformal prediction made 1449 (34.6\%) errors. When externally validated, more predictions (n=4004, 38.6\%) were flagged for clinician review due to interdatabase heterogeneity. Nevertheless, the model still produced significantly lower error rates compared to the point predictions by AI (n=1221, 11.8\% vs n=4540, 43.8\%). The most important predictors identified in this predictive model were Acute Physiology Score III, age, urine output, vasopressors, and pulmonary infection. Clinically relevant risk factors contributing to a single patient were also examined to show how the risk arose. Conclusions: By combining model explanation and conformal prediction, AI-based systems can be better translated into medical practice for clinical decision-making. ", doi="10.2196/50369", url="https://www.jmir.org/2024/1/e50369", url="http://www.ncbi.nlm.nih.gov/pubmed/38498038" } @Article{info:doi/10.2196/33868, author="C{\'a}ceres Rivera, Isabel Diana and Rojas, Jaimes Luz Mileyde and Rojas, Z. Lyda and Gomez, Canon Diana and Castro Ruiz, Andr{\'e}s David and L{\'o}pez Romero, Alberto Luis", title="Using Principles of Digital Development for a Smartphone App to Support Data Collection in Patients With Acute Myocardial Infarction and Physical Activity Intolerance: Case Study", journal="JMIR Form Res", year="2024", month="Mar", day="18", volume="8", pages="e33868", keywords="app", keywords="applications of medical informatics", keywords="coronary disease", keywords="data collection", keywords="development", keywords="health care reform", keywords="health data", keywords="medical informatics", keywords="medical informatics apps", keywords="mobile app", keywords="mobile applications", keywords="nursing diagnosis", keywords="nursing research", keywords="research data", keywords="software", keywords="validation", abstract="Background: Advances in health have highlighted the need to implement technologies as a fundamental part of the diagnosis, treatment, and recovery of patients at risk of or with health alterations. For this purpose, digital platforms have demonstrated their applicability in the identification of care needs. Nursing is a fundamental component in the care of patients with cardiovascular disorders and plays a crucial role in diagnosing human responses to these health conditions. Consequently, the validation of nursing diagnoses through ongoing research processes has become a necessity that can significantly impact both patients and health care professionals. Objective: We aimed to describe the process of developing a mobile app to validate the nursing diagnosis ``intolerance to physical activity'' in patients with acute myocardial infarction. Methods: We describe the development and pilot-testing of a mobile system to support data collection for validating the nursing diagnosis of activity intolerance. This was a descriptive study conducted with 11 adults (aged ?18 years) who attended a health institution for highly complex needs with a suspected diagnosis of coronary syndrome between August and September 2019 in Floridablanca, Colombia. An app for the clinical validation of activity intolerance (North American Nursing Diagnosis Association [NANDA] code 00092) in patients with acute coronary syndrome was developed in two steps: (1) operationalization of the nursing diagnosis and (2) the app development process, which included an evaluation of the initial requirements, development and digitization of the forms, and a pilot test. The agreement level between the 2 evaluating nurses was evaluated with the $\kappa$ index. Results: We developed a form that included sociodemographic data, hospital admission data, medical history, current pharmacological treatment, and thrombolysis in myocardial infarction risk score (TIMI-RS) and GRACE (Global Registry of Acute Coronary Events) scores. To identify the defining characteristics, we included official guidelines, physiological measurements, and scales such as the Piper fatigue scale and Borg scale. Participants in the pilot test (n=11) had an average age of 63.2 (SD 4.0) years and were 82\% (9/11) men; 18\% (2/11) had incomplete primary schooling. The agreement between the evaluators was approximately 80\% for most of the defining characteristics. The most prevalent characteristics were exercise discomfort (10/11, 91\%), weakness (7/11, 64\%), dyspnea (3/11, 27\%), abnormal heart rate in response to exercise (2/10, 20\%), electrocardiogram abnormalities (1/10, 9\%), and abnormal blood pressure in response to activity (1/10, 10\%). Conclusions: We developed a mobile app for validating the diagnosis of ``activity intolerance.'' Its use will guarantee not only optimal data collection, minimizing errors to perform validation, but will also allow the identification of individual care needs. ", doi="10.2196/33868", url="https://formative.jmir.org/2024/1/e33868", url="http://www.ncbi.nlm.nih.gov/pubmed/38498019" } @Article{info:doi/10.2196/52688, author="Killian, A. Jackson and Jain, Manish and Jia, Yugang and Amar, Jonathan and Huang, Erich and Tambe, Milind", title="New Approach to Equitable Intervention Planning to Improve Engagement and Outcomes in a Digital Health Program: Simulation Study", journal="JMIR Diabetes", year="2024", month="Mar", day="15", volume="9", pages="e52688", keywords="chronic disease", keywords="type-2 diabetes", keywords="T2D", keywords="restless multiarmed bandits", keywords="multi-armed bandit", keywords="multi-armed bandits", keywords="machine learning", keywords="resource allocation", keywords="digital health", keywords="equity", abstract="Background: Digital health programs provide individualized support to patients with chronic diseases and their effectiveness is measured by the extent to which patients achieve target individual clinical outcomes and the program's ability to sustain patient engagement. However, patient dropout and inequitable intervention delivery strategies, which may unintentionally penalize certain patient subgroups, represent challenges to maximizing effectiveness. Therefore, methodologies that optimize the balance between success factors (achievement of target clinical outcomes and sustained engagement) equitably would be desirable, particularly when there are resource constraints. Objective: Our objectives were to propose a model for digital health program resource management that accounts jointly for the interaction between individual clinical outcomes and patient engagement, ensures equitable allocation as well as allows for capacity planning, and conducts extensive simulations using publicly available data on type 2 diabetes, a chronic disease. Methods: We propose a restless multiarmed bandit (RMAB) model to plan interventions that jointly optimize long-term engagement and individual clinical outcomes (in this case measured as the achievement of target healthy glucose levels). To mitigate the tendency of RMAB to achieve good aggregate performance by exacerbating disparities between groups, we propose new equitable objectives for RMAB and apply bilevel optimization algorithms to solve them. We formulated a model for the joint evolution of patient engagement and individual clinical outcome trajectory to capture the key dynamics of interest in digital chronic disease management programs. Results: In simulation exercises, our optimized intervention policies lead to up to 10\% more patients reaching healthy glucose levels after 12 months, with a 10\% reduction in dropout compared to standard-of-care baselines. Further, our new equitable policies reduce the mean absolute difference of engagement and health outcomes across 6 demographic groups by up to 85\% compared to the state-of-the-art. Conclusions: Planning digital health interventions with individual clinical outcome objectives and long-term engagement dynamics as considerations can be both feasible and effective. We propose using an RMAB sequential decision-making framework, which may offer additional capabilities in capacity planning as well. The integration of an equitable RMAB algorithm further enhances the potential for reaching equitable solutions. This approach provides program designers with the flexibility to switch between different priorities and balance trade-offs across various objectives according to their preferences. ", doi="10.2196/52688", url="https://diabetes.jmir.org/2024/1/e52688", url="http://www.ncbi.nlm.nih.gov/pubmed/38488828" } @Article{info:doi/10.2196/50882, author="Xue, Zhaowen and Zhang, Yiming and Gan, Wenyi and Wang, Huajun and She, Guorong and Zheng, Xiaofei", title="Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis", journal="J Med Internet Res", year="2024", month="Mar", day="14", volume="26", pages="e50882", keywords="artificial intelligence", keywords="ChatGPT", keywords="consultation", keywords="musculoskeletal", keywords="natural language processing", keywords="remote medical consultation", keywords="orthopaedic", keywords="orthopaedics", abstract="Background: The widespread use of artificial intelligence, such as ChatGPT (OpenAI), is transforming sectors, including health care, while separate advancements of the internet have enabled platforms such as China's DingXiangYuan to offer remote medical services. Objective: This study evaluates ChatGPT-4's responses against those of professional health care providers in telemedicine, assessing artificial intelligence's capability to support the surge in remote medical consultations and its impact on health care delivery. Methods: We sourced remote orthopedic consultations from ``Doctor DingXiang,'' with responses from its certified physicians as the control and ChatGPT's responses as the experimental group. In all, 3 blindfolded, experienced orthopedic surgeons assessed responses against 7 criteria: ``logical reasoning,'' ``internal information,'' ``external information,'' ``guiding function,'' ``therapeutic effect,'' ``medical knowledge popularization education,'' and ``overall satisfaction.'' We used Fleiss $\kappa$ to measure agreement among multiple raters. Results: Initially, consultation records for a cumulative count of 8 maladies (equivalent to 800 cases) were gathered. We ultimately included 73 consultation records by May 2023, following primary and rescreening, in which no communication records containing private information, images, or voice messages were transmitted. After statistical scoring, we discovered that ChatGPT's ``internal information'' score (mean 4.61, SD 0.52 points vs mean 4.66, SD 0.49 points; P=.43) and ``therapeutic effect'' score (mean 4.43, SD 0.75 points vs mean 4.55, SD 0.62 points; P=.32) were lower than those of the control group, but the differences were not statistically significant. ChatGPT showed better performance with a higher ``logical reasoning'' score (mean 4.81, SD 0.36 points vs mean 4.75, SD 0.39 points; P=.38), ``external information'' score (mean 4.06, SD 0.72 points vs mean 3.92, SD 0.77 points; P=.25), and ``guiding function'' score (mean 4.73, SD 0.51 points vs mean 4.72, SD 0.54 points; P=.96), although the differences were not statistically significant. Meanwhile, the ``medical knowledge popularization education'' score of ChatGPT was better than that of the control group (mean 4.49, SD 0.67 points vs mean 3.87, SD 1.01 points; P<.001), and the difference was statistically significant. In terms of ``overall satisfaction,'' the difference was not statistically significant between the groups (mean 8.35, SD 1.38 points vs mean 8.37, SD 1.24 points; P=.92). According to how Fleiss $\kappa$ values were interpreted, 6 of the control group's score points were classified as displaying ``fair agreement'' (P<.001), and 1 was classified as showing ``substantial agreement'' (P<.001). In the experimental group, 3 points were classified as indicating ``fair agreement,'' while 4 suggested ``moderate agreement'' (P<.001). Conclusions: ChatGPT-4 matches the expertise found in DingXiangYuan forums' paid consultations, excelling particularly in scientific education. It presents a promising alternative for remote health advice. For health care professionals, it could act as an aid in patient education, while patients may use it as a convenient tool for health inquiries. ", doi="10.2196/50882", url="https://www.jmir.org/2024/1/e50882", url="http://www.ncbi.nlm.nih.gov/pubmed/38483451" } @Article{info:doi/10.2196/52602, author="Ong, Yuhan Ariel and Hogg, Jeffry Henry David and Kale, U. Aditya and Taribagil, Priyal and Kras, Ashley and Dow, Eliot and Macdonald, Trystan and Liu, Xiaoxuan and Keane, A. Pearse and Denniston, K. Alastair", title="AI as a Medical Device for Ophthalmic Imaging in Europe, Australia, and the United States: Protocol for a Systematic Scoping Review of Regulated Devices", journal="JMIR Res Protoc", year="2024", month="Mar", day="14", volume="13", pages="e52602", keywords="AIaMD", keywords="artificial intelligence as a medical device", keywords="artificial intelligence", keywords="deep learning", keywords="machine learning", keywords="ophthalmic imaging", keywords="regulatory approval", abstract="Background: Artificial intelligence as a medical device (AIaMD) has the potential to transform many aspects of ophthalmic care, such as improving accuracy and speed of diagnosis, addressing capacity issues in high-volume areas such as screening, and detecting novel biomarkers of systemic disease in the eye (oculomics). In order to ensure that such tools are safe for the target population and achieve their intended purpose, it is important that these AIaMD have adequate clinical evaluation to support any regulatory decision. Currently, the evidential requirements for regulatory approval are less clear for AIaMD compared to more established interventions such as drugs or medical devices. There is therefore value in understanding the level of evidence that underpins AIaMD currently on the market, as a step toward identifying what the best practices might be in this area. In this systematic scoping review, we will focus on AIaMD that contributes to clinical decision-making (relating to screening, diagnosis, prognosis, and treatment) in the context of ophthalmic imaging. Objective: This study aims to identify regulator-approved AIaMD for ophthalmic imaging in Europe, Australia, and the United States; report the characteristics of these devices and their regulatory approvals; and report the available evidence underpinning these AIaMD. Methods: The Food and Drug Administration (United States), the Australian Register of Therapeutic Goods (Australia), the Medicines and Healthcare products Regulatory Agency (United Kingdom), and the European Database on Medical Devices (European Union) regulatory databases will be searched for ophthalmic imaging AIaMD through a snowballing approach. PubMed and clinical trial registries will be systematically searched, and manufacturers will be directly contacted for studies investigating the effectiveness of eligible AIaMD. Preliminary regulatory database searches, evidence searches, screening, data extraction, and methodological quality assessment will be undertaken by 2 independent review authors and arbitrated by a third at each stage of the process. Results: Preliminary searches were conducted in February 2023. Data extraction, data synthesis, and assessment of methodological quality commenced in October 2023. The review is on track to be completed and submitted for peer review by April 2024. Conclusions: This systematic review will provide greater clarity on ophthalmic imaging AIaMD that have achieved regulatory approval as well as the evidence that underpins them. This should help adopters understand the range of tools available and whether they can be safely incorporated into their clinical workflow, and it should also support developers in navigating regulatory approval more efficiently. International Registered Report Identifier (IRRID): DERR1-10.2196/52602 ", doi="10.2196/52602", url="https://www.researchprotocols.org/2024/1/e52602", url="http://www.ncbi.nlm.nih.gov/pubmed/38483456" } @Article{info:doi/10.2196/55508, author="Cirone, Katrina and Akrout, Mohamed and Abid, Latif and Oakley, Amanda", title="Assessing the Utility of Multimodal Large Language Models (GPT-4 Vision and Large Language and Vision Assistant) in Identifying Melanoma Across Different Skin Tones", journal="JMIR Dermatol", year="2024", month="Mar", day="13", volume="7", pages="e55508", keywords="melanoma", keywords="nevus", keywords="skin pigmentation", keywords="artificial intelligence", keywords="AI", keywords="multimodal large language models", keywords="large language model", keywords="large language models", keywords="LLM", keywords="LLMs", keywords="machine learning", keywords="expert systems", keywords="natural language processing", keywords="NLP", keywords="GPT", keywords="GPT-4V", keywords="dermatology", keywords="skin", keywords="lesion", keywords="lesions", keywords="cancer", keywords="oncology", keywords="visual", doi="10.2196/55508", url="https://derma.jmir.org/2024/1/e55508", url="http://www.ncbi.nlm.nih.gov/pubmed/38477960" } @Article{info:doi/10.2196/50737, author="Suen, Matthew and Manski-Nankervis, Jo-Anne and McBride, Caroline and Lumsden, Natalie and Hunter, Barbara", title="Implementing a Sodium-Glucose Cotransporter 2 Inhibitor Module With a Software Tool (Future Health Today): Qualitative Study", journal="JMIR Form Res", year="2024", month="Mar", day="13", volume="8", pages="e50737", keywords="type 2 diabetes", keywords="CP-FIT", keywords="electronic health", keywords="clinical decision support tool", keywords="primary care", keywords="SGLT2 inhibitor", keywords="complication", keywords="tool", keywords="digital health intervention", keywords="thematic analysis", keywords="decision support", keywords="diabetes management", abstract="Background: Primary care plays a key role in the management of type 2 diabetes. Sodium-glucose cotransporter 2 (SGLT2) inhibitors have been demonstrated to reduce hospitalization and cardiac and renal complications. Tools that optimize management, including appropriate prescribing, are a priority for treating chronic diseases. Future Health Today (FHT) is software that facilitates clinical decision support and quality improvement. FHT applies algorithms to data stored in electronic medical records in general practice to identify patients who are at risk of a chronic disease or who have a chronic disease that may benefit from intensification of management. The platform continues to evolve because of rigorous evaluation, continuous improvement, and expansion of the conditions hosted on the platform. FHT currently displays recommendations for the identification and management of chronic kidney disease, cardiovascular disease, type 2 diabetes, and cancer risk. A new module will be introduced to FHT focusing on SGLT2 inhibitors in patients with type 2 diabetes who have chronic kidney diseases, cardiovascular diseases, or risk factors for cardiovascular disease. Objective: The study aims to explore the barriers and enablers to the implementation of an SGLT2 inhibitor module within the Future Health Today software. Methods: Clinic staff were recruited to participate in interviews on their experience in their use of a tool to improve prescribing behavior for SGLT2 inhibitors. Thematic analysis was guided by Clinical Performance Feedback Intervention Theory. Results: In total, 16 interviews were completed. Identified enablers of use included workflow alignment, clinical appropriateness, and active delivery of the module. Key barriers to use were competing priorities, staff engagement, and knowledge of the clinical topic. Conclusions: There is a recognized benefit to the use of a clinical decision support tool to support type 2 diabetes management, but barriers were identified that impeded the usability and actionability of the module. Successful and effective implementation of this tool could support the optimization of patient management of type 2 diabetes in primary care. ", doi="10.2196/50737", url="https://formative.jmir.org/2024/1/e50737", url="http://www.ncbi.nlm.nih.gov/pubmed/38477973" } @Article{info:doi/10.2196/54593, author="Chomutare, Taridzo and Lamproudis, Anastasios and Budrionis, Andrius and Svenning, Olsen Therese and Hind, Irene Lill and Ngo, Dinh Phuong and Mikalsen, {\O}yvind Karl and Dalianis, Hercules", title="Improving Quality of ICD-10 (International Statistical Classification of Diseases, Tenth Revision) Coding Using AI: Protocol for a Crossover Randomized Controlled Trial", journal="JMIR Res Protoc", year="2024", month="Mar", day="12", volume="13", pages="e54593", keywords="International Classification of Diseases, Tenth Revision", keywords="ICD-10", keywords="International Classification of Diseases, Eleventh Revision", keywords="ICD-11", keywords="Easy-ICD", keywords="clinical coding", keywords="artificial intelligence", keywords="machine learning", keywords="deep learning", abstract="Background: Computer-assisted clinical coding (CAC) tools are designed to help clinical coders assign standardized codes, such as the ICD-10 (International Statistical Classification of Diseases, Tenth Revision), to clinical texts, such as discharge summaries. Maintaining the integrity of these standardized codes is important both for the functioning of health systems and for ensuring data used for secondary purposes are of high quality. Clinical coding is an error-prone cumbersome task, and the complexity of modern classification systems such as the ICD-11 (International Classification of Diseases, Eleventh Revision) presents significant barriers to implementation. To date, there have only been a few user studies; therefore, our understanding is still limited regarding the role CAC systems can play in reducing the burden of coding and improving the overall quality of coding. Objective: The objective of the user study is to generate both qualitative and quantitative data for measuring the usefulness of a CAC system, Easy-ICD, that was developed for recommending ICD-10 codes. Specifically, our goal is to assess whether our tool can reduce the burden on clinical coders and also improve coding quality. Methods: The user study is based on a crossover randomized controlled trial study design, where we measure the performance of clinical coders when they use our CAC tool versus when they do not. Performance is measured by the time it takes them to assign codes to both simple and complex clinical texts as well as the coding quality, that is, the accuracy of code assignment. Results: We expect the study to provide us with a measurement of the effectiveness of the CAC system compared to manual coding processes, both in terms of time use and coding quality. Positive outcomes from this study will imply that CAC tools hold the potential to reduce the burden on health care staff and will have major implications for the adoption of artificial intelligence--based CAC innovations to improve coding practice. Expected results to be published summer 2024. Conclusions: The planned user study promises a greater understanding of the impact CAC systems might have on clinical coding in real-life settings, especially with regard to coding time and quality. Further, the study may add new insights on how to meaningfully exploit current clinical text mining capabilities, with a view to reducing the burden on clinical coders, thus lowering the barriers and paving a more sustainable path to the adoption of modern coding systems, such as the new ICD-11. Trial Registration: clinicaltrials.gov NCT06286865; https://clinicaltrials.gov/study/NCT06286865 International Registered Report Identifier (IRRID): DERR1-10.2196/54593 ", doi="10.2196/54593", url="https://www.researchprotocols.org/2024/1/e54593", url="http://www.ncbi.nlm.nih.gov/pubmed/38470476" } @Article{info:doi/10.2196/47803, author="Chuang, Bo-Sheng Beau and Yang, C. Albert", title="Optimization of Using Multiple Machine Learning Approaches in Atrial Fibrillation Detection Based on a Large-Scale Data Set of 12-Lead Electrocardiograms: Cross-Sectional Study", journal="JMIR Form Res", year="2024", month="Mar", day="11", volume="8", pages="e47803", keywords="machine learning", keywords="atrial fibrillation", keywords="light gradient boosting machine", keywords="power spectral density", keywords="digital health", keywords="electrocardiogram", keywords="machine learning algorithm", keywords="atrial fibrillation detection", keywords="real-time", keywords="detection", keywords="electrocardiography leads", keywords="clinical outcome", abstract="Background: Atrial fibrillation (AF) represents a hazardous cardiac arrhythmia that significantly elevates the risk of stroke and heart failure. Despite its severity, its diagnosis largely relies on the proficiency of health care professionals. At present, the real-time identification of paroxysmal AF is hindered by the lack of automated techniques. Consequently, a highly effective machine learning algorithm specifically designed for AF detection could offer substantial clinical benefits. We hypothesized that machine learning algorithms have the potential to identify and extract features of AF with a high degree of accuracy, given the intricate and distinctive patterns present in electrocardiogram (ECG) recordings of AF. Objective: This study aims to develop a clinically valuable machine learning algorithm that can accurately detect AF and compare different leads' performances of AF detection. Methods: We used 12-lead ECG recordings sourced from the 2020 PhysioNet Challenge data sets. The Welch method was used to extract power spectral features of the 12-lead ECGs within a frequency range of 0.083 to 24.92 Hz. Subsequently, various machine learning techniques were evaluated and optimized to classify sinus rhythm (SR) and AF based on these power spectral features. Furthermore, we compared the effects of different frequency subbands and different lead selections on machine learning performances. Results: The light gradient boosting machine (LightGBM) was found to be the most effective in classifying AF and SR, achieving an average F1-score of 0.988 across all ECG leads. Among the frequency subbands, the 0.083 to 4.92 Hz range yielded the highest F1-score of 0.985. In interlead comparisons, aVR had the highest performance (F1=0.993), with minimal differences observed between leads. Conclusions: In conclusion, this study successfully used machine learning methodologies, particularly the LightGBM model, to differentiate SR and AF based on power spectral features derived from 12-lead ECGs. The performance marked by an average F1-score of 0.988 and minimal interlead variation underscores the potential of machine learning algorithms to bolster real-time AF detection. This advancement could significantly improve patient care in intensive care units as well as facilitate remote monitoring through wearable devices, ultimately enhancing clinical outcomes. ", doi="10.2196/47803", url="https://formative.jmir.org/2024/1/e47803", url="http://www.ncbi.nlm.nih.gov/pubmed/38466973" } @Article{info:doi/10.2196/52744, author="Nair, Monika and Lundgren, E. Lina and Soliman, Amira and Dryselius, Petra and Fogelberg, Ebba and Petersson, Marcus and Hamed, Omar and Triantafyllou, Miltiadis and Nygren, Jens", title="Machine Learning Model for Readmission Prediction of Patients With Heart Failure Based on Electronic Health Records: Protocol for a Quasi-Experimental Study for Impact Assessment", journal="JMIR Res Protoc", year="2024", month="Mar", day="11", volume="13", pages="e52744", keywords="artificial intelligence", keywords="machine learning", keywords="readmission prediction", keywords="heart failure", keywords="clinical decision support", keywords="machine learning model", keywords="CHF", keywords="congestive heart failure", keywords="readmission", keywords="prediction", keywords="electronic health records", keywords="electronic health record", keywords="EHR", keywords="quasi-experimental study", keywords="decision-making process", keywords="risk assessment", keywords="risk assessment tool", keywords="predictive models", keywords="predictive model", keywords="Sweden", keywords="physician", keywords="nurse", keywords="nurses", keywords="clinician", keywords="clinicians", abstract="Background: Care for patients with heart failure (HF) causes a substantial load on health care systems where a prominent challenge is the elevated rate of readmissions within 30 days following initial discharge. Clinical professionals face high levels of uncertainty and subjectivity in the decision-making process on the optimal timing of discharge. Unwanted hospital stays generate costs and cause stress to patients and potentially have an impact on care outcomes. Recent studies have aimed to mitigate the uncertainty by developing and testing risk assessment tools and predictive models to identify patients at risk of readmission, often using novel methods such as machine learning (ML). Objective: This study aims to investigate how a developed clinical decision support (CDS) tool alters the decision-making processes of health care professionals in the specific context of discharging patients with HF, and if so, in which ways. Additionally, the aim is to capture the experiences of health care practitioners as they engage with the system's outputs to analyze usability aspects and obtain insights related to future implementation. Methods: A quasi-experimental design with randomized crossover assessment will be conducted with health care professionals on HF patients' scenarios in a region located in the South of Sweden. In total, 12 physicians and nurses will be randomized into control and test groups. The groups shall be provided with 20 scenarios of purposefully sampled patients. The clinicians will be asked to take decisions on the next action regarding a patient. The test group will be provided with the 10 scenarios containing patient data from electronic health records and an outcome from an ML-based CDS model on the risk level for readmission of the same patients. The control group will have 10 other scenarios without the CDS model output and containing only the patients' data from electronic medical records. The groups will switch roles for the next 10 scenarios. This study will collect data through interviews and observations. The key outcome measures are decision consistency, decision quality, work efficiency, perceived benefits of using the CDS model, reliability, validity, and confidence in the CDS model outcome, integrability in the routine workflow, ease of use, and intention to use. This study will be carried out in collaboration with Cambio Healthcare Systems. Results: The project is part of the Center for Applied Intelligent Systems Research Health research profile, funded by the Knowledge Foundation (2021-2028). Ethical approval for this study was granted by the Swedish ethical review authority (2022-07287-02). The recruitment process of the clinicians and the patient scenario selection will start in September 2023 and last till March 2024. Conclusions: This study protocol will contribute to the development of future formative evaluation studies to test ML models with clinical professionals. International Registered Report Identifier (IRRID): PRR1-10.2196/52744 ", doi="10.2196/52744", url="https://www.researchprotocols.org/2024/1/e52744", url="http://www.ncbi.nlm.nih.gov/pubmed/38466983" } @Article{info:doi/10.2196/53008, author="Chen, Yan and Esmaeilzadeh, Pouyan", title="Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges", journal="J Med Internet Res", year="2024", month="Mar", day="8", volume="26", pages="e53008", keywords="artificial intelligence", keywords="AI", keywords="generative artificial intelligence", keywords="generative AI", keywords="medical practices", keywords="potential benefits", keywords="security and privacy threats", doi="10.2196/53008", url="https://www.jmir.org/2024/1/e53008", url="http://www.ncbi.nlm.nih.gov/pubmed/38457208" } @Article{info:doi/10.2196/57012, author="Auron, Moises", title="Blood Management: A Current Opportunity in Perioperative Medicine", journal="JMIR Perioper Med", year="2024", month="Mar", day="8", volume="7", pages="e57012", keywords="blood management", keywords="perioperative", keywords="anemia", keywords="plasma", keywords="transfusion", doi="10.2196/57012", url="https://periop.jmir.org/2024/1/e57012", url="http://www.ncbi.nlm.nih.gov/pubmed/38457232" } @Article{info:doi/10.2196/45202, author="Kim, Dohyun and Choi, Hyun-Soo and Lee, DongHoon and Kim, Minkyu and Kim, Yoon and Han, Seon-Sook and Heo, Yeonjeong and Park, Ju-Hee and Park, Jinkyeong", title="A Deep Learning--Based Approach for Prediction of Vancomycin Treatment Monitoring: Retrospective Study Among Patients With Critical Illness", journal="JMIR Form Res", year="2024", month="Mar", day="8", volume="8", pages="e45202", keywords="critically ill", keywords="deep learning", keywords="inflammation", keywords="machine learning", keywords="pharmacokinetic", keywords="therapeutic drug monitoring", keywords="vancomycin", abstract="Background: Vancomycin pharmacokinetics are highly variable in patients with critical illnesses, and clinicians commonly use population pharmacokinetic (PPK) models based on a Bayesian approach to dose. However, these models are population-dependent, may only sometimes meet the needs of individual patients, and are only used by experienced clinicians as a reference for making treatment decisions. To assist real-world clinicians, we developed a deep learning--based decision-making system that predicts vancomycin therapeutic drug monitoring (TDM) levels in patients in intensive care unit. Objective: This study aimed to establish joint multilayer perceptron (JointMLP), a new deep-learning model for predicting vancomycin TDM levels, and compare its performance with the PPK models, extreme gradient boosting (XGBoost), and TabNet. Methods: We used a 977-case data set split into training and testing groups in a 9:1 ratio. We performed external validation of the model using 1429 cases from Kangwon National University Hospital and 2394 cases from the Medical Information Mart for Intensive Care--IV (MIMIC-IV). In addition, we performed 10-fold cross-validation on the internal training data set and calculated the 95\% CIs using the metric. Finally, we evaluated the generalization ability of the JointMLP model using the MIMIC-IV data set. Results: Our JointMLP model outperformed other models in predicting vancomycin TDM levels in internal and external data sets. Compared to PPK, the JointMLP model improved predictive power by up to 31\% (mean absolute error [MAE] 6.68 vs 5.11) on the internal data set and 81\% (MAE 11.87 vs 6.56) on the external data set. In addition, the JointMLP model significantly outperforms XGBoost and TabNet, with a 13\% (MAE 5.75 vs 5.11) and 14\% (MAE 5.85 vs 5.11) improvement in predictive accuracy on the inner data set, respectively. On both the internal and external data sets, our JointMLP model performed well compared to XGBoost and TabNet, achieving prediction accuracy improvements of 34\% and 14\%, respectively. Additionally, our JointMLP model showed higher robustness to outlier data than the other models, as evidenced by its higher root mean squared error performance across all data sets. The mean errors and variances of the JointMLP model were close to zero and smaller than those of the PPK model in internal and external data sets. Conclusions: Our JointMLP approach can help optimize treatment outcomes in patients with critical illnesses in an intensive care unit setting, reducing side effects associated with suboptimal vancomycin administration. These include increased risk of bacterial resistance, extended hospital stays, and increased health care costs. In addition, the superior performance of our model compared to existing models highlights its potential to help real-world clinicians. ", doi="10.2196/45202", url="https://formative.jmir.org/2024/1/e45202", url="http://www.ncbi.nlm.nih.gov/pubmed/38152042" } @Article{info:doi/10.2196/46817, author="Tenda, Daniel Eric and Yunus, Eddy Reyhan and Zulkarnaen, Benny and Yugo, Reynalzi Muhammad and Pitoyo, Wicaksono Ceva and Asaf, Mazmur Moses and Islamiyati, Nur Tiara and Pujitresnani, Arierta and Setiadharma, Andry and Henrina, Joshua and Rumende, Martin Cleopas and Wulani, Vally and Harimurti, Kuntjoro and Lydia, Aida and Shatri, Hamzah and Soewondo, Pradana and Yusuf, Astagiri Prasandhya", title="Comparison of the Discrimination Performance of AI Scoring and the Brixia Score in Predicting COVID-19 Severity on Chest X-Ray Imaging: Diagnostic Accuracy Study", journal="JMIR Form Res", year="2024", month="Mar", day="7", volume="8", pages="e46817", keywords="artificial intelligence", keywords="Brixia", keywords="chest x-ray", keywords="COVID-19", keywords="CAD4COVID", keywords="pneumonia", keywords="radiograph", keywords="artificial intelligence scoring system", keywords="AI scoring system", keywords="prediction", keywords="disease severity", abstract="Background: The artificial intelligence (AI) analysis of chest x-rays can increase the precision of binary COVID-19 diagnosis. However, it is unknown if AI-based chest x-rays can predict who will develop severe COVID-19, especially in low- and middle-income countries. Objective: The study aims to compare the performance of human radiologist Brixia scores versus 2 AI scoring systems in predicting the severity of COVID-19 pneumonia. Methods: We performed a cross-sectional study of 300 patients suspected with and with confirmed COVID-19 infection in Jakarta, Indonesia. A total of 2 AI scores were generated using CAD4COVID x-ray software. Results: The AI probability score had slightly lower discrimination (area under the curve [AUC] 0.787, 95\% CI 0.722-0.852). The AI score for the affected lung area (AUC 0.857, 95\% CI 0.809-0.905) was almost as good as the human Brixia score (AUC 0.863, 95\% CI 0.818-0.908). Conclusions: The AI score for the affected lung area and the human radiologist Brixia score had similar and good discrimination performance in predicting COVID-19 severity. Our study demonstrated that using AI-based diagnostic tools is possible, even in low-resource settings. However, before it is widely adopted in daily practice, more studies with a larger scale and that are prospective in nature are needed to confirm our findings. ", doi="10.2196/46817", url="https://formative.jmir.org/2024/1/e46817", url="http://www.ncbi.nlm.nih.gov/pubmed/38451633" } @Article{info:doi/10.2196/47744, author="Ru, Boshu and Sillah, Arthur and Desai, Kaushal and Chandwani, Sheenu and Yao, Lixia and Kothari, Smita", title="Real-World Data Quality Framework for Oncology Time to Treatment Discontinuation Use Case: Implementation and Evaluation Study", journal="JMIR Med Inform", year="2024", month="Mar", day="6", volume="12", pages="e47744", keywords="data quality assessment", keywords="real-world data", keywords="real-world time to treatment discontinuation", keywords="systemic anticancer therapy", keywords="Use Case Specific Relevance and Quality Assessment", keywords="UReQA framework", abstract="Background: The importance of real-world evidence is widely recognized in observational oncology studies. However, the lack of interoperable data quality standards in the fragmented health information technology landscape represents an important challenge. Therefore, adopting validated systematic methods for evaluating data quality is important for oncology outcomes research leveraging real-world data (RWD). Objective: This study aims to implement real-world time to treatment discontinuation (rwTTD) for a systemic anticancer therapy (SACT) as a new use case for the Use Case Specific Relevance and Quality Assessment, a framework linking data quality and relevance in fit-for-purpose RWD assessment. Methods: To define the rwTTD use case, we mapped the operational definition of rwTTD to RWD elements commonly available from oncology electronic health record--derived data sets. We identified 20 tasks to check the completeness and plausibility of data elements concerning SACT use, line of therapy (LOT), death date, and length of follow-up. Using descriptive statistics, we illustrated how to implement the Use Case Specific Relevance and Quality Assessment on 2 oncology databases (Data sets A and B) to estimate the rwTTD of an SACT drug (target SACT) for patients with advanced head and neck cancer diagnosed on or after January 1, 2015. Results: A total of 1200 (24.96\%) of 4808 patients in Data set A and 237 (5.92\%) of 4003 patients in Data set B received the target SACT, suggesting better relevance of the former in estimating the rwTTD of the target SACT. The 2 data sets differed with regard to the terminology used for SACT drugs, LOT format, and target SACT LOT distribution over time. Data set B appeared to have less complete SACT records, longer lags in incorporating the latest data, and incomplete mortality data, suggesting a lack of fitness for estimating rwTTD. Conclusions: The fit-for-purpose data quality assessment demonstrated substantial variability in the quality of the 2 real-world data sets. The data quality specifications applied for rwTTD estimation can be expanded to support a broad spectrum of oncology use cases. ", doi="10.2196/47744", url="https://medinform.jmir.org/2024/1/e47744", url="http://www.ncbi.nlm.nih.gov/pubmed/38446504" } @Article{info:doi/10.2196/45130, author="Yang, C. Phillip and Jha, Alokkumar and Xu, William and Song, Zitao and Jamp, Patrick and Teuteberg, J. Jeffrey", title="Cloud-Based Machine Learning Platform to Predict Clinical Outcomes at Home for Patients With Cardiovascular Conditions Discharged From Hospital: Clinical Trial", journal="JMIR Cardio", year="2024", month="Mar", day="1", volume="8", pages="e45130", keywords="smart sensor", keywords="wearable technology", keywords="moving average", keywords="physical activity", keywords="artificial intelligence", keywords="AI", abstract="Background: Hospitalizations account for almost one-third of the US \$4.1 trillion health care cost in the United States. A substantial portion of these hospitalizations are attributed to readmissions, which led to the establishment of the Hospital Readmissions Reduction Program (HRRP) in 2012. The HRRP reduces payments to hospitals with excess readmissions. In 2018, >US \$700 million was withheld; this is expected to exceed US \$1 billion by 2022. More importantly, there is nothing more physically and emotionally taxing for readmitted patients and demoralizing for hospital physicians, nurses, and administrators. Given this high uncertainty of proper home recovery, intelligent monitoring is needed to predict the outcome of discharged patients to reduce readmissions. Physical activity (PA) is one of the major determinants for overall clinical outcomes in diabetes, hypertension, hyperlipidemia, heart failure, cancer, and mental health issues. These are the exact comorbidities that increase readmission rates, underlining the importance of PA in assessing the recovery of patients by quantitative measurement beyond the questionnaire and survey methods. Objective: This study aims to develop a remote, low-cost, and cloud-based machine learning (ML) platform to enable the precision health monitoring of PA, which may fundamentally alter the delivery of home health care. To validate this technology, we conducted a clinical trial to test the ability of our platform to predict clinical outcomes in discharged patients. Methods: Our platform consists of a wearable device, which includes an accelerometer and a Bluetooth sensor, and an iPhone connected to our cloud-based ML interface to analyze PA remotely and predict clinical outcomes. This system was deployed at a skilled nursing facility where we collected >17,000 person-day data points over 2 years, generating a solid training database. We used these data to train our extreme gradient boosting (XGBoost)--based ML environment to conduct a clinical trial, Activity Assessment of Patients Discharged from Hospital-I, to test the hypothesis that a comprehensive profile of PA would predict clinical outcome. We developed an advanced data-driven analytic platform that predicts the clinical outcome based on accurate measurements of PA. Artificial intelligence or an ML algorithm was used to analyze the data to predict short-term health outcome. Results: We enrolled 52 patients discharged from Stanford Hospital. Our data demonstrated a robust predictive system to forecast health outcome in the enrolled patients based on their PA data. We achieved precise prediction of the patients' clinical outcomes with a sensitivity of 87\%, a specificity of 79\%, and an accuracy of 85\%. Conclusions: To date, there are no reliable clinical data, using a wearable device, regarding monitoring discharged patients to predict their recovery. We conducted a clinical trial to assess outcome data rigorously to be used reliably for remote home care by patients, health care professionals, and caretakers. ", doi="10.2196/45130", url="https://cardio.jmir.org/2024/1/e45130", url="http://www.ncbi.nlm.nih.gov/pubmed/38427393" } @Article{info:doi/10.2196/49022, author="Bhargava, Hansa and Salomon, Carmela and Suresh, Srinivasan and Chang, Anthony and Kilian, Rachel and Stijn, van Diana and Oriol, Albert and Low, Daniel and Knebel, Ashley and Taraman, Sharief", title="Promises, Pitfalls, and Clinical Applications of Artificial Intelligence in Pediatrics", journal="J Med Internet Res", year="2024", month="Feb", day="29", volume="26", pages="e49022", keywords="artificial intelligence", keywords="pediatrics", keywords="autism spectrum disorder", keywords="ASD", keywords="disparities", keywords="pediatric", keywords="youth", keywords="child", keywords="children", keywords="autism", keywords="autistic", keywords="barrier", keywords="barriers", keywords="clinical application", keywords="clinical applications", keywords="professional development", keywords="continuing education", keywords="continuing medical education", keywords="CME", keywords="implementation", doi="10.2196/49022", url="https://www.jmir.org/2024/1/e49022", url="http://www.ncbi.nlm.nih.gov/pubmed/38421690" } @Article{info:doi/10.2196/51326, author="?lhanl?, Nevruz and Park, Yoon Se and Kim, Jaewoong and Ryu, An Jee and Yard?mc?, Ahmet and Yoon, Dukyong", title="Prediction of Antibiotic Resistance in Patients With a Urinary Tract Infection: Algorithm Development and Validation", journal="JMIR Med Inform", year="2024", month="Feb", day="29", volume="12", pages="e51326", keywords="antibiotic resistance", keywords="machine learning", keywords="urinary tract infections", keywords="UTI", keywords="decision support", abstract="Background: The early prediction of antibiotic resistance in patients with a urinary tract infection (UTI) is important to guide appropriate antibiotic therapy selection. Objective: In this study, we aimed to predict antibiotic resistance in patients with a UTI. Additionally, we aimed to interpret the machine learning models we developed. Methods: The electronic medical records of patients who were admitted to Yongin Severance Hospital, South Korea were used. A total of 71 features extracted from patients' admission, diagnosis, prescription, and microbiology records were used for classification. UTI pathogens were classified as either sensitive or resistant to cephalosporin, piperacillin-tazobactam (TZP), carbapenem, trimethoprim-sulfamethoxazole (TMP-SMX), and fluoroquinolone. To analyze how each variable contributed to the machine learning model's predictions of antibiotic resistance, we used the Shapley Additive Explanations method. Finally, a prototype machine learning--based clinical decision support system was proposed to provide clinicians the resistance probabilities for each antibiotic. Results: The data set included 3535, 737, 708, 1582, and 1365 samples for cephalosporin, TZP, TMP-SMX, fluoroquinolone, and carbapenem resistance prediction models, respectively. The area under the receiver operating characteristic curve values of the random forest models were 0.777 (95\% CI 0.775-0.779), 0.864 (95\% CI 0.862-0.867), 0.877 (95\% CI 0.874-0.880), 0.881 (95\% CI 0.879-0.882), and 0.884 (95\% CI 0.884-0.885) in the training set and 0.638 (95\% CI 0.635-0.642), 0.630 (95\% CI 0.626-0.634), 0.665 (95\% CI 0.659-0.671), 0.670 (95\% CI 0.666-0.673), and 0.721 (95\% CI 0.718-0.724) in the test set for predicting resistance to cephalosporin, TZP, carbapenem, TMP-SMX, and fluoroquinolone, respectively. The number of previous visits, first culture after admission, chronic lower respiratory diseases, administration of drugs before infection, and exposure time to these drugs were found to be important variables for predicting antibiotic resistance. Conclusions: The study results demonstrated the potential of machine learning to predict antibiotic resistance in patients with a UTI. Machine learning can assist clinicians in making decisions regarding the selection of appropriate antibiotic therapy in patients with a UTI. ", doi="10.2196/51326", url="https://medinform.jmir.org/2024/1/e51326", url="http://www.ncbi.nlm.nih.gov/pubmed/38421718" } @Article{info:doi/10.2196/45126, author="Nguyen, B. Thy and Weitzel, Nathaen and Hogan, Craig and Kacmar, M. Rachel and Williamson, M. Kayla and Pattee, Jack and Jevtovic-Todorovic, Vesna and Simmons, G. Colby and Faruki, Ahmad Adeel", title="Comparing Anesthesia and Surgery Controlled Time for Primary Total Knee and Hip Arthroplasty Between an Academic Medical Center and a Community Hospital: Retrospective Cohort Study", journal="JMIR Perioper Med", year="2024", month="Feb", day="26", volume="7", pages="e45126", keywords="anesthesia controlled time", keywords="surgery-controlled time", keywords="total joint arthroplasty", keywords="healthcare operations", keywords="efficiency", keywords="total joint replacement", keywords="knee", keywords="hip", keywords="arthroplasty", keywords="anesthesia", keywords="surgery", keywords="surgical duration", keywords="community hospital", keywords="surgeon", keywords="reliability", keywords="operating room", keywords="anesthesiology", keywords="orthopedics", keywords="perioperative", keywords="medicine", abstract="Background: Osteoarthritis is a significant cause of disability, resulting in increased joint replacement surgeries and health care costs. Establishing benchmarks that more accurately predict surgical duration could help to decrease costs, maximize efficiency, and improve patient experience. We compared the anesthesia-controlled time (ACT) and surgery-controlled time (SCT) of primary total knee (TKA) and total hip arthroplasties (THA) between an academic medical center (AMC) and a community hospital (CH) for 2 orthopedic surgeons. Objective: This study aims to validate and compare benchmarking times for ACT and SCT in a single patient population at both an AMC and a CH. Methods: This retrospective 2-center observational cohort study was conducted at the University of Colorado Hospital (AMC) and UCHealth Broomfield Hospital (CH). Cases with current procedural terminology codes for THA and TKA between January 1, 2019, and December 31, 2020, were assessed. Cases with missing data were excluded. The primary outcomes were ACT and SCT. Primary outcomes were tested for association with covariates of interest. The primary covariate of interest was the location of the procedure (CH vs AMC); secondary covariates of interest included the American Society of Anesthesiologists (ASA) classification and anesthetic type. Linear regression models were used to assess the relationships. Results: Two surgeons performed 1256 cases at the AMC and CH. A total of 10 THA cases and 12 TKA cases were excluded due to missing data. After controlling for surgeon, the ACT was greater at the AMC for THA by 3.77 minutes and for TKA by 3.58 minutes (P<.001). SCT was greater at the AMC for THA by 11.14 minutes and for TKA by 14.04 minutes (P<.001). ASA III/IV classification increased ACT for THA by 3.76 minutes (P<.001) and increased SCT for THA by 6.33 minutes after controlling for surgeon and location (P=.008). General anesthesia use was higher at the AMC for both THA (29.2\% vs 7.3\%) and TKA (23.8\% vs 4.2\%). No statistically significant association was observed between either ACT or SCT and anesthetic type (neuraxial or general) after adjusting for surgeon and location (all P>.05). Conclusions: We observed lower ACT and SCT at the CH for both TKA and THA after controlling for the surgeon of record and ASA classification. These findings underscore the efficiency advantages of performing primary joint replacements at the CH, showcasing an average reduction of 16 minutes in SCT and 4 minutes in ACT per case. Overall, establishing more accurate benchmarks to improve the prediction of surgical duration for THA and TKA in different perioperative environments can increase the reliability of surgical duration predictions and optimize scheduling. Future studies with study populations at multiple community hospitals and academic medical centers are needed before extrapolating these findings. ", doi="10.2196/45126", url="https://periop.jmir.org/2024/1/e45126", url="http://www.ncbi.nlm.nih.gov/pubmed/38407957" } @Article{info:doi/10.2196/51002, author="Cheng, Hao-Shen and You, Weir-Chiang and Chen, Ni-Wei and Hsieh, Mu-Chih and Tsai, Che-Fu and Ho, Chia-Jing and Chen, Chien-Chih", title="Enhancing the Efficiency of a Radiation Oncology Department Using Electronic Medical Records: Protocol for Preparing Radiotherapy", journal="JMIR Res Protoc", year="2024", month="Feb", day="23", volume="13", pages="e51002", keywords="efficiency", keywords="electronic medical records", keywords="Hospital Information System", keywords="protocol", keywords="radiation oncology", abstract="Background: Electronic medical records (EMRs) streamline medical processes, improve quality control, and facilitate data sharing among hospital departments. They also reduce maintenance costs and storage space needed for paper records, while saving time and providing structured data for future research. Objective: This study aimed to investigate whether the integration of the radiation oncology information system and the hospital information system enhances the efficiency of the department of radiation oncology. Methods: We held multidisciplinary discussions among physicians, physicists, medical radiation technologists, nurses, and engineers. We integrated paper records from the radiation oncology department into the existing hospital information system within the hospital. A new electronic interface was designed. A comparison was made between the time taken to retrieve information from either the paper records or the EMRs for radiation preparation. A total of 30 cases were randomly allocated in both the old paper-based system and the new EMR system. The time spent was calculated manually at every step during the process, and we performed an independent 1-tailed t test to evaluate the difference between the 2 systems. Results: Since the system was launched in August 2020, more than 1000 medical records have been entered into the system, and this figure continues to increase. The total time needed for the radiation preparation process was reduced from 286.8 minutes to 154.3 minutes (P<.001)---a reduction of 46.2\%. There was no longer any need to arrange for a nurse to organize the radiotherapy paper records, saving a workload of 16 hours per month. Conclusions: The implementation of the integrated EMR system has resulted in a significant reduction in the number of steps involved in radiotherapy preparation, as well as a decrease in the amount of time required for the process. The new EMR system has provided numerous benefits for the department, including a decrease in workload, a simplified workflow, and conserving more patient data within a confined space. ", doi="10.2196/51002", url="https://www.researchprotocols.org/2024/1/e51002", url="http://www.ncbi.nlm.nih.gov/pubmed/38393753" } @Article{info:doi/10.2196/52155, author="Kumar, Ajay and Burr, Pierce and Young, Michael Tim", title="Using AI Text-to-Image Generation to Create Novel Illustrations for Medical Education: Current Limitations as Illustrated by Hypothyroidism and Horner Syndrome", journal="JMIR Med Educ", year="2024", month="Feb", day="22", volume="10", pages="e52155", keywords="artificial intelligence", keywords="AI", keywords="medical illustration", keywords="medical images", keywords="medical education", keywords="image", keywords="images", keywords="illustration", keywords="illustrations", keywords="photo", keywords="photos", keywords="photographs", keywords="face", keywords="facial", keywords="paralysis", keywords="photograph", keywords="photography", keywords="Horner's syndrome", keywords="Horner syndrome", keywords="Bernard syndrome", keywords="Bernard's syndrome", keywords="miosis", keywords="oculosympathetic", keywords="ptosis", keywords="ophthalmoplegia", keywords="nervous system", keywords="autonomic", keywords="eye", keywords="eyes", keywords="pupil", keywords="pupils", keywords="neurologic", keywords="neurological", doi="10.2196/52155", url="https://mededu.jmir.org/2024/1/e52155", url="http://www.ncbi.nlm.nih.gov/pubmed/38386400" } @Article{info:doi/10.2196/48445, author="{\O}stervang, Christina and Jensen, Myhre Charlotte and Coyne, Elisabeth and Dieperink, B. Karin and Lassen, Annmarie", title="Usability and Evaluation of a Health Information System in the Emergency Department: Mixed Methods Study", journal="JMIR Hum Factors", year="2024", month="Feb", day="21", volume="11", pages="e48445", keywords="consumer", keywords="eHealth", keywords="elderly", keywords="emergency department", keywords="emergency", keywords="family members", keywords="healthcare professionals", keywords="information system", keywords="mixed methods research: patients", keywords="qualitative interview", keywords="questionnaire", keywords="technology", keywords="usability", keywords="usable", abstract="Background: A lack of information during an emergency visit leads to the experience of powerlessness for patients and their family members, who may also feel unprepared to cope with acute symptoms. The ever-changing nature and fast-paced workflow in the emergency department (ED) often affect how health care professionals can tailor information and communication to the needs of the patient. Objective: This study aimed to evaluate the usability and experience of a newly developed information system. The system was developed together with patients and their family members to help provide the information needed in the ED. Methods: We conducted a mixed methods study consisting of quantitative data obtained from the System Usability Scale questionnaire and qualitative interview data obtained from purposively selected participants included in the quantitative part of the study. Results: A total of 106 patients and 14 family members (N=120) answered the questionnaire. A total of 10 patients and 3 family members participated in the interviews. Based on the System Usability Scale score, the information system was rated close to excellent, with a mean score of 83.6 (SD 12.8). Most of the participants found the information system easy to use and would like to use it again. The participants reported that the system helped them feel in control, and the information was useful. Simplifications were needed to improve the user experience for the older individuals. Conclusions: This study demonstrates that the usability of the information system is rated close to excellent. It was perceived to be useful as it enabled understanding and predictability of the patient's trajectory in the ED. Areas for improvement include making the system more usable by older individuals. The study provides an example of how a technological solution can be used to diminish the information gap in an ED context. ", doi="10.2196/48445", url="https://humanfactors.jmir.org/2024/1/e48445", url="http://www.ncbi.nlm.nih.gov/pubmed/38381502" } @Article{info:doi/10.2196/51727, author="Hashtarkhani, Soheil and Schwartz, L. David and Shaban-Nejad, Arash", title="Enhancing Health Care Accessibility and Equity Through a Geoprocessing Toolbox for Spatial Accessibility Analysis: Development and Case Study", journal="JMIR Form Res", year="2024", month="Feb", day="21", volume="8", pages="e51727", keywords="geographical information system", keywords="geoprocessing tool", keywords="health disparities", keywords="health equity", keywords="health services management", keywords="hemodialysis services", keywords="spatial accessibility", abstract="Background: Access to health care services is a critical determinant of population health and well-being. Measuring spatial accessibility to health services is essential for understanding health care distribution and addressing potential inequities. Objective: In this study, we developed a geoprocessing toolbox including Python script tools for the ArcGIS Pro environment to measure the spatial accessibility of health services using both classic and enhanced versions of the 2-step floating catchment area method. Methods: Each of our tools incorporated both distance buffers and travel time catchments to calculate accessibility scores based on users' choices. Additionally, we developed a separate tool to create travel time catchments that is compatible with both locally available network data sets and ArcGIS Online data sources. We conducted a case study focusing on the accessibility of hemodialysis services in the state of Tennessee using the 4 versions of the accessibility tools. Notably, the calculation of the target population considered age as a significant nonspatial factor influencing hemodialysis service accessibility. Weighted populations were calculated using end-stage renal disease incidence rates in different age groups. Results: The implemented tools are made accessible through ArcGIS Online for free use by the research community. The case study revealed disparities in the accessibility of hemodialysis services, with urban areas demonstrating higher scores compared to rural and suburban regions. Conclusions: These geoprocessing tools can serve as valuable decision-support resources for health care providers, organizations, and policy makers to improve equitable access to health care services. This comprehensive approach to measuring spatial accessibility can empower health care stakeholders to address health care distribution challenges effectively. ", doi="10.2196/51727", url="https://formative.jmir.org/2024/1/e51727", url="http://www.ncbi.nlm.nih.gov/pubmed/38381503" } @Article{info:doi/10.2196/51473, author="Lee, Hojae and Cho, Ki Joong and Park, Jaeyu and Lee, Hyeri and Fond, Guillaume and Boyer, Laurent and Kim, Jin Hyeon and Park, Seoyoung and Cho, Wonyoung and Lee, Hayeon and Lee, Jinseok and Yon, Keon Dong", title="Machine Learning--Based Prediction of Suicidality in Adolescents With Allergic Rhinitis: Derivation and Validation in 2 Independent Nationwide Cohorts", journal="J Med Internet Res", year="2024", month="Feb", day="14", volume="26", pages="e51473", keywords="machine learning", keywords="allergic rhinitis", keywords="prediction", keywords="random forest", keywords="suicidality", abstract="Background: Given the additional risk of suicide-related behaviors in adolescents with allergic rhinitis (AR), it is important to use the growing field of machine learning (ML) to evaluate this risk. Objective: This study aims to evaluate the validity and usefulness of an ML model for predicting suicide risk in patients with AR. Methods: We used data from 2 independent survey studies, Korea Youth Risk Behavior Web-based Survey (KYRBS; n=299,468) for the original data set and Korea National Health and Nutrition Examination Survey (KNHANES; n=833) for the external validation data set, to predict suicide risks of AR in adolescents aged 13 to 18 years, with 3.45\% (10,341/299,468) and 1.4\% (12/833) of the patients attempting suicide in the KYRBS and KNHANES studies, respectively. The outcome of interest was the suicide attempt risks. We selected various ML-based models with hyperparameter tuning in the discovery and performed an area under the receiver operating characteristic curve (AUROC) analysis in the train, test, and external validation data. Results: The study data set included 299,468 (KYRBS; original data set) and 833 (KNHANES; external validation data set) patients with AR recruited between 2005 and 2022. The best-performing ML model was the random forest model with a mean AUROC of 84.12\% (95\% CI 83.98\%-84.27\%) in the original data set. Applying this result to the external validation data set revealed the best performance among the models, with an AUROC of 89.87\% (sensitivity 83.33\%, specificity 82.58\%, accuracy 82.59\%, and balanced accuracy 82.96\%). While looking at feature importance, the 5 most important features in predicting suicide attempts in adolescent patients with AR are depression, stress status, academic achievement, age, and alcohol consumption. Conclusions: This study emphasizes the potential of ML models in predicting suicide risks in patients with AR, encouraging further application of these models in other conditions to enhance adolescent health and decrease suicide rates. ", doi="10.2196/51473", url="https://www.jmir.org/2024/1/e51473", url="http://www.ncbi.nlm.nih.gov/pubmed/38354043" } @Article{info:doi/10.2196/42271, author="Li, Angie and Mullin, Sarah and Elkin, L. Peter", title="Improving Prediction of Survival for Extremely Premature Infants Born at 23 to 29 Weeks Gestational Age in the Neonatal Intensive Care Unit: Development and Evaluation of Machine Learning Models", journal="JMIR Med Inform", year="2024", month="Feb", day="14", volume="12", pages="e42271", keywords="reproductive informatics", keywords="pregnancy complications", keywords="premature birth", keywords="neonatal mortality", keywords="machine learning", keywords="clinical decision support", keywords="preterm", keywords="pediatrics", keywords="intensive care unit outcome", keywords="health care outcome", keywords="survival prediction", keywords="maternal health", keywords="decision tree model", keywords="socioeconomic", abstract="Background: Infants born at extremely preterm gestational ages are typically admitted to the neonatal intensive care unit (NICU) after initial resuscitation. The subsequent hospital course can be highly variable, and despite counseling aided by available risk calculators, there are significant challenges with shared decision-making regarding life support and transition to end-of-life care. Improving predictive models can help providers and families navigate these unique challenges. Objective: Machine learning methods have previously demonstrated added predictive value for determining intensive care unit outcomes, and their use allows consideration of a greater number of factors that potentially influence newborn outcomes, such as maternal characteristics. Machine learning--based models were analyzed for their ability to predict the survival of extremely preterm neonates at initial admission. Methods: Maternal and newborn information was extracted from the health records of infants born between 23 and 29 weeks of gestation in the Medical Information Mart for Intensive Care III (MIMIC-III) critical care database. Applicable machine learning models predicting survival during the initial NICU admission were developed and compared. The same type of model was also examined using only features that would be available prepartum for the purpose of survival prediction prior to an anticipated preterm birth. Features most correlated with the predicted outcome were determined when possible for each model. Results: Of included patients, 37 of 459 (8.1\%) expired. The resulting random forest model showed higher predictive performance than the frequently used Score for Neonatal Acute Physiology With Perinatal Extension II (SNAPPE-II) NICU model when considering extremely preterm infants of very low birth weight. Several other machine learning models were found to have good performance but did not show a statistically significant difference from previously available models in this study. Feature importance varied by model, and those of greater importance included gestational age; birth weight; initial oxygenation level; elements of the APGAR (appearance, pulse, grimace, activity, and respiration) score; and amount of blood pressure support. Important prepartum features also included maternal age, steroid administration, and the presence of pregnancy complications. Conclusions: Machine learning methods have the potential to provide robust prediction of survival in the context of extremely preterm births and allow for consideration of additional factors such as maternal clinical and socioeconomic information. Evaluation of larger, more diverse data sets may provide additional clarity on comparative performance. ", doi="10.2196/42271", url="https://medinform.jmir.org/2024/1/e42271", url="http://www.ncbi.nlm.nih.gov/pubmed/38354033" } @Article{info:doi/10.2196/47739, author="Laurentiev, John and Kim, Hyun Dae and Mahesri, Mufaddal and Wang, Kuan-Yuan and Bessette, G. Lily and York, Cassandra and Zakoul, Heidi and Lee, Been Su and Zhou, Li and Lin, Joshua Kueiyu", title="Identifying Functional Status Impairment in People Living With Dementia Through Natural Language Processing of Clinical Documents: Cross-Sectional Study", journal="J Med Internet Res", year="2024", month="Feb", day="13", volume="26", pages="e47739", keywords="activities of daily living", keywords="ADLs", keywords="clinical note", keywords="dementia", keywords="electronic health record", keywords="EHR", keywords="functional impairment", keywords="instrumental activities of daily living", keywords="iADLs", keywords="machine learning", keywords="natural language processing", keywords="NLP", abstract="Background: Assessment of activities of daily living (ADLs) and instrumental ADLs (iADLs) is key to determining the severity of dementia and care needs among older adults. However, such information is often only documented in free-text clinical notes within the electronic health record and can be challenging to find. Objective: This study aims to develop and validate machine learning models to determine the status of ADL and iADL impairments based on clinical notes. Methods: This cross-sectional study leveraged electronic health record clinical notes from Mass General Brigham's Research Patient Data Repository linked with Medicare fee-for-service claims data from 2007 to 2017 to identify individuals aged 65 years or older with at least 1 diagnosis of dementia. Notes for encounters both 180 days before and after the first date of dementia diagnosis were randomly sampled. Models were trained and validated using note sentences filtered by expert-curated keywords (filtered cohort) and further evaluated using unfiltered sentences (unfiltered cohort). The model's performance was compared using area under the receiver operating characteristic curve and area under the precision-recall curve (AUPRC). Results: The study included 10,000 key-term--filtered sentences representing 441 people (n=283, 64.2\% women; mean age 82.7, SD 7.9 years) and 1000 unfiltered sentences representing 80 people (n=56, 70\% women; mean age 82.8, SD 7.5 years). Area under the receiver operating characteristic curve was high for the best-performing ADL and iADL models on both cohorts (>0.97). For ADL impairment identification, the random forest model achieved the best AUPRC (0.89, 95\% CI 0.86-0.91) on the filtered cohort; the support vector machine model achieved the highest AUPRC (0.82, 95\% CI 0.75-0.89) for the unfiltered cohort. For iADL impairment, the Bio+Clinical bidirectional encoder representations from transformers (BERT) model had the highest AUPRC (filtered: 0.76, 95\% CI 0.68-0.82; unfiltered: 0.58, 95\% CI 0.001-1.0). Compared with a keyword-search approach on the unfiltered cohort, machine learning reduced false-positive rates from 4.5\% to 0.2\% for ADL and 1.8\% to 0.1\% for iADL. Conclusions: In this study, we demonstrated the ability of machine learning models to accurately identify ADL and iADL impairment based on free-text clinical notes, which could be useful in determining the severity of dementia. ", doi="10.2196/47739", url="https://www.jmir.org/2024/1/e47739", url="http://www.ncbi.nlm.nih.gov/pubmed/38349732" } @Article{info:doi/10.2196/51391, author="Abdullahi, Tassallah and Singh, Ritambhara and Eickhoff, Carsten", title="Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models", journal="JMIR Med Educ", year="2024", month="Feb", day="13", volume="10", pages="e51391", keywords="clinical decision support", keywords="rare diseases", keywords="complex diseases", keywords="prompt engineering", keywords="reliability", keywords="consistency", keywords="natural language processing", keywords="language model", keywords="Bard", keywords="ChatGPT 3.5", keywords="GPT-4", keywords="MedAlpaca", keywords="medical education", keywords="complex diagnosis", keywords="artificial intelligence", keywords="AI assistance", keywords="medical training", keywords="prediction model", abstract="Background: Patients with rare and complex diseases often experience delayed diagnoses and misdiagnoses because comprehensive knowledge about these diseases is limited to only a few medical experts. In this context, large language models (LLMs) have emerged as powerful knowledge aggregation tools with applications in clinical decision support and education domains. Objective: This study aims to explore the potential of 3 popular LLMs, namely Bard (Google LLC), ChatGPT-3.5 (OpenAI), and GPT-4 (OpenAI), in medical education to enhance the diagnosis of rare and complex diseases while investigating the impact of prompt engineering on their performance. Methods: We conducted experiments on publicly available complex and rare cases to achieve these objectives. We implemented various prompt strategies to evaluate the performance of these models using both open-ended and multiple-choice prompts. In addition, we used a majority voting strategy to leverage diverse reasoning paths within language models, aiming to enhance their reliability. Furthermore, we compared their performance with the performance of human respondents and MedAlpaca, a generative LLM specifically designed for medical tasks. Results: Notably, all LLMs outperformed the average human consensus and MedAlpaca, with a minimum margin of 5\% and 13\%, respectively, across all 30 cases from the diagnostic case challenge collection. On the frequently misdiagnosed cases category, Bard tied with MedAlpaca but surpassed the human average consensus by 14\%, whereas GPT-4 and ChatGPT-3.5 outperformed MedAlpaca and the human respondents on the moderately often misdiagnosed cases category with minimum accuracy scores of 28\% and 11\%, respectively. The majority voting strategy, particularly with GPT-4, demonstrated the highest overall score across all cases from the diagnostic complex case collection, surpassing that of other LLMs. On the Medical Information Mart for Intensive Care-III data sets, Bard and GPT-4 achieved the highest diagnostic accuracy scores, with multiple-choice prompts scoring 93\%, whereas ChatGPT-3.5 and MedAlpaca scored 73\% and 47\%, respectively. Furthermore, our results demonstrate that there is no one-size-fits-all prompting approach for improving the performance of LLMs and that a single strategy does not universally apply to all LLMs. Conclusions: Our findings shed light on the diagnostic capabilities of LLMs and the challenges associated with identifying an optimal prompting strategy that aligns with each language model's characteristics and specific task requirements. The significance of prompt engineering is highlighted, providing valuable insights for researchers and practitioners who use these language models for medical training. Furthermore, this study represents a crucial step toward understanding how LLMs can enhance diagnostic reasoning in rare and complex medical cases, paving the way for developing effective educational tools and accurate diagnostic aids to improve patient care and outcomes. ", doi="10.2196/51391", url="https://mededu.jmir.org/2024/1/e51391", url="http://www.ncbi.nlm.nih.gov/pubmed/38349725" } @Article{info:doi/10.2196/55368, author="Weidener, Lukas and Fischer, Michael", title="Proposing a Principle-Based Approach for Teaching AI Ethics in Medical Education", journal="JMIR Med Educ", year="2024", month="Feb", day="9", volume="10", pages="e55368", keywords="artificial intelligence", keywords="AI", keywords="ethics", keywords="artificial intelligence ethics", keywords="AI ethics", keywords="medical education", keywords="medicine", keywords="medical artificial intelligence ethics", keywords="medical AI ethics", keywords="medical ethics", keywords="public health ethics", doi="10.2196/55368", url="https://mededu.jmir.org/2024/1/e55368", url="http://www.ncbi.nlm.nih.gov/pubmed/38285931" } @Article{info:doi/10.2196/48514, author="Yu, Peng and Fang, Changchang and Liu, Xiaolin and Fu, Wanying and Ling, Jitao and Yan, Zhiwei and Jiang, Yuan and Cao, Zhengyu and Wu, Maoxiong and Chen, Zhiteng and Zhu, Wengen and Zhang, Yuling and Abudukeremu, Ayiguli and Wang, Yue and Liu, Xiao and Wang, Jingfeng", title="Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study", journal="JMIR Med Educ", year="2024", month="Feb", day="9", volume="10", pages="e48514", keywords="ChatGPT", keywords="Chinese Postgraduate Examination for Clinical Medicine", keywords="medical student", keywords="performance", keywords="artificial intelligence", keywords="medical care", keywords="qualitative feedback", keywords="medical education", keywords="clinical decision-making", abstract="Background: ChatGPT, an artificial intelligence (AI) based on large-scale language models, has sparked interest in the field of health care. Nonetheless, the capabilities of AI in text comprehension and generation are constrained by the quality and volume of available training data for a specific language, and the performance of AI across different languages requires further investigation. While AI harbors substantial potential in medicine, it is imperative to tackle challenges such as the formulation of clinical care standards; facilitating cultural transitions in medical education and practice; and managing ethical issues including data privacy, consent, and bias. Objective: The study aimed to evaluate ChatGPT's performance in processing Chinese Postgraduate Examination for Clinical Medicine questions, assess its clinical reasoning ability, investigate potential limitations with the Chinese language, and explore its potential as a valuable tool for medical professionals in the Chinese context. Methods: A data set of Chinese Postgraduate Examination for Clinical Medicine questions was used to assess the effectiveness of ChatGPT's (version 3.5) medical knowledge in the Chinese language, which has a data set of 165 medical questions that were divided into three categories: (1) common questions (n=90) assessing basic medical knowledge, (2) case analysis questions (n=45) focusing on clinical decision-making through patient case evaluations, and (3) multichoice questions (n=30) requiring the selection of multiple correct answers. First of all, we assessed whether ChatGPT could meet the stringent cutoff score defined by the government agency, which requires a performance within the top 20\% of candidates. Additionally, in our evaluation of ChatGPT's performance on both original and encoded medical questions, 3 primary indicators were used: accuracy, concordance (which validates the answer), and the frequency of insights. Results: Our evaluation revealed that ChatGPT scored 153.5 out of 300 for original questions in Chinese, which signifies the minimum score set to ensure that at least 20\% more candidates pass than the enrollment quota. However, ChatGPT had low accuracy in answering open-ended medical questions, with only 31.5\% total accuracy. The accuracy for common questions, multichoice questions, and case analysis questions was 42\%, 37\%, and 17\%, respectively. ChatGPT achieved a 90\% concordance across all questions. Among correct responses, the concordance was 100\%, significantly exceeding that of incorrect responses (n=57, 50\%; P<.001). ChatGPT provided innovative insights for 80\% (n=132) of all questions, with an average of 2.95 insights per accurate response. Conclusions: Although ChatGPT surpassed the passing threshold for the Chinese Postgraduate Examination for Clinical Medicine, its performance in answering open-ended medical questions was suboptimal. Nonetheless, ChatGPT exhibited high internal concordance and the ability to generate multiple insights in the Chinese language. Future research should investigate the language-based discrepancies in ChatGPT's performance within the health care context. ", doi="10.2196/48514", url="https://mededu.jmir.org/2024/1/e48514", url="http://www.ncbi.nlm.nih.gov/pubmed/38335017" } @Article{info:doi/10.2196/32690, author="Ji, Jia and Hou, Yongshuai and Chen, Xinyu and Pan, Youcheng and Xiang, Yang", title="Vision-Language Model for Generating Textual Descriptions From Clinical Images: Model Development and Validation Study", journal="JMIR Form Res", year="2024", month="Feb", day="8", volume="8", pages="e32690", keywords="clinical image", keywords="radiology report generation", keywords="vision-language model", keywords="multistage fine-tuning", keywords="prior knowledge", abstract="Background: The automatic generation of radiology reports, which seeks to create a free-text description from a clinical radiograph, is emerging as a pivotal intersection between clinical medicine and artificial intelligence. Leveraging natural language processing technologies can accelerate report creation, enhancing health care quality and standardization. However, most existing studies have not yet fully tapped into the combined potential of advanced language and vision models. Objective: The purpose of this study was to explore the integration of pretrained vision-language models into radiology report generation. This would enable the vision-language model to automatically convert clinical images into high-quality textual reports. Methods: In our research, we introduced a radiology report generation model named ClinicalBLIP, building upon the foundational InstructBLIP model and refining it using clinical image-to-text data sets. A multistage fine-tuning approach via low-rank adaptation was proposed to deepen the semantic comprehension of the visual encoder and the large language model for clinical imagery. Furthermore, prior knowledge was integrated through prompt learning to enhance the precision of the reports generated. Experiments were conducted on both the IU X-RAY and MIMIC-CXR data sets, with ClinicalBLIP compared to several leading methods. Results: Experimental results revealed that ClinicalBLIP obtained superior scores of 0.570/0.365 and 0.534/0.313 on the IU X-RAY/MIMIC-CXR test sets for the Metric for Evaluation of Translation with Explicit Ordering (METEOR) and the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluations, respectively. This performance notably surpasses that of existing state-of-the-art methods. Further evaluations confirmed the effectiveness of the multistage fine-tuning and the integration of prior information, leading to substantial improvements. Conclusions: The proposed ClinicalBLIP model demonstrated robustness and effectiveness in enhancing clinical radiology report generation, suggesting significant promise for real-world clinical applications. ", doi="10.2196/32690", url="https://formative.jmir.org/2024/1/e32690", url="http://www.ncbi.nlm.nih.gov/pubmed/38329788" } @Article{info:doi/10.2196/52059, author="Ahmadzia, Khorrami Homa and Dzienny, C. Alexa and Bopf, Mike and Phillips, M. Jaclyn and Federspiel, Jeffrey Jerome and Amdur, Richard and Rice, Murguia Madeline and Rodriguez, Laritza", title="Machine Learning Models for Prediction of Maternal Hemorrhage and Transfusion: Model Development Study", journal="JMIR Bioinform Biotech", year="2024", month="Feb", day="5", volume="5", pages="e52059", keywords="postpartum hemorrhage", keywords="machine learning", keywords="prediction", keywords="maternal", keywords="predict", keywords="predictive", keywords="bleeding", keywords="hemorrhage", keywords="hemorrhaging", keywords="birth", keywords="postnatal", keywords="blood", keywords="transfusion", keywords="antepartum", keywords="obstetric", keywords="obstetrics", keywords="women's health", keywords="gynecology", keywords="gynecological", abstract="Background: Current postpartum hemorrhage (PPH) risk stratification is based on traditional statistical models or expert opinion. Machine learning could optimize PPH prediction by allowing for more complex modeling. Objective: We sought to improve PPH prediction and compare machine learning and traditional statistical methods. Methods: We developed models using the Consortium for Safe Labor data set (2002-2008) from 12 US hospitals. The primary outcome was a transfusion of blood products or PPH (estimated blood loss of ?1000 mL). The secondary outcome was a transfusion of any blood product. Fifty antepartum and intrapartum characteristics and hospital characteristics were included. Logistic regression, support vector machines, multilayer perceptron, random forest, and gradient boosting (GB) were used to generate prediction models. The area under the receiver operating characteristic curve (ROC-AUC) and area under the precision/recall curve (PR-AUC) were used to compare performance. Results: Among 228,438 births, 5760 (3.1\%) women had a postpartum hemorrhage, 5170 (2.8\%) had a transfusion, and 10,344 (5.6\%) met the criteria for the transfusion-PPH composite. Models predicting the transfusion-PPH composite using antepartum and intrapartum features had the best positive predictive values, with the GB machine learning model performing best overall (ROC-AUC=0.833, 95\% CI 0.828-0.838; PR-AUC=0.210, 95\% CI 0.201-0.220). The most predictive features in the GB model predicting the transfusion-PPH composite were the mode of delivery, oxytocin incremental dose for labor (mU/minute), intrapartum tocolytic use, presence of anesthesia nurse, and hospital type. Conclusions: Machine learning offers higher discriminability than logistic regression in predicting PPH. The Consortium for Safe Labor data set may not be optimal for analyzing risk due to strong subgroup effects, which decreases accuracy and limits generalizability. ", doi="10.2196/52059", url="https://bioinform.jmir.org/2024/1/e52059", url="http://www.ncbi.nlm.nih.gov/pubmed/38935950" } @Article{info:doi/10.2196/49497, author="Yakob, Najia and Lalibert{\'e}, Sandrine and Doyon-Poulin, Philippe and Jouvet, Philippe and Noumeir, Rita", title="Data Representation Structure to Support Clinical Decision-Making in the Pediatric Intensive Care Unit: Interview Study and Preliminary Decision Support Interface Design", journal="JMIR Form Res", year="2024", month="Feb", day="1", volume="8", pages="e49497", keywords="data representation", keywords="decision support", keywords="critical care", keywords="clinical workflow", keywords="clinical decision-making", keywords="prototype", keywords="design", keywords="intensive care unit", abstract="Background: Clinical decision-making is a complex cognitive process that relies on the interpretation of a large variety of data from different sources and involves the use of knowledge bases and scientific recommendations. The representation of clinical data plays a key role in the speed and efficiency of its interpretation. In addition, the increasing use of clinical decision support systems (CDSSs) provides assistance to clinicians in their practice, allowing them to improve patient outcomes. In the pediatric intensive care unit (PICU), clinicians must process high volumes of data and deal with ever-growing workloads. As they use multiple systems daily to assess patients' status and to adjust the health care plan, including electronic health records (EHR), clinical systems (eg, laboratory, imaging and pharmacy), and connected devices (eg, bedside monitors, mechanical ventilators, intravenous pumps, and syringes), clinicians rely mostly on their judgment and ability to trace relevant data for decision-making. In these circumstances, the lack of optimal data structure and adapted visual representation hinder clinician's cognitive processes and clinical decision-making skills. Objective: In this study, we designed a prototype to optimize the representation of clinical data collected from existing sources (eg, EHR, clinical systems, and devices) via a structure that supports the integration of a home-developed CDSS in the PICU. This study was based on analyzing end user needs and their clinical workflow. Methods: First, we observed clinical activities in a PICU to secure a better understanding of the workflow in terms of staff tasks and their use of EHR on a typical work shift. Second, we conducted interviews with 11 clinicians from different staff categories (eg, intensivists, fellows, nurses, and nurse practitioners) to compile their needs for decision support. Third, we structured the data to design a prototype that illustrates the proposed representation. We used a brain injury care scenario to validate the relevance of integrated data and the utility of main functionalities in a clinical context. Fourth, we held design meetings with 5 clinicians to present, revise, and adapt the prototype to meet their needs. Results: We created a structure with 3 levels of abstraction---unit level, patient level, and system level---to optimize clinical data representation and display for efficient patient assessment and to provide a flexible platform to host the internally developed CDSS. Subsequently, we designed a preliminary prototype based on this structure. Conclusions: The data representation structure allows prioritizing patients via criticality indicators, assessing their conditions using a personalized dashboard, and monitoring their courses based on the evolution of clinical values. Further research is required to define and model the concepts of criticality, problem recognition, and evolution. Furthermore, feasibility tests will be conducted to ensure user satisfaction. ", doi="10.2196/49497", url="https://formative.jmir.org/2024/1/e49497", url="http://www.ncbi.nlm.nih.gov/pubmed/38300695" } @Article{info:doi/10.2196/49347, author="Blasini, Romina and Strantz, Cosima and Gulden, Christian and Helfer, Sven and Lidke, Jakub and Prokosch, Hans-Ulrich and Sohrabi, Keywan and Schneider, Henning", title="Evaluation of Eligibility Criteria Relevance for the Purpose of IT-Supported Trial Recruitment: Descriptive Quantitative Analysis", journal="JMIR Form Res", year="2024", month="Jan", day="31", volume="8", pages="e49347", keywords="CTRSS", keywords="clinical trial recruitment support system", keywords="PRS", keywords="patient recruitment system", keywords="clinical trials", keywords="classifications", keywords="data groups", keywords="data elements", keywords="data classification", keywords="criteria", keywords="relevance", keywords="automated clinical trials", keywords="participants", keywords="clinical trial", abstract="Background: Clinical trials (CTs) are crucial for medical research; however, they frequently fall short of the requisite number of participants who meet all eligibility criteria (EC). A clinical trial recruitment support system (CTRSS) is developed to help identify potential participants by performing a search on a specific data pool. The accuracy of the search results is directly related to the quality of the data used for comparison. Data accessibility can present challenges, making it crucial to identify the necessary data for a CTRSS to query. Prior research has examined the data elements frequently used in CT EC but has not evaluated which criteria are actually used to search for participants. Although all EC must be met to enroll a person in a CT, not all criteria have the same importance when searching for potential participants in an existing data pool, such as an electronic health record, because some of the criteria are only relevant at the time of enrollment. Objective: In this study, we investigated which groups of data elements are relevant in practice for finding suitable participants and whether there are typical elements that are not relevant and can therefore be omitted. Methods: We asked trial experts and CTRSS developers to first categorize the EC of their CTs according to data element groups and then to classify them into 1 of 3 categories: necessary, complementary, and irrelevant. In addition, the experts assessed whether a criterion was documented (on paper or digitally) or whether it was information known only to the treating physicians or patients. Results: We reviewed 82 CTs with 1132 unique EC. Of these 1132 EC, 350 (30.9\%) were considered necessary, 224 (19.8\%) complementary, and 341 (30.1\%) total irrelevant. To identify the most relevant data elements, we introduced the data element relevance index (DERI). This describes the percentage of studies in which the corresponding data element occurs and is also classified as necessary or supplementary. We found that the query of ``diagnosis'' was relevant for finding participants in 79 (96.3\%) of the CTs. This group was followed by ``date of birth/age'' with a DERI of 85.4\% (n=70) and ``procedure'' with a DERI of 35.4\% (n=29). Conclusions: The distribution of data element groups in CTs has been heterogeneously described in previous works. Therefore, we recommend identifying the percentage of CTs in which data element groups can be found as a more reliable way to determine the relevance of EC. Only necessary and complementary criteria should be included in this DERI. ", doi="10.2196/49347", url="https://formative.jmir.org/2024/1/e49347", url="http://www.ncbi.nlm.nih.gov/pubmed/38294862" } @Article{info:doi/10.2196/50890, author="Cho, Hunyong and She, Jane and De Marchi, Daniel and El-Zaatari, Helal and Barnes, L. Edward and Kahkoska, R. Anna and Kosorok, R. Michael and Virkud, V. Arti", title="Machine Learning and Health Science Research: Tutorial", journal="J Med Internet Res", year="2024", month="Jan", day="30", volume="26", pages="e50890", keywords="health science researcher", keywords="machine learning pipeline", keywords="machine learning", keywords="medical machine learning", keywords="precision medicine", keywords="reproducibility", keywords="unsupervised learning", doi="10.2196/50890", url="https://www.jmir.org/2024/1/e50890", url="http://www.ncbi.nlm.nih.gov/pubmed/38289657" } @Article{info:doi/10.2196/46857, author="Fudickar, Sebastian and Bantel, Carsten and Spieker, Jannik and T{\"o}pfer, Heinrich and Stegeman, Patrick and Schiphorst Preuper, R. Henrica and Reneman, F. Michiel and Wolff, P. Andr{\'e} and Soer, Remko", title="Natural Language Processing of Referral Letters for Machine Learning--Based Triaging of Patients With Low Back Pain to the Most Appropriate Intervention: Retrospective Study", journal="J Med Internet Res", year="2024", month="Jan", day="30", volume="26", pages="e46857", keywords="decision support", keywords="triaging", keywords="NLP", keywords="natural language processing", keywords="neural network", keywords="LBP", keywords="low back pain", keywords="back", keywords="pain", keywords="decision-making", keywords="machine learning", keywords="artificial intelligence", keywords="clinical application", keywords="patient records", keywords="qualitative data", keywords="support system", keywords="questionnaire", keywords="quality of life", keywords="psychosocial", abstract="Background: Decision support systems (DSSs) for suggesting optimal treatments for individual patients with low back pain (LBP) are currently insufficiently accurate for clinical application. Most of the input provided to train these systems is based on patient-reported outcome measures. However, with the appearance of electronic health records (EHRs), additional qualitative data on reasons for referrals and patients' goals become available for DSSs. Currently, no decision support tools cover a wide range of biopsychosocial factors, including referral letter information to help clinicians triage patients to the optimal LBP treatment. Objective: The objective of this study was to investigate the added value of including qualitative data from EHRs and referral letters to the accuracy of a quantitative DSS for patients with LBP. Methods: A retrospective study was conducted in a clinical cohort of Dutch patients with LBP. Patients filled out a baseline questionnaire about demographics, pain, disability, work status, quality of life, medication, psychosocial functioning, comorbidity, history, and duration of pain. Referral reasons and patient requests for help (patient goals) were extracted via natural language processing (NLP) and enriched in the data set. For decision support, these data were considered independent factors for triage to neurosurgery, anesthesiology, rehabilitation, or minimal intervention. Support vector machine, k-nearest neighbor, and multilayer perceptron models were trained for 2 conditions: with and without consideration of the referral letter content. The models' accuracies were evaluated via F1-scores, and confusion matrices were used to predict the treatment path (out of 4 paths) with and without additional referral parameters. Results: Data from 1608 patients were evaluated. The evaluation indicated that 2 referral reasons from the referral letters (for anesthesiology and rehabilitation intervention) increased the F1-score accuracy by up to 19.5\% for triaging. The confusion matrices confirmed the results. Conclusions: This study indicates that data enriching by adding NLP-based extraction of the content of referral letters increases the model accuracy of DSSs in suggesting optimal treatments for individual patients with LBP. Overall model accuracies were considered low and insufficient for clinical application. ", doi="10.2196/46857", url="https://www.jmir.org/2024/1/e46857", url="http://www.ncbi.nlm.nih.gov/pubmed/38289669" } @Article{info:doi/10.2196/53516, author="Koonce, Y. Taneya and Giuse, A. Dario and Williams, M. Annette and Blasingame, N. Mallory and Krump, A. Poppy and Su, Jing and Giuse, B. Nunzia", title="Using a Natural Language Processing Approach to Support Rapid Knowledge Acquisition", journal="JMIR Med Inform", year="2024", month="Jan", day="30", volume="12", pages="e53516", keywords="natural language processing", keywords="electronic health records", keywords="machine learning", keywords="data mining", keywords="knowledge management", keywords="NLP", doi="10.2196/53516", url="https://medinform.jmir.org/2024/1/e53516", url="http://www.ncbi.nlm.nih.gov/pubmed/38289670" } @Article{info:doi/10.2196/48995, author="Cheligeer, Cheligeer and Wu, Guosong and Lee, Seungwon and Pan, Jie and Southern, A. Danielle and Martin, A. Elliot and Sapiro, Natalie and Eastwood, A. Cathy and Quan, Hude and Xu, Yuan", title="BERT-Based Neural Network for Inpatient Fall Detection From Electronic Medical Records: Retrospective Cohort Study", journal="JMIR Med Inform", year="2024", month="Jan", day="30", volume="12", pages="e48995", keywords="accidental falls", keywords="electronic medical records", keywords="data mining", keywords="machine learning", keywords="patient safety", keywords="natural language processing", keywords="adverse event", abstract="Background: Inpatient falls are a substantial concern for health care providers and are associated with negative outcomes for patients. Automated detection of falls using machine learning (ML) algorithms may aid in improving patient safety and reducing the occurrence of falls. Objective: This study aims to develop and evaluate an ML algorithm for inpatient fall detection using multidisciplinary progress record notes and a pretrained Bidirectional Encoder Representation from Transformers (BERT) language model. Methods: A cohort of 4323 adult patients admitted to 3 acute care hospitals in Calgary, Alberta, Canada from 2016 to 2021 were randomly sampled. Trained reviewers determined falls from patient charts, which were linked to electronic medical records and administrative data. The BERT-based language model was pretrained on clinical notes, and a fall detection algorithm was developed based on a neural network binary classification architecture. Results: To address various use scenarios, we developed 3 different Alberta hospital notes-specific BERT models: a high sensitivity model (sensitivity 97.7, IQR 87.7-99.9), a high positive predictive value model (positive predictive value 85.7, IQR 57.2-98.2), and the high F1-score model (F1=64.4). Our proposed method outperformed 3 classical ML algorithms and an International Classification of Diseases code--based algorithm for fall detection, showing its potential for improved performance in diverse clinical settings. Conclusions: The developed algorithm provides an automated and accurate method for inpatient fall detection using multidisciplinary progress record notes and a pretrained BERT language model. This method could be implemented in clinical practice to improve patient safety and reduce the occurrence of falls in hospitals. ", doi="10.2196/48995", url="https://medinform.jmir.org/2024/1/e48995", url="http://www.ncbi.nlm.nih.gov/pubmed/38289643" } @Article{info:doi/10.2196/52200, author="Wang, Andrew and Fulton, Rachel and Hwang, Sy and Margolis, J. David and Mowery, Danielle", title="Patient Phenotyping for Atopic Dermatitis With Transformers and Machine Learning: Algorithm Development and Validation Study", journal="JMIR Form Res", year="2024", month="Jan", day="26", volume="8", pages="e52200", keywords="atopic dermatitis", keywords="classification", keywords="classifier", keywords="dermatitis", keywords="dermatology", keywords="EHR", keywords="electronic health record", keywords="health records", keywords="health", keywords="informatics", keywords="machine learning", keywords="natural language processing", keywords="NLP", keywords="patient phenotyping", keywords="phenotype", keywords="skin", keywords="transformer", keywords="transformers", abstract="Background: Atopic dermatitis (AD) is a chronic skin condition that millions of people around the world live with each day. Performing research into identifying the causes and treatment for this disease has great potential to provide benefits for these individuals. However, AD clinical trial recruitment is not a trivial task due to the variance in diagnostic precision and phenotypic definitions leveraged by different clinicians, as well as the time spent finding, recruiting, and enrolling patients by clinicians to become study participants. Thus, there is a need for automatic and effective patient phenotyping for cohort recruitment. Objective: This study aims to present an approach for identifying patients whose electronic health records suggest that they may have AD. Methods: We created a vectorized representation of each patient and trained various supervised machine learning methods to classify when a patient has AD. Each patient is represented by a vector of either probabilities or binary values, where each value indicates whether they meet a different criteria for AD diagnosis. Results: The most accurate AD classifier performed with a class-balanced accuracy of 0.8036, a precision of 0.8400, and a recall of 0.7500 when using XGBoost (Extreme Gradient Boosting). Conclusions: Creating an automated approach for identifying patient cohorts has the potential to accelerate, standardize, and automate the process of patient recruitment for AD studies; therefore, reducing clinician burden and informing the discovery of better treatment options for AD. ", doi="10.2196/52200", url="https://formative.jmir.org/2024/1/e52200", url="http://www.ncbi.nlm.nih.gov/pubmed/38277207" } @Article{info:doi/10.2196/54274, author="Gannon, Hannah and Larsson, Leyla and Chimhuya, Simbarashe and Mangiza, Marcia and Wilson, Emma and Kesler, Erin and Chimhini, Gwendoline and Fitzgerald, Felicity and Zailani, Gloria and Crehan, Caroline and Khan, Nushrat and Hull-Bailey, Tim and Sassoon, Yali and Baradza, Morris and Heys, Michelle and Chiume, Msandeni", title="Development and Implementation of Digital Diagnostic Algorithms for Neonatal Units in Zimbabwe and Malawi: Development and Usability Study", journal="JMIR Form Res", year="2024", month="Jan", day="26", volume="8", pages="e54274", keywords="mobile health", keywords="mHealth", keywords="neonatology", keywords="digital health", keywords="mobile apps", keywords="newborn", keywords="Malawi, Zimbabwe", keywords="usability", keywords="clinical decision support", abstract="Background: Despite an increase in hospital-based deliveries, neonatal mortality remains high in low-resource settings. Due to limited laboratory diagnostics, there is significant reliance on clinical findings to inform diagnoses. Accurate, evidence-based identification and management of neonatal conditions could improve outcomes by standardizing care. This could be achieved through digital clinical decision support (CDS) tools. Neotree is a digital, quality improvement platform that incorporates CDS, aiming to improve neonatal care in low-resource health care facilities. Before this study, first-phase CDS development included developing and implementing neonatal resuscitation algorithms, creating initial versions of CDS to address a range of neonatal conditions, and a Delphi study to review key algorithms. Objective: This second-phase study aims to codevelop and implement neonatal digital CDS algorithms in Malawi and Zimbabwe. Methods: Overall, 11 diagnosis-specific web-based workshops with Zimbabwean, Malawian, and UK neonatal experts were conducted (August 2021 to April 2022) encompassing the following: (1) review of available evidence, (2) review of country-specific guidelines (Essential Medicines List and Standard Treatment Guidelinesfor Zimbabwe and Care of the Infant and Newborn, Malawi), and (3) identification of uncertainties within the literature for future studies. After agreement of clinical content, the algorithms were programmed into a test script, tested with the respective hospital's health care professionals (HCPs), and refined according to their feedback. Once finalized, the algorithms were programmed into the Neotree software and implemented at the tertiary-level implementation sites: Sally Mugabe Central Hospital in Zimbabwe and Kamuzu Central Hospital in Malawi, in December 2021 and May 2022, respectively. In Zimbabwe, usability was evaluated through 2 usability workshops and usability questionnaires: Post-Study System Usability Questionnaire (PSSUQ) and System Usability Scale (SUS). Results: Overall, 11 evidence-based diagnostic and management algorithms were tailored to local resource availability. These refined algorithms were then integrated into Neotree. Where national management guidelines differed, country-specific guidelines were created. In total, 9 HCPs attended the usability workshops and completed the SUS, among whom 8 (89\%) completed the PSSUQ. Both usability scores (SUS mean score 75.8 out of 100 [higher score is better]; PSSUQ overall score 2.28 out of 7 [lower score is better]) demonstrated high usability of the CDS function but highlighted issues around technical complexity, which continue to be addressed iteratively. Conclusions: This study describes the successful development and implementation of the only known neonatal CDS system, incorporated within a bedside data capture system with the ability to deliver up-to-date management guidelines, tailored to local resource availability. This study highlighted the importance of collaborative participatory design. Further implementation evaluation is planned to guide and inform the development of health system and program strategies to support newborn HCPs, with the ultimate goal of reducing preventable neonatal morbidity and mortality in low-resource settings. ", doi="10.2196/54274", url="https://formative.jmir.org/2024/1/e54274", url="http://www.ncbi.nlm.nih.gov/pubmed/38277198" } @Article{info:doi/10.2196/48443, author="Lee, You-Qian and Chen, Ching-Tai and Chen, Chien-Chang and Lee, Chung-Hong and Chen, Peitsz and Wu, Chi-Shin and Dai, Hong-Jie", title="Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study", journal="J Med Internet Res", year="2024", month="Jan", day="25", volume="26", pages="e48443", keywords="code mixing", keywords="electronic health record", keywords="deidentification", keywords="pretrained language model", keywords="large language model", keywords="ChatGPT", abstract="Background: The widespread use of electronic health records in the clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual forms, posing a challenge for deidentification. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code mixing. Most current clinical natural language processing techniques are designed for monolingual text, and there is a need to address the deidentification of code-mixed text. Objective: The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned pretrained language models (PLMs) in identifying PHI in the code-mixed context. Additionally, we aimed to evaluate the potential of prompting large language models (LLMs) for recognizing PHI in a zero-shot manner. Methods: We compiled the first clinical code-mixed deidentification data set consisting of text written in Chinese and English. We explored the effectiveness of fine-tuned PLMs for recognizing PHI in code-mixed content, with a focus on whether PLMs exploit naming regularity and mention coverage to achieve superior performance, by probing the developed models' outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of LLMs for recognizing PHI in code-mixed text. Results: The developed methods were evaluated on a code-mixed deidentification corpus of 1700 discharge summaries. We observed that different PHI types had preferences in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHI by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity is weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of code-mixed training instances is essential for the model's performance. Furthermore, the LLM-based deidentification method was a feasible and appealing approach that can be controlled and enhanced through natural language prompts. Conclusions: The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI. ", doi="10.2196/48443", url="https://www.jmir.org/2024/1/e48443", url="http://www.ncbi.nlm.nih.gov/pubmed/38271060" } @Article{info:doi/10.2196/53378, author="Chen, Hongbo and Cohen, Eldan and Wilson, Dulaney and Alfred, Myrtede", title="A Machine Learning Approach with Human-AI Collaboration for Automated Classification of Patient Safety Event Reports: Algorithm Development and Validation Study", journal="JMIR Hum Factors", year="2024", month="Jan", day="25", volume="11", pages="e53378", keywords="accident", keywords="accidents", keywords="black box", keywords="classification", keywords="classifier", keywords="collaboration", keywords="design", keywords="document", keywords="documentation", keywords="documents", keywords="explainability", keywords="explainable", keywords="human-AI collaboration", keywords="human-AI", keywords="human-computer", keywords="human-machine", keywords="incident reporting", keywords="interface design", keywords="interface", keywords="interpretable", keywords="LIME", keywords="machine learning", keywords="patient safety", keywords="predict", keywords="prediction", keywords="predictions", keywords="predictive", keywords="report", keywords="reporting", keywords="safety", keywords="text", keywords="texts", keywords="textual", keywords="artificial intelligence", abstract="Background: Adverse events refer to incidents with potential or actual harm to patients in hospitals. These events are typically documented through patient safety event (PSE) reports, which consist of detailed narratives providing contextual information on the occurrences. Accurate classification of PSE reports is crucial for patient safety monitoring. However, this process faces challenges due to inconsistencies in classifications and the sheer volume of reports. Recent advancements in text representation, particularly contextual text representation derived from transformer-based language models, offer a promising solution for more precise PSE report classification. Integrating the machine learning (ML) classifier necessitates a balance between human expertise and artificial intelligence (AI). Central to this integration is the concept of explainability, which is crucial for building trust and ensuring effective human-AI collaboration. Objective: This study aims to investigate the efficacy of ML classifiers trained using contextual text representation in automatically classifying PSE reports. Furthermore, the study presents an interface that integrates the ML classifier with the explainability technique to facilitate human-AI collaboration for PSE report classification. Methods: This study used a data set of 861 PSE reports from a large academic hospital's maternity units in the Southeastern United States. Various ML classifiers were trained with both static and contextual text representations of PSE reports. The trained ML classifiers were evaluated with multiclass classification metrics and the confusion matrix. The local interpretable model-agnostic explanations (LIME) technique was used to provide the rationale for the ML classifier's predictions. An interface that integrates the ML classifier with the LIME technique was designed for incident reporting systems. Results: The top-performing classifier using contextual representation was able to obtain an accuracy of 75.4\% (95/126) compared to an accuracy of 66.7\% (84/126) by the top-performing classifier trained using static text representation. A PSE reporting interface has been designed to facilitate human-AI collaboration in PSE report classification. In this design, the ML classifier recommends the top 2 most probable event types, along with the explanations for the prediction, enabling PSE reporters and patient safety analysts to choose the most suitable one. The LIME technique showed that the classifier occasionally relies on arbitrary words for classification, emphasizing the necessity of human oversight. Conclusions: This study demonstrates that training ML classifiers with contextual text representations can significantly enhance the accuracy of PSE report classification. The interface designed in this study lays the foundation for human-AI collaboration in the classification of PSE reports. The insights gained from this research enhance the decision-making process in PSE report classification, enabling hospitals to more efficiently identify potential risks and hazards and enabling patient safety analysts to take timely actions to prevent patient harm. ", doi="10.2196/53378", url="https://humanfactors.jmir.org/2024/1/e53378", url="http://www.ncbi.nlm.nih.gov/pubmed/38271086" } @Article{info:doi/10.2196/52113, author="Herrmann-Werner, Anne and Festl-Wietek, Teresa and Holderried, Friederike and Herschbach, Lea and Griewatz, Jan and Masters, Ken and Zipfel, Stephan and Mahling, Moritz", title="Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study", journal="J Med Internet Res", year="2024", month="Jan", day="23", volume="26", pages="e52113", keywords="answer", keywords="artificial intelligence", keywords="assessment", keywords="Bloom's taxonomy", keywords="ChatGPT", keywords="classification", keywords="error", keywords="exam", keywords="examination", keywords="generative", keywords="GPT-4", keywords="Generative Pre-trained Transformer 4", keywords="language model", keywords="learning outcome", keywords="LLM", keywords="MCQ", keywords="medical education", keywords="medical exam", keywords="multiple-choice question", keywords="natural language processing", keywords="NLP", keywords="psychosomatic", keywords="question", keywords="response", keywords="taxonomy", abstract="Background: Large language models such as GPT-4 (Generative Pre-trained Transformer 4) are being increasingly used in medicine and medical education. However, these models are prone to ``hallucinations'' (ie, outputs that seem convincing while being factually incorrect). It is currently unknown how these errors by large language models relate to the different cognitive levels defined in Bloom's taxonomy. Objective: This study aims to explore how GPT-4 performs in terms of Bloom's taxonomy using psychosomatic medicine exam questions. Methods: We used a large data set of psychosomatic medicine multiple-choice questions (N=307) with real-world results derived from medical school exams. GPT-4 answered the multiple-choice questions using 2 distinct prompt versions: detailed and short. The answers were analyzed using a quantitative approach and a qualitative approach. Focusing on incorrectly answered questions, we categorized reasoning errors according to the hierarchical framework of Bloom's taxonomy. Results: GPT-4's performance in answering exam questions yielded a high success rate: 93\% (284/307) for the detailed prompt and 91\% (278/307) for the short prompt. Questions answered correctly by GPT-4 had a statistically significant higher difficulty than questions answered incorrectly (P=.002 for the detailed prompt and P<.001 for the short prompt). Independent of the prompt, GPT-4's lowest exam performance was 78.9\% (15/19), thereby always surpassing the ``pass'' threshold. Our qualitative analysis of incorrect answers, based on Bloom's taxonomy, showed that errors were primarily in the ``remember'' (29/68) and ``understand'' (23/68) cognitive levels; specific issues arose in recalling details, understanding conceptual relationships, and adhering to standardized guidelines. Conclusions: GPT-4 demonstrated a remarkable success rate when confronted with psychosomatic medicine multiple-choice exam questions, aligning with previous findings. When evaluated through Bloom's taxonomy, our data revealed that GPT-4 occasionally ignored specific facts (remember), provided illogical reasoning (understand), or failed to apply concepts to a new situation (apply). These errors, which were confidently presented, could be attributed to inherent model biases and the tendency to generate outputs that maximize likelihood. ", doi="10.2196/52113", url="https://www.jmir.org/2024/1/e52113", url="http://www.ncbi.nlm.nih.gov/pubmed/38261378" } @Article{info:doi/10.2196/48842, author="Zhou, Linyun and Jiang, Minghuan and Duan, Ran and Zuo, Feng and Li, Zongfang and Xu, Songhua", title="Barriers and Implications of 5G Technology Adoption for Hospitals in Western China: Integrated Interpretive Structural Modeling and Decision-Making Trial and Evaluation Laboratory Analysis", journal="JMIR Mhealth Uhealth", year="2024", month="Jan", day="23", volume="12", pages="e48842", keywords="5G health care", keywords="5G adoption barriers", keywords="5G adoption strategy", keywords="smart health care", keywords="Western China hospitals", abstract="Background: 5G technology is gaining traction in Chinese hospitals for its potential to enhance patient care and internal management. However, various barriers hinder its implementation in clinical settings, and studies on their relevance and importance are scarce. Objective: This study aimed to identify critical barriers hampering the effective implementation of 5G in hospitals in Western China, to identify interaction relationships and priorities of the above-identified barriers, and to assess the intensity of the relationships and cause-and-effect relations between the adoption barriers. Methods: This paper uses the Delphi expert consultation method to determine key barriers to 5G adoption in Western China hospitals, the interpretive structural modeling to uncover interaction relationships and priorities, and the decision-making trial and evaluation laboratory method to reveal cause-and-effect relationships and their intensity levels. Results: In total, 14 barriers were determined by literature review and the Delphi method. Among these, ``lack of policies on ethics, rights, and responsibilities in core health care scenarios'' emerged as the fundamental influencing factor in the entire system, as it was the only factor at the bottom level of the interpretive structural model. Overall, 8 barriers were classified as the ``cause group,'' and 6 as the ``effect group'' by the decision-making trial and evaluation laboratory method. ``High expense'' and ``organizational barriers within hospitals'' were determined as the most significant driving barrier (the highest R--C value of 1.361) and the most critical barrier (the highest R+C value of 4.317), respectively. Conclusions: Promoting the integration of 5G in hospitals in Western China faces multiple complex and interrelated barriers. The study provides valuable quantitative evidence and a comprehensive approach for regulatory authorities, hospitals, and telecom operators, helping them develop strategic pathways for promoting widespread 5G adoption in health care. It is suggested that the stakeholders cooperate to explore and solve the problems in the 5G medical care era, aiming to achieve the coverage of 5G medical care across the country. To our best knowledge, this study is the first academic exploration systematically analyzing factors resisting 5G integration in Chinese hospitals, and it may give subsequent researchers a solid foundation for further studying the application and development of 5G in health care. ", doi="10.2196/48842", url="https://mhealth.jmir.org/2024/1/e48842", url="http://www.ncbi.nlm.nih.gov/pubmed/38261368" } @Article{info:doi/10.2196/51926, author="Liu, Xiaocong and Wu, Jiageng and Shao, An and Shen, Wenyue and Ye, Panpan and Wang, Yao and Ye, Juan and Jin, Kai and Yang, Jie", title="Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study", journal="J Med Internet Res", year="2024", month="Jan", day="22", volume="26", pages="e51926", keywords="large language models", keywords="ChatGPT", keywords="clinical decision support", keywords="retinal vascular disease", keywords="artificial intelligence", abstract="Background: Benefiting from rich knowledge and the exceptional ability to understand text, large language models like ChatGPT have shown great potential in English clinical environments. However, the performance of ChatGPT in non-English clinical settings, as well as its reasoning, have not been explored in depth. Objective: This study aimed to evaluate ChatGPT's diagnostic performance and inference abilities for retinal vascular diseases in a non-English clinical environment. Methods: In this cross-sectional study, we collected 1226 fundus fluorescein angiography reports and corresponding diagnoses written in Chinese and tested ChatGPT with 4 prompting strategies (direct diagnosis or diagnosis with a step-by-step reasoning process and in Chinese or English). Results: Compared with ChatGPT using Chinese prompts for direct diagnosis that achieved an F1-score of 70.47\%, ChatGPT using English prompts for direct diagnosis achieved the best diagnostic performance (80.05\%), which was inferior to ophthalmologists (89.35\%) but close to ophthalmologist interns (82.69\%). As for its inference abilities, although ChatGPT can derive a reasoning process with a low error rate (0.4 per report) for both Chinese and English prompts, ophthalmologists identified that the latter brought more reasoning steps with less incompleteness (44.31\%), misinformation (1.96\%), and hallucinations (0.59\%) (all P<.001). Also, analysis of the robustness of ChatGPT with different language prompts indicated significant differences in the recall (P=.03) and F1-score (P=.04) between Chinese and English prompts. In short, when prompted in English, ChatGPT exhibited enhanced diagnostic and inference capabilities for retinal vascular disease classification based on Chinese fundus fluorescein angiography reports. Conclusions: ChatGPT can serve as a helpful medical assistant to provide diagnosis in non-English clinical environments, but there are still performance gaps, language disparities, and errors compared to professionals, which demonstrate the potential limitations and the need to continually explore more robust large language models in ophthalmology practice. ", doi="10.2196/51926", url="https://www.jmir.org/2024/1/e51926", url="http://www.ncbi.nlm.nih.gov/pubmed/38252483" } @Article{info:doi/10.2196/48527, author="Wang, Qingyi and Chang, Zhuo and Liu, Xiaofang and Wang, Yunrui and Feng, Chuwen and Ping, Yunlu and Feng, Xiaoling", title="Predictive Value of Machine Learning for Platinum Chemotherapy Responses in Ovarian Cancer: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2024", month="Jan", day="22", volume="26", pages="e48527", keywords="ovarian cancer", keywords="platinum chemotherapy response", keywords="machine learning", keywords="platinum-based therapy", keywords="predictive potential", abstract="Background: Machine learning is a potentially effective method for predicting the response to platinum-based treatment for ovarian cancer. However, the predictive performance of various machine learning methods and variables is still a matter of controversy and debate. Objective: This study aims to systematically review relevant literature on the predictive value of machine learning for platinum-based chemotherapy responses in patients with ovarian cancer. Methods: Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we systematically searched the PubMed, Embase, Web of Science, and Cochrane databases for relevant studies on predictive models for platinum-based therapies for the treatment of ovarian cancer published before April 26, 2023. The Prediction Model Risk of Bias Assessment tool was used to evaluate the risk of bias in the included articles. Concordance index (C-index), sensitivity, and specificity were used to evaluate the performance of the prediction models to investigate the predictive value of machine learning for platinum chemotherapy responses in patients with ovarian cancer. Results: A total of 1749 articles were examined, and 19 of them involving 39 models were eligible for this study. The most commonly used modeling methods were logistic regression (16/39, 41\%), Extreme Gradient Boosting (4/39, 10\%), and support vector machine (4/39, 10\%). The training cohort reported C-index in 39 predictive models, with a pooled value of 0.806; the validation cohort reported C-index in 12 predictive models, with a pooled value of 0.831. Support vector machine performed well in both the training and validation cohorts, with a C-index of 0.942 and 0.879, respectively. The pooled sensitivity was 0.890, and the pooled specificity was 0.790 in the training cohort. Conclusions: Machine learning can effectively predict how patients with ovarian cancer respond to platinum-based chemotherapy and may provide a reference for the development or updating of subsequent scoring systems. ", doi="10.2196/48527", url="https://www.jmir.org/2024/1/e48527", url="http://www.ncbi.nlm.nih.gov/pubmed/38252469" } @Article{info:doi/10.2196/49221, author="Schnoor, Kyma and Versluis, Anke and Chavannes, H. Niels and Talboom-Kamp, A. Esther P. W.", title="Digital Triage Tools for Sexually Transmitted Infection Testing Compared With General Practitioners' Advice: Vignette-Based Qualitative Study With Interviews Among General Practitioners", journal="JMIR Hum Factors", year="2024", month="Jan", day="22", volume="11", pages="e49221", keywords="eHealth", keywords="digital triage tool", keywords="sexually transmitted infection", keywords="STI", keywords="human immunodeficiency virus", keywords="general practitioners", keywords="GPs decision-making", keywords="digital health", keywords="diagnostic", keywords="sexually transmitted disease", keywords="STD", keywords="sexually transmitted", keywords="sexual transmission", keywords="triage", keywords="artificial intelligence", keywords="HIV", keywords="diagnostics", keywords="diagnosis", keywords="vignette", keywords="vignettes", keywords="interview", keywords="interviews", keywords="best practice", keywords="best practices", keywords="thematic analysis", keywords="referral", keywords="medical advice", abstract="Background: Digital triage tools for sexually transmitted infection (STI) testing can potentially be used as a substitute for the triage that general practitioners (GPs) perform to lower their work pressure. The studied tool is based on medical guidelines. The same guidelines support GPs' decision-making process. However, research has shown that GPs make decisions from a holistic perspective and, therefore, do not always adhere to those guidelines. To have a high-quality digital triage tool that results in an efficient care process, it is important to learn more about GPs' decision-making process. Objective: The first objective was to identify whether the advice of the studied digital triage tool aligned with GPs' daily medical practice. The second objective was to learn which factors influence GPs' decisions regarding referral for diagnostic testing. In addition, this study provides insights into GPs' decision-making process. Methods: A qualitative vignette-based study using semistructured interviews was conducted. In total, 6 vignettes representing patient cases were discussed with the participants (GPs). The participants needed to think aloud whether they would advise an STI test for the patient and why. A thematic analysis was conducted on the transcripts of the interviews. The vignette patient cases were also passed through the digital triage tool, resulting in advice to test or not for an STI. A comparison was made between the advice of the tool and that of the participants. Results: In total, 10 interviews were conducted. Participants (GPs) had a mean age of 48.30 (SD 11.88) years. For 3 vignettes, the advice of the digital triage tool and of all participants was the same. In those vignettes, the patients' risk factors were sufficiently clear for the participants to advise the same as the digital tool. For 3 vignettes, the advice of the digital tool differed from that of the participants. Patient-related factors that influenced the participants' decision-making process were the patient's anxiety, young age, and willingness to be tested. Participants would test at a lower threshold than the triage tool because of those factors. Sometimes, participants wanted more information than was provided in the vignette or would like to conduct a physical examination. These elements were not part of the digital triage tool. Conclusions: The advice to conduct a diagnostic STI test differed between a digital triage tool and GPs. The digital triage tool considered only medical guidelines, whereas GPs were open to discussion reasoning from a holistic perspective. The GPs' decision-making process was influenced by patients' anxiety, willingness to be tested, and age. On the basis of these results, we believe that the digital triage tool for STI testing could support GPs and even replace consultations in the future. Further research must substantiate how this can be done safely. ", doi="10.2196/49221", url="https://humanfactors.jmir.org/2024/1/e49221", url="http://www.ncbi.nlm.nih.gov/pubmed/38252474" } @Article{info:doi/10.2196/49986, author="Ulgu, Mahir Mustafa and Laleci Erturkmen, Banu Gokce and Yuksel, Mustafa and Namli, Tuncay and Postac?, ?enan and Gencturk, Mert and Kabak, Yildiray and Sinaci, Anil A. and Gonul, Suat and Dogac, Asuman and {\"O}zkan Altunay, Z{\"u}beyde and Ekinci, Banu and Aydin, Sahin and Birinci, Suayip", title="A Nationwide Chronic Disease Management Solution via Clinical Decision Support Services: Software Development and Real-Life Implementation Report", journal="JMIR Med Inform", year="2024", month="Jan", day="19", volume="12", pages="e49986", keywords="chronic disease management", keywords="clinical decision support services", keywords="integrated care", keywords="interoperability", keywords="evidence-based medicine", keywords="medicine", keywords="disease management", keywords="management", keywords="implementation", keywords="decision support", keywords="clinical decision", keywords="support", keywords="chronic disease", keywords="physician-centered", keywords="risk assessment", keywords="tracking", keywords="diagnosis", abstract="Background: The increasing population of older adults has led to a rise in the demand for health care services, with chronic diseases being a major burden. Person-centered integrated care is required to address these challenges; hence, the Turkish Ministry of Health has initiated strategies to implement an integrated health care model for chronic disease management. We aim to present the design, development, nationwide implementation, and initial performance results of the national Disease Management Platform (DMP). Objective: This paper's objective is to present the design decisions taken and technical solutions provided to ensure successful nationwide implementation by addressing several challenges, including interoperability with existing IT systems, integration with clinical workflow, enabling transition of care, ease of use by health care professionals, scalability, high performance, and adaptability. Methods: The DMP is implemented as an integrated care solution that heavily uses clinical decision support services to coordinate effective screening and management of chronic diseases in adherence to evidence-based clinical guidelines and, hence, to increase the quality of health care delivery. The DMP is designed and implemented to be easily integrated with the existing regional and national health IT systems via conformance to international health IT standards, such as Health Level Seven Fast Healthcare Interoperability Resources. A repeatable cocreation strategy has been used to design and develop new disease modules to ensure extensibility while ensuring ease of use and seamless integration into the regular clinical workflow during patient encounters. The DMP is horizontally scalable in case of high load to ensure high performance. Results: As of September 2023, the DMP has been used by 25,568 health professionals to perform 73,715,269 encounters for 16,058,904 unique citizens. It has been used to screen and monitor chronic diseases such as obesity, cardiovascular risk, diabetes, and hypertension, resulting in the diagnosis of 3,545,573 patients with obesity, 534,423 patients with high cardiovascular risk, 490,346 patients with diabetes, and 144,768 patients with hypertension. Conclusions: It has been demonstrated that the platform can scale horizontally and efficiently provides services to thousands of family medicine practitioners without performance problems. The system seamlessly interoperates with existing health IT solutions and runs as a part of the clinical workflow of physicians at the point of care. By automatically accessing and processing patient data from various sources to provide personalized care plan guidance, it maximizes the effect of evidence-based decision support services by seamless integration with point-of-care electronic health record systems. As the system is built on international code systems and standards, adaptation and deployment to additional regional and national settings become easily possible. The nationwide DMP as an integrated care solution has been operational since January 2020, coordinating effective screening and management of chronic diseases in adherence to evidence-based clinical guidelines. ", doi="10.2196/49986", url="https://medinform.jmir.org/2024/1/e49986", url="http://www.ncbi.nlm.nih.gov/pubmed/38241077" } @Article{info:doi/10.2196/53002, author="Zhan, Siyi and Ding, Liping and Li, Hui and Su, Aonan", title="Application of Failure Mode and Effects Analysis to Improve the Quality of the Front Page of Electronic Medical Records in China: Cross-Sectional Data Mapping Analysis", journal="JMIR Med Inform", year="2024", month="Jan", day="19", volume="12", pages="e53002", keywords="front page", keywords="EMR system", keywords="electronic medical record", keywords="failure mode and effects analysis", keywords="FMEA", keywords="measures", abstract="Background: The completeness and accuracy of the front pages of electronic medical records (EMRs) are crucial for evaluating hospital performance and for health insurance payments to inpatients. However, the quality of the first page of EMRs in China's medical system is not satisfactory, which can be partly attributed to deficiencies in the EMR system. Failure mode and effects analysis (FMEA) is a proactive risk management tool that can be used to investigate the potential failure modes in an EMR system and analyze the possible consequences. Objective: The purpose of this study was to preemptively identify the potential failures of the EMR system in China and their causes and effects in order to prevent such failures from recurring. Further, we aimed to implement corresponding improvements to minimize system failure modes. Methods: From January 1, 2020, to May 31, 2022, 10 experts, including clinicians, engineers, administrators, and medical record coders, in Zhejiang People's Hospital conducted FMEA to improve the quality of the front page of the EMR. The completeness and accuracy of the front page and the risk priority numbers were compared before and after the implementation of specific improvement measures. Results: We identified 2 main processes and 6 subprocesses for improving the EMR system. We found that there were 13 potential failure modes, including data messaging errors, data completion errors, incomplete quality control, and coding errors. A questionnaire survey administered to random physicians and coders showed 7 major causes for these failure modes. Therefore, we established quality control rules for medical records and embedded them in the system. We also integrated the medical insurance system and the front page of the EMR on the same interface and established a set of intelligent front pages in the EMR management system. Further, we revamped the quality management systems such as communicating with physicians regularly and conducting special training seminars. The overall accuracy and integrity rate of the front page (P<.001) of the EMR increased significantly after implementation of the improvement measures, while the risk priority number decreased. Conclusions: In this study, we were able to identify the potential failure modes in the front page of the EMR system by using the FMEA method and implement corresponding improvement measures in order to minimize recurring errors in the health care services in China. ", doi="10.2196/53002", url="https://medinform.jmir.org/2024/1/e53002", url="http://www.ncbi.nlm.nih.gov/pubmed/38241064" } @Article{info:doi/10.2196/52880, author="Tabja Bortesi, Pablo Juan and Ranisau, Jonathan and Di, Shuang and McGillion, Michael and Rosella, Laura and Johnson, Alistair and Devereaux, PJ and Petch, Jeremy", title="Machine Learning Approaches for the Image-Based Identification of Surgical Wound Infections: Scoping Review", journal="J Med Internet Res", year="2024", month="Jan", day="18", volume="26", pages="e52880", keywords="surgical site infection", keywords="machine learning", keywords="postoperative surveillance", keywords="wound imaging", keywords="mobile phone", abstract="Background: Surgical site infections (SSIs) occur frequently and impact patients and health care systems. Remote surveillance of surgical wounds is currently limited by the need for manual assessment by clinicians. Machine learning (ML)--based methods have recently been used to address various aspects of the postoperative wound healing process and may be used to improve the scalability and cost-effectiveness of remote surgical wound assessment. Objective: The objective of this review was to provide an overview of the ML methods that have been used to identify surgical wound infections from images. Methods: We conducted a scoping review of ML approaches for visual detection of SSIs following the JBI (Joanna Briggs Institute) methodology. Reports of participants in any postoperative context focusing on identification of surgical wound infections were included. Studies that did not address SSI identification, surgical wounds, or did not use image or video data were excluded. We searched MEDLINE, Embase, CINAHL, CENTRAL, Web of Science Core Collection, IEEE Xplore, Compendex, and arXiv for relevant studies in November 2022. The records retrieved were double screened for eligibility. A data extraction tool was used to chart the relevant data, which was described narratively and presented using tables. Employment of TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines was evaluated and PROBAST (Prediction Model Risk of Bias Assessment Tool) was used to assess risk of bias (RoB). Results: In total, 10 of the 715 unique records screened met the eligibility criteria. In these studies, the clinical contexts and surgical procedures were diverse. All papers developed diagnostic models, though none performed external validation. Both traditional ML and deep learning methods were used to identify SSIs from mostly color images, and the volume of images used ranged from under 50 to thousands. Further, 10 TRIPOD items were reported in at least 4 studies, though 15 items were reported in fewer than 4 studies. PROBAST assessment led to 9 studies being identified as having an overall high RoB, with 1 study having overall unclear RoB. Conclusions: Research on the image-based identification of surgical wound infections using ML remains novel, and there is a need for standardized reporting. Limitations related to variability in image capture, model building, and data sources should be addressed in the future. ", doi="10.2196/52880", url="https://www.jmir.org/2024/1/e52880", url="http://www.ncbi.nlm.nih.gov/pubmed/38236623" } @Article{info:doi/10.2196/51925, author="de Hond, Anne and van Buchem, Marieke and Fanconi, Claudio and Roy, Mohana and Blayney, Douglas and Kant, Ilse and Steyerberg, Ewout and Hernandez-Boussard, Tina", title="Predicting Depression Risk in Patients With Cancer Using Multimodal Data: Algorithm Development Study", journal="JMIR Med Inform", year="2024", month="Jan", day="18", volume="12", pages="e51925", keywords="natural language processing", keywords="machine learning", keywords="artificial intelligence", keywords="oncology", keywords="depression", keywords="clinical decision support", keywords="decision support", keywords="cancer", keywords="patients with cancer", keywords="chemotherapy", keywords="mental health", keywords="prediction model", keywords="depression risk", keywords="cancer treatment", keywords="radiotherapy", keywords="diagnosis", keywords="validation", keywords="cancer care", keywords="care", abstract="Background: Patients with cancer starting systemic treatment programs, such as chemotherapy, often develop depression. A prediction model may assist physicians and health care workers in the early identification of these vulnerable patients. Objective: This study aimed to develop a prediction model for depression risk within the first month of cancer treatment. Methods: We included 16,159 patients diagnosed with cancer starting chemo- or radiotherapy treatment between 2008 and 2021. Machine learning models (eg, least absolute shrinkage and selection operator [LASSO] logistic regression) and natural language processing models (Bidirectional Encoder Representations from Transformers [BERT]) were used to develop multimodal prediction models using both electronic health record data and unstructured text (patient emails and clinician notes). Model performance was assessed in an independent test set (n=5387, 33\%) using area under the receiver operating characteristic curve (AUROC), calibration curves, and decision curve analysis to assess initial clinical impact use. Results: Among 16,159 patients, 437 (2.7\%) received a depression diagnosis within the first month of treatment. The LASSO logistic regression models based on the structured data (AUROC 0.74, 95\% CI 0.71-0.78) and structured data with email classification scores (AUROC 0.74, 95\% CI 0.71-0.78) had the best discriminative performance. The BERT models based on clinician notes and structured data with email classification scores had AUROCs around 0.71. The logistic regression model based on email classification scores alone performed poorly (AUROC 0.54, 95\% CI 0.52-0.56), and the model based solely on clinician notes had the worst performance (AUROC 0.50, 95\% CI 0.49-0.52). Calibration was good for the logistic regression models, whereas the BERT models produced overly extreme risk estimates even after recalibration. There was a small range of decision thresholds for which the best-performing model showed promising clinical effectiveness use. The risks were underestimated for female and Black patients. Conclusions: The results demonstrated the potential and limitations of machine learning and multimodal models for predicting depression risk in patients with cancer. Future research is needed to further validate these models, refine the outcome label and predictors related to mental health, and address biases across subgroups. ", doi="10.2196/51925", url="https://medinform.jmir.org/2024/1/e51925", url="http://www.ncbi.nlm.nih.gov/pubmed/38236635" } @Article{info:doi/10.2196/49007, author="Mehra, Tarun and Wekhof, Tobias and Keller, Iris Dagmar", title="Additional Value From Free-Text Diagnoses in Electronic Health Records: Hybrid Dictionary and Machine Learning Classification Study", journal="JMIR Med Inform", year="2024", month="Jan", day="17", volume="12", pages="e49007", keywords="electronic health records", keywords="free text", keywords="natural language processing", keywords="NLP", keywords="artificial intelligence", keywords="AI", abstract="Background: Physicians are hesitant to forgo the opportunity of entering unstructured clinical notes for structured data entry in electronic health records. Does free text increase informational value in comparison with structured data? Objective: This study aims to compare information from unstructured text-based chief complaints harvested and processed by a natural language processing (NLP) algorithm with clinician-entered structured diagnoses in terms of their potential utility for automated improvement of patient workflows. Methods: Electronic health records of 293,298 patient visits at the emergency department of a Swiss university hospital from January 2014 to October 2021 were analyzed. Using emergency department overcrowding as a case in point, we compared supervised NLP-based keyword dictionaries of symptom clusters from unstructured clinical notes and clinician-entered chief complaints from a structured drop-down menu with the following 2 outcomes: hospitalization and high Emergency Severity Index (ESI) score. Results: Of 12 symptom clusters, the NLP cluster was substantial in predicting hospitalization in 11 (92\%) clusters; 8 (67\%) clusters remained significant even after controlling for the cluster of clinician-determined chief complaints in the model. All 12 NLP symptom clusters were significant in predicting a low ESI score, of which 9 (75\%) remained significant when controlling for clinician-determined chief complaints. The correlation between NLP clusters and chief complaints was low (r=?0.04 to 0.6), indicating complementarity of information. Conclusions: The NLP-derived features and clinicians' knowledge were complementary in explaining patient outcome heterogeneity. They can provide an efficient approach to patient flow management, for example, in an emergency medicine setting. We further demonstrated the feasibility of creating extensive and precise keyword dictionaries with NLP by medical experts without requiring programming knowledge. Using the dictionary, we could classify short and unstructured clinical texts into diagnostic categories defined by the clinician. ", doi="10.2196/49007", url="https://medinform.jmir.org/2024/1/e49007", url="http://www.ncbi.nlm.nih.gov/pubmed/38231569" } @Article{info:doi/10.2196/50174, author="Nguyen, Tina", title="ChatGPT in Medical Education: A Precursor for Automation Bias?", journal="JMIR Med Educ", year="2024", month="Jan", day="17", volume="10", pages="e50174", keywords="ChatGPT", keywords="artificial intelligence", keywords="AI", keywords="medical students", keywords="residents", keywords="medical school curriculum", keywords="medical education", keywords="automation bias", keywords="large language models", keywords="LLMs", keywords="bias", doi="10.2196/50174", url="https://mededu.jmir.org/2024/1/e50174", url="http://www.ncbi.nlm.nih.gov/pubmed/38231545" } @Article{info:doi/10.2196/44653, author="Pang, MengWei and Dong, YanLing and Zhao, XiaoHan and Wan, JiaWu and Jiang, Li and Song, JinLin and Ji, Ping and Jiang, Lin", title="Virtual and Interprofessional Objective Structured Clinical Examination in Dentistry and Dental Technology: Development and User Evaluations", journal="JMIR Form Res", year="2024", month="Jan", day="17", volume="8", pages="e44653", keywords="dentist", keywords="dental technician", keywords="objective structured clinical examination", keywords="OSCE", keywords="interprofessional education", keywords="interprofessional collaborative practice", abstract="Background: Interprofessional education (IPE) facilitates interprofessional collaborative practice (IPCP) to encourage teamwork among dental care professionals and is increasingly becoming a part of training programs for dental and dental technology students. However, the focus of previous IPE and IPCP studies has largely been on subjective student and instructor perceptions without including objective assessments of collaborative practice as an outcome measure. Objective: The purposes of this study were to develop the framework for a novel virtual and interprofessional objective structured clinical examination (viOSCE) applicable to dental and dental technology students, to assess the effectiveness of the framework as a tool for measuring the outcomes of IPE, and to promote IPCP among dental and dental technology students. Methods: The framework of the proposed novel viOSCE was developed using the modified Delphi method and then piloted. The lead researcher and a group of experts determined the content and scoring system. Subjective data were collected using the Readiness for Interprofessional Learning Scale and a self-made scale, and objective data were collected using examiner ratings. Data were analyzed using nonparametric tests. Results: We successfully developed a viOSCE framework applicable to dental and dental technology students. Of 50 students, 32 (64\%) participated in the pilot study and completed the questionnaires. On the basis of the Readiness for Interprofessional Learning Scale, the subjective evaluation indicated that teamwork skills were improved, and the only statistically significant difference in participant motivation between the 2 professional groups was in the mutual evaluation scale (P=.004). For the viOSCE evaluation scale, the difference between the professional groups in removable prosthodontics was statistically significant, and a trend for negative correlation between subjective and objective scores was noted, but it was not statistically significant. Conclusions: The results confirm that viOSCE can be used as an objective evaluation tool to assess the outcomes of IPE and IPCP. This study also revealed an interesting relationship between mutual evaluation and IPCP results, further demonstrating that the IPE and IPCP results urgently need to be supplemented with objective evaluation tools. Therefore, the implementation of viOSCE as part of a large and more complete objective structured clinical examination to test the ability of students to meet undergraduate graduation requirements will be the focus of our future studies. ", doi="10.2196/44653", url="https://formative.jmir.org/2024/1/e44653", url="http://www.ncbi.nlm.nih.gov/pubmed/38231556" } @Article{info:doi/10.2196/49970, author="Long, Cai and Lowe, Kayle and Zhang, Jessica and Santos, dos Andr{\'e} and Alanazi, Alaa and O'Brien, Daniel and Wright, D. Erin and Cote, David", title="A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology--Head and Neck Surgery Certification Examinations: Performance Study", journal="JMIR Med Educ", year="2024", month="Jan", day="16", volume="10", pages="e49970", keywords="medical licensing", keywords="otolaryngology", keywords="otology", keywords="laryngology", keywords="ear", keywords="nose", keywords="throat", keywords="ENT", keywords="surgery", keywords="surgical", keywords="exam", keywords="exams", keywords="response", keywords="responses", keywords="answer", keywords="answers", keywords="chatbot", keywords="chatbots", keywords="examination", keywords="examinations", keywords="medical education", keywords="otolaryngology/head and neck surgery", keywords="OHNS", keywords="artificial intelligence", keywords="AI", keywords="ChatGPT", keywords="medical examination", keywords="large language models", keywords="language model", keywords="LLM", keywords="LLMs", keywords="wide range information", keywords="patient safety", keywords="clinical implementation", keywords="safety", keywords="machine learning", keywords="NLP", keywords="natural language processing", abstract="Background: ChatGPT is among the most popular large language models (LLMs), exhibiting proficiency in various standardized tests, including multiple-choice medical board examinations. However, its performance on otolaryngology--head and neck surgery (OHNS) certification examinations and open-ended medical board certification examinations has not been reported. Objective: We aimed to evaluate the performance of ChatGPT on OHNS board examinations and propose a novel method to assess an AI model's performance on open-ended medical board examination questions. Methods: Twenty-one open-ended questions were adopted from the Royal College of Physicians and Surgeons of Canada's sample examination to query ChatGPT on April 11, 2023, with and without prompts. A new model, named Concordance, Validity, Safety, Competency (CVSC), was developed to evaluate its performance. Results: In an open-ended question assessment, ChatGPT achieved a passing mark (an average of 75\% across 3 trials) in the attempts and demonstrated higher accuracy with prompts. The model demonstrated high concordance (92.06\%) and satisfactory validity. While demonstrating considerable consistency in regenerating answers, it often provided only partially correct responses. Notably, concerning features such as hallucinations and self-conflicting answers were observed. Conclusions: ChatGPT achieved a passing score in the sample examination and demonstrated the potential to pass the OHNS certification examination of the Royal College of Physicians and Surgeons of Canada. Some concerns remain due to its hallucinations, which could pose risks to patient safety. Further adjustments are necessary to yield safer and more accurate answers for clinical implementation. ", doi="10.2196/49970", url="https://mededu.jmir.org/2024/1/e49970", url="http://www.ncbi.nlm.nih.gov/pubmed/38227351" } @Article{info:doi/10.2196/49573, author="Mohta, Alpana and Mohta, Achala and Kumari, Pramila", title="An Unusual Case of Anderson-Fabry Disease: Case Report", journal="JMIR Dermatol", year="2024", month="Jan", day="16", volume="7", pages="e49573", keywords="angiokeratoma", keywords="Fabry disease", keywords="angiokeratoma corporis diffusum", keywords="vascular", keywords="capillary", keywords="capillaries", keywords="blood vessel", keywords="lysosome", keywords="lysosomal", keywords="enzyme", keywords="enzymatic", keywords="case report", keywords="circulatory", keywords="skin", keywords="dermatology", keywords="dermatological", doi="10.2196/49573", url="https://derma.jmir.org/2024/1/e49573", url="http://www.ncbi.nlm.nih.gov/pubmed/38227354" } @Article{info:doi/10.2196/53366, author="Poulsen, N. Melissa and Freda, J. Philip and Troiani, Vanessa and Mowery, L. Danielle", title="Developing a Framework to Infer Opioid Use Disorder Severity From Clinical Notes to Inform Natural Language Processing Methods: Characterization Study", journal="JMIR Ment Health", year="2024", month="Jan", day="15", volume="11", pages="e53366", keywords="annotation", keywords="clinical notes", keywords="natural language processing", keywords="opioid related disorders", keywords="opioid use disorder", keywords="substance use disorders", keywords="adult", keywords="adults", keywords="opioid", keywords="annotation schema", keywords="severity score", keywords="substance misuse", keywords="mental health", abstract="Background: Information regarding opioid use disorder (OUD) status and severity is important for patient care. Clinical notes provide valuable information for detecting and characterizing problematic opioid use, necessitating development of natural language processing (NLP) tools, which in turn requires reliably labeled OUD-relevant text and understanding of documentation patterns. Objective: To inform automated NLP methods, we aimed to develop and evaluate an annotation schema for characterizing OUD and its severity, and to document patterns of OUD-relevant information within clinical notes of heterogeneous patient cohorts. Methods: We developed an annotation schema to characterize OUD severity based on criteria from the Diagnostic and Statistical Manual of Mental Disorders, 5th edition. In total, 2 annotators reviewed clinical notes from key encounters of 100 adult patients with varied evidence of OUD, including patients with and those without chronic pain, with and without medication treatment for OUD, and a control group. We completed annotations at the sentence level. We calculated severity scores based on annotation of note text with 18 classes aligned with criteria for OUD severity and determined positive predictive values for OUD severity. Results: The annotation schema contained 27 classes. We annotated 1436 sentences from 82 patients; notes of 18 patients (11 of whom were controls) contained no relevant information. Interannotator agreement was above 70\% for 11 of 15 batches of reviewed notes. Severity scores for control group patients were all 0. Among noncontrol patients, the mean severity score was 5.1 (SD 3.2), indicating moderate OUD, and the positive predictive value for detecting moderate or severe OUD was 0.71. Progress notes and notes from emergency department and outpatient settings contained the most and greatest diversity of information. Substance misuse and psychiatric classes were most prevalent and highly correlated across note types with high co-occurrence across patients. Conclusions: Implementation of the annotation schema demonstrated strong potential for inferring OUD severity based on key information in a small set of clinical notes and highlighting where such information is documented. These advancements will facilitate NLP tool development to improve OUD prevention, diagnosis, and treatment. ", doi="10.2196/53366", url="https://mental.jmir.org/2024/1/e53366", url="http://www.ncbi.nlm.nih.gov/pubmed/38224481" } @Article{info:doi/10.2196/45391, author="Zheng, Lu and Ohde, W. Joshua and Overgaard, M. Shauna and Brereton, A. Tracey and Jose, Kristelle and Wi, Chung-Il and Peterson, J. Kevin and Juhn, J. Young", title="Clinical Needs Assessment of a Machine Learning--Based Asthma Management Tool: User-Centered Design Approach", journal="JMIR Form Res", year="2024", month="Jan", day="15", volume="8", pages="e45391", keywords="asthma", keywords="formative research", keywords="user-centered design", keywords="machine learning (ML)", keywords="artificial intelligence (AI)", keywords="qualitative", keywords="user needs.", abstract="Background: Personalized asthma management depends on a clinician's ability to efficiently review patient's data and make timely clinical decisions. Unfortunately, efficient and effective review of these data is impeded by the varied format, location, and workflow of data acquisition, storage, and processing in the electronic health record. While machine learning (ML) and clinical decision support tools are well-positioned as potential solutions, the translation of such frameworks requires that barriers to implementation be addressed in the formative research stages. Objective: We aimed to use a structured user-centered design approach (double-diamond design framework) to (1) qualitatively explore clinicians' experience with the current asthma management system, (2) identify user requirements to improve algorithm explainability and Asthma Guidance and Prediction System prototype, and (3) identify potential barriers to ML-based clinical decision support system use. Methods: At the ``discovery'' phase, we first shadowed to understand the practice context. Then, semistructured interviews were conducted digitally with 14 clinicians who encountered pediatric asthma patients at 2 outpatient facilities. Participants were asked about their current difficulties in gathering information for patients with pediatric asthma, their expectations of ideal workflows and tools, and suggestions on user-centered interfaces and features. At the ``define'' phase, a synthesis analysis was conducted to converge key results from interviewees' insights into themes, eventually forming critical ``how might we'' research questions to guide model development and implementation. Results: We identified user requirements and potential barriers associated with three overarching themes: (1) usability and workflow aspects of the ML system, (2) user expectations and algorithm explainability, and (3) barriers to implementation in context. Even though the responsibilities and workflows vary among different roles, the core asthma-related information and functions they requested were highly cohesive, which allows for a shared information view of the tool. Clinicians hope to perceive the usability of the model with the ability to note patients' high risks and take proactive actions to manage asthma efficiently and effectively. For optimal ML algorithm explainability, requirements included documentation to support the validity of algorithm development and output logic, and a request for increased transparency to build trust and validate how the algorithm arrived at the decision. Acceptability, adoption, and sustainability of the asthma management tool are implementation outcomes that are reliant on the proper design and training as suggested by participants. Conclusions: As part of our comprehensive informatics-based process centered on clinical usability, we approach the problem using a theoretical framework grounded in user experience research leveraging semistructured interviews. Our focus on meeting the needs of the practice with ML technology is emphasized by a user-centered approach to clinician engagement through upstream technology design. ", doi="10.2196/45391", url="https://formative.jmir.org/2024/1/e45391", url="http://www.ncbi.nlm.nih.gov/pubmed/38224482" } @Article{info:doi/10.2196/48273, author="Yang, Yi and Madanian, Samaneh and Parry, David", title="Enhancing Health Equity by Predicting Missed Appointments in Health Care: Machine Learning Study", journal="JMIR Med Inform", year="2024", month="Jan", day="12", volume="12", pages="e48273", keywords="Did Not Show", keywords="Did Not Attend", keywords="machine learning", keywords="prediction", keywords="decision support system", keywords="health care operation", keywords="data analytics", keywords="patients no-show", keywords="predictive modeling", keywords="appointment nonadherence", keywords="health equity", abstract="Background: The phenomenon of patients missing booked appointments without canceling them---known as Did Not Show (DNS), Did Not Attend (DNA), or Failed To Attend (FTA)---has a detrimental effect on patients' health and results in massive health care resource wastage. Objective: Our objective was to develop machine learning (ML) models and evaluate their performance in predicting the likelihood of DNS for hospital outpatient appointments at the MidCentral District Health Board (MDHB) in New Zealand. Methods: We sourced 5 years of MDHB outpatient records (a total of 1,080,566 outpatient visits) to build the ML prediction models. We developed 3 ML models using logistic regression, random forest, and Extreme Gradient Boosting (XGBoost). Subsequently, 10-fold cross-validation and hyperparameter tuning were deployed to minimize model bias and boost the algorithms' prediction strength. All models were evaluated against accuracy, sensitivity, specificity, and area under the receiver operating characteristic (AUROC) curve metrics. Results: Based on 5 years of MDHB data, the best prediction classifier was XGBoost, with an area under the curve (AUC) of 0.92, sensitivity of 0.83, and specificity of 0.85. The patients' DNS history, age, ethnicity, and appointment lead time significantly contributed to DNS prediction. An ML system trained on a large data set can produce useful levels of DNS prediction. Conclusions: This research is one of the very first published studies that use ML technologies to assist with DNS management in New Zealand. It is a proof of concept and could be used to benchmark DNS predictions for the MDHB and other district health boards. We encourage conducting additional qualitative research to investigate the root cause of DNS issues and potential solutions. Addressing DNS using better strategies potentially can result in better utilization of health care resources and improve health equity. ", doi="10.2196/48273", url="https://medinform.jmir.org/2024/1/e48273", url="http://www.ncbi.nlm.nih.gov/pubmed/38214974" } @Article{info:doi/10.2196/52134, author="Baek, Sangwon and Jeong, joo Yeon and Kim, Yun-Hyeon and Kim, Young Jin and Kim, Hwan Jin and Kim, Young Eun and Lim, Jae-Kwang and Kim, Jungok and Kim, Zero and Kim, Kyunga and Chung, Jin Myung", title="Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study", journal="J Med Internet Res", year="2024", month="Jan", day="11", volume="26", pages="e52134", keywords="COVID-19", keywords="prognosis", keywords="prognostic", keywords="prognostics", keywords="prediction model", keywords="early triaging", keywords="interpretability", keywords="machine learning", keywords="predict", keywords="prediction", keywords="predictive", keywords="triage", keywords="triaging", keywords="emergency", keywords="severity", keywords="biomarker", keywords="biomarkers", keywords="SHAP", keywords="Shapley", keywords="clustering", keywords="hospital admission", keywords="hospital admissions", keywords="hospitalize", keywords="hospitalization", keywords="hospitalizations", keywords="neural network", keywords="neural networks", keywords="deep learning", keywords="Omicron", keywords="SARS-CoV-2", keywords="coronavirus", abstract="Background: Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability. Objective: The primary objective of this study was to develop and validate a machine learning--based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers. Methods: We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods. Results: Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95\% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95\% CI 0.897-0.910). Conclusions: RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice. ", doi="10.2196/52134", url="https://www.jmir.org/2024/1/e52134", url="http://www.ncbi.nlm.nih.gov/pubmed/38206673" } @Article{info:doi/10.2196/46364, author="Bentley, H. Kate and Madsen, M. Emily and Song, Eugene and Zhou, Yu and Castro, Victor and Lee, Hyunjoon and Lee, H. Younga and Smoller, W. Jordan", title="Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study", journal="JMIR Form Res", year="2024", month="Jan", day="8", volume="8", pages="e46364", keywords="suicide", keywords="suicide attempt", keywords="self-injury", keywords="electronic health record", keywords="EHR", keywords="prediction", keywords="predictive model", keywords="predict", keywords="model", keywords="suicidal", keywords="informatics", keywords="automated rule", keywords="psychiatry", keywords="machine learning", abstract="Background: Prior suicide attempts are a relatively strong risk factor for future suicide attempts. There is growing interest in using longitudinal electronic health record (EHR) data to derive statistical risk prediction models for future suicide attempts and other suicidal behavior outcomes. However, model performance may be inflated by a largely unrecognized form of ``data leakage'' during model training: diagnostic codes for suicide attempt outcomes may refer to prior attempts that are also included in the model as predictors. Objective: We aimed to develop an automated rule for determining when documented suicide attempt diagnostic codes identify distinct suicide attempt events. Methods: From a large health care system's EHR, we randomly sampled suicide attempt codes for 300 patients with at least one pair of suicide attempt codes documented at least one but no more than 90 days apart. Supervised chart reviewers assigned the clinical settings (ie, emergency department [ED] versus non-ED), methods of suicide attempt, and intercode interval (number of days). The probability (or positive predictive value) that the second suicide attempt code in a given pair of codes referred to a distinct suicide attempt event from its preceding suicide attempt code was calculated by clinical setting, method, and intercode interval. Results: Of 1015 code pairs reviewed, 835 (82.3\%) were nonindependent (ie, the 2 codes referred to the same suicide attempt event). When the second code in a pair was documented in a clinical setting other than the ED, it represented a distinct suicide attempt 3.3\% of the time. The more time elapsed between codes, the more likely the second code in a pair referred to a distinct suicide attempt event from its preceding code. Code pairs in which the second suicide attempt code was assigned in an ED at least 5 days after its preceding suicide attempt code had a positive predictive value of 0.90. Conclusions: EHR-based suicide risk prediction models that include International Classification of Diseases codes for prior suicide attempts as a predictor may be highly susceptible to bias due to data leakage in model training. We derived a simple rule to distinguish codes that reflect new, independent suicide attempts: suicide attempt codes documented in an ED setting at least 5 days after a preceding suicide attempt code can be confidently treated as new events in EHR-based suicide risk prediction models. This rule has the potential to minimize upward bias in model performance when prior suicide attempts are included as predictors in EHR-based suicide risk prediction models. ", doi="10.2196/46364", url="https://formative.jmir.org/2024/1/e46364", url="http://www.ncbi.nlm.nih.gov/pubmed/38190236" } @Article{info:doi/10.2196/46744, author="Cheung, Kin and Yip, Sum Chak", title="Documentation Completeness and Nurses' Perceptions of a Novel Electronic App for Medical Resuscitation in the Emergency Room: Mixed Methods Approach", journal="JMIR Mhealth Uhealth", year="2024", month="Jan", day="5", volume="12", pages="e46744", keywords="tablet computer", keywords="nursing documentation", keywords="paper resuscitation record", keywords="electronic resuscitation record", keywords="medical resuscitation", keywords="electronic medical record", keywords="documentation", keywords="resuscitation", keywords="electronic health record", keywords="nurses' perception", keywords="traditional paper record", keywords="nurse", abstract="Background: Complete documentation of critical care events in the accident and emergency department (AED) is essential. Due to the fast-paced and complex nature of resuscitation cases, missing data is a common issue during emergency situations. Objective: This study aimed to evaluate the impact of a tablet-based resuscitation record on documentation completeness during medical resuscitations and nurses' perceptions of the use of the tablet app. Methods: A mixed methods approach was adopted. To collect quantitative data, randomized retrospective reviews of paper-based resuscitation records before implementation of the tablet (Pre-App Paper; n=176), paper-based resuscitation records after implementation of the tablet (Post-App Paper; n=176), and electronic tablet-based resuscitation records (Post-App Electronic; n=176) using a documentation completeness checklist were conducted. The checklist was validated by 4 experts in the emergency medicine field. The content validity index (CVI) was calculated using the scale CVI (S-CVI). The universal agreement S-CVI was 0.822, and the average S-CVI was 0.939. The checklist consisted of the following 5 domains: basic information, vital signs, procedures, investigations, and medications. To collect qualitative data, nurses' perceptions of the app for electronic resuscitation documentation were obtained using individual interviews. Reporting of the qualitative data was guided by Consolidated Criteria for Reporting Qualitative Studies (COREQ) to enhance rigor. Results: A significantly higher documentation rate in all 5 domains (ie, basic information, vital signs, procedures, investigations, and medications) was present with Post-App Electronic than with Post-App Paper, but there were no significant differences in the 5 domains between Pre-App Paper and Post-App Paper. The qualitative analysis resulted in main categories of ``advantages of tablet-based documentation of resuscitation records,'' ``challenges with tablet-based documentation of resuscitation records,'' and ``areas for improvement of tablet-based resuscitation records.'' Conclusions: This study demonstrated that higher documentation completion rates are achieved with electronic tablet-based resuscitation records than with traditional paper records. During the transition period, the nurse documenters faced general problems with resuscitation documentation such as multitasking and unique challenges such as software updates and a need to familiarize themselves with the app's layout. Automation should be considered during future app development to improve documentation and redistribute more time for patient care. Nurses should continue to provide feedback on the app's usability and functionality during app refinement to ensure a successful transition and future development of electronic documentation records. ", doi="10.2196/46744", url="https://mhealth.jmir.org/2024/1/e46744", url="http://www.ncbi.nlm.nih.gov/pubmed/38180801" } @Article{info:doi/10.2196/48487, author="Zhang, Pin and Wu, Lei and Zou, Ting-Ting and Zou, ZiXuan and Tu, JiaXin and Gong, Ren and Kuang, Jie", title="Machine Learning for Early Prediction of Major Adverse Cardiovascular Events After First Percutaneous Coronary Intervention in Patients With Acute Myocardial Infarction: Retrospective Cohort Study", journal="JMIR Form Res", year="2024", month="Jan", day="3", volume="8", pages="e48487", keywords="acute myocardial infarction", keywords="percutaneous coronary intervention", keywords="machine learning", keywords="early prediction", keywords="cardiovascular event", abstract="Background: The incidence of major adverse cardiovascular events (MACEs) remains high in patients with acute myocardial infarction (AMI) who undergo percutaneous coronary intervention (PCI), and early prediction models to guide their clinical management are lacking. Objective: This study aimed to develop machine learning--based early prediction models for MACEs in patients with newly diagnosed AMI who underwent PCI. Methods: A total of 1531 patients with AMI who underwent PCI from January 2018 to December 2019 were enrolled in this consecutive cohort. The data comprised demographic characteristics, clinical investigations, laboratory tests, and disease-related events. Four machine learning models---artificial neural network (ANN), k-nearest neighbors, support vector machine, and random forest---were developed and compared with the logistic regression model. Our primary outcome was the model performance that predicted the MACEs, which was determined by accuracy, area under the receiver operating characteristic curve, and F1-score. Results: In total, 1362 patients were successfully followed up. With a median follow-up of 25.9 months, the incidence of MACEs was 18.5\% (252/1362). The area under the receiver operating characteristic curve of the ANN, random forest, k-nearest neighbors, support vector machine, and logistic regression models were 80.49\%, 72.67\%, 79.80\%, 77.20\%, and 71.77\%, respectively. The top 5 predictors in the ANN model were left ventricular ejection fraction, the number of implanted stents, age, diabetes, and the number of vessels with coronary artery disease. Conclusions: The ANN model showed good MACE prediction after PCI for patients with AMI. The use of machine learning--based prediction models may improve patient management and outcomes in clinical practice. ", doi="10.2196/48487", url="https://formative.jmir.org/2024/1/e48487", url="http://www.ncbi.nlm.nih.gov/pubmed/38170581" } @Article{info:doi/10.2196/46501, author="Wang, Ying and Jiang, Mengyao and He, Mei and Du, Meijie", title="Design and Implementation of an Inpatient Fall Risk Management Information System", journal="JMIR Med Inform", year="2024", month="Jan", day="2", volume="12", pages="e46501", keywords="fall", keywords="hospital information system", keywords="patient safety", keywords="quality improvement", keywords="management", keywords="implementation", abstract="Background: Falls had been identified as one of the nursing-sensitive indicators for nursing care in hospitals. With technological progress, health information systems make it possible for health care professionals to manage patient care better. However, there is a dearth of research on health information systems used to manage inpatient falls. Objective: This study aimed to design and implement a novel hospital-based fall risk management information system (FRMIS) to prevent inpatient falls and improve nursing quality. Methods: This implementation was conducted at a large academic medical center in central China. We established a nurse-led multidisciplinary fall prevention team in January 2016. The hospital's fall risk management problems were summarized by interviewing fall-related stakeholders, observing fall prevention workflow and post--fall care process, and investigating patients' satisfaction. The FRMIS was developed using an iterative design process, involving collaboration among health care professionals, software developers, and system architects. We used process indicators and outcome indicators to evaluate the implementation effect. Results: The FRMIS includes a fall risk assessment platform, a fall risk warning platform, a fall preventive strategies platform, fall incident reporting, and a tracking improvement platform. Since the implementation of the FRMIS, the inpatient fall rate was significantly lower than that before implementation (P<.05). In addition, the percentage of major fall-related injuries was significantly lower than that before implementation. The implementation rate of fall-related process indicators and the reporting rate of high risk of falls were significantly different before and after system implementation (P<.05). Conclusions: The FRMIS provides support to nursing staff in preventing falls among hospitalized patients while facilitating process control for nursing managers. ", doi="10.2196/46501", url="https://medinform.jmir.org/2024/1/e46501", url="http://www.ncbi.nlm.nih.gov/pubmed/38165733" } @Article{info:doi/10.2196/51501, author="Wang, Changyu and Liu, Siru and Li, Aiqing and Liu, Jialin", title="Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study", journal="J Med Internet Res", year="2023", month="Dec", day="29", volume="25", pages="e51501", keywords="artificial intelligence", keywords="AI", keywords="AI models", keywords="ChatGPT", keywords="primary screening", keywords="mild cognitive impairment", keywords="standardization", keywords="prompt design", keywords="design", keywords="cognitive impairment", keywords="screening", keywords="model", keywords="clinician", keywords="diagnosis", abstract="Background: Artificial intelligence models tailored to diagnose cognitive impairment have shown excellent results. However, it is unclear whether large linguistic models can rival specialized models by text alone. Objective: In this study, we explored the performance of ChatGPT for primary screening of mild cognitive impairment (MCI) and standardized the design steps and components of the prompts. Methods: We gathered a total of 174 participants from the DementiaBank screening and classified 70\% of them into the training set and 30\% of them into the test set. Only text dialogues were kept. Sentences were cleaned using a macro code, followed by a manual check. The prompt consisted of 5 main parts, including character setting, scoring system setting, indicator setting, output setting, and explanatory information setting. Three dimensions of variables from published studies were included: vocabulary (ie, word frequency and word ratio, phrase frequency and phrase ratio, and lexical complexity), syntax and grammar (ie, syntactic complexity and grammatical components), and semantics (ie, semantic density and semantic coherence). We used R 4.3.0. for the analysis of variables and diagnostic indicators. Results: Three additional indicators related to the severity of MCI were incorporated into the final prompt for the model. These indicators were effective in discriminating between MCI and cognitively normal participants: tip-of-the-tongue phenomenon (P<.001), difficulty with complex ideas (P<.001), and memory issues (P<.001). The final GPT-4 model achieved a sensitivity of 0.8636, a specificity of 0.9487, and an area under the curve of 0.9062 on the training set; on the test set, the sensitivity, specificity, and area under the curve reached 0.7727, 0.8333, and 0.8030, respectively. Conclusions: ChatGPT was effective in the primary screening of participants with possible MCI. Improved standardization of prompts by clinicians would also improve the performance of the model. It is important to note that ChatGPT is not a substitute for a clinician making a diagnosis. ", doi="10.2196/51501", url="https://www.jmir.org/2023/1/e51501", url="http://www.ncbi.nlm.nih.gov/pubmed/38157230" } @Article{info:doi/10.2196/51199, author="Koranteng, Erica and Rao, Arya and Flores, Efren and Lev, Michael and Landman, Adam and Dreyer, Keith and Succi, Marc", title="Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care", journal="JMIR Med Educ", year="2023", month="Dec", day="28", volume="9", pages="e51199", keywords="ChatGPT", keywords="AI", keywords="artificial intelligence", keywords="large language models", keywords="LLMs", keywords="ethics", keywords="empathy", keywords="equity", keywords="bias", keywords="language model", keywords="health care application", keywords="patient care", keywords="care", keywords="development", keywords="framework", keywords="model", keywords="ethical implication", doi="10.2196/51199", url="https://mededu.jmir.org/2023/1/e51199", url="http://www.ncbi.nlm.nih.gov/pubmed/38153778" } @Article{info:doi/10.2196/52782, author="Hern{\'a}ndez Guillamet, Guillem and Morancho Pallaruelo, Ning Ariadna and Mir{\'o} Mezquita, Laura and Miralles, Ram{\'o}n and Mas, {\`A}ngel Miquel and Ulldemolins Papaseit, Jos{\'e} Mar{\'i}a and Estrada Cuxart, Oriol and L{\'o}pez Segu{\'i}, Francesc", title="Machine Learning Model for Predicting Mortality Risk in Patients With Complex Chronic Conditions: Retrospective Analysis", journal="Online J Public Health Inform", year="2023", month="Dec", day="28", volume="15", pages="e52782", keywords="machine learning", keywords="mortality prediction", keywords="chronicity", keywords="chromic", keywords="complex", keywords="artificial intelligence", keywords="complexity", keywords="health data", keywords="predict", keywords="prediction", keywords="predictive", keywords="mortality", keywords="death", keywords="classification", keywords="algorithm", keywords="algorithms", keywords="mortality risk", keywords="risk prediction", abstract="Background: The health care system is undergoing a shift toward a more patient-centered approach for individuals with chronic and complex conditions, which presents a series of challenges, such as predicting hospital needs and optimizing resources. At the same time, the exponential increase in health data availability has made it possible to apply advanced statistics and artificial intelligence techniques to develop decision-support systems and improve resource planning, diagnosis, and patient screening. These methods are key to automating the analysis of large volumes of medical data and reducing professional workloads. Objective: This article aims to present a machine learning model and a case study in a cohort of patients with highly complex conditions. The object was to predict mortality within the following 4 years and early mortality over 6 months following diagnosis. The method used easily accessible variables and health care resource utilization information. Methods: A classification algorithm was selected among 6 models implemented and evaluated using a stratified cross-validation strategy with k=10 and a 70/30 train-test split. The evaluation metrics used included accuracy, recall, precision, F1-score, and area under the receiver operating characteristic (AUROC) curve. Results: The model predicted patient death with an 87\% accuracy, recall of 87\%, precision of 82\%, F1-score of 84\%, and area under the curve (AUC) of 0.88 using the best model, the Extreme Gradient Boosting (XGBoost) classifier. The results were worse when predicting premature deaths (following 6 months) with an 83\% accuracy (recall=55\%, precision=64\% F1-score=57\%, and AUC=0.88) using the Gradient Boosting (GRBoost) classifier. Conclusions: This study showcases encouraging outcomes in forecasting mortality among patients with intricate and persistent health conditions. The employed variables are conveniently accessible, and the incorporation of health care resource utilization information of the patient, which has not been employed by current state-of-the-art approaches, displays promising predictive power. The proposed prediction model is designed to efficiently identify cases that need customized care and proactively anticipate the demand for critical resources by health care providers. ", doi="10.2196/52782", url="https://ojphi.jmir.org/2023/1/e52782", url="http://www.ncbi.nlm.nih.gov/pubmed/38223690" } @Article{info:doi/10.2196/51798, author="{\'C}irkovi{\'c}, Aleksandar and Katz, Toam", title="Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study", journal="JMIR Form Res", year="2023", month="Dec", day="28", volume="7", pages="e51798", keywords="artificial intelligence", keywords="machine learning", keywords="decision support systems", keywords="clinical", keywords="refractive surgical procedures", keywords="risk assessment", keywords="ophthalmology", keywords="health informatics", keywords="predictive modeling", keywords="data analysis", keywords="medical decision-making", keywords="eHealth", keywords="ChatGPT-4", keywords="ChatGPT", keywords="refractive surgery", keywords="categorization", keywords="AI-powered algorithm", keywords="large language model", keywords="decision-making", abstract="Background: Refractive surgery research aims to optimally precategorize patients by their suitability for various types of surgery. Recent advances have led to the development of artificial intelligence--powered algorithms, including machine learning approaches, to assess risks and enhance workflow. Large language models (LLMs) like ChatGPT-4 (OpenAI LP) have emerged as potential general artificial intelligence tools that can assist across various disciplines, possibly including refractive surgery decision-making. However, their actual capabilities in precategorizing refractive surgery patients based on real-world parameters remain unexplored. Objective: This exploratory study aimed to validate ChatGPT-4's capabilities in precategorizing refractive surgery patients based on commonly used clinical parameters. The goal was to assess whether ChatGPT-4's performance when categorizing batch inputs is comparable to those made by a refractive surgeon. A simple binary set of categories (patient suitable for laser refractive surgery or not) as well as a more detailed set were compared. Methods: Data from 100 consecutive patients from a refractive clinic were anonymized and analyzed. Parameters included age, sex, manifest refraction, visual acuity, and various corneal measurements and indices from Scheimpflug imaging. This study compared ChatGPT-4's performance with a clinician's categorizations using Cohen $\kappa$ coefficient, a chi-square test, a confusion matrix, accuracy, precision, recall, F1-score, and receiver operating characteristic area under the curve. Results: A statistically significant noncoincidental accordance was found between ChatGPT-4 and the clinician's categorizations with a Cohen $\kappa$ coefficient of 0.399 for 6 categories (95\% CI 0.256-0.537) and 0.610 for binary categorization (95\% CI 0.372-0.792). The model showed temporal instability and response variability, however. The chi-square test on 6 categories indicated an association between the 2 raters' distributions ($\chi${\texttwosuperior}5=94.7, P<.001). Here, the accuracy was 0.68, precision 0.75, recall 0.68, and F1-score 0.70. For 2 categories, the accuracy was 0.88, precision 0.88, recall 0.88, F1-score 0.88, and area under the curve 0.79. Conclusions: This study revealed that ChatGPT-4 exhibits potential as a precategorization tool in refractive surgery, showing promising agreement with clinician categorizations. However, its main limitations include, among others, dependency on solely one human rater, small sample size, the instability and variability of ChatGPT's (OpenAI LP) output between iterations and nontransparency of the underlying models. The results encourage further exploration into the application of LLMs like ChatGPT-4 in health care, particularly in decision-making processes that require understanding vast clinical data. Future research should focus on defining the model's accuracy with prompt and vignette standardization, detecting confounding factors, and comparing to other versions of ChatGPT-4 and other LLMs to pave the way for larger-scale validation and real-world implementation. ", doi="10.2196/51798", url="https://formative.jmir.org/2023/1/e51798", url="http://www.ncbi.nlm.nih.gov/pubmed/38153777" } @Article{info:doi/10.2196/48544, author="Santana, Os{\'o}rio Giulia and Couto, Macedo Rodrigo de and Loureiro, Maffei Rafael and Furriel, Silva Brunna Carolinne Rocha and Rother, Terezinha Edna and de Paiva, Queiroz Joselisa P{\'e}res and Correia, Reis Lucas", title="Economic Evaluations and Equity in the Use of Artificial Intelligence in Imaging Exams for Medical Diagnosis in People With Skin, Neurological, and Pulmonary Diseases: Protocol for a Systematic Review", journal="JMIR Res Protoc", year="2023", month="Dec", day="28", volume="12", pages="e48544", keywords="artificial intelligence", keywords="economic evaluation", keywords="equity", keywords="medical diagnosis", keywords="health care system", keywords="technology", keywords="systematic review", keywords="cost-effectiveness", keywords="imaging exam", keywords="intervention", abstract="Background: Traditional health care systems face long-standing challenges, including patient diversity, geographical disparities, and financial constraints. The emergence of artificial intelligence (AI) in health care offers solutions to these challenges. AI, a multidisciplinary field, enhances clinical decision-making. However, imbalanced AI models may enhance health disparities. Objective: This systematic review aims to investigate the economic performance and equity impact of AI in diagnostic imaging for skin, neurological, and pulmonary diseases. The research question is ``To what extent does the use of AI in imaging exams for diagnosing skin, neurological, and pulmonary diseases result in improved economic outcomes, and does it promote equity in health care systems?'' Methods: The study is a systematic review of economic and equity evaluations following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and CHEERS (Consolidated Health Economic Evaluation Reporting Standards) guidelines. Eligibility criteria include articles reporting on economic evaluations or equity considerations related to AI-based diagnostic imaging for specified diseases. Data will be collected from PubMed, Embase, Scopus, Web of Science, and reference lists. Data quality and transferability will be assessed according to CHEC (Consensus on Health Economic Criteria), EPHPP (Effective Public Health Practice Project), and Welte checklists. Results: This systematic review began in March 2023. The literature search identified 9,526 publications and, after full-text screening, 9 publications were included in the study. We plan to submit a manuscript to a peer-reviewed journal once it is finalized, with an expected completion date in January 2024. Conclusions: AI in diagnostic imaging offers potential benefits but also raises concerns about equity and economic impact. Bias in algorithms and disparities in access may hinder equitable outcomes. Evaluating the economic viability of AI applications is essential for resource allocation and affordability. Policy makers and health care stakeholders can benefit from this review's insights to make informed decisions. Limitations, including study variability and publication bias, will be considered in the analysis. This systematic review will provide valuable insights into the economic and equity implications of AI in diagnostic imaging. It aims to inform evidence-based decision-making and contribute to more efficient and equitable health care systems. International Registered Report Identifier (IRRID): DERR1-10.2196/48544 ", doi="10.2196/48544", url="https://www.researchprotocols.org/2023/1/e48544", url="http://www.ncbi.nlm.nih.gov/pubmed/38153775" } @Article{info:doi/10.2196/51921, author="Zheng, Yifan and Rowell, Brigid and Chen, Qiyuan and Kim, Yong Jin and Kontar, Al Raed and Yang, Jessie X. and Lester, A. Corey", title="Designing Human-Centered AI to Prevent Medication Dispensing Errors: Focus Group Study With Pharmacists", journal="JMIR Form Res", year="2023", month="Dec", day="25", volume="7", pages="e51921", keywords="artificial intelligence", keywords="communication", keywords="design methods", keywords="design", keywords="development", keywords="engineering", keywords="focus groups", keywords="human-computer interaction", keywords="medication errors", keywords="morbidity", keywords="mortality", keywords="patient safety", keywords="safety", keywords="SEIPS", keywords="Systems Engineering Initiative for Patient Safety", keywords="tool", keywords="user-centered design methods", keywords="user-centered", keywords="visualization", abstract="Background: Medication errors, including dispensing errors, represent a substantial worldwide health risk with significant implications in terms of morbidity, mortality, and financial costs. Although pharmacists use methods like barcode scanning and double-checking for dispensing verification, these measures exhibit limitations. The application of artificial intelligence (AI) in pharmacy verification emerges as a potential solution, offering precision, rapid data analysis, and the ability to recognize medications through computer vision. For AI to be embraced, it must be designed with the end user in mind, fostering trust, clear communication, and seamless collaboration between AI and pharmacists. Objective: This study aimed to gather pharmacists' feedback in a focus group setting to help inform the initial design of the user interface and iterative designs of the AI prototype. Methods: A multidisciplinary research team engaged pharmacists in a 3-stage process to develop a human-centered AI system for medication dispensing verification. To design the AI model, we used a Bayesian neural network that predicts the dispensed pills' National Drug Code (NDC). Discussion scripts regarding how to design the system and feedback in focus groups were collected through audio recordings and professionally transcribed, followed by a content analysis guided by the Systems Engineering Initiative for Patient Safety and Human-Machine Teaming theoretical frameworks. Results: A total of 8 pharmacists participated in 3 rounds of focus groups to identify current challenges in medication dispensing verification, brainstorm solutions, and provide feedback on our AI prototype. Participants considered several teaming scenarios, generally favoring a hybrid teaming model where the AI assists in the verification process and a pharmacist intervenes based on medication risk level and the AI's confidence level. Pharmacists highlighted the need for improving the interpretability of AI systems, such as adding stepwise checkmarks, probability scores, and details about drugs the AI model frequently confuses with the target drug. Pharmacists emphasized the need for simplicity and accessibility. They favored displaying only essential information to prevent overwhelming users with excessive data. Specific design features, such as juxtaposing pill images with their packaging for quick comparisons, were requested. Pharmacists preferred accept, reject, or unsure options. The final prototype interface included (1) checkmarks to compare pill characteristics between the AI-predicted NDC and the prescription's expected NDC, (2) a histogram showing predicted probabilities for the AI-identified NDC, (3) an image of an AI-provided ``confused'' pill, and (4) an NDC match status (ie, match, unmatched, or unsure). Conclusions: In partnership with pharmacists, we developed a human-centered AI prototype designed to enhance AI interpretability and foster trust. This initiative emphasized human-machine collaboration and positioned AI as an augmentative tool rather than a replacement. This study highlights the process of designing a human-centered AI for dispensing verification, emphasizing its interpretability, confidence visualization, and collaborative human-machine teaming styles. ", doi="10.2196/51921", url="https://formative.jmir.org/2023/1/e51921", url="http://www.ncbi.nlm.nih.gov/pubmed/38145475" } @Article{info:doi/10.2196/48244, author="Kim, Kwan Yun and Koo, Hyung Ja and Lee, Jung Sun and Song, Seok Hee and Lee, Minji", title="Explainable Artificial Intelligence Warning Model Using an Ensemble Approach for In-Hospital Cardiac Arrest Prediction: Retrospective Cohort Study", journal="J Med Internet Res", year="2023", month="Dec", day="22", volume="25", pages="e48244", keywords="cardiac arrest prediction", keywords="ensemble learning", keywords="temporal pattern changes", keywords="cost-sensitive learning", keywords="electronic medical records", abstract="Background: Cardiac arrest (CA) is the leading cause of death in critically ill patients. Clinical research has shown that early identification of CA reduces mortality. Algorithms capable of predicting CA with high sensitivity have been developed using multivariate time series data. However, these algorithms suffer from a high rate of false alarms, and their results are not clinically interpretable. Objective: We propose an ensemble approach using multiresolution statistical features and cosine similarity--based features for the timely prediction of CA. Furthermore, this approach provides clinically interpretable results that can be adopted by clinicians. Methods: Patients were retrospectively analyzed using data from the Medical Information Mart for Intensive Care-IV database and the eICU Collaborative Research Database. Based on the multivariate vital signs of a 24-hour time window for adults diagnosed with heart failure, we extracted multiresolution statistical and cosine similarity--based features. These features were used to construct and develop gradient boosting decision trees. Therefore, we adopted cost-sensitive learning as a solution. Then, 10-fold cross-validation was performed to check the consistency of the model performance, and the Shapley additive explanation algorithm was used to capture the overall interpretability of the proposed model. Next, external validation using the eICU Collaborative Research Database was performed to check the generalization ability. Results: The proposed method yielded an overall area under the receiver operating characteristic curve (AUROC) of 0.86 and area under the precision-recall curve (AUPRC) of 0.58. In terms of the timely prediction of CA, the proposed model achieved an AUROC above 0.80 for predicting CA events up to 6 hours in advance. The proposed method simultaneously improved precision and sensitivity to increase the AUPRC, which reduced the number of false alarms while maintaining high sensitivity. This result indicates that the predictive performance of the proposed model is superior to the performances of the models reported in previous studies. Next, we demonstrated the effect of feature importance on the clinical interpretability of the proposed method and inferred the effect between the non-CA and CA groups. Finally, external validation was performed using the eICU Collaborative Research Database, and an AUROC of 0.74 and AUPRC of 0.44 were obtained in a general intensive care unit population. Conclusions: The proposed framework can provide clinicians with more accurate CA prediction results and reduce false alarm rates through internal and external validation. In addition, clinically interpretable prediction results can facilitate clinician understanding. Furthermore, the similarity of vital sign changes can provide insights into temporal pattern changes in CA prediction in patients with heart failure--related diagnoses. Therefore, our system is sufficiently feasible for routine clinical use. In addition, regarding the proposed CA prediction system, a clinically mature application has been developed and verified in the future digital health field. ", doi="10.2196/48244", url="https://www.jmir.org/2023/1/e48244", url="http://www.ncbi.nlm.nih.gov/pubmed/38133922" } @Article{info:doi/10.2196/48892, author="Schapranow, Matthieu-P and Bayat, Mozhgan and Rasheed, Aadil and Naik, Marcel and Graf, Verena and Schmidt, Danilo and Budde, Klemens and Cardinal, H{\'e}lo{\"i}se and Sapir-Pichhadze, Ruth and Fenninger, Franz and Sherwood, Karen and Keown, Paul and G{\"u}nther, P. Oliver and Pandl, D. Konstantin and Leiser, Florian and Thiebes, Scott and Sunyaev, Ali and Niemann, Matthias and Schimanski, Andreas and Klein, Thomas", title="NephroCAGE---German-Canadian Consortium on AI for Improved Kidney Transplantation Outcome: Protocol for an Algorithm Development and Validation Study", journal="JMIR Res Protoc", year="2023", month="Dec", day="22", volume="12", pages="e48892", keywords="posttransplant risks", keywords="kidney transplantation", keywords="federated learning infrastructure", keywords="clinical prediction model", keywords="donor-recipient matching", keywords="multinational transplant data set", abstract="Background: Recent advances in hardware and software enabled the use of artificial intelligence (AI) algorithms for analysis of complex data in a wide range of daily-life use cases. We aim to explore the benefits of applying AI to a specific use case in transplant nephrology: risk prediction for severe posttransplant events. For the first time, we combine multinational real-world transplant data, which require specific legal and technical protection measures. Objective: The German-Canadian NephroCAGE consortium aims to develop and evaluate specific processes, software tools, and methods to (1) combine transplant data of more than 8000 cases over the past decades from leading transplant centers in Germany and Canada, (2) implement specific measures to protect sensitive transplant data, and (3) use multinational data as a foundation for developing high-quality prognostic AI models. Methods: To protect sensitive transplant data addressing the first and second objectives, we aim to implement a decentralized NephroCAGE federated learning infrastructure upon a private blockchain. Our NephroCAGE federated learning infrastructure enables a switch of paradigms: instead of pooling sensitive data into a central database for analysis, it enables the transfer of clinical prediction models (CPMs) to clinical sites for local data analyses. Thus, sensitive transplant data reside protected in their original sites while the comparable small algorithms are exchanged instead. For our third objective, we will compare the performance of selected AI algorithms, for example, random forest and extreme gradient boosting, as foundation for CPMs to predict severe short- and long-term posttransplant risks, for example, graft failure or mortality. The CPMs will be trained on donor and recipient data from retrospective cohorts of kidney transplant patients. Results: We have received initial funding for NephroCAGE in February 2021. All clinical partners have applied for and received ethics approval as of 2022. The process of exploration of clinical transplant database for variable extraction has started at all the centers in 2022. In total, 8120 patient records have been retrieved as of August 2023. The development and validation of CPMs is ongoing as of 2023. Conclusions: For the first time, we will (1) combine kidney transplant data from nephrology centers in Germany and Canada, (2) implement federated learning as a foundation to use such real-world transplant data as a basis for the training of CPMs in a privacy-preserving way, and (3) develop a learning software system to investigate population specifics, for example, to understand population heterogeneity, treatment specificities, and individual impact on selected posttransplant outcomes. International Registered Report Identifier (IRRID): DERR1-10.2196/48892 ", doi="10.2196/48892", url="https://www.researchprotocols.org/2023/1/e48892", url="http://www.ncbi.nlm.nih.gov/pubmed/38133915" } @Article{info:doi/10.2196/51471, author="Dolezel, Diane and Beauvais, Brad and Stigler Granados, Paula and Fulton, Lawrence and Kruse, Scott Clemens", title="Effects of Internal and External Factors on Hospital Data Breaches: Quantitative Study", journal="J Med Internet Res", year="2023", month="Dec", day="21", volume="25", pages="e51471", keywords="data breach", keywords="security", keywords="geospatial", keywords="predictive", keywords="mobile phone", abstract="Background: Health care data breaches are the most rapidly increasing type of cybercrime; however, the predictors of health care data breaches are uncertain. Objective: This quantitative study aims to develop a predictive model to explain the number of hospital data breaches at the county level. Methods: This study evaluated data consolidated at the county level from 1032 short-term acute care hospitals. We considered the association between data breach occurrence (a dichotomous variable), predictors based on county demographics, and socioeconomics, average hospital workload, facility type, and average performance on several hospital financial metrics using 3 model types: logistic regression, perceptron, and support vector machine. Results: The model coefficient performance metrics indicated convergent validity across the 3 model types for all variables except bad debt and the factor level accounting for counties with >20\% and up to 40\% Hispanic populations, both of which had mixed coefficient directionality. The support vector machine model performed the classification task best based on all metrics (accuracy, precision, recall, F1-score). All the 3 models performed the classification task well with directional congruence of weights. From the logistic regression model, the top 5 odds ratios (indicating a higher risk of breach) included inpatient workload, medical center status, pediatric trauma center status, accounts receivable, and the number of outpatient visits, in high to low order. The bottom 5 odds ratios (indicating the lowest odds of experiencing a data breach) occurred for counties with Black populations of >20\% and <40\%, >80\% and <100\%, and >40\% but <60\%, as well as counties with ?20\% Asian or between 80\% and 100\% Hispanic individuals. Our results are in line with those of other studies that determined that patient workload, facility type, and financial outcomes were associated with the likelihood of health care data breach occurrence. Conclusions: The results of this study provide a predictive model for health care data breaches that may guide health care managers to reduce the risk of data breaches by raising awareness of the risk factors. ", doi="10.2196/51471", url="https://www.jmir.org/2023/1/e51471", url="http://www.ncbi.nlm.nih.gov/pubmed/38127426" } @Article{info:doi/10.2196/50158, author="Welzel, Cindy and Cotte, Fabienne and Wekenborg, Magdalena and Vasey, Baptiste and McCulloch, Peter and Gilbert, Stephen", title="Holistic Human-Serving Digitization of Health Care Needs Integrated Automated System-Level Assessment Tools", journal="J Med Internet Res", year="2023", month="Dec", day="20", volume="25", pages="e50158", keywords="health technology assessment", keywords="human factors", keywords="postmarket surveillance", keywords="software as a medical device", keywords="digital health tools", keywords="quality assessment", keywords="quality improvement", keywords="regulatory framework", keywords="user experience", keywords="health care", doi="10.2196/50158", url="https://www.jmir.org/2023/1/e50158", url="http://www.ncbi.nlm.nih.gov/pubmed/38117545" } @Article{info:doi/10.2196/50903, author="Jacobs, Marie Sarah and Lundy, Nicole Neva and Issenberg, Barry Saul and Chandran, Latha", title="Reimagining Core Entrustable Professional Activities for Undergraduate Medical Education in the Era of Artificial Intelligence", journal="JMIR Med Educ", year="2023", month="Dec", day="19", volume="9", pages="e50903", keywords="artificial intelligence", keywords="entrustable professional activities", keywords="medical education", keywords="competency-based education", keywords="educational technology", keywords="machine learning", doi="10.2196/50903", url="https://mededu.jmir.org/2023/1/e50903", url="http://www.ncbi.nlm.nih.gov/pubmed/38052721" } @Article{info:doi/10.2196/50660, author="Sriraam, Natarajan and Raghu, S. and Gommer, D. Erik and Hilkman, W. Danny M. and Temel, Yasin and Vasudeva Rao, Shyam and Hegde, Satyaranjandas Alangar and L Kubben, Pieter", title="Application of a Low-Cost mHealth Solution for the Remote Monitoring of Patients With Epilepsy: Algorithm Development and Validation", journal="JMIR Neurotech", year="2023", month="Dec", day="19", volume="2", pages="e50660", keywords="Android", keywords="epileptic seizures", keywords="mobile health", keywords="mHealth", keywords="mobile phone--based epilepsy monitoring", keywords="support vector machine", keywords="seizure", keywords="epileptic", keywords="epilepsy", keywords="monitoring", keywords="smartphone", keywords="smartphones", keywords="mobile phone", keywords="neurology", keywords="neuroscience", keywords="electroencephalography", keywords="EEG", keywords="brain", keywords="classification", keywords="detect", keywords="detection", keywords="neurological", keywords="electroencephalogram", keywords="diagnose", keywords="diagnosis", keywords="diagnostic", keywords="imaging", abstract="Background: Implementing automated seizure detection in long-term electroencephalography (EEG) analysis enables the remote monitoring of patients with epilepsy, thereby improving their quality of life. Objective: The objective of this study was to explore an mHealth (mobile health) solution by investigating the feasibility of smartphones for processing large EEG recordings for the remote monitoring of patients with epilepsy. Methods: We developed a mobile app to automatically analyze and classify epileptic seizures using EEG. We used the cross-database model developed in our previous study, incorporating successive decomposition index and matrix determinant as features, adaptive median feature baseline correction for overcoming interdatabase feature variation, and postprocessing-based support vector machine for classification using 5 different EEG databases. The Sezect (Seizure Detect) Android app was built using the Chaquopy software development kit, which uses the Python language in Android Studio. Various durations of EEG signals were tested on different smartphones to check the feasibility of the Sezect app. Results: We observed a sensitivity of 93.5\%, a specificity of 97.5\%, and a false detection rate of 1.5 per hour for EEG recordings using the Sezect app. The various mobile phones did not differ substantially in processing time, which indicates a range of phone models can be used for implementation. The computational time required to process real-time EEG data via smartphones and the classification results suggests that our mHealth app could be a valuable asset for monitoring patients with epilepsy. Conclusions: Smartphones have multipurpose use in health care, offering tools that can improve the quality of patients' lives. ", doi="10.2196/50660", url="https://neuro.jmir.org/2023/1/e50660" } @Article{info:doi/10.2196/45515, author="Liu, Jian and Chen, Jia and Dong, Yongquan and Lou, Yan and Tian, Yu and Sun, Huiyao and Jin, Yuqing and Li, Jingsong and Qiu, Yunqing", title="Clinical Timing-Sequence Warning Models for Serious Bacterial Infections in Adults Based on Machine Learning: Retrospective Study", journal="J Med Internet Res", year="2023", month="Dec", day="18", volume="25", pages="e45515", keywords="clinical timing-sequence warning models", keywords="machine learning", keywords="serious bacterial infection", keywords="nomogram", abstract="Background: Serious bacterial infections (SBIs) are linked to unplanned hospital admissions and a high mortality rate. The early identification of SBIs is crucial in clinical practice. Objective: This study aims to establish and validate clinically applicable models designed to identify SBIs in patients with infective fever. Methods: Clinical data from 945 patients with infective fever, encompassing demographic and laboratory indicators, were retrospectively collected from a 2200-bed teaching hospital between January 2013 and December 2020. The data were randomly divided into training and test sets at a ratio of 7:3. Various machine learning (ML) algorithms, including Boruta, Lasso (least absolute shrinkage and selection operator), and recursive feature elimination, were utilized for feature filtering. The selected features were subsequently used to construct models predicting SBIs using logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost) with 5-fold cross-validation. Performance metrics, including the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC), accuracy, sensitivity, and other relevant parameters, were used to assess model performance. Considering both model performance and clinical needs, 2 clinical timing-sequence warning models were ultimately confirmed using LR analysis. The corresponding predictive nomograms were then plotted for clinical use. Moreover, a physician, blinded to the study, collected additional data from the same center involving 164 patients during 2021. The nomograms developed in the study were then applied in clinical practice to further validate their clinical utility. Results: In total, 69.9\% (661/945) of the patients developed SBIs. Age, hemoglobin, neutrophil-to-lymphocyte ratio, fibrinogen, and C-reactive protein levels were identified as important features by at least two ML algorithms. Considering the collection sequence of these indicators and clinical demands, 2 timing-sequence models predicting the SBI risk were constructed accordingly: the early admission model (model 1) and the model within 24 hours of admission (model 2). LR demonstrated better stability than RF and XGBoost in both models and performed the best in model 2, with an AUC, accuracy, and sensitivity of 0.780 (95\% CI 0.720-841), 0.754 (95\% CI 0.698-804), and 0.776 (95\% CI 0.711-832), respectively. XGBoost had an advantage over LR in AUC (0.708, 95\% CI 0.641-775 vs 0.686, 95\% CI 0.617-754), while RF achieved better accuracy (0.729, 95\% CI 0.673-780) and sensitivity (0.790, 95\% CI 0.728-844) than the other 2 approaches in model 1. Two SBI-risk prediction nomograms were developed for clinical use based on LR, and they exhibited good performance with an accuracy of 0.707 and 0.750 and a sensitivity of 0.729 and 0.927 in clinical application. Conclusions: The clinical timing-sequence warning models demonstrated efficacy in predicting SBIs in patients suspected of having infective fever and in clinical application, suggesting good potential in clinical decision-making. Nevertheless, additional prospective and multicenter studies are necessary to further confirm their clinical utility. ", doi="10.2196/45515", url="https://www.jmir.org/2023/1/e45515", url="http://www.ncbi.nlm.nih.gov/pubmed/38109177" } @Article{info:doi/10.2196/44119, author="Chen, Chaoyue and Teng, Yuen and Tan, Shuo and Wang, Zizhou and Zhang, Lei and Xu, Jianguo", title="Performance Test of a Well-Trained Model for Meningioma Segmentation in Health Care Centers: Secondary Analysis Based on Four Retrospective Multicenter Data Sets", journal="J Med Internet Res", year="2023", month="Dec", day="15", volume="25", pages="e44119", keywords="meningioma segmentation", keywords="magnetic resonance imaging", keywords="MRI", keywords="convolutional neural network", keywords="model test and verification", keywords="CNN", keywords="radiographic image interpretation", abstract="Background: Convolutional neural networks (CNNs) have produced state-of-the-art results in meningioma segmentation on magnetic resonance imaging (MRI). However, images obtained from different institutions, protocols, or scanners may show significant domain shift, leading to performance degradation and challenging model deployment in real clinical scenarios. Objective: This research aims to investigate the realistic performance of a well-trained meningioma segmentation model when deployed across different health care centers and verify the methods to enhance its generalization. Methods: This study was performed in four centers. A total of 606 patients with 606 MRIs were enrolled between January 2015 and December 2021. Manual segmentations, determined through consensus readings by neuroradiologists, were used as the ground truth mask. The model was previously trained using a standard supervised CNN called Deeplab V3+ and was deployed and tested separately in four health care centers. To determine the appropriate approach to mitigating the observed performance degradation, two methods were used: unsupervised domain adaptation and supervised retraining. Results: The trained model showed a state-of-the-art performance in tumor segmentation in two health care institutions, with a Dice ratio of 0.887 (SD 0.108, 95\% CI 0.903-0.925) in center A and a Dice ratio of 0.874 (SD 0.800, 95\% CI 0.854-0.894) in center B. Whereas in the other health care institutions, the performance declined, with Dice ratios of 0.631 (SD 0.157, 95\% CI 0.556-0.707) in center C and 0.649 (SD 0.187, 95\% CI 0.566-0.732) in center D, as they obtained the MRI using different scanning protocols. The unsupervised domain adaptation showed a significant improvement in performance scores, with Dice ratios of 0.842 (SD 0.073, 95\% CI 0.820-0.864) in center C and 0.855 (SD 0.097, 95\% CI 0.826-0.886) in center D. Nonetheless, it did not overperform the supervised retraining, which achieved Dice ratios of 0.899 (SD 0.026, 95\% CI 0.889-0.906) in center C and 0.886 (SD 0.046, 95\% CI 0.870-0.903) in center D. Conclusions: Deploying the trained CNN model in different health care institutions may show significant performance degradation due to the domain shift of MRIs. Under this circumstance, the use of unsupervised domain adaptation or supervised retraining should be considered, taking into account the balance between clinical requirements, model performance, and the size of the available data. ", doi="10.2196/44119", url="https://www.jmir.org/2023/1/e44119", url="http://www.ncbi.nlm.nih.gov/pubmed/38100181" } @Article{info:doi/10.2196/46929, author="Cummerow, Julia and Wienecke, Christin and Engler, Nicola and Marahrens, Philip and Gruening, Philipp and Steinh{\"a}user, Jost", title="Identifying Existing Evidence to Potentially Develop a Machine Learning Diagnostic Algorithm for Cough in Primary Care Settings: Scoping Review", journal="J Med Internet Res", year="2023", month="Dec", day="14", volume="25", pages="e46929", keywords="cough", keywords="predictor", keywords="differential diagnosis", keywords="primary health care", keywords="artificial intelligence", abstract="Background: Primary care is known to be one of the most complex health care settings because of the high number of theoretically possible diagnoses. Therefore, the process of clinical decision-making in primary care includes complex analytical and nonanalytical factors such as gut feelings and dealing with uncertainties. Artificial intelligence is also mandated to offer support in finding valid diagnoses. Nevertheless, to translate some aspects of what occurs during a consultation into a machine-based diagnostic algorithm, the probabilities for the underlying diagnoses (odds ratios) need to be determined. Objective: Cough is one of the most common reasons for a consultation in general practice, the core discipline in primary care. The aim of this scoping review was to identify the available data on cough as a predictor of various diagnoses encountered in general practice. In the context of an ongoing project, we reflect on this database as a possible basis for a machine-based diagnostic algorithm. Furthermore, we discuss the applicability of such an algorithm against the background of the specifics of general practice. Methods: The PubMed, Scopus, Web of Science, and Cochrane Library databases were searched with defined search terms, supplemented by the search for gray literature via the German Journal of Family Medicine until April 20, 2023. The inclusion criterion was the explicit analysis of cough as a predictor of any conceivable disease. Exclusion criteria were articles that did not provide original study results, articles in languages other than English or German, and articles that did not mention cough as a diagnostic predictor. Results: In total, 1458 records were identified for screening, of which 35 articles met our inclusion criteria. Most of the results (11/35, 31\%) were found for chronic obstructive pulmonary disease. The others were distributed among the diagnoses of asthma or unspecified obstructive airway disease, various infectious diseases, bronchogenic carcinoma, dyspepsia or gastroesophageal reflux disease, and adverse effects of angiotensin-converting enzyme inhibitors. Positive odds ratios were found for cough as a predictor of chronic obstructive pulmonary disease, influenza, COVID-19 infections, and bronchial carcinoma, whereas the results for cough as a predictor of asthma and other nonspecified obstructive airway diseases were inconsistent. Conclusions: Reliable data on cough as a predictor of various diagnoses encountered in general practice are scarce. The example of cough does not provide a sufficient database to contribute odds to a machine learning--based diagnostic algorithm in a meaningful way. ", doi="10.2196/46929", url="https://www.jmir.org/2023/1/e46929", url="http://www.ncbi.nlm.nih.gov/pubmed/38096024" } @Article{info:doi/10.2196/48351, author="Teza, Htun and Pattanateepapon, Anuchate and Lertpimonchai, Attawood and Vathesatogkit, Prin and J McKay, Gareth and Attia, John and Thakkinstian, Ammarin", title="Development of Risk Prediction Models for Severe Periodontitis in a Thai Population: Statistical and Machine Learning Approaches", journal="JMIR Form Res", year="2023", month="Dec", day="14", volume="7", pages="e48351", keywords="periodontitis", keywords="prediction", keywords="machine learning", keywords="repeated measures", keywords="panel data", abstract="Background: Severe periodontitis affects 26\% of Thai adults and 11.2\% of adults globally and is characterized by the loss of alveolar bone height. Full-mouth examination by periodontal probing is the gold standard for diagnosis but is time- and resource-intensive. A screening model to identify those at high risk of severe periodontitis would offer a targeted approach and aid in reducing the workload for dentists. While statistical modelling by a logistic regression is commonly applied, optimal performance depends on feature selections and engineering. Machine learning has been recently gaining favor given its potential discriminatory power and ability to deal with multiway interactions without the requirements of linear assumptions. Objective: We aim to compare the performance of screening models developed using statistical and machine learning approaches for the risk prediction of severe periodontitis. Methods: This study used data from the prospective Electricity Generating Authority of Thailand cohort. Dental examinations were performed for the 2008 and 2013 surveys. Oral examinations (ie, number of teeth and oral hygiene index and plaque scores), periodontal pocket depth, and gingival recession were performed by dentists. The outcome of interest was severe periodontitis diagnosed by the Centre for Disease Control--American Academy of Periodontology, defined as 2 or more interproximal sites with a clinical attachment level ?6 mm (on different teeth) and 1 or more interproximal sites with a periodontal pocket depth ?5 mm. Risk prediction models were developed using mixed-effects logistic regression (MELR), recurrent neural network, mixed-effects support vector machine, and mixed-effects decision tree models. A total of 21 features were considered as predictive features, including 4 demographic characteristics, 2 physical examinations, 4 underlying diseases, 1 medication, 2 risk behaviors, 2 oral features, and 6 laboratory features. Results: A total of 3883 observations from 2086 participants were split into development (n=3112, 80.1\%) and validation (n=771, 19.9\%) sets with prevalences of periodontitis of 34.4\% (n=1070) and 34.1\% (n=263), respectively. The final MELR model contained 6 features (gender, education, smoking, diabetes mellitus, number of teeth, and plaque score) with an area under the curve (AUC) of 0.983 (95\% CI 0.977-0.989) and positive likelihood ratio (LR+) of 11.9 (95\% CI 8.8-16.3). Machine learning yielded lower performance than the MELR model, with AUC (95\% CI) and LR+ (95\% CI) values of 0.712 (0.669-0.754) and 2.1 (1.8-2.6), respectively, for the recurrent neural network model; 0.698 (0.681-0.734) and 2.1 (1.7-2.6), respectively, for the mixed-effects support vector machine model; and 0.662 (0.621-0.702) and 2.4 (1.9-3.0), respectively, for the mixed-effects decision tree model. Conclusions: The MELR model might be more useful than machine learning for large-scale screening to identify those at high risk of severe periodontitis for periodontal evaluation. External validation using data from other centers is required to evaluate the generalizability of the model. ", doi="10.2196/48351", url="https://formative.jmir.org/2023/1/e48351", url="http://www.ncbi.nlm.nih.gov/pubmed/38096008" } @Article{info:doi/10.2196/45979, author="Persson, Inger and Gr{\"u}nwald, Adam and Morvan, Ludivine and Becedas, David and Arlbrandt, Martin", title="A Machine Learning Algorithm Predicting Acute Kidney Injury in Intensive Care Unit Patients (NAVOY Acute Kidney Injury): Proof-of-Concept Study", journal="JMIR Form Res", year="2023", month="Dec", day="14", volume="7", pages="e45979", keywords="acute kidney injury", keywords="AKI", keywords="algorithm", keywords="early detection", keywords="electronic health records", keywords="ICU", keywords="intensive care unit", keywords="machine learning", keywords="nephrology", keywords="prediction", keywords="software as a medical device", abstract="Background: Acute kidney injury (AKI) represents a significant global health challenge, leading to increased patient distress and financial health care burdens. The development of AKI in intensive care unit (ICU) settings is linked to prolonged ICU stays, a heightened risk of long-term renal dysfunction, and elevated short- and long-term mortality rates. The current diagnostic approach for AKI is based on late indicators, such as elevated serum creatinine and decreased urine output, which can only detect AKI after renal injury has transpired. There are no treatments to reverse or restore renal function once AKI has developed, other than supportive care. Early prediction of AKI enables proactive management and may improve patient outcomes. Objective: The primary aim was to develop a machine learning algorithm, NAVOY Acute Kidney Injury, capable of predicting the onset of AKI in ICU patients using data routinely collected in ICU electronic health records. The ultimate goal was to create a clinical decision support tool that empowers ICU clinicians to proactively manage AKI and, consequently, enhance patient outcomes. Methods: We developed the NAVOY Acute Kidney Injury algorithm using a hybrid ensemble model, which combines the strengths of both a Random Forest (Leo Breiman and Adele Cutler) and an XGBoost model (Tianqi Chen). To ensure the accuracy of predictions, the algorithm used 22 clinical variables for hourly predictions of AKI as defined by the Kidney Disease: Improving Global Outcomes guidelines. Data for algorithm development were sourced from the Massachusetts Institute of Technology Lab for Computational Physiology Medical Information Mart for Intensive Care IV clinical database, focusing on ICU patients aged 18 years or older. Results: The developed algorithm, NAVOY Acute Kidney Injury, uses 4 hours of input and can, with high accuracy, predict patients with a high risk of developing AKI 12 hours before onset. The prediction performance compares well with previously published prediction algorithms designed to predict AKI onset in accordance with Kidney Disease: Improving Global Outcomes diagnosis criteria, with an impressive area under the receiver operating characteristics curve (AUROC) of 0.91 and an area under the precision-recall curve (AUPRC) of 0.75. The algorithm's predictive performance was externally validated on an independent hold-out test data set, confirming its ability to predict AKI with exceptional accuracy. Conclusions: NAVOY Acute Kidney Injury is an important development in the field of critical care medicine. It offers the ability to predict the onset of AKI with high accuracy using only 4 hours of data routinely collected in ICU electronic health records. This early detection capability has the potential to strengthen patient monitoring and management, ultimately leading to improved patient outcomes. Furthermore, NAVOY Acute Kidney Injury has been granted Conformite Europeenne (CE)--marking, marking a significant milestone as the first CE-marked AKI prediction algorithm for commercial use in European ICUs. ", doi="10.2196/45979", url="https://formative.jmir.org/2023/1/e45979", url="http://www.ncbi.nlm.nih.gov/pubmed/38096015" } @Article{info:doi/10.2196/51578, author="Jones, Bree and Michou, Stavroula and Chen, Tong and Moreno-Betancur, Margarita and Kilpatrick, Nicky and Burgner, David and Vannahme, Christoph and Silva, Mihiri", title="Caries Detection in Primary Teeth Using Intraoral Scanners Featuring Fluorescence: Protocol for a Diagnostic Agreement Study", journal="JMIR Res Protoc", year="2023", month="Dec", day="14", volume="12", pages="e51578", keywords="dental caries", keywords="diagnosis", keywords="oral", keywords="technology", keywords="dental", keywords="image interpretation", keywords="computer-assisted", keywords="imaging", keywords="3D", keywords="quantitative light-induced fluorescence", keywords="diagnostic agreement", keywords="intra oral scanners", keywords="oral health", keywords="teeth", keywords="3D model", keywords="color", keywords="fluorescence", keywords="intraoral scanner", keywords="device", keywords="dentistry", abstract="Background: Digital methods that enable early caries identification can streamline data collection in research and optimize dental examinations for young children. Intraoral scanners are devices used for creating 3D models of teeth in dentistry and are being rapidly adopted into clinical workflows. Integrating fluorescence technology into scanner hardware can support early caries detection. However, the performance of caries detection methods using 3D models featuring color and fluorescence in primary teeth is unknown. Objective: This study aims to assess the diagnostic agreement between visual examination (VE), on-screen assessment of 3D models in approximate natural colors with and without fluorescence, and application of an automated caries scoring system to the 3D models with fluorescence for caries detection in primary teeth. Methods: The study sample will be drawn from eligible participants in a randomized controlled trial at the Royal Children's Hospital, Melbourne, Australia, where a dental assessment was conducted, including VE using the International Caries Detection and Assessment System (ICDAS) and intraoral scan using the TRIOS 4 (3Shape TRIOS A/S). Participant clinical records will be collected, and all records meeting eligibility criteria will be subject to an on-screen assessment of 3D models by 4 dental practitioners. First, all primary tooth surfaces will be examined for caries based on 3D geometry and color, using a merged ICDAS index. Second, the on-screen assessment of 3D models will include fluorescence, where caries will be classified using a merged ICDAS index that has been modified to incorporate fluorescence criteria. After 4 weeks, all examiners will repeat the on-screen assessment for all 3D models. Finally, an automated caries scoring system will be used to classify caries on primary occlusal surfaces. The agreement in the total number of caries detected per person between methods will be assessed using a Bland-Altman analysis and intraclass correlation coefficients. At a tooth surface level, agreement between methods will be estimated using multilevel models to account for the clustering of dental data. Results: Automated caries scoring of 3D models was completed as of October 2023, with the publication of results expected by July 2024. On-screen assessment has commenced, with the expected completion of scoring and data analysis by March 2024. Results will be disseminated by the end of 2024. Conclusions: The study outcomes may inform new practices that use digital models to facilitate dental assessments. Novel approaches that enable remote dental examination without compromising the accuracy of VE have wide applications in the research environment, clinical practice, and the provision of teledentistry. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12622001237774; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=384632 International Registered Report Identifier (IRRID): DERR1-10.2196/51578 ", doi="10.2196/51578", url="https://www.researchprotocols.org/2023/1/e51578", url="http://www.ncbi.nlm.nih.gov/pubmed/38096003" } @Article{info:doi/10.2196/45364, author="Pereira, Margarida Ana and J{\'a}come, Cristina and Jacinto, Tiago and Amaral, Rita and Pereira, Mariana and S{\'a}-Sousa, Ana and Couto, Mariana and Vieira-Marques, Pedro and Martinho, Diogo and Vieira, Ana and Almeida, Ana and Martins, Constantino and Marreiros, Goreti and Freitas, Alberto and Almeida, Rute and Fonseca, A. Jo{\~a}o", title="Multidisciplinary Development and Initial Validation of a Clinical Knowledge Base on Chronic Respiratory Diseases for mHealth Decision Support Systems", journal="J Med Internet Res", year="2023", month="Dec", day="13", volume="25", pages="e45364", keywords="knowledge base", keywords="recommendations", keywords="personalization", keywords="clinical decision support system", keywords="chronic obstructive respiratory diseases", keywords="mobile phone", doi="10.2196/45364", url="https://www.jmir.org/2023/1/e45364", url="http://www.ncbi.nlm.nih.gov/pubmed/38090790" } @Article{info:doi/10.2196/46481, author="Gregory, E. Megan and Cao, Weidan and Rahurkar, Saurabh and Jonnalagadda, Pallavi and Stock, C. James and Ghazi, M. Sanam and Reid, Endia and Berk, L. Abigail and Hebert, Courtney and Li, Lang and Addison, Daniel", title="Exploring the Incorporation of a Novel Cardiotoxicity Mobile Health App Into Care of Patients With Cancer: Qualitative Study of Patient and Provider Perspectives", journal="JMIR Cancer", year="2023", month="Dec", day="12", volume="9", pages="e46481", keywords="cancer, cardiology, implementation science, mobile app, oncology", keywords="mobile phone", keywords="cancer patient", keywords="patient care", keywords="mobile health application", keywords="application", keywords="implementation", keywords="design", keywords="development", keywords="symptom tracking", keywords="cardiotoxicity", keywords="cancer therapy", keywords="symptom", keywords="primary care", abstract="Background: Cardiotoxicity is a limitation of several cancer therapies and early recognition improves outcomes. Symptom-tracking mobile health (mHealth) apps are feasible and beneficial, but key elements for mHealth symptom-tracking to indicate early signs of cardiotoxicity are unknown. Objective: We explored considerations for the design of, and implementation into a large academic medical center, an mHealth symptom-tracking tool for early recognition of cardiotoxicity in patients with cancer after cancer therapy initiation. Methods: We conducted semistructured interviews of >50\% of the providers (oncologists, cardio-oncologists, and radiation oncologists) who manage cancer treatment-related cardiotoxicity in the participating institution (n=11), and either interviews or co-design or both with 6 patients. Data were coded and analyzed using thematic analysis. Results: Providers indicated that there was no existing process to enable early recognition of cardiotoxicity and felt the app could reduce delays in diagnosis and lead to better patient outcomes. Signs and symptoms providers recommended for tracking included chest pain or tightness, shortness of breath, heart racing or palpitations, syncope, lightheadedness, edema, and excessive fatigue. Implementation barriers included determining who would receive symptom reports, ensuring all members of the patient's care team (eg, oncologist, cardiologist, and primary care) were informed of the symptom reports and could collaborate on care plans, and how to best integrate the app data into the electronic health record. Patients (n=6, 100\%) agreed that the app would be useful for enhanced symptom capture and education and indicated willingness to use it. Conclusions: Providers and patients agree that a patient-facing, cancer treatment-related cardiotoxicity symptom-tracking mHealth app would be beneficial. Additional studies evaluating the role of mHealth as a potential strategy for targeted early cardioprotective therapy initiation are needed. ", doi="10.2196/46481", url="https://cancer.jmir.org/2023/1/e46481", url="http://www.ncbi.nlm.nih.gov/pubmed/38085565" } @Article{info:doi/10.2196/51024, author="Huang, Shan and Liang, Yuzhen and Li, Jiarui and Li, Xuejun", title="Applications of Clinical Decision Support Systems in Diabetes Care: Scoping Review", journal="J Med Internet Res", year="2023", month="Dec", day="8", volume="25", pages="e51024", keywords="scoping review", keywords="clinical decision support system", keywords="CDSS", keywords="diabetes care", keywords="health information technology", keywords="clinical decision support", keywords="decision", keywords="decision support", keywords="diabetes", keywords="clinical application", keywords="decision-making", keywords="medical resources", abstract="Background: Providing comprehensive and individualized diabetes care remains a significant challenge in the face of the increasing complexity of diabetes management and a lack of specialized endocrinologists to support diabetes care. Clinical decision support systems (CDSSs) are progressively being used to improve diabetes care, while many health care providers lack awareness and knowledge about CDSSs in diabetes care. A comprehensive analysis of the applications of CDSSs in diabetes care is still lacking. Objective: This review aimed to summarize the research landscape, clinical applications, and impact on both patients and physicians of CDSSs in diabetes care. Methods: We conducted a scoping review following the Arksey and O'Malley framework. A search was conducted in 7 electronic databases to identify the clinical applications of CDSSs in diabetes care up to June 30, 2022. Additional searches were conducted for conference abstracts from the period of 2021-2022. Two researchers independently performed the screening and data charting processes. Results: Of 11,569 retrieved studies, 85 (0.7\%) were included for analysis. Research interest is growing in this field, with 45 (53\%) of the 85 studies published in the past 5 years. Among the 58 (68\%) out of 85 studies disclosing the underlying decision-making mechanism, most CDSSs (44/58, 76\%) were knowledge based, while the number of non-knowledge-based systems has been increasing in recent years. Among the 81 (95\%) out of 85 studies disclosing application scenarios, the majority of CDSSs were used for treatment recommendation (63/81, 78\%). Among the 39 (46\%) out of 85 studies disclosing physician user types, primary care physicians (20/39, 51\%) were the most common, followed by endocrinologists (15/39, 39\%) and nonendocrinology specialists (8/39, 21\%). CDSSs significantly improved patients' blood glucose, blood pressure, and lipid profiles in 71\% (45/63), 67\% (12/18), and 38\% (8/21) of the studies, respectively, with no increase in the risk of hypoglycemia. Conclusions: CDSSs are both effective and safe in improving diabetes care, implying that they could be a potentially reliable assistant in diabetes care, especially for physicians with limited experience and patients with limited access to medical resources. International Registered Report Identifier (IRRID): RR2-10.37766/inplasy2022.9.0061 ", doi="10.2196/51024", url="https://www.jmir.org/2023/1/e51024", url="http://www.ncbi.nlm.nih.gov/pubmed/38064249" } @Article{info:doi/10.2196/47873, author="Ruiz-C{\'a}rdenas, D. Juan and Montemurro, Alessio and Mart{\'i}nez-Garc{\'i}a, Mar Mar{\'i}a del and Rodr{\'i}guez-Juan, J. Juan", title="Sit-to-Stand Video Analysis--Based App for Diagnosing Sarcopenia and Its Relationship With Health-Related Risk Factors and Frailty in Community-Dwelling Older Adults: Diagnostic Accuracy Study", journal="J Med Internet Res", year="2023", month="Dec", day="8", volume="25", pages="e47873", keywords="sarcopenia", keywords="power", keywords="calf circumference", keywords="diagnosis", keywords="screening", keywords="affordable", keywords="community dwelling", keywords="older adults", keywords="smartphone", abstract="Background: Probable sarcopenia is determined by a reduction in muscle strength assessed with the handgrip strength test or 5 times sit-to-stand test, and it is confirmed with a reduction in muscle quantity determined by dual-energy X-ray absorptiometry or bioelectrical impedance analysis. However, these parameters are not implemented in clinical practice mainly due to a lack of equipment and time constraints. Nowadays, the technical innovations incorporated in most smartphone devices, such as high-speed video cameras, provide the opportunity to develop specific smartphone apps for measuring kinematic parameters related with sarcopenia during a simple sit-to-stand transition. Objective: We aimed to create and validate a sit-to-stand video analysis--based app for diagnosing sarcopenia in community-dwelling older adults and to analyze its construct validity with health-related risk factors and frailty. Methods: A total of 686 community-dwelling older adults (median age: 72 years; 59.2\% [406/686] female) were recruited from elderly social centers. The index test was a sit-to-stand video analysis--based app using muscle power and calf circumference as proxies of muscle strength and muscle quantity, respectively. The reference standard was obtained by different combinations of muscle strength (handgrip strength or 5 times sit-to-stand test result) and muscle quantity (appendicular skeletal mass or skeletal muscle index) as recommended by the European Working Group on Sarcopenia in Older People-2 (EWGSOP2). Sensitivity, specificity, positive and negative predictive values, and area under the curve (AUC) of the receiver operating characteristic curve were calculated to determine the diagnostic accuracy of the app. Construct validity was evaluated using logistic regression to identify the risks associated with health-related outcomes and frailty (Fried phenotype) among those individuals who were classified as having sarcopenia by the index test. Results: Sarcopenia prevalence varied from 2\% to 11\% according to the different combinations proposed by the EWGSOP2 guideline. Sensitivity, specificity, and AUC were 70\%-83.3\%, 77\%-94.9\%, and 80.5\%-87.1\%, respectively, depending on the diagnostic criteria used. Likewise, positive and negative predictive values were 10.6\%-43.6\% and 92.2\%-99.4\%, respectively. These results proved that the app was reliable to rule out the disease. Moreover, those individuals who were diagnosed with sarcopenia according to the index test showed more odds of having health-related adverse outcomes and frailty compared to their respective counterparts, regardless of the definition proposed by the EWGSOP2. Conclusions: The app showed good diagnostic performance for detecting sarcopenia in well-functioning Spanish community-dwelling older adults. Individuals with sarcopenia diagnosed by the app showed more odds of having health-related risk factors and frailty compared to their respective counterparts. These results highlight the potential use of this app in clinical settings. Trial Registration: ClinicalTrials.gov NCT05148351; https://clinicaltrials.gov/study/NCT05148351 International Registered Report Identifier (IRRID): RR2-10.3390/s22166010 ", doi="10.2196/47873", url="https://www.jmir.org/2023/1/e47873", url="http://www.ncbi.nlm.nih.gov/pubmed/38064268" } @Article{info:doi/10.2196/45815, author="Shi, Jin and Bendig, David and Vollmar, Christian Horst and Rasche, Peter", title="Mapping the Bibliometrics Landscape of AI in Medicine: Methodological Study", journal="J Med Internet Res", year="2023", month="Dec", day="8", volume="25", pages="e45815", keywords="artificial intelligence", keywords="AI", keywords="AI in medicine", keywords="medical AI taxonomy", keywords="Python", keywords="latent Dirichlet allocation", keywords="LDA", keywords="topic modeling", keywords="unsupervised machine learning", abstract="Background: Artificial intelligence (AI), conceived in the 1950s, has permeated numerous industries, intensifying in tandem with advancements in computing power. Despite the widespread adoption of AI, its integration into medicine trails other sectors. However, medical AI research has experienced substantial growth, attracting considerable attention from researchers and practitioners. Objective: In the absence of an existing framework, this study aims to outline the current landscape of medical AI research and provide insights into its future developments by examining all AI-related studies within PubMed over the past 2 decades. We also propose potential data acquisition and analysis methods, developed using Python (version 3.11) and to be executed in Spyder IDE (version 5.4.3), for future analogous research. Methods: Our dual-pronged approach involved (1) retrieving publication metadata related to AI from PubMed (spanning 2000-2022) via Python, including titles, abstracts, authors, journals, country, and publishing years, followed by keyword frequency analysis and (2) classifying relevant topics using latent Dirichlet allocation, an unsupervised machine learning approach, and defining the research scope of AI in medicine. In the absence of a universal medical AI taxonomy, we used an AI dictionary based on the European Commission Joint Research Centre AI Watch report, which emphasizes 8 domains: reasoning, planning, learning, perception, communication, integration and interaction, service, and AI ethics and philosophy. Results: From 2000 to 2022, a comprehensive analysis of 307,701 AI-related publications from PubMed highlighted a 36-fold increase. The United States emerged as a clear frontrunner, producing 68,502 of these articles. Despite its substantial contribution in terms of volume, China lagged in terms of citation impact. Diving into specific AI domains, as the Joint Research Centre AI Watch report categorized, the learning domain emerged dominant. Our classification analysis meticulously traced the nuanced research trajectories across each domain, revealing the multifaceted and evolving nature of AI's application in the realm of medicine. Conclusions: The research topics have evolved as the volume of AI studies increases annually. Machine learning remains central to medical AI research, with deep learning expected to maintain its fundamental role. Empowered by predictive algorithms, pattern recognition, and imaging analysis capabilities, the future of AI research in medicine is anticipated to concentrate on medical diagnosis, robotic intervention, and disease management. Our topic modeling outcomes provide a clear insight into the focus of AI research in medicine over the past decades and lay the groundwork for predicting future directions. The domains that have attracted considerable research attention, primarily the learning domain, will continue to shape the trajectory of AI in medicine. Given the observed growing interest, the domain of AI ethics and philosophy also stands out as a prospective area of increased focus. ", doi="10.2196/45815", url="https://www.jmir.org/2023/1/e45815", url="http://www.ncbi.nlm.nih.gov/pubmed/38064255" } @Article{info:doi/10.2196/50813, author="Starnecker, Fabian and Reimer, Marie Lara and Nissen, Leon and Jovanovi{\'c}, Marko and Kapsecker, Maximilian and Rospleszcz, Susanne and von Scheidt, Moritz and Krefting, Johannes and Kr{\"u}ger, Nils and Perl, Benedikt and Wiehler, Jens and Sun, Ruoyu and Jonas, Stephan and Schunkert, Heribert", title="Guideline-Based Cardiovascular Risk Assessment Delivered by an mHealth App: Development Study", journal="JMIR Cardio", year="2023", month="Dec", day="8", volume="7", pages="e50813", keywords="cardiovascular disease", keywords="cardiovascular risk assessment", keywords="HerzFit", keywords="mobile health app", keywords="mHealth app", keywords="public information campaigns", keywords="prevention", keywords="risk calculator", keywords="mobile phone", abstract="Background: Identifying high-risk individuals is crucial for preventing cardiovascular diseases (CVDs). Currently, risk assessment is mostly performed by physicians. Mobile health apps could help decouple the determination of risk from medical resources by allowing unrestricted self-assessment. The respective test results need to be interpretable for laypersons. Objective: Together with a patient organization, we aimed to design a digital risk calculator that allows people to individually assess and optimize their CVD risk. The risk calculator was integrated into the mobile health app HerzFit, which provides the respective background information. Methods: To cover a broad spectrum of individuals for both primary and secondary prevention, we integrated the respective scores (Framingham 10-year CVD, Systematic Coronary Risk Evaluation 2, Systematic Coronary Risk Evaluation 2 in Older Persons, and Secondary Manifestations Of Arterial Disease) into a single risk calculator that was recalibrated for the German population. In primary prevention, an individual's heart age is estimated, which gives the user an easy-to-understand metric for assessing cardiac health. For secondary prevention, the risk of recurrence was assessed. In addition, a comparison of expected to mean and optimal risk levels was determined. The risk calculator is available free of charge. Data safety is ensured by processing the data locally on the users' smartphones. Results: Offering a risk calculator to the general population requires the use of multiple instruments, as each provides only a limited spectrum in terms of age and risk distribution. The integration of 4 internationally recommended scores allows risk calculation in individuals aged 30 to 90 years with and without CVD. Such integration requires recalibration and harmonization to provide consistent and plausible estimates. In the first 14 months after the launch, the HerzFit calculator was downloaded more than 96,000 times, indicating great demand. Public information campaigns proved effective in publicizing the risk calculator and contributed significantly to download numbers. Conclusions: The HerzFit calculator provides CVD risk assessment for the general population. The public demonstrated great demand for such a risk calculator as it was downloaded up to 10,000 times per month, depending on campaigns creating awareness for the instrument. ", doi="10.2196/50813", url="https://cardio.jmir.org/2023/1/e50813", url="http://www.ncbi.nlm.nih.gov/pubmed/38064248" } @Article{info:doi/10.2196/43821, author="Khalid, Mahnoor and Sutterfield, Bethany and Minley, Kirstien and Ottwell, Ryan and Abercrombie, McKenna and Heath, Christopher and Torgerson, Trevor and Hartwell, Micah and Vassar, Matt", title="The Reporting and Methodological Quality of Systematic Reviews Underpinning Clinical Practice Guidelines Focused on the Management of Cutaneous Melanoma: Cross-Sectional Analysis", journal="JMIR Dermatol", year="2023", month="Dec", day="7", volume="6", pages="e43821", keywords="clinical practice guidelines", keywords="clinical", keywords="cutaneous melanoma", keywords="decision making", keywords="evidence", keywords="management", keywords="melanoma", keywords="practice guideline", keywords="review", keywords="systematic review", abstract="Background: Clinical practice guidelines (CPGs) inform evidence-based decision-making in the clinical setting; however, systematic reviews (SRs) that inform these CPGs may vary in terms of reporting and methodological quality, which affects confidence in summary effect estimates. Objective: Our objective was to appraise the methodological and reporting quality of the SRs used in CPGs for cutaneous melanoma and evaluate differences in these outcomes between Cochrane and non-Cochrane reviews. Methods: We conducted a cross-sectional analysis by searching PubMed for cutaneous melanoma guidelines published between January 1, 2015, and May 21, 2021. Next, we extracted SRs composing these guidelines and appraised their reporting and methodological rigor using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and AMSTAR (A Measurement Tool to Assess Systematic Reviews) checklists. Lastly, we compared these outcomes between Cochrane and non-Cochrane SRs. All screening and data extraction occurred in a masked, duplicate fashion. Results: Of the SRs appraised, the mean completion rate was 66.5\% (SD 12.29\%) for the PRISMA checklist and 44.5\% (SD 21.05\%) for AMSTAR. The majority of SRs (19/50, 53\%) were of critically low methodological quality, with no SRs being appraised as high quality. There was a statistically significant association (P<.001) between AMSTAR and PRISMA checklists. Cochrane SRs had higher PRISMA mean completion rates and higher methodological quality than non-Cochrane SRs. Conclusions: SRs supporting CPGs focused on the management of cutaneous melanoma vary in reporting and methodological quality, with the majority of SRs being of low quality. Increasing adherence to PRISMA and AMSTAR checklists will likely increase the quality of SRs, thereby increasing the level of evidence supporting cutaneous melanoma CPGs. ", doi="10.2196/43821", url="https://derma.jmir.org/2023/1/e43821", url="http://www.ncbi.nlm.nih.gov/pubmed/38060306" } @Article{info:doi/10.2196/50027, author="Gierend, Kerstin and Waltemath, Dagmar and Ganslandt, Thomas and Siegel, Fabian", title="Traceable Research Data Sharing in a German Medical Data Integration Center With FAIR (Findability, Accessibility, Interoperability, and Reusability)-Geared Provenance Implementation: Proof-of-Concept Study", journal="JMIR Form Res", year="2023", month="Dec", day="7", volume="7", pages="e50027", keywords="provenance", keywords="traceability", keywords="data management", keywords="metadata", keywords="data integrity", keywords="data integration center", keywords="medical informatics", abstract="Background: Secondary investigations into digital health records, including electronic patient data from German medical data integration centers (DICs), pave the way for enhanced future patient care. However, only limited information is captured regarding the integrity, traceability, and quality of the (sensitive) data elements. This lack of detail diminishes trust in the validity of the collected data. From a technical standpoint, adhering to the widely accepted FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship necessitates enriching data with provenance-related metadata. Provenance offers insights into the readiness for the reuse of a data element and serves as a supplier of data governance. Objective: The primary goal of this study is to augment the reusability of clinical routine data within a medical DIC for secondary utilization in clinical research. Our aim is to establish provenance traces that underpin the status of data integrity, reliability, and consequently, trust in electronic health records, thereby enhancing the accountability of the medical DIC. We present the implementation of a proof-of-concept provenance library integrating international standards as an initial step. Methods: We adhered to a customized road map for a provenance framework, and examined the data integration steps across the ETL (extract, transform, and load) phases. Following a maturity model, we derived requirements for a provenance library. Using this research approach, we formulated a provenance model with associated metadata and implemented a proof-of-concept provenance class. Furthermore, we seamlessly incorporated the internationally recognized Word Wide Web Consortium (W3C) provenance standard, aligned the resultant provenance records with the interoperable health care standard Fast Healthcare Interoperability Resources, and presented them in various representation formats. Ultimately, we conducted a thorough assessment of provenance trace measurements. Results: This study marks the inaugural implementation of integrated provenance traces at the data element level within a German medical DIC. We devised and executed a practical method that synergizes the robustness of quality- and health standard--guided (meta)data management practices. Our measurements indicate commendable pipeline execution times, attaining notable levels of accuracy and reliability in processing clinical routine data, thereby ensuring accountability in the medical DIC. These findings should inspire the development of additional tools aimed at providing evidence-based and reliable electronic health record services for secondary use. Conclusions: The research method outlined for the proof-of-concept provenance class has been crafted to promote effective and reliable core data management practices. It aims to enhance biomedical data by imbuing it with meaningful provenance, thereby bolstering the benefits for both research and society. Additionally, it facilitates the streamlined reuse of biomedical data. As a result, the system mitigates risks, as data analysis without knowledge of the origin and quality of all data elements is rendered futile. While the approach was initially developed for the medical DIC use case, these principles can be universally applied throughout the scientific domain. ", doi="10.2196/50027", url="https://formative.jmir.org/2023/1/e50027", url="http://www.ncbi.nlm.nih.gov/pubmed/38060305" } @Article{info:doi/10.2196/48145, author="Liu, Jiaxing and Gupta, Shalini and Chen, Aipeng and Wang, Chen-Kai and Mishra, Pratik and Dai, Hong-Jie and Wong, Shui-Yee Zoie and Jonnagaddala, Jitendra", title="OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study", journal="J Med Internet Res", year="2023", month="Dec", day="6", volume="25", pages="e48145", keywords="deidentification", keywords="scrubbing", keywords="anonymization", keywords="surrogate generation", keywords="unstructured EHRs", keywords="electronic health records", keywords="BERT", keywords="Bidirectional Encoder Representations from Transformers", abstract="Background: Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research purposes, sensitive health information (SHI) must be removed in several cases to protect patient privacy. Rule-based and machine learning--based methods have been shown to be effective in deidentification. However, very few studies investigated the combination of transformer-based language models and rules. Objective: The objective of this study is to develop a hybrid deidentification pipeline for Australian EHR text notes using rules and transformers. The study also aims to investigate the impact of pretrained word embedding and transformer-based language models. Methods: In this study, we present a hybrid deidentification pipeline called OpenDeID, which is developed using an Australian multicenter EHR-based corpus called OpenDeID Corpus. The OpenDeID corpus consists of 2100 pathology reports with 38,414 SHI entities from 1833 patients. The OpenDeID pipeline incorporates a hybrid approach of associative rules, supervised deep learning, and pretrained language models. Results: The OpenDeID achieved a best F1-score of 0.9659 by fine-tuning the Discharge Summary BioBERT model and incorporating various preprocessing and postprocessing rules. The OpenDeID pipeline has been deployed at a large tertiary teaching hospital and has processed over 8000 unstructured EHR text notes in real time. Conclusions: The OpenDeID pipeline is a hybrid deidentification pipeline to deidentify SHI entities in unstructured EHR text notes. The pipeline has been evaluated on a large multicenter corpus. External validation will be undertaken as part of our future work to evaluate the effectiveness of the OpenDeID pipeline. ", doi="10.2196/48145", url="https://www.jmir.org/2023/1/e48145", url="http://www.ncbi.nlm.nih.gov/pubmed/38055317" } @Article{info:doi/10.2196/53058, author="Lee, Ra Ah and Park, Hojoon and Yoo, Aram and Kim, Seok and Sunwoo, Leonard and Yoo, Sooyoung", title="Risk Prediction of Emergency Department Visits in Patients With Lung Cancer Using Machine Learning: Retrospective Observational Study", journal="JMIR Med Inform", year="2023", month="Dec", day="6", volume="11", pages="e53058", keywords="emergency department", keywords="lung cancer", keywords="risk prediction", keywords="machine learning", keywords="common data model", keywords="emergency", keywords="hospitalization", keywords="hospitalizations", keywords="lung", keywords="cancer", keywords="oncology", keywords="lungs", keywords="pulmonary", keywords="respiratory", keywords="predict", keywords="prediction", keywords="predictions", keywords="predictive", keywords="algorithm", keywords="algorithms", keywords="risk", keywords="risks", keywords="model", keywords="models", abstract="Background: Patients with lung cancer are among the most frequent visitors to emergency departments due to cancer-related problems, and the prognosis for those who seek emergency care is dismal. Given that patients with lung cancer frequently visit health care facilities for treatment or follow-up, the ability to predict emergency department visits based on clinical information gleaned from their routine visits would enhance hospital resource utilization and patient outcomes. Objective: This study proposed a machine learning--based prediction model to identify risk factors for emergency department visits by patients with lung cancer. Methods: This was a retrospective observational study of patients with lung cancer diagnosed at Seoul National University Bundang Hospital, a tertiary general hospital in South Korea, between January 2010 and December 2017. The primary outcome was an emergency department visit within 30 days of an outpatient visit. This study developed a machine learning--based prediction model using a common data model. In addition, the importance of features that influenced the decision-making of the model output was analyzed to identify significant clinical factors. Results: The model with the best performance demonstrated an area under the receiver operating characteristic curve of 0.73 in its ability to predict the attendance of patients with lung cancer in emergency departments. The frequency of recent visits to the emergency department and several laboratory test results that are typically collected during cancer treatment follow-up visits were revealed as influencing factors for the model output. Conclusions: This study developed a machine learning--based risk prediction model using a common data model and identified influencing factors for emergency department visits by patients with lung cancer. The predictive model contributes to the efficiency of resource utilization and health care service quality by facilitating the identification and early intervention of high-risk patients. This study demonstrated the possibility of collaborative research among different institutions using the common data model for precision medicine in lung cancer. ", doi="10.2196/53058", url="https://medinform.jmir.org/2023/1/e53058", url="http://www.ncbi.nlm.nih.gov/pubmed/38055320" } @Article{info:doi/10.2196/47262, author="Dryden, Lindsay and Song, Jacquelin and Valenzano, J. Teresa and Yang, Zhen and Debnath, Meggie and Lin, Rebecca and Topolovec-Vranic, Jane and Mamdani, Muhammad and Antoniou, Tony", title="Evaluation of Machine Learning Approaches for Predicting Warfarin Discharge Dose in Cardiac Surgery Patients: Retrospective Algorithm Development and Validation Study", journal="JMIR Cardio", year="2023", month="Dec", day="6", volume="7", pages="e47262", keywords="algorithm", keywords="anticlotting", keywords="anticoagulant", keywords="anticoagulation", keywords="blood thinner", keywords="cardiac", keywords="cardiology", keywords="develop", keywords="dosage", keywords="international normalized ratio", keywords="machine learning", keywords="medical informatics", keywords="pharmacology", keywords="postoperative", keywords="predict", keywords="prescribe", keywords="prescription", keywords="surgery", keywords="surgical", keywords="validate", keywords="validation", keywords="warfarin administration and dosage", keywords="warfarin", abstract="Background: Warfarin dosing in cardiac surgery patients is complicated by a heightened sensitivity to the drug, predisposing patients to adverse events. Predictive algorithms are therefore needed to guide warfarin dosing in cardiac surgery patients. Objective: This study aimed to develop and validate an algorithm for predicting the warfarin dose needed to attain a therapeutic international normalized ratio (INR) at the time of discharge in cardiac surgery patients. Methods: We abstracted variables influencing warfarin dosage from the records of 1031 encounters initiating warfarin between April 1, 2011, and November 29, 2019, at St Michael's Hospital in Toronto, Ontario, Canada. We compared the performance of penalized linear regression, k-nearest neighbors, random forest regression, gradient boosting, multivariate adaptive regression splines, and an ensemble model combining the predictions of the 5 regression models. We developed and validated separate models for predicting the warfarin dose required for achieving a discharge INR of 2.0-3.0 in patients undergoing all forms of cardiac surgery except mechanical mitral valve replacement and a discharge INR of 2.5-3.5 in patients receiving a mechanical mitral valve replacement. For the former, we selected 80\% of encounters (n=780) who had initiated warfarin during their hospital admission and had achieved a target INR of 2.0-3.0 at the time of discharge as the training cohort. Following 10-fold cross-validation, model accuracy was evaluated in a test cohort comprised solely of cardiac surgery patients. For patients requiring a target INR of 2.5-3.5 (n=165), we used leave-p-out cross-validation (p=3 observations) to estimate model performance. For each approach, we determined the mean absolute error (MAE) and the proportion of predictions within 20\% of the true warfarin dose. We retrospectively evaluated the best-performing algorithm in clinical practice by comparing the proportion of cardiovascular surgery patients discharged with a therapeutic INR before (April 2011 and July 2019) and following (September 2021 and May 2, 2022) its implementation in routine care. Results: Random forest regression was the best-performing model for patients with a target INR of 2.0-3.0, an MAE of 1.13 mg, and 39.5\% of predictions of falling within 20\% of the actual therapeutic discharge dose. For patients with a target INR of 2.5-3.5, the ensemble model performed best, with an MAE of 1.11 mg and 43.6\% of predictions being within 20\% of the actual therapeutic discharge dose. The proportion of cardiovascular surgery patients discharged with a therapeutic INR before and following implementation of these algorithms in clinical practice was 47.5\% (305/641) and 61.1\% (11/18), respectively. Conclusions: Machine learning algorithms based on routinely available clinical data can help guide initial warfarin dosing in cardiac surgery patients and optimize the postsurgical anticoagulation of these patients. ", doi="10.2196/47262", url="https://cardio.jmir.org/2023/1/e47262", url="http://www.ncbi.nlm.nih.gov/pubmed/38055310" } @Article{info:doi/10.2196/49894, author="Wang, Yi and Yu, Yide and Liu, Yue and Ma, Yan and Pang, Cheong-Iao Patrick", title="Predicting Patients' Satisfaction With Mental Health Drug Treatment Using Their Reviews: Unified Interchangeable Model Fusion Approach", journal="JMIR Ment Health", year="2023", month="Dec", day="5", volume="10", pages="e49894", keywords="artificial intelligence", keywords="AI", keywords="mental disorder", keywords="psychotherapy effectiveness", keywords="deep learning", keywords="machine learning", keywords="natural language processing", keywords="NLP", keywords="data imbalance", keywords="model fusion", abstract="Background: After the COVID-19 pandemic, the conflict between limited mental health care resources and the rapidly growing number of patients has become more pronounced. It is necessary for psychologists to borrow artificial intelligence (AI)--based methods to analyze patients' satisfaction with drug treatment for those undergoing mental illness treatment. Objective: Our goal was to construct highly accurate and transferable models for predicting the satisfaction of patients with mental illness with medication by analyzing their own experiences and comments related to medication intake. Methods: We extracted 41,851 reviews in 20 categories of disorders related to mental illnesses from a large public data set of 161,297 reviews in 16,950 illness categories. To discover a more optimal structure of the natural language processing models, we proposed the Unified Interchangeable Model Fusion to decompose the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT), support vector machine, and random forest (RF) models into 2 modules, the encoder and the classifier, and then reconstruct fused ``encoder+classifer'' models to accurately evaluate patients' satisfaction. The fused models were divided into 2 categories in terms of model structures, traditional machine learning--based models and neural network--based models. A new loss function was proposed for those neural network--based models to overcome overfitting and data imbalance. Finally, we fine-tuned the fused models and evaluated their performance comprehensively in terms of F1-score, accuracy, $\kappa$ coefficient, and training time using 10-fold cross-validation. Results: Through extensive experiments, the transformer bidirectional encoder+RF model outperformed the state-of-the-art BERT, MentalBERT, and other fused models. It became the optimal model for predicting the patients' satisfaction with drug treatment. It achieved an average graded F1-score of 0.872, an accuracy of 0.873, and a $\kappa$ coefficient of 0.806. This model is suitable for high-standard users with sufficient computing resources. Alternatively, it turned out that the word-embedding encoder+RF model showed relatively good performance with an average graded F1-score of 0.801, an accuracy of 0.812, and a $\kappa$ coefficient of 0.695 but with much less training time. It can be deployed in environments with limited computing resources. Conclusions: We analyzed the performance of support vector machine, RF, BERT, MentalBERT, and all fused models and identified the optimal models for different clinical scenarios. The findings can serve as evidence to support that the natural language processing methods can effectively assist psychologists in evaluating the satisfaction of patients with drug treatment programs and provide precise and standardized solutions. The Unified Interchangeable Model Fusion provides a different perspective on building AI models in mental health and has the potential to fuse the strengths of different components of the models into a single model, which may contribute to the development of AI in mental health. ", doi="10.2196/49894", url="https://mental.jmir.org/2023/1/e49894", url="http://www.ncbi.nlm.nih.gov/pubmed/38051580" } @Article{info:doi/10.2196/53124, author="Odeh, Yousra and Al-Balas, Mahmoud", title="Implications of Agile Values in Software Engineering for Agility in Breast Cancer Treatment: Protocol for a Comparative Study", journal="JMIR Res Protoc", year="2023", month="Dec", day="5", volume="12", pages="e53124", keywords="agile breast cancer treatment", keywords="breast cancer", keywords="breast cancer treatment", keywords="agile", keywords="software engineering", keywords="agile software engineering", keywords="oncology", keywords="agile values", keywords="multidisciplinary research", keywords="agility in health care", keywords="agile oncology practice", abstract="Background: Breast cancer treatment has been described as a dynamic and patient-centered approach that emphasizes adaptability and flexibility throughout the treatment process. Breast cancer is complex, with varying subtypes and stages, making it important to tailor treatment plans to each patient's unique circumstances. Breast cancer treatment delivery relies on a multidisciplinary team of health care professionals who collaborate to provide personalized care and quick adaptation to changing conditions to optimize outcomes while minimizing side effects and maintaining the patient's quality of life. However, agility in breast cancer treatment has not been defined according to common agile values and described in language comprehensible to breast cancer professionals. In the rapidly evolving landscape of breast cancer treatment, the incorporation of agile values from software engineering promises to enhance patient care. Objective: Our objective is to propose agile values for breast cancer treatment adopted and adapted from software engineering. We also aim to validate how these values conform to the concept of agility in the breast cancer context through referencing past work. Methods: We applied a structured research methodology to identify and validate 4 agile values for breast cancer treatment. In the elicitation phase, through 2 interviews, we identified 4 agile values and described them in language that resonates with breast cancer treatment professionals. The values were then validated by a domain expert and discussed in the context of supporting work from the literature. Final validation entailed a domain expert conducting a walkthrough of the 4 identified agile values to adjust them as per the reported literature. Results: Four agile values were identified for breast cancer treatment, and among them, we validated 3 that conformed to the concept of agility. The fourth value, documentation and the quality of documentation, is vital for breast cancer treatment planning and management. This does not conform to agility. However, its nonagility is vital for the agility of the other values. None of the identified agile values were validated as partially conforming to the concept of agility. Conclusions: This work makes a novel contribution to knowledge in identifying the first set of agile values in breast cancer treatment through multidisciplinary research. Three of these values were evaluated as conforming to the concept of agility, and although 1 value did not meet the concept of agility, it enhanced the agility of the other values. It is anticipated that these 4 agile values can drive oncology practice, strategies, policies, protocols, and procedures to enhance delivery of care. Moreover, the identified values contribute to identifying quality assurance and control practices to assess the concept of agility in oncology practice and breast cancer treatment and adjust corresponding actions. We conclude that breast cancer treatment agile values are not limited to 4. International Registered Report Identifier (IRRID): RR1-10.2196/53124 ", doi="10.2196/53124", url="https://www.researchprotocols.org/2023/1/e53124", url="http://www.ncbi.nlm.nih.gov/pubmed/38051558" } @Article{info:doi/10.2196/51480, author="Kirkwood, L. Melissa and Armstrong, J. Ehrin and Ansari, M. Mohammad and Holden, Andrew and Reijnen, J. Michel M. P. and Steinbauer, Markus and Crannell, Zachary and Novoa, Hector and Phillips, Austin and Schneider, B. Darren", title="FORWARD Study of GORE VIABAHN Balloon-Expandable Endoprostheses and Bare Metal Stents in the United States, European Union, United Kingdom, Australia, and New Zealand When Placed to Treat Complex Iliac Occlusive Disease: Protocol for a Randomized Superiority Trial", journal="JMIR Res Protoc", year="2023", month="Dec", day="4", volume="12", pages="e51480", keywords="iliac artery occlusive disease", keywords="VIABAHN VBX balloon expandable endoprosthesis", keywords="covered stent", keywords="stent graft", keywords="stent", keywords="randomized control trial", keywords="FORWARD", keywords="endoprosthesis", keywords="atherosclerosis", keywords="endovascular", keywords="stenting", keywords="occlusion", keywords="RCT", keywords="iliac occlusion", abstract="Background: The recommendations for the use of and selection of covered stent grafts in patients with aortoiliac occlusive disease are limited. Objective: The GORE VBX FORWARD clinical study aims to demonstrate the superiority of the GORE VIABAHN VBX Balloon Expandable Endoprosthesis (VBX device) for primary patency when compared to bare metal stenting (BMS) for the treatment of complex iliac artery occlusive disease. Methods: A prospective, multicenter, randomized control study in the United States, European Union, United Kingdom, Australia, and New Zealand will enroll patients with symptomatic, complex iliac artery occlusive disease. In this study, iliac artery occlusive disease is defined as a unilateral or bilateral disease with single or multiple lesions (with >50\% stenosis or chronic total occlusion) each between 4 and 11 cm in length. In an attempt to more closely match real-world practices, patients with minor tissue loss (Rutherford class 5) and patients requiring hemodialysis will be included. Baseline aortoiliac angiography will be performed to assess target lesion characteristics and determine final patient eligibility. Once the patient is confirmed and guidewires are in place across the target lesions, the patient will be randomized in a 1:1 format to treatment with either the VBX device or a BMS. The BMS can be balloon- or self-expanding and must be approved for the iliac artery occlusive disease indication. Patients, the independent core laboratory reviewers, and Clinical Events Committee members will be blinded from the assigned treatment. Dual antithrombotic medical therapy is required through a minimum of 3 months post procedure. The primary end point is 12?month primary patency and will be adjudicated by an independent imaging core laboratory and Clinical Events Committee. Key secondary end points will be tested for superiority and include technical, acute procedural, and clinical success; changes in Ankle-brachial index; patient quality of life; primary patency; freedom from restenosis; primary-assisted patency; secondary patency; freedom from target lesion revascularizations; cumulative reintervention rate; amputation-free survival; survival; and change in Rutherford category. Study follow-up will continue through 5?years. Results: Outcomes will be reported following study completion. Enrollment is anticipated to start in October 2023. Conclusions: The results of this study will provide definitive, level 1 clinical evidence to clinicians on the optimal choice of stent device to use for the treatment of complex iliac artery occlusive disease. The FORWARD study is powered for superiority and includes only complex, unilateral, or bilateral lesions involving the common or external iliac arteries. This study is a multidisciplinary endeavor involving vascular surgery, interventional cardiology, and interventional radiology across multiple countries with a blinded core laboratory review of end points in hopes that the outcomes will be widely accepted and incorporated into practice guidelines for optimal treatment of patients with complex iliac artery occlusive disease. Trial Registration: ClinicalTrials.gov NCT05811364; https://clinicaltrials.gov/study/NCT05811364 International Registered Report Identifier (IRRID): PRR1-10.2196/51480 ", doi="10.2196/51480", url="https://www.researchprotocols.org/2023/1/e51480", url="http://www.ncbi.nlm.nih.gov/pubmed/38048145" } @Article{info:doi/10.2196/49147, author="Gu, Dongmei and Lv, Xiaozhen and Shi, Chuan and Zhang, Tianhong and Liu, Sha and Fan, Zili and Tu, Lihui and Zhang, Ming and Zhang, Nan and Chen, Liming and Wang, Zhijiang and Wang, Jing and Zhang, Ying and Li, Huizi and Wang, Luchun and Zhu, Jiahui and Zheng, Yaonan and Wang, Huali and Yu, Xin and ", title="A Stable and Scalable Digital Composite Neurocognitive Test for Early Dementia Screening Based on Machine Learning: Model Development and Validation Study", journal="J Med Internet Res", year="2023", month="Dec", day="1", volume="25", pages="e49147", keywords="mild cognitive impairment", keywords="digital cognitive assessment", keywords="machine learning", keywords="neurocognitive test", keywords="cognitive screening", keywords="dementia", abstract="Background: Dementia has become a major public health concern due to its heavy disease burden. Mild cognitive impairment (MCI) is a transitional stage between healthy aging and dementia. Early identification of MCI is an essential step in dementia prevention. Objective: Based on machine learning (ML) methods, this study aimed to develop and validate a stable and scalable panel of cognitive tests for the early detection of MCI and dementia based on the Chinese Neuropsychological Consensus Battery (CNCB) in the Chinese Neuropsychological Normative Project (CN-NORM) cohort. Methods: CN-NORM was a nationwide, multicenter study conducted in China with 871 participants, including an MCI group (n=327, 37.5\%), a dementia group (n=186, 21.4\%), and a cognitively normal (CN) group (n=358, 41.1\%). We used the following 4 algorithms to select candidate variables: the F-score according to the SelectKBest method, the area under the curve (AUC) from logistic regression (LR), P values from the logit method, and backward stepwise elimination. Different models were constructed after considering the administration duration and complexity of combinations of various tests. Receiver operating characteristic curve and AUC metrics were used to evaluate the discriminative ability of the models via stratified sampling cross-validation and LR and support vector classification (SVC) algorithms. This model was further validated in the Alzheimer's Disease Neuroimaging Initiative phase 3 (ADNI-3) cohort (N=743), which included 416 (56\%) CN subjects, 237 (31.9\%) patients with MCI, and 90 (12.1\%) patients with dementia. Results: Except for social cognition, all other domains in the CNCB differed between the MCI and CN groups (P<.008). In feature selection results regarding discrimination between the MCI and CN groups, the Hopkins Verbal Learning Test-5 minutes Recall had the best performance, with the highest mean AUC of up to 0.80 (SD 0.02) and an F-score of up to 258.70. The scalability of model 5 (Hopkins Verbal Learning Test-5 minutes Recall and Trail Making Test-B) was the lowest. Model 5 achieved a higher level of discrimination than the Hong Kong Brief Cognitive test score in distinguishing between the MCI and CN groups (P<.05). Model 5 also provided the highest sensitivity of up to 0.82 (range 0.72-0.92) and 0.83 (range 0.75-0.91) according to LR and SVC, respectively. This model yielded a similar robust discriminative performance in the ADNI-3 cohort regarding differentiation between the MCI and CN groups, with a mean AUC of up to 0.81 (SD 0) according to both LR and SVC algorithms. Conclusions: We developed a stable and scalable composite neurocognitive test based on ML that could differentiate not only between patients with MCI and controls but also between patients with different stages of cognitive impairment. This composite neurocognitive test is a feasible and practical digital biomarker that can potentially be used in large-scale cognitive screening and intervention studies. ", doi="10.2196/49147", url="https://www.jmir.org/2023/1/e49147", url="http://www.ncbi.nlm.nih.gov/pubmed/38039074" } @Article{info:doi/10.2196/49186, author="Mignanelli, Ga{\"e}tan and Boyer, Richard and Bonifas, Nicolas and Rineau, Emmanuel and Moussali, Yassine and Le Guen, Morgan", title="Survey of the Impact of Decision Support in Preoperative Management of Anemia (i-Anemia): Survey Study", journal="JMIR Perioper Med", year="2023", month="Dec", day="1", volume="6", pages="e49186", keywords="anemia", keywords="transfusion", keywords="patient blood management", keywords="preoperative optimization", keywords="preoperative", keywords="blood", keywords="decision support", keywords="randomized", keywords="case", keywords="survey", keywords="anesthesiologists", keywords="anesthesiologist", keywords="anesthesia", keywords="anesthesiology", keywords="professional development", keywords="digital health", keywords="surgery", keywords="perioperative", abstract="Background: Major surgery on patients with anemia has demonstrated an increased risk of perioperative blood transfusions and postoperative morbidity and mortality. Recent studies have shown that integrating preoperative anemia treatment as a component of perioperative blood management may reduce blood product utilization and improve outcomes in both cardiac and noncardiac surgery. However, outpatient management of anemia falls outside of daily practice for most anesthesiologists and is probably weakly understood. Objective: We conducted a simulated case survey with anesthesiologists to accomplish the following aims: (1) evaluate the baseline knowledge of the preoperative optimization of anemia and (2) determine the impact of real-time clinical decision support on anemia management. Methods: We sent a digital survey (i-Anemia) to members of the French Society of Anaesthesia and Critical Care. The i-Anemia survey contained 7 simulated case vignettes, each describing a patient's brief clinical history and containing up to 3 multiple-choice questions related to preoperative anemia management (12 questions in total). The cases concerned potential situations of preoperative anemia and were created and validated with a committee of patient blood management experts. Correct answers were determined by the current guidelines or by expert consensus. Eligible participants were randomly assigned to control or decision support groups. In the decision support group, the primary outcome measured was the correct response rate. Results: Overall, 1123 participants were enrolled and randomly divided into control (n=568) and decision support (n=555) groups. Among them, 763 participants fully responded to the survey. We obtained a complete response rate of 65.6\% (n=364) in the group receiving cognitive aid and 70.2\% (n=399) in the group without assistance. The mean duration of response was 10.2 (SD 6.8) minutes versus 7.8 (SD 5) minutes for the decision support and control groups, respectively (P<.001). The score significantly improved with cognitive aid (mean 10.3 out of 12, SD 2.1) in comparison to standard care (mean 6.2 out of 12, SD 2.1; P<.001). Conclusions: Management strategies to optimize preoperative anemia are not fully known and applied by anesthesiologists in daily practice despite their clinical importance. However, adding a decision support tool can significantly improve patient care by reminding practitioners of current recommendations. ", doi="10.2196/49186", url="https://periop.jmir.org/2023/1/e49186", url="http://www.ncbi.nlm.nih.gov/pubmed/38039068" } @Article{info:doi/10.2196/51387, author="McCormack, Heather and Wand, Handan and Newman, E. Christy and Bourne, Christopher and Kennedy, Catherine and Guy, Rebecca", title="Exploring Whether the Electronic Optimization of Routine Health Assessments Can Increase Testing for Sexually Transmitted Infections and Provider Acceptability at an Aboriginal Community Controlled Health Service: Mixed Methods Evaluation", journal="JMIR Med Inform", year="2023", month="Nov", day="30", volume="11", pages="e51387", keywords="sexual health", keywords="sexually transmitted infection", keywords="STI", keywords="primary care", keywords="Indigenous health", keywords="electronic medical record", keywords="EMR", keywords="medical records", keywords="electronic health record", keywords="EHR", keywords="health record", keywords="health records", keywords="Indigenous", keywords="Native", keywords="Aboriginal", keywords="sexual transmission", keywords="sexually transmitted", keywords="time series", keywords="testing", keywords="uptake", keywords="acceptance", keywords="acceptability", keywords="adoption", keywords="syphilis", keywords="sexually transmitted disease", keywords="STD", keywords="systems change", keywords="health assessment", keywords="health assessments", keywords="prompt", keywords="prompts", keywords="implementation", keywords="youth", keywords="young people", keywords="adolescent", keywords="adolescents", abstract="Background: In the context of a syphilis outbreak in neighboring states, a multifaceted systems change to increase testing for sexually transmitted infections (STIs) among young Aboriginal people aged 15 to 29 years was implemented at an Aboriginal Community Controlled Health Service (ACCHS) in New South Wales, Australia. The components included electronic medical record prompts and automated pathology test sets to increase STI testing in annual routine health assessments, the credentialing of nurses and Aboriginal health practitioners to conduct STI tests independently, pathology request forms presigned by a physician, and improved data reporting. Objective: We aimed to determine whether the systems change increased the integration of STI testing into routine health assessments by clinicians between April 2019 and March 2020, the inclusion of syphilis tests in STI testing, and STI testing uptake overall. We also explored the understandings of factors contributing to the acceptability and normalization of the systems change among staff. Methods: We used a mixed methods design to evaluate the effectiveness and acceptability of the systems change implemented in 2019. We calculated the annual proportion of health assessments that included tests for chlamydia, gonorrhea, and syphilis, as well as an internal control (blood glucose level). We conducted an interrupted time series analysis of quarterly proportions 24 months before and 12 months after the systems change and in-depth semistructured interviews with ACCHS staff using normalization process theory. Results: Among 2461 patients, the annual proportion of health assessments that included any STI test increased from 16\% (38/237) in the first year of the study period to 42.9\% (94/219) after the implementation of the systems change. There was an immediate and large increase when the systems change occurred (coefficient=0.22; P=.003) with no decline for 12 months thereafter. The increase was greater for male individuals, with no change for the internal control. Qualitative data indicated that nurse- and Aboriginal health practitioner--led testing and presigned pathology forms proved more difficult to normalize than electronic prompts and shortcuts. The interviews identified that staff understood the modifications to have encouraged cultural change around the role of sexual health care in routine practice. Conclusions: This study provides evidence for the first time that optimizing health assessments electronically is an effective and acceptable strategy to increase and sustain clinician integration and the completeness of STI testing among young Aboriginal people attending an ACCHS. Future strategies should focus on increasing the uptake of health assessments and promote whole-of-service engagement and accountability. ", doi="10.2196/51387", url="https://medinform.jmir.org/2023/1/e51387", url="http://www.ncbi.nlm.nih.gov/pubmed/38032729" } @Article{info:doi/10.2196/48142, author="Yoon, Jeewoo and Han, Jinyoung and Ko, Junseo and Choi, Seong and Park, In Ji and Hwang, Seo Joon and Han, Mo Jeong and Hwang, Duck-Jin Daniel", title="Developing and Evaluating an AI-Based Computer-Aided Diagnosis System for Retinal Disease: Diagnostic Study for Central Serous Chorioretinopathy", journal="J Med Internet Res", year="2023", month="Nov", day="29", volume="25", pages="e48142", keywords="computer aided diagnosis", keywords="ophthalmology", keywords="deep learning", keywords="artificial intelligence", keywords="computer vision", keywords="imaging informatics", keywords="retinal disease", keywords="central serous chorioretinopathy", keywords="diagnostic study", abstract="Background: Although previous research has made substantial progress in developing high-performance artificial intelligence (AI)--based computer-aided diagnosis (AI-CAD) systems in various medical domains, little attention has been paid to developing and evaluating AI-CAD system in ophthalmology, particularly for diagnosing?retinal diseases using optical coherence tomography (OCT) images. Objective: This diagnostic study aimed to determine the usefulness of a proposed AI-CAD system in assisting ophthalmologists with the diagnosis of central serous chorioretinopathy (CSC), which is known to be difficult to diagnose, using OCT images. Methods: For the training and evaluation of the proposed deep learning model, 1693 OCT images were collected and annotated. The data set included 929 and 764 cases of acute and chronic CSC, respectively. In total, 66 ophthalmologists (2 groups: 36 retina and 30 nonretina specialists) participated in the observer performance test. To evaluate the deep learning algorithm used in the proposed AI-CAD system, the training, validation, and test sets were split in an 8:1:1 ratio. Further, 100 randomly sampled OCT images from the test set were used for the observer performance test, and the participants were instructed to select a CSC subtype for each of these images. Each image was provided under different conditions: (1) without AI assistance, (2) with AI assistance with a probability score, and (3) with AI assistance with a probability score and visual evidence heatmap. The sensitivity, specificity, and area under the receiver operating characteristic curve were used to measure the diagnostic performance of the model and ophthalmologists. Results: The proposed system achieved a high detection performance (99\% of the area under the curve) for CSC, outperforming the 66 ophthalmologists who participated in the observer performance test. In both groups, ophthalmologists with the support of AI assistance with a probability score and visual evidence heatmap achieved the highest mean diagnostic performance compared with that of those subjected to other conditions (without AI assistance or with AI assistance with a probability score). Nonretina specialists achieved expert-level diagnostic performance with the support of the proposed AI-CAD system. Conclusions: Our proposed AI-CAD system improved the diagnosis of CSC by ophthalmologists, which may support decision-making regarding retinal disease detection and alleviate the workload of ophthalmologists. ", doi="10.2196/48142", url="https://www.jmir.org/2023/1/e48142", url="http://www.ncbi.nlm.nih.gov/pubmed/38019564" } @Article{info:doi/10.2196/50886, author="Bibi, Igor and Schaffert, Daniel and Blauth, Mara and Lull, Christian and von Ahnen, Alwin Jan and Gross, Georg and Weigandt, Alexander Wanja and Knitza, Johannes and Kuhn, Sebastian and Benecke, Johannes and Leipe, Jan and Schmieder, Astrid and Olsavszky, Victor", title="Automated Machine Learning Analysis of Patients With Chronic Skin Disease Using a Medical Smartphone App: Retrospective Study", journal="J Med Internet Res", year="2023", month="Nov", day="28", volume="25", pages="e50886", keywords="automated machine learning", keywords="psoriasis", keywords="hand and foot eczema", keywords="medical smartphone app", keywords="application", keywords="smartphone", keywords="machine learning", keywords="digitalization", keywords="skin", keywords="skin disease", keywords="use", keywords="hand", keywords="foot", keywords="mobile phone", abstract="Background: Rapid digitalization in health care has led to the adoption of digital technologies; however, limited trust in internet-based health decisions and the need for technical personnel hinder the use of smartphones and machine learning applications. To address this, automated machine learning (AutoML) is a promising tool that can empower health care professionals to enhance the effectiveness of mobile health apps. Objective: We used AutoML to analyze data from clinical studies involving patients with chronic hand and/or foot eczema or psoriasis vulgaris who used a smartphone monitoring app. The analysis focused on itching, pain, Dermatology Life Quality Index (DLQI) development, and app use. Methods: After extensive data set preparation, which consisted of combining 3 primary data sets by extracting common features and by computing new features, a new pseudonymized secondary data set with a total of 368 patients was created. Next, multiple machine learning classification models were built during AutoML processing, with the most accurate models ultimately selected for further data set analysis. Results: Itching development for 6 months was accurately modeled using the light gradient boosted trees classifier model (log loss: 0.9302 for validation, 1.0193 for cross-validation, and 0.9167 for holdout). Pain development for 6 months was assessed using the random forest classifier model (log loss: 1.1799 for validation, 1.1561 for cross-validation, and 1.0976 for holdout). Then, the random forest classifier model (log loss: 1.3670 for validation, 1.4354 for cross-validation, and 1.3974 for holdout) was used again to estimate the DLQI development for 6 months. Finally, app use was analyzed using an elastic net blender model (area under the curve: 0.6567 for validation, 0.6207 for cross-validation, and 0.7232 for holdout). Influential feature correlations were identified, including BMI, age, disease activity, DLQI, and Hospital Anxiety and Depression Scale-Anxiety scores at follow-up. App use increased with BMI >35, was less common in patients aged >47 years and those aged 23 to 31 years, and was more common in those with higher disease activity. A Hospital Anxiety and Depression Scale-Anxiety score >8 had a slightly positive effect on app use. Conclusions: This study provides valuable insights into the relationship between data characteristics and targeted outcomes in patients with chronic eczema or psoriasis, highlighting the potential of smartphone and AutoML techniques in improving chronic disease management and patient care. ", doi="10.2196/50886", url="https://www.jmir.org/2023/1/e50886", url="http://www.ncbi.nlm.nih.gov/pubmed/38015608" } @Article{info:doi/10.2196/44773, author="Peine, Arne and Gronholz, Maike and Seidl-Rathkopf, Katharina and Wolfram, Thomas and Hallawa, Ahmed and Reitz, Annika and Celi, Anthony Leo and Marx, Gernot and Martin, Lukas", title="Standardized Comparison of Voice-Based Information and Documentation Systems to Established Systems in Intensive Care: Crossover Study", journal="JMIR Med Inform", year="2023", month="Nov", day="28", volume="11", pages="e44773", keywords="artificial intelligence", keywords="documentation", keywords="ICU", keywords="intensive care medicine", keywords="speech-recognition", keywords="user perception", keywords="workload", abstract="Background: The medical teams in intensive care units (ICUs) spend increasing amounts of time at computer systems for data processing, input, and interpretation purposes. As each patient creates about 1000 data points per hour, the available information is abundant, making the interpretation difficult and time-consuming. This data flood leads to a decrease in time for evidence-based, patient-centered care. Information systems, such as patient data management systems (PDMSs), are increasingly used at ICUs. However, they often create new challenges arising from the increasing documentation burden. Objective: New concepts, such as artificial intelligence (AI)--based assistant systems, are hence introduced to the workflow to cope with these challenges. However, there is a lack of standardized, published metrics in order to compare the various data input and management systems in the ICU setting. The objective of this study is to compare established documentation and retrieval processes with newer methods, such as PDMSs and voice information and documentation systems (VIDSs). Methods: In this crossover study, we compare traditional, paper-based documentation systems with PDMSs and newer AI-based VIDSs in terms of performance (required time), accuracy, mental workload, and user experience in an intensive care setting. Performance is assessed on a set of 6 standardized, typical ICU tasks, ranging from documentation to medical interpretation. Results: A total of 60 ICU-experienced medical professionals participated in the study. The VIDS showed a statistically significant advantage compared to the other 2 systems. The tasks were completed significantly faster with the VIDS than with the PDMS (1-tailed t59=12.48; Cohen d=1.61; P<.001) or paper documentation (t59=20.41; Cohen d=2.63; P<.001). Significantly fewer errors were made with VIDS than with the PDMS (t59=3.45; Cohen d=0.45; P=.03) and paper-based documentation (t59=11.2; Cohen d=1.45; P<.001). The analysis of the mental workload of VIDS and PDMS showed no statistically significant difference (P=.06). However, the analysis of subjective user perception showed a statistically significant perceived benefit of the VIDS compared to the PDMS (P<.001) and paper documentation (P<.001). Conclusions: The results of this study show that the VIDS reduced error rate, documentation time, and mental workload regarding the set of 6 standardized typical ICU tasks. In conclusion, this indicates that AI-based systems such as the VIDS tested in this study have the potential to reduce this workload and improve evidence-based and safe patient care. ", doi="10.2196/44773", url="https://medinform.jmir.org/2023/1/e44773", url="http://www.ncbi.nlm.nih.gov/pubmed/38015593" } @Article{info:doi/10.2196/44639, author="Keszthelyi, Daniel and Gaudet-Blavignac, Christophe and Bjelogrlic, Mina and Lovis, Christian", title="Patient Information Summarization in Clinical Settings: Scoping Review", journal="JMIR Med Inform", year="2023", month="Nov", day="28", volume="11", pages="e44639", keywords="summarization", keywords="electronic health records", keywords="EHR", keywords="medical record", keywords="visualization", keywords="dashboard", keywords="natural language processing", abstract="Background: Information overflow, a common problem in the present clinical environment, can be mitigated by summarizing clinical data. Although there are several solutions for clinical summarization, there is a lack of a complete overview of the research relevant to this field. Objective: This study aims to identify state-of-the-art solutions for clinical summarization, to analyze their capabilities, and to identify their properties. Methods: A scoping review of articles published between 2005 and 2022 was conducted. With a clinical focus, PubMed and Web of Science were queried to find an initial set of reports, later extended by articles found through a chain of citations. The included reports were analyzed to answer the questions of where, what, and how medical information is summarized; whether summarization conserves temporality, uncertainty, and medical pertinence; and how the propositions are evaluated and deployed. To answer how information is summarized, methods were compared through a new framework ``collect---synthesize---communicate'' referring to information gathering from data, its synthesis, and communication to the end user. Results: Overall, 128 articles were included, representing various medical fields. Exclusively structured data were used as input in 46.1\% (59/128) of papers, text in 41.4\% (53/128) of articles, and both in 10.2\% (13/128) of papers. Using the proposed framework, 42.2\% (54/128) of the records contributed to information collection, 27.3\% (35/128) contributed to information synthesis, and 46.1\% (59/128) presented solutions for summary communication. Numerous summarization approaches have been presented, including extractive (n=13) and abstractive summarization (n=19); topic modeling (n=5); summary specification (n=11); concept and relation extraction (n=30); visual design considerations (n=59); and complete pipelines (n=7) using information extraction, synthesis, and communication. Graphical displays (n=53), short texts (n=41), static reports (n=7), and problem-oriented views (n=7) were the most common types in terms of summary communication. Although temporality and uncertainty information were usually not conserved in most studies (74/128, 57.8\% and 113/128, 88.3\%, respectively), some studies presented solutions to treat this information. Overall, 115 (89.8\%) articles showed results of an evaluation, and methods included evaluations with human participants (median 15, IQR 24 participants): measurements in experiments with human participants (n=31), real situations (n=8), and usability studies (n=28). Methods without human involvement included intrinsic evaluation (n=24), performance on a proxy (n=10), or domain-specific tasks (n=11). Overall, 11 (8.6\%) reports described a system deployed in clinical settings. Conclusions: The scientific literature contains many propositions for summarizing patient information but reports very few comparisons of these proposals. This work proposes to compare these algorithms through how they conserve essential aspects of clinical information and through the ``collect---synthesize---communicate'' framework. We found that current propositions usually address these 3 steps only partially. Moreover, they conserve and use temporality, uncertainty, and pertinent medical aspects to varying extents, and solutions are often preliminary. ", doi="10.2196/44639", url="https://medinform.jmir.org/2023/1/e44639", url="http://www.ncbi.nlm.nih.gov/pubmed/38015588" } @Article{info:doi/10.2196/49886, author="Savage, Thomas and Wang, John and Shieh, Lisa", title="A Large Language Model Screening Tool to Target Patients for Best Practice Alerts: Development and Validation", journal="JMIR Med Inform", year="2023", month="Nov", day="27", volume="11", pages="e49886", keywords="large language models", keywords="language models", keywords="language model", keywords="EHR", keywords="health record", keywords="health records", keywords="quality improvement", keywords="Artificial Intelligence", keywords="Natural Language Processing", abstract="Background: Best Practice Alerts (BPAs) are alert messages to physicians in the electronic health record that are used to encourage appropriate use of health care resources. While these alerts are helpful in both improving care and reducing costs, BPAs are often broadly applied nonselectively across entire patient populations. The development of large language models (LLMs) provides an opportunity to selectively identify patients for BPAs. Objective: In this paper, we present an example case where an LLM screening tool is used to select patients appropriate for a BPA encouraging the prescription of deep vein thrombosis (DVT) anticoagulation prophylaxis. The artificial intelligence (AI) screening tool was developed to identify patients experiencing acute bleeding and exclude them from receiving a DVT prophylaxis BPA. Methods: Our AI screening tool used a BioMed-RoBERTa (Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach; AllenAI) model to perform classification of physician notes, identifying patients without active bleeding and thus appropriate for a thromboembolism prophylaxis BPA. The BioMed-RoBERTa model was fine-tuned using 500 history and physical notes of patients from the MIMIC-III (Medical Information Mart for Intensive Care) database who were not prescribed anticoagulation. A development set of 300 MIMIC patient notes was used to determine the model's hyperparameters, and a separate test set of 300 patient notes was used to evaluate the screening tool. Results: Our MIMIC-III test set population of 300 patients included 72 patients with bleeding (ie, were not appropriate for a DVT prophylaxis BPA) and 228 without bleeding who were appropriate for a DVT prophylaxis BPA. The AI screening tool achieved impressive accuracy with a precision-recall area under the curve of 0.82 (95\% CI 0.75-0.89) and a receiver operator curve area under the curve of 0.89 (95\% CI 0.84-0.94). The screening tool reduced the number of patients who would trigger an alert by 20\% (240 instead of 300 alerts) and increased alert applicability by 14.8\% (218 [90.8\%] positive alerts from 240 total alerts instead of 228 [76\%] positive alerts from 300 total alerts), compared to nonselectively sending alerts for all patients. Conclusions: These results show a proof of concept on how language models can be used as a screening tool for BPAs. We provide an example AI screening tool that uses a HIPAA (Health Insurance Portability and Accountability Act)--compliant BioMed-RoBERTa model deployed with minimal computing power. Larger models (eg, Generative Pre-trained Transformers--3, Generative Pre-trained Transformers--4, and Pathways Language Model) will exhibit superior performance but require data use agreements to be HIPAA compliant. We anticipate LLMs to revolutionize quality improvement in hospital medicine. ", doi="10.2196/49886", url="https://medinform.jmir.org/2023/1/e49886", url="http://www.ncbi.nlm.nih.gov/pubmed/38010803" } @Article{info:doi/10.2196/43658, author="Iorga, Andrea and Velezis, J. Marti and Marinac-Dabic, Danica and Lario, F. Robert and Huff, M. Stanley and Gore, Beth and Mermel, A. Leonard and Bailey, Charles L. and Skapik, Julia and Willis, Debi and Lee, E. Robert and Hurst, P. Frank and Gressler, E. Laura and Reed, L. Terrie and Towbin, Richard and Baskin, M. Kevin", title="Venous Access: National Guideline and Registry Development (VANGUARD): Advancing Patient-Centered Venous Access Care Through the Development of a National Coordinated Registry Network", journal="J Med Internet Res", year="2023", month="Nov", day="24", volume="25", pages="e43658", keywords="central venous access devices", keywords="registry", keywords="patient-reported outcomes", keywords="catheter", keywords="CRBSI", keywords="CLABSI", keywords="development", keywords="patient", keywords="therapy", keywords="life-threatening", keywords="clinical", keywords="reliable", keywords="policy", keywords="system", keywords="medical device", doi="10.2196/43658", url="https://www.jmir.org/2023/1/e43658", url="http://www.ncbi.nlm.nih.gov/pubmed/37999957" } @Article{info:doi/10.2196/47859, author="Kang, Jin Ha Ye and Batbaatar, Erdenebileg and Choi, Dong-Woo and Choi, Son Kui and Ko, Minsam and Ryu, Sun Kwang", title="Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy", journal="JMIR Med Inform", year="2023", month="Nov", day="24", volume="11", pages="e47859", keywords="generative adversarial networks", keywords="GAN", keywords="synthetic data generation", keywords="synthetic tabular data", keywords="lung cancer", keywords="machine learning", keywords="mortality prediction", abstract="Background: Synthetic data generation (SDG) based on generative adversarial networks (GANs) is used in health care, but research on preserving data with logical relationships with synthetic tabular data (STD) remains challenging. Filtering methods for SDG can lead to the loss of important information. Objective: This study proposed a divide-and-conquer (DC) method to generate STD based on the GAN algorithm, while preserving data with logical relationships. Methods: The proposed method was evaluated on data from the Korea Association for Lung Cancer Registry (KALC-R) and 2 benchmark data sets (breast cancer and diabetes). The DC-based SDG strategy comprises 3 steps: (1) We used 2 different partitioning methods (the class-specific criterion distinguished between survival and death groups, while the Cramer V criterion identified the highest correlation between columns in the original data); (2) the entire data set was divided into a number of subsets, which were then used as input for the conditional tabular generative adversarial network and the copula generative adversarial network to generate synthetic data; and (3) the generated synthetic data were consolidated into a single entity. For validation, we compared DC-based SDG and conditional sampling (CS)--based SDG through the performances of machine learning models. In addition, we generated imbalanced and balanced synthetic data for each of the 3 data sets and compared their performance using 4 classifiers: decision tree (DT), random forest (RF), Extreme Gradient Boosting (XGBoost), and light gradient-boosting machine (LGBM) models. Results: The synthetic data of the 3 diseases (non--small cell lung cancer [NSCLC], breast cancer, and diabetes) generated by our proposed model outperformed the 4 classifiers (DT, RF, XGBoost, and LGBM). The CS- versus DC-based model performances were compared using the mean area under the curve (SD) values: 74.87 (SD 0.77) versus 63.87 (SD 2.02) for NSCLC, 73.31 (SD 1.11) versus 67.96 (SD 2.15) for breast cancer, and 61.57 (SD 0.09) versus 60.08 (SD 0.17) for diabetes (DT); 85.61 (SD 0.29) versus 79.01 (SD 1.20) for NSCLC, 78.05 (SD 1.59) versus 73.48 (SD 4.73) for breast cancer, and 59.98 (SD 0.24) versus 58.55 (SD 0.17) for diabetes (RF); 85.20 (SD 0.82) versus 76.42 (SD 0.93) for NSCLC, 77.86 (SD 2.27) versus 68.32 (SD 2.37) for breast cancer, and 60.18 (SD 0.20) versus 58.98 (SD 0.29) for diabetes (XGBoost); and 85.14 (SD 0.77) versus 77.62 (SD 1.85) for NSCLC, 78.16 (SD 1.52) versus 70.02 (SD 2.17) for breast cancer, and 61.75 (SD 0.13) versus 61.12 (SD 0.23) for diabetes (LGBM). In addition, we found that balanced synthetic data performed better. Conclusions: This study is the first attempt to generate and validate STD based on a DC approach and shows improved performance using STD. The necessity for balanced SDG was also demonstrated. ", doi="10.2196/47859", url="https://medinform.jmir.org/2023/1/e47859", url="http://www.ncbi.nlm.nih.gov/pubmed/37999942" } @Article{info:doi/10.2196/49314, author="Rose, Christian and Barber, Rachel and Preiksaitis, Carl and Kim, Ireh and Mishra, Nikesh and Kayser, Kristen and Brown, Italo and Gisondi, Michael", title="A Conference (Missingness in Action) to Address Missingness in Data and AI in Health Care: Qualitative Thematic Analysis", journal="J Med Internet Res", year="2023", month="Nov", day="23", volume="25", pages="e49314", keywords="machine learning", keywords="artificial intelligence", keywords="health care data", keywords="data quality", keywords="thematic analysis", keywords="AI", keywords="implementation", keywords="digital conference", keywords="trust", keywords="privacy", keywords="predictive model", keywords="health care community", abstract="Background: Missingness in health care data poses significant challenges in the development and implementation of artificial intelligence (AI) and machine learning solutions. Identifying and addressing these challenges is critical to ensuring the continued growth and accuracy of these models as well as their equitable and effective use in health care settings. Objective: This study aims to explore the challenges, opportunities, and potential solutions related to missingness in health care data for AI applications through the conduct of a digital conference and thematic analysis of conference proceedings. Methods: A digital conference was held in September 2022, attracting 861 registered participants, with 164 (19\%) attending the live event. The conference featured presentations and panel discussions by experts in AI, machine learning, and health care. Transcripts of the event were analyzed using the stepwise framework of Braun and Clark to identify key themes related to missingness in health care data. Results: Three principal themes---data quality and bias, human input in model development, and trust and privacy---emerged from the analysis. Topics included the accuracy of predictive models, lack of inclusion of underrepresented communities, partnership with physicians and other populations, challenges with sensitive health care data, and fostering trust with patients and the health care community. Conclusions: Addressing the challenges of data quality, human input, and trust is vital when devising and using machine learning algorithms in health care. Recommendations include expanding data collection efforts to reduce gaps and biases, involving medical professionals in the development and implementation of AI models, and developing clear ethical guidelines to safeguard patient privacy. Further research and ongoing discussions are needed to ensure these conclusions remain relevant as health care and AI continue to evolve. ", doi="10.2196/49314", url="https://www.jmir.org/2023/1/e49314", url="http://www.ncbi.nlm.nih.gov/pubmed/37995113" } @Article{info:doi/10.2196/46607, author="Quintal, Ariane and Carreau, Isabelle and Grenier, Annie-Danielle and H{\'e}bert, Caroline and Yergeau, Christine and Berthiaume, Yves and Racine, Eric", title="An Ethics Action Plan for Rare Disease Care: Participatory Action Research Approach", journal="J Particip Med", year="2023", month="Nov", day="23", volume="15", pages="e46607", keywords="community-based participatory research", keywords="rare diseases", keywords="bioethics", keywords="delivery of health care", keywords="ethics", keywords="clinical", keywords="patient participation", keywords="empowerment", keywords="education", keywords="medical", keywords="attitude of health personnel", keywords="patient education as topic", keywords="patient partnership", abstract="Background: Owing to their low prevalence, rare diseases are poorly addressed in the scientific literature and clinical practice guidelines. Thus, health care workers are inadequately equipped to provide timely diagnoses, appropriate treatment, and support for these poorly understood conditions. These clinical tribulations are experienced as moral challenges by patients, jeopardizing their life trajectories, dreams, and aspirations. Objective: This paper presents an ethical action plan for rare disease care and the process underlying its development. Methods: This action plan was designed through an ethical inquiry conducted by the Ethics and Rare Diseases Working Group, which included 3 patient partners, 2 clinician researchers, and 1 representative from Qu{\'e}bec's rare disease association. Results: The plan is structured into 4 components. Component A presents the key moral challenges encountered by patients, which are the lack of knowledge on rare diseases among health care workers, the problematic attitudes that it sometimes elicits, and the distress and powerlessness experienced by patients. Component B emphasizes a vision for patient partnership in rare disease care characterized by open-mindedness, empathy, respect, and support of patient autonomy from health care workers. Component C outlines 2 courses of action prompted by this vision: raising awareness among health care workers and empowering patients to better navigate their care. Component D compares several interventions that could help integrate these 2 courses of action in rare disease care. Conclusions: Overall, this action plan represents a toolbox that provides a review of multiple possible interventions for policy makers, hospital managers, practitioners, researchers, and patient associations to critically reflect on key moral challenges experienced by patients with rare diseases and ways to mitigate them. This paper also prompts reflection on the values underlying rare disease care, patient experiences, and health care workers' beliefs and behaviors. Health care workers and patients were the primary beneficiaries of this action plan. ", doi="10.2196/46607", url="https://jopm.jmir.org/2023/1/e46607", url="http://www.ncbi.nlm.nih.gov/pubmed/37995128" } @Article{info:doi/10.2196/46089, author="Wang, Ying and Li, Nian and Chen, Lingmin and Wu, Miaomiao and Meng, Sha and Dai, Zelei and Zhang, Yonggang and Clarke, Mike", title="Guidelines, Consensus Statements, and Standards for the Use of Artificial Intelligence in Medicine: Systematic Review", journal="J Med Internet Res", year="2023", month="Nov", day="22", volume="25", pages="e46089", keywords="artificial intelligence", keywords="clinical practice", keywords="guidelines", keywords="consensus statements", keywords="standards", keywords="systematic review", abstract="Background: The application of artificial intelligence (AI) in the delivery of health care is a promising area, and guidelines, consensus statements, and standards on AI regarding various topics have been developed. Objective: We performed this study to assess the quality of guidelines, consensus statements, and standards in the field of AI for medicine and to provide a foundation for recommendations about the future development of AI guidelines. Methods: We searched 7 electronic databases from database establishment to April 6, 2022, and screened articles involving AI guidelines, consensus statements, and standards for eligibility. The AGREE II (Appraisal of Guidelines for Research \& Evaluation II) and RIGHT (Reporting Items for Practice Guidelines in Healthcare) tools were used to assess the methodological and reporting quality of the included articles. Results: This systematic review included 19 guideline articles, 14 consensus statement articles, and 3 standard articles published between 2019 and 2022. Their content involved disease screening, diagnosis, and treatment; AI intervention trial reporting; AI imaging development and collaboration; AI data application; and AI ethics governance and applications. Our quality assessment revealed that the average overall AGREE II score was 4.0 (range 2.2-5.5; 7-point Likert scale) and the mean overall reporting rate of the RIGHT tool was 49.4\% (range 25.7\%-77.1\%). Conclusions: The results indicated important differences in the quality of different AI guidelines, consensus statements, and standards. We made recommendations for improving their methodological and reporting quality. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews (CRD42022321360); https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=321360 ", doi="10.2196/46089", url="https://www.jmir.org/2023/1/e46089", url="http://www.ncbi.nlm.nih.gov/pubmed/37991819" } @Article{info:doi/10.2196/47833, author="Liu, Kui and Li, Linyi and Ma, Yifei and Jiang, Jun and Liu, Zhenhua and Ye, Zichen and Liu, Shuang and Pu, Chen and Chen, Changsheng and Wan, Yi", title="Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis", journal="JMIR Med Inform", year="2023", month="Nov", day="20", volume="11", pages="e47833", keywords="machine learning", keywords="diabetes", keywords="hypoglycemia", keywords="blood glucose", keywords="blood glucose management", abstract="Background: Machine learning (ML) models provide more choices to patients with diabetes mellitus (DM) to more properly manage blood glucose (BG) levels. However, because of numerous types of ML algorithms, choosing an appropriate model is vitally important. Objective: In a systematic review and network meta-analysis, this study aimed to comprehensively assess the performance of ML models in predicting BG levels. In addition, we assessed ML models used to detect and predict adverse BG (hypoglycemia) events by calculating pooled estimates of sensitivity and specificity. Methods: PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers Explore databases were systematically searched for studies on predicting BG levels and predicting or detecting adverse BG events using ML models, from inception to November 2022. Studies that assessed the performance of different ML models in predicting or detecting BG levels or adverse BG events of patients with DM were included. Studies with no derivation or performance metrics of ML models were excluded. The Quality Assessment of Diagnostic Accuracy Studies tool was applied to assess the quality of included studies. Primary outcomes were the relative ranking of ML models for predicting BG levels in different prediction horizons (PHs) and pooled estimates of the sensitivity and specificity of ML models in detecting or predicting adverse BG events. Results: In total, 46 eligible studies were included for meta-analysis. Regarding ML models for predicting BG levels, the means of the absolute root mean square error (RMSE) in a PH of 15, 30, 45, and 60 minutes were 18.88 (SD 19.71), 21.40 (SD 12.56), 21.27 (SD 5.17), and 30.01 (SD 7.23) mg/dL, respectively. The neural network model (NNM) showed the highest relative performance in different PHs. Furthermore, the pooled estimates of the positive likelihood ratio and the negative likelihood ratio of ML models were 8.3 (95\% CI 5.7-12.0) and 0.31 (95\% CI 0.22-0.44), respectively, for predicting hypoglycemia and 2.4 (95\% CI 1.6-3.7) and 0.37 (95\% CI 0.29-0.46), respectively, for detecting hypoglycemia. Conclusions: Statistically significant high heterogeneity was detected in all subgroups, with different sources of heterogeneity. For predicting precise BG levels, the RMSE increases with a rise in the PH, and the NNM shows the highest relative performance among all the ML models. Meanwhile, current ML models have sufficient ability to predict adverse BG events, while their ability to detect adverse BG events needs to be enhanced. Trial Registration: PROSPERO CRD42022375250; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=375250 ", doi="10.2196/47833", url="https://medinform.jmir.org/2023/1/e47833", url="http://www.ncbi.nlm.nih.gov/pubmed/37983072" } @Article{info:doi/10.2196/46474, author="Hj{\"a}rtstr{\"o}m, Malin and Dihge, Looket and Bendahl, P{\"a}r-Ola and Skarping, Ida and Ellbrant, Julia and Ohlsson, Mattias and Ryd{\'e}n, Lisa", title="Noninvasive Staging of Lymph Node Status in Breast Cancer Using Machine Learning: External Validation and Further Model Development", journal="JMIR Cancer", year="2023", month="Nov", day="20", volume="9", pages="e46474", keywords="breast neoplasm", keywords="sentinel lymph node biopsy", keywords="SLNB", keywords="noninvasive lymph node staging", keywords="NILS", keywords="prediction model", keywords="multilayer perceptron", keywords="MLP", keywords="register data", keywords="breast cancer", keywords="cancer", keywords="validation study", keywords="machine learning", keywords="model development", keywords="therapeutic", keywords="feasibility", keywords="diagnostic", keywords="lymph node", keywords="mammography images", abstract="Background: Most patients diagnosed with breast cancer present with a node-negative disease. Sentinel lymph node biopsy (SLNB) is routinely used for axillary staging, leaving patients with healthy axillary lymph nodes without therapeutic effects but at risk of morbidities from the intervention. Numerous studies have developed nodal status prediction models for noninvasive axillary staging using postoperative data or imaging features that are not part of the diagnostic workup. Lymphovascular invasion (LVI) is a top-ranked predictor of nodal metastasis; however, its preoperative assessment is challenging. Objective: This paper aimed to externally validate a multilayer perceptron (MLP) model for noninvasive lymph node staging (NILS) in a large population-based cohort (n=18,633) and develop a new MLP in the same cohort. Data were extracted from the Swedish National Quality Register for Breast Cancer (NKBC, 2014-2017), comprising only routinely and preoperatively available documented clinicopathological variables. A secondary aim was to develop and validate an LVI MLP for imputation of missing LVI status to increase the preoperative feasibility of the original NILS model. Methods: Three nonoverlapping cohorts were used for model development and validation. A total of 4 MLPs for nodal status and 1 LVI MLP were developed using 11 to 12 routinely available predictors. Three nodal status models were used to account for the different availabilities of LVI status in the cohorts and external validation in NKBC. The fourth nodal status model was developed for 80\% (14,906/18,663) of NKBC cases and validated in the remaining 20\% (3727/18,663). Three alternatives for imputation of LVI status were compared. The discriminatory capacity was evaluated using the validation area under the receiver operating characteristics curve (AUC) in 3 of the nodal status models. The clinical feasibility of the models was evaluated using calibration and decision curve analyses. Results: External validation of the original NILS model was performed in NKBC (AUC 0.699, 95\% CI 0.690-0.708) with good calibration and the potential of sparing 16\% of patients with node-negative disease from SLNB. The LVI model was externally validated (AUC 0.747, 95\% CI 0.694-0.799) with good calibration but did not improve the discriminatory performance of the nodal status models. A new nodal status model was developed in NKBC without information on LVI (AUC 0.709, 95\% CI: 0.688-0.729), with excellent calibration in the holdout internal validation cohort, resulting in the potential omission of 24\% of patients from unnecessary SLNBs. Conclusions: The NILS model was externally validated in NKBC, where the imputation of LVI status did not improve the model's discriminatory performance. A new nodal status model demonstrated the feasibility of using register data comprising only the variables available in the preoperative setting for NILS using machine learning. Future steps include ongoing preoperative validation of the NILS model and extending the model with, for example, mammography images. ", doi="10.2196/46474", url="https://cancer.jmir.org/2023/1/e46474", url="http://www.ncbi.nlm.nih.gov/pubmed/37983068" } @Article{info:doi/10.2196/49016, author="Zhang, Jinghui and Ma, Guiyuan and Peng, Sha and Hou, Jianmei and Xu, Ran and Luo, Lingxia and Hu, Jiaji and Yao, Nian and Wang, Jiaan and Huang, Xin", title="Risk Factors and Predictive Models for Peripherally Inserted Central Catheter Unplanned Extubation in Patients With Cancer: Prospective, Machine Learning Study", journal="J Med Internet Res", year="2023", month="Nov", day="16", volume="25", pages="e49016", keywords="cancer", keywords="PICC", keywords="unplanned extubation", keywords="predictive model", keywords="logistic", keywords="support vector machine", keywords="random forest", abstract="Background: Cancer indeed represents a significant public health challenge, and unplanned extubation of peripherally inserted central catheter (PICC-UE) is a critical concern in patient safety. Identifying independent risk factors and implementing high-quality assessment tools for early detection in high-risk populations can play a crucial role in reducing the incidence of PICC-UE among patients with cancer. Precise prevention and treatment strategies are essential to improve patient outcomes and safety in clinical settings. Objective: This study aims to identify the independent risk factors associated with PICC-UE in patients with cancer and to construct a predictive model tailored to this group, offering a theoretical framework for anticipating and preventing PICC-UE in these patients. Methods: Prospective data were gathered from January to December 2022, encompassing patients with cancer with PICC at Xiangya Hospital, Central South University. Each patient underwent continuous monitoring until the catheter's removal. The patients were categorized into 2 groups: the UE group (n=3107) and the non-UE group (n=284). Independent risk factors were identified through univariate analysis, the least absolute shrinkage and selection operator (LASSO) algorithm, and multivariate analysis. Subsequently, the 3391 patients were classified into a train set and a test set in a 7:3 ratio. Utilizing the identified predictors, 3 predictive models were constructed using the logistic regression, support vector machine, and random forest algorithms. The ultimate model was selected based on the receiver operating characteristic (ROC) curve and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) synthesis analysis. To further validate the model, we gathered prospective data from 600 patients with cancer at the Affiliated Hospital of Qinghai University and Hainan Provincial People's Hospital from June to December 2022. We assessed the model's performance using the area under the curve of the ROC to evaluate differentiation, the calibration curve for calibration capability, and decision curve analysis (DCA) to gauge the model's clinical applicability. Results: Independent risk factors for PICC-UE in patients with cancer were identified, including impaired physical mobility (odds ratio [OR] 2.775, 95\% CI 1.951-3.946), diabetes (OR 1.754, 95\% CI 1.134-2.712), surgical history (OR 1.734, 95\% CI 1.313-2.290), elevated D-dimer concentration (OR 2.376, 95\% CI 1.778-3.176), targeted therapy (OR 1.441, 95\% CI 1.104-1.881), surgical treatment (OR 1.543, 95\% CI 1.152-2.066), and more than 1 catheter puncture (OR 1.715, 95\% CI 1.121-2.624). Protective factors were normal BMI (OR 0.449, 95\% CI 0.342-0.590), polyurethane catheter material (OR 0.305, 95\% CI 0.228-0.408), and valved catheter (OR 0.639, 95\% CI 0.480-0.851). The TOPSIS synthesis analysis results showed that in the train set, the composite index (Ci) values were 0.00 for the logistic model, 0.82 for the support vector machine model, and 0.85 for the random forest model. In the test set, the Ci values were 0.00 for the logistic model, 1.00 for the support vector machine model, and 0.81 for the random forest model. The optimal model, constructed based on the support vector machine, was obtained and validated externally. The ROC curve, calibration curve, and DCA curve demonstrated that the model exhibited excellent accuracy, stability, generalizability, and clinical applicability. Conclusions: In summary, this study identified 10 independent risk factors for PICC-UE in patients with cancer. The predictive model developed using the support vector machine algorithm demonstrated excellent clinical applicability and was validated externally, providing valuable support for the early prediction of PICC-UE in patients with cancer. ", doi="10.2196/49016", url="https://www.jmir.org/2023/1/e49016", url="http://www.ncbi.nlm.nih.gov/pubmed/37971792" } @Article{info:doi/10.2196/44250, author="Yahya, Gezan and O'Keefe, B. James and Moore, A. Miranda", title="Comparing a Data Entry Tool to Provider Insights Alone for Assessment of COVID-19 Hospitalization Risk: Pilot Matched Cohort Comparison Study", journal="JMIR Form Res", year="2023", month="Nov", day="16", volume="7", pages="e44250", keywords="COVID-19", keywords="risk assessment", keywords="hospitalization", keywords="outpatient", keywords="telemedicine", keywords="data", keywords="tool", keywords="risk", keywords="assessment", keywords="utilization", keywords="algorithm", keywords="symptoms", keywords="disease", keywords="community", keywords="patient", keywords="decision making tool", keywords="risk algorithm", abstract="Background: In March 2020, the World Health Organization declared COVID-19 a global pandemic, necessitating an understanding of factors influencing severe disease outcomes. High COVID-19 hospitalization rates underscore the need for robust risk prediction tools to determine estimated risk for future hospitalization for outpatients with COVID-19. We introduced the ``COVID-19 Risk Tier Assessment Tool'' (CRTAT), designed to enhance clinical decision-making for outpatients. Objective: We investigated whether CRTAT offers more accurate risk tier assignments (RTAs) than medical provider insights alone. Methods: We assessed COVID-19--positive patients enrolled at Emory Healthcare's Virtual Outpatient Management Clinic (VOMC)---a telemedicine monitoring program, from May 27 through August 24, 2020---who were not hospitalized at the time of enrollment. The primary analysis included patients from this program, who were later hospitalized due to COVID-19. We retroactively formed an age-, gender-, and risk factor--matched group of nonhospitalized patients for comparison. Data extracted from clinical notes were entered into CRTAT. We used descriptive statistics to compare RTAs reported by algorithm--trained health care providers and those produced by CRTAT. Results: Our patients were primarily younger than 60 years (67\% hospitalized and 71\% nonhospitalized). Moderate risk factors were prevalent (hospitalized group: 1 among 11, 52\% patients; 2 among 2, 10\% patients; and ?3 among 4, 19\% patients; nonhospitalized group: 1 among 11, 52\% patients, 2 among 5, 24\% patients, and ?3 among 4, 19\% patients). High risk factors were prevalent in approximately 45\% (n=19) of the sample (hospitalized group: 11, 52\% patients; nonhospitalized: 8, 38\% patients). Approximately 83\% (n=35) of the sample reported nonspecific symptoms, and the symptoms were generally mild (hospitalized: 12, 57\% patients; nonhospitalized: 14, 67\% patients). Most patient visits were seen within the first 1-6 days of their illness (n=19, 45\%) with symptoms reported as stable over this period (hospitalized: 7, 70\% patients; nonhospitalized: 3, 33\% patients). Of 42 matched patients (hospitalized: n=21; nonhospitalized: n=21), 26 had identical RTAs and 16 had discrepancies between VOMC providers and CRTAT. Elements that led to different RTAs were as follows: (1) the provider ``missed'' comorbidity (n=6), (2) the provider noted comorbidity but undercoded risk (n=10), and (3) the provider miscoded symptom severity and course (n=7). Conclusions: CRTAT, a point-of-care data entry tool, more accurately categorized patients into risk tiers (particularly those hospitalized), underscored by its ability to identify critical factors in patient history and clinical status. Clinical decision-making regarding patient management, resource allocation, and treatment plans could be enhanced by using similar risk assessment data entry tools for other disease states, such as influenza and community-acquired pneumonia. The COVID-19 pandemic has accelerated the adoption of telemedicine, enabling remote patient tools such as CRTAT. Future research should explore the long-term impact of outpatient clinical risk assessment tools and their contribution to better patient care. ", doi="10.2196/44250", url="https://formative.jmir.org/2023/1/e44250", url="http://www.ncbi.nlm.nih.gov/pubmed/37903299" } @Article{info:doi/10.2196/44763, author="Kim, Jinchul and Kim, Kwan Yun and Kim, Hyeyeon and Jung, Hyojung and Koh, Soonjeong and Kim, Yujeong and Yoon, Dukyong and Yi, Hahn and Kim, Hyung-Jun", title="Machine Learning Algorithms Predict Successful Weaning From Mechanical Ventilation Before Intubation: Retrospective Analysis From the Medical Information Mart for Intensive Care IV Database", journal="JMIR Form Res", year="2023", month="Nov", day="14", volume="7", pages="e44763", keywords="algorithms", keywords="clinical decision-making", keywords="intensive care units", keywords="noninvasive ventilation", keywords="organ dysfunction scores", abstract="Background: The prediction of successful weaning from mechanical ventilation (MV) in advance of intubation can facilitate discussions regarding end-of-life care before unnecessary intubation. Objective: We aimed to develop a machine learning--based model that predicts successful weaning from ventilator support based on routine clinical and laboratory data taken before or immediately after intubation. Methods: We used the Medical Information Mart for Intensive Care IV database, which is an open-access database covering 524,740 admissions of 382,278 patients in Beth Israel Deaconess Medical Center, United States, from 2008 to 2019. We selected adult patients who underwent MV in the intensive care unit (ICU). Clinical and laboratory variables that are considered relevant to the prognosis of the patient in the ICU were selected. Data collected before or within 24 hours of intubation were used to develop machine learning models that predict the probability of successful weaning within 14 days of ventilator support. Developed models were integrated into an ensemble model. Performance metrics were calculated by 5-fold cross-validation for each model, and a permutation feature importance and Shapley additive explanations analysis was conducted to better understand the impacts of individual variables on outcome prediction. Results: Of the 23,242 patients, 19,025 (81.9\%) patients were successfully weaned from MV within 14 days. Using the preselected 46 clinical and laboratory variables, the area under the receiver operating characteristic curve of CatBoost classifier, random forest classifier, and regularized logistic regression classifier models were 0.860 (95\% CI 0.852-0.868), 0.855 (95\% CI 0.848-0.863), and 0.823 (95\% CI 0.813-0.832), respectively. Using the ensemble voting classifier using the 3 models above, the final model revealed the area under the receiver operating characteristic curve of 0.861 (95\% CI 0.853-0.869), which was significantly better than that of Simplified Acute Physiology Score II (0.749, 95\% CI 0.742-0.756) and Sequential Organ Failure Assessment (0.588, 95\% CI 0.566-0.609). The top features included lactate and anion gap. The model's performance achieved a plateau with approximately the top 21 variables. Conclusions: We developed machine learning algorithms that can predict successful weaning from MV in advance to intubation in the ICU. Our models can aid the appropriate management for patients who hesitate to decide on ventilator support or meaningless end-of-life care. ", doi="10.2196/44763", url="https://formative.jmir.org/2023/1/e44763", url="http://www.ncbi.nlm.nih.gov/pubmed/37962939" } @Article{info:doi/10.2196/48521, author="Ser, E. Sarah and Shear, Kristen and Snigurska, A. Urszula and Prosperi, Mattia and Wu, Yonghui and Magoc, Tanja and Bjarnadottir, I. Ragnhildur and Lucero, J. Robert", title="Clinical Prediction Models for Hospital-Induced Delirium Using Structured and Unstructured Electronic Health Record Data: Protocol for a Development and Validation Study", journal="JMIR Res Protoc", year="2023", month="Nov", day="9", volume="12", pages="e48521", keywords="big data", keywords="machine learning", keywords="data science", keywords="hospital-acquired condition", keywords="hospital induced", keywords="hospital acquired", keywords="predict", keywords="predictive", keywords="prediction", keywords="model", keywords="models", keywords="natural language processing", keywords="risk factors", keywords="delirium", keywords="risk", keywords="unstructured", keywords="structured", keywords="free text", keywords="clinical text", keywords="text data", abstract="Background: Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium. Objective: The long-term goal of our research is to enhance the safety of hospitalized older adults by reducing iatrogenic conditions through an effective learning health system. In this study, we will develop models for predicting hospital-induced delirium. In order to accomplish this objective, we will create a computable phenotype for our outcome (hospital-induced delirium), design an expert-based traditional logistic regression model, leverage machine learning techniques to generate a model using structured data, and use machine learning and natural language processing to produce an integrated model with components from both structured data and text data. Methods: This study will explore text-based data, such as nursing notes, to improve the predictive capability of prognostic models for hospital-induced delirium. By using supervised and unsupervised text mining in addition to structured data, we will examine multiple types of information in electronic health record data to predict medical-surgical patient risk of developing delirium. Development and validation will be compliant to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. Results: Work on this project will take place through March 2024. For this study, we will use data from approximately 332,230 encounters that occurred between January 2012 to May 2021. Findings from this project will be disseminated at scientific conferences and in peer-reviewed journals. Conclusions: Success in this study will yield a durable, high-performing research-data infrastructure that will process, extract, and analyze clinical text data in near real time. This model has the potential to be integrated into the electronic health record and provide point-of-care decision support to prevent harm and improve quality of care. International Registered Report Identifier (IRRID): DERR1-10.2196/48521 ", doi="10.2196/48521", url="https://www.researchprotocols.org/2023/1/e48521", url="http://www.ncbi.nlm.nih.gov/pubmed/37943599" } @Article{info:doi/10.2196/48432, author="Chung, Holly and Hyatt, Amelia and Crone, Elizabeth and Milne, Donna and Aranda, Sanchia and Gough, Karla and Krishnasamy, Meinir", title="Clinical Utility Assessment of a Nursing Checklist Identifying Complex Care Needs Due to Inequities Among Ambulatory Patients With Cancer: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2023", month="Nov", day="9", volume="12", pages="e48432", keywords="cancer", keywords="oncology", keywords="cancer nursing", keywords="clinical utility", keywords="nursing checklist", keywords="social determinants of health", keywords="equity", keywords="cancer care", keywords="disparity", keywords="barrier", keywords="checklist", keywords="nursing", keywords="nurse", keywords="caregiver", keywords="specialist nurse", keywords="patient outcome", abstract="Background: Disparities in cancer incidence, complex care needs, and poor health outcomes are largely driven by structural inequities stemming from social determinants of health. To date, no evidence-based clinical tool has been developed to identify newly diagnosed patients at risk of poorer outcomes. Specialist cancer nurses are well-positioned to ameliorate inequity of opportunity for optimal care, treatment, and outcomes through timely screening, assessment, and intervention. We designed a nursing complexity checklist (the ``Checklist'') to support these activities, with the ultimate goal of improving equitable experiences and outcomes of care. This study aims to generate evidence regarding the clinical utility of the Checklist. Objective: The primary objectives of this study are to provide qualitative evidence regarding key aspects of the Checklist's clinical utility (appropriateness, acceptability, and practicability), informed by Smart's multidimensional model of clinical utility. Secondary objectives explore the predictive value of the Checklist and concordance between specific checklist items and patient-reported outcome measures. Methods: This prospective mixed methods case series study will recruit up to 60 newly diagnosed patients with cancer and 10 specialist nurses from a specialist cancer center. Nurses will complete the Checklist with patient participants. Within 2 weeks of Checklist completion, patients will complete 5 patient-reported outcome measures with established psychometric properties that correspond to specific checklist items and an individual semistructured interview to explore Checklist clinical utility. Interviews with nurses will occur 12 and 24 weeks after they first complete a checklist, exploring perceptions of the Checklist's clinical utility including barriers and facilitators to implementation. Data describing planned and unplanned patient service use will be collected from patient follow-up interviews at 12 weeks and the electronic medical record at 24 weeks after Checklist completion. Descriptive statistics will summarize operational, checklist, and electronic medical record data. The predictive value of the Checklist and the relationship between specific checklist items and relevant patient-reported outcome measures will be examined using descriptive statistics, contingency tables, measures of association, and plots as appropriate. Qualitative data will be analyzed using a content analysis approach. Results: This study was approved by the institution's ethics committee. The enrollment period commenced in May 2022 and ended in November 2022. In total, 37 patients with cancer and 7 specialist cancer nurses were recruited at this time. Data collection is scheduled for completion at the end of May 2023. Conclusions: This study will evaluate key clinical utility dimensions of a nursing complexity checklist. It will also provide preliminary evidence on its predictive value and information to support its seamless implementation into everyday practice including, but not limited to, possible revisions to the Checklist, instructions, and training for relevant personnel. Future implementation of this Checklist may improve equity of opportunity of access to care for patients with cancer. International Registered Report Identifier (IRRID): DERR1-10.2196/48432 ", doi="10.2196/48432", url="https://www.researchprotocols.org/2023/1/e48432", url="http://www.ncbi.nlm.nih.gov/pubmed/37943601" } @Article{info:doi/10.2196/48809, author="Gierend, Kerstin and Freiesleben, Sherry and Kadioglu, Dennis and Siegel, Fabian and Ganslandt, Thomas and Waltemath, Dagmar", title="The Status of Data Management Practices Across German Medical Data Integration Centers: Mixed Methods Study", journal="J Med Internet Res", year="2023", month="Nov", day="8", volume="25", pages="e48809", keywords="data management", keywords="provenance", keywords="traceability", keywords="metadata", keywords="data integration center", keywords="maturity model", abstract="Background: In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status. Objective: Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data. Methods: In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist. Results: Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies. Conclusions: The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality. ", doi="10.2196/48809", url="https://www.jmir.org/2023/1/e48809", url="http://www.ncbi.nlm.nih.gov/pubmed/37938878" } @Article{info:doi/10.2196/45636, author="Rindal, Brad D. and Pasumarthi, Prasad Dhavan and Thirumalai, Vijayakumar and Truitt, R. Anjali and Asche, E. Stephen and Worley, C. Donald and Kane, M. Sheryl and Gryczynski, Jan and Mitchell, G. Shannon", title="Clinical Decision Support to Reduce Opioid Prescriptions for Dental Extractions using SMART on FHIR: Implementation Report", journal="JMIR Med Inform", year="2023", month="Nov", day="7", volume="11", pages="e45636", keywords="clinical decision support systems", keywords="dentistry", keywords="analgesics", keywords="electronic health records", keywords="EHR", keywords="algorithm", keywords="design", keywords="implementation", keywords="decision support", keywords="development", keywords="dentists", keywords="pain management", keywords="patient care", keywords="application", keywords="tool", keywords="Fast Healthcare Interoperability Resources", keywords="FHIR", keywords="Substitutable Medical Applications and Reusable Technologies", keywords="SMART", abstract="Background: Clinical decision support (CDS) has the potential to improve clinical decision-making consistent with evidence-based care. CDS can be designed to save health care providers time and help them provide safe and personalized analgesic prescribing. Objective: The aim of this report is to describe the development of a CDS system designed to provide dentists with personalized pain management recommendations to reduce opioid prescribing following extractions. The use of CDS is also examined. Methods: This study was conducted in HealthPartners, which uses an electronic health record (EHR) system that integrates both medical and dental information upon which the CDS application was developed based on SMART (Substitutable Medical Applications and Reusable Technologies) on FHIR (Fast Healthcare Interoperability Resources). The various tools used to bring relevant medical conditions, medications, patient history, and other relevant data into the CDS interface are described. The CDS application runs a drug interaction algorithm developed by our organization and provides patient-specific recommendations. The CDS included access to the state Prescription Monitoring Program database. Implementation (Results): The pain management CDS was implemented as part of a study examining opioid prescribing among patients undergoing dental extraction procedures from February 17, 2020, to May 14, 2021. Provider-level use of CDS at extraction encounters ranged from 0\% to 87.4\% with 12.1\% of providers opening the CDS for no encounters, 39.4\% opening the CDS for 1\%-20\% of encounters, 36.4\% opening it for 21\%-50\% of encounters, and 12.1\% opening it for 51\%-87\% of encounters. Conclusions: The pain management CDS is an EHR-embedded, provider-facing tool to help dentists make personalized pain management recommendations following dental extractions. The SMART on FHIR--based pain management CDS adapted well to the point-of-care dental setting and led to the design of a scalable CDS tool that is EHR vendor agnostic. Trial Registration: ClinicalTrials.gov NCT03584789; https://clinicaltrials.gov/study/NCT03584789 ", doi="10.2196/45636", url="https://medinform.jmir.org/2023/1/e45636", url="http://www.ncbi.nlm.nih.gov/pubmed/37934572" } @Article{info:doi/10.2196/46708, author="Yang, Wenyi and Wang, Baohua and Ma, Shaobo and Wang, Jingxin and Ai, Limei and Li, Zhengyu and Wan, Xia", title="Optimal Look-Back Period to Identify True Incident Cases of Diabetes in Medical Insurance Data in the Chinese Population: Retrospective Analysis Study", journal="JMIR Public Health Surveill", year="2023", month="Nov", day="6", volume="9", pages="e46708", keywords="diabetes", keywords="incident cases", keywords="administrative data", keywords="look-back period", keywords="retrograde survival function", abstract="Background: Accurate estimation of incidence and prevalence is vital for preventing and controlling diabetes. Administrative data (including insurance data) could be a good source to estimate the incidence of diabetes. However, how to determine the look-back period (LP) to remove cases with preceding records remains a problem for administrative data. A short LP will cause overestimation of incidence, whereas a long LP will limit the usefulness of a database. Therefore, it is necessary to determine the optimal LP length for identifying incident cases in administrative data. Objective: This study aims to offer different methods to identify the optimal LP for diabetes by using medical insurance data from the Chinese population with reference to other diseases in the administrative data. Methods: Data from the insurance database of the city of Weifang, China from between January 2016 and December 2020 were used. To identify the incident cases in 2020, we removed prevalent patients with preceding records of diabetes between 2016 and 2019 (ie, a 4-year LP). Using this 4-year LP as a reference, consistency examination indexes (CEIs), including positive predictive values, the $\kappa$ coefficient, and overestimation rate, were calculated to determine the level of agreement between different LPs and an LP of 4 years (the longest LP). Moreover, we constructed a retrograde survival function, in which survival (ie, incident cases) means not having a preceding record at the given time and the survival time is the difference between the date of the last record in 2020 and the most recent previous record in the LP. Based on the survival outcome and survival time, we established the survival function and survival hazard function. When the survival probability, S(t), remains stable, and survival hazard converges to zero, we obtain the optimal LP. Combined with the results of these two methods, we determined the optimal LP for Chinese diabetes patients. Results: The $\kappa$ agreement was excellent (0.950), with a high positive predictive value (92.2\%) and a low overestimation rate (8.4\%) after a 2-year LP. As for the retrograde survival function, S(t) dropped rapidly during the first 1-year LP (from 1.00 to 0.11). At a 417-day LP, the hazard function reached approximately zero (ht=0.000459), S(t) remained at 0.10, and at 480 days, the frequency of S(t) did not increase. Combining the two methods, we found that the optimal LP is 2 years for Chinese diabetes patients. Conclusions: The retrograde survival method and CEIs both showed effectiveness. A 2-year LP should be considered when identifying incident cases of diabetes using insurance data in the Chinese population. ", doi="10.2196/46708", url="https://publichealth.jmir.org/2023/1/e46708", url="http://www.ncbi.nlm.nih.gov/pubmed/37930785" } @Article{info:doi/10.2196/44732, author="Ho, Vy and Brown Johnson, Cati and Ghanzouri, Ilies and Amal, Saeed and Asch, Steven and Ross, Elsie", title="Physician- and Patient-Elicited Barriers and Facilitators to Implementation of a Machine Learning--Based Screening Tool for Peripheral Arterial Disease: Preimplementation Study With Physician and Patient Stakeholders", journal="JMIR Cardio", year="2023", month="Nov", day="6", volume="7", pages="e44732", keywords="artificial intelligence", keywords="cardiovascular disease", keywords="machine learning", keywords="peripheral arterial disease", keywords="preimplementation study", abstract="Background: Peripheral arterial disease (PAD) is underdiagnosed, partially due to a high prevalence of atypical symptoms and a lack of physician and patient awareness. Implementing clinical decision support tools powered by machine learning algorithms may help physicians identify high-risk patients for diagnostic workup. Objective: This study aims to evaluate barriers and facilitators to the implementation of a novel machine learning--based screening tool for PAD among physician and patient stakeholders using the Consolidated Framework for Implementation Research (CFIR). Methods: We performed semistructured interviews with physicians and patients from the Stanford University Department of Primary Care and Population Health, Division of Cardiology, and Division of Vascular Medicine. Participants answered questions regarding their perceptions toward machine learning and clinical decision support for PAD detection. Rapid thematic analysis was performed using templates incorporating codes from CFIR constructs. Results: A total of 12 physicians (6 primary care physicians and 6 cardiovascular specialists) and 14 patients were interviewed. Barriers to implementation arose from 6 CFIR constructs: complexity, evidence strength and quality, relative priority, external policies and incentives, knowledge and beliefs about intervention, and individual identification with the organization. Facilitators arose from 5 CFIR constructs: intervention source, relative advantage, learning climate, patient needs and resources, and knowledge and beliefs about intervention. Physicians felt that a machine learning--powered diagnostic tool for PAD would improve patient care but cited limited time and authority in asking patients to undergo additional screening procedures. Patients were interested in having their physicians use this tool but raised concerns about such technologies replacing human decision-making. Conclusions: Patient- and physician-reported barriers toward the implementation of a machine learning--powered PAD diagnostic tool followed four interdependent themes: (1) low familiarity or urgency in detecting PAD; (2) concerns regarding the reliability of machine learning; (3) differential perceptions of responsibility for PAD care among primary care versus specialty physicians; and (4) patient preference for physicians to remain primary interpreters of health care data. Facilitators followed two interdependent themes: (1) enthusiasm for clinical use of the predictive model and (2) willingness to incorporate machine learning into clinical care. Implementation of machine learning--powered diagnostic tools for PAD should leverage provider support while simultaneously educating stakeholders on the importance of early PAD diagnosis. High predictive validity is necessary for machine learning models but not sufficient for implementation. ", doi="10.2196/44732", url="https://cardio.jmir.org/2023/1/e44732", url="http://www.ncbi.nlm.nih.gov/pubmed/37930755" } @Article{info:doi/10.2196/44139, author="Gopwani, Sumeet and Bahrun, Ehab and Singh, Tanvee and Popovsky, Daniel and Cramer, Joseph and Geng, Xue", title="Efficacy of Electronic Reminders in Increasing the Enhanced Recovery After Surgery Protocol Use During Major Breast Surgery: Prospective Cohort Study", journal="JMIR Perioper Med", year="2023", month="Nov", day="3", volume="6", pages="e44139", keywords="ERAS protocol", keywords="electronic notification system", keywords="clinical decision support system", keywords="postoperative outcomes", keywords="breast surgery", keywords="surgery", keywords="surgical", keywords="postoperative", keywords="decision support", keywords="notification", keywords="recovery", keywords="anesthesia", keywords="cohort study", keywords="patient outcome", keywords="enhanced recovery", keywords="patient education", keywords="surgical stress", abstract="Background: Enhanced recovery after surgery (ERAS) protocols are patient-centered, evidence-based guidelines for peri-, intra-, and postoperative management of surgical candidates that aim to decrease operative complications and facilitate recovery after surgery. Anesthesia providers can use these protocols to guide decision-making and standardize aspects of their anesthetic plan in the operating room. Objective: Research across multiple disciplines has demonstrated that clinical decision support systems have the potential to improve protocol adherence by reminding providers about departmental policies and protocols via notifications. There remains a gap in the literature about whether clinical decision support systems can improve patient outcomes by improving anesthesia providers' adherence to protocols. Our hypothesis is that the implementation of an electronic notification system to anesthesia providers the day prior to scheduled breast surgeries will increase the use of the already existing but underused ERAS protocols. Methods: This was a single-center prospective cohort study conducted between October 2017 and August 2018 at an urban academic medical center. After obtaining approval from the institutional review board, anesthesia providers assigned to major breast surgery cases were identified. Patient data were collected pre- and postimplementation of an electronic notification system that sent the anesthesia providers an email reminder of the ERAS breast protocol the night before scheduled surgeries. Each patient's record was then reviewed to assess the frequency of adherence to the various ERAS protocol elements. Results: Implementation of an electronic notification significantly improved overall protocol adherence and several preoperative markers of ERAS protocol adherence. Protocol adherence increased from 16\% (n=14) to 44\% (n=44; P<.001), preoperative administration of oral gabapentin (600 mg) increased from 13\% (n=11) to 43\% (n=43; P<.001), and oral celebrex (400 mg) use increased from 16\% (n=14) to 35\% (n=35; P=.006). There were no statistically significant differences in the use of scopolamine transdermal patch (P=.05), ketamine (P=.35), and oral acetaminophen (P=.31) between the groups. Secondary outcomes such as intraoperative and postoperative morphine equivalent administered, postanesthesia care unit length of stay, postoperative pain scores, and incidence of postoperative nausea and vomiting did not show statistical significance. Conclusions: This study examines whether sending automated notifications to anesthesia providers increases the use of ERAS protocols in a single academic medical center. Our analysis exhibited statistically significant increases in overall protocol adherence but failed to show significant differences in secondary outcome measures. Despite the lack of a statistically significant difference in secondary postoperative outcomes, our analysis contributes to the limited literature on the relationship between using push notifications and clinical decision support in guiding perioperative decision-making. A variety of techniques can be implemented, including technological solutions such as automated notifications to providers, to improve awareness and adherence to ERAS protocols. ", doi="10.2196/44139", url="https://periop.jmir.org/2023/1/e44139", url="http://www.ncbi.nlm.nih.gov/pubmed/37921854" } @Article{info:doi/10.2196/47532, author="Ito, Naoki and Kadomatsu, Sakina and Fujisawa, Mineto and Fukaguchi, Kiyomitsu and Ishizawa, Ryo and Kanda, Naoki and Kasugai, Daisuke and Nakajima, Mikio and Goto, Tadahiro and Tsugawa, Yusuke", title="The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study", journal="JMIR Med Educ", year="2023", month="Nov", day="2", volume="9", pages="e47532", keywords="GPT-4", keywords="racial and ethnic bias", keywords="typical clinical vignettes", keywords="diagnosis", keywords="triage", keywords="artificial intelligence", keywords="AI", keywords="race", keywords="clinical vignettes", keywords="physician", keywords="efficiency", keywords="decision-making", keywords="bias", keywords="GPT", abstract="Background: Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear. Objective: We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity. Methods: We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as ``correct'' or ``incorrect.'' Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes. Results: The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8\% (44/45; 95\% CI 88.2\%-99.9\%) for GPT-4 and 91.1\% (41/45; 95\% CI 78.8\%-97.5\%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8\% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7\%; 95\% CI 51.0\%-80.0\%; physicians: 30/45, 66.7\%; 95\% CI 51.0\%-80.0\%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100\% (95\% CI 78.2\%-100\%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients' race and ethnicity information was added. The accuracy of triage was 62.2\% (95\% CI 46.5\%-76.2\%; P=.50) for Black patients; 66.7\% (95\% CI 51.0\%-80.0\%; P=.99) for White patients; 66.7\% (95\% CI 51.0\%-80.0\%; P=.99) for Asian patients, and 62.2\% (95\% CI 46.5\%-76.2\%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity. Conclusions: GPT-4's ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage. ", doi="10.2196/47532", url="https://mededu.jmir.org/2023/1/e47532", url="http://www.ncbi.nlm.nih.gov/pubmed/37917120" } @Article{info:doi/10.2196/49605, author="Guo, Lin Lin and Guo, Ying Lin and Li, Jiao and Gu, Wen Yao and Wang, Yang Jia and Cui, Ying and Qian, Qing and Chen, Ting and Jiang, Rui and Zheng, Si", title="Characteristics and Admission Preferences of Pediatric Emergency Patients and Their Waiting Time Prediction Using Electronic Medical Record Data: Retrospective Comparative Analysis", journal="J Med Internet Res", year="2023", month="Nov", day="1", volume="25", pages="e49605", keywords="pediatric emergency department", keywords="characteristics", keywords="admission preferences", keywords="waiting time", keywords="machine learning", keywords="electronic medical record", abstract="Background: The growing number of patients visiting pediatric emergency departments could have a detrimental impact on the care provided to children who are triaged as needing urgent attention. Therefore, it has become essential to continuously monitor and analyze the admissions and waiting times of pediatric emergency patients. Despite the significant challenge posed by the shortage of pediatric medical resources in China's health care system, there have been few large-scale studies conducted to analyze visits to the pediatric emergency room. Objective: This study seeks to examine the characteristics and admission patterns of patients in the pediatric emergency department using electronic medical record (EMR) data. Additionally, it aims to develop and assess machine learning models for predicting waiting times for pediatric emergency department visits. Methods: This retrospective analysis involved patients who were admitted to the emergency department of Children's Hospital Capital Institute of Pediatrics from January 1, 2021, to December 31, 2021. Clinical data from these admissions were extracted from the electronic medical records, encompassing various variables of interest such as patient demographics, clinical diagnoses, and time stamps of clinical visits. These indicators were collected and compared. Furthermore, we developed and evaluated several computational models for predicting waiting times. Results: In total, 183,024 eligible admissions from 127,368 pediatric patients were included. During the 12-month study period, pediatric emergency department visits were most frequent among children aged less than 5 years, accounting for 71.26\% (130,423/183,024) of the total visits. Additionally, there was a higher proportion of male patients (104,147/183,024, 56.90\%) compared with female patients (78,877/183,024, 43.10\%). Fever (50,715/183,024, 27.71\%), respiratory infection (43,269/183,024, 23.64\%), celialgia (9560/183,024, 5.22\%), and emesis (6898/183,024, 3.77\%) were the leading causes of pediatric emergency room visits. The average daily number of admissions was 501.44, and 18.76\% (34,339/183,204) of pediatric emergency department visits resulted in discharge without a prescription or further tests. The median waiting time from registration to seeing a doctor was 27.53 minutes. Prolonged waiting times were observed from April to July, coinciding with an increased number of arrivals, primarily for respiratory diseases. In terms of waiting time prediction, machine learning models, specifically random forest, LightGBM, and XGBoost, outperformed regression methods. On average, these models reduced the root-mean-square error by approximately 17.73\% (8.951/50.481) and increased the R2 by approximately 29.33\% (0.154/0.525). The SHAP method analysis highlighted that the features ``wait.green'' and ``department'' had the most significant influence on waiting times. Conclusions: This study offers a contemporary exploration of pediatric emergency room visits, revealing significant variations in admission rates across different periods and uncovering certain admission patterns. The machine learning models, particularly ensemble methods, delivered more dependable waiting time predictions. Patient volume awaiting consultation or treatment and the triage status emerged as crucial factors contributing to prolonged waiting times. Therefore, strategies such as patient diversion to alleviate congestion in emergency departments and optimizing triage systems to reduce average waiting times remain effective approaches to enhance the quality of pediatric health care services in China. ", doi="10.2196/49605", url="https://www.jmir.org/2023/1/e49605", url="http://www.ncbi.nlm.nih.gov/pubmed/37910168" } @Article{info:doi/10.2196/51375, author="de Koning, Enrico and van der Haas, Yvette and Saguna, Saguna and Stoop, Esmee and Bosch, Jan and Beeres, Saskia and Schalij, Martin and Boogers, Mark", title="AI Algorithm to Predict Acute Coronary Syndrome in Prehospital Cardiac Care: Retrospective Cohort Study", journal="JMIR Cardio", year="2023", month="Oct", day="31", volume="7", pages="e51375", keywords="cardiology", keywords="acute coronary syndrome", keywords="Hollands Midden Acute Regional Triage--cardiology", keywords="prehospital", keywords="triage", keywords="artificial intelligence", keywords="natural language processing", keywords="angina", keywords="algorithm", keywords="overcrowding", keywords="emergency department", keywords="clinical decision-making", keywords="emergency medical service", keywords="paramedics", abstract="Background: Overcrowding of hospitals and emergency departments (EDs) is a growing problem. However, not all ED consultations are necessary. For example, 80\% of patients in the ED with chest pain do not have an acute coronary syndrome (ACS). Artificial intelligence (AI) is useful in analyzing (medical) data, and might aid health care workers in prehospital clinical decision-making before patients are presented to the hospital. Objective: The aim of this study was to develop an AI model which would be able to predict ACS before patients visit the ED. The model retrospectively analyzed prehospital data acquired by emergency medical services' nurse paramedics. Methods: Patients presenting to the emergency medical services with symptoms suggestive of ACS between September 2018 and September 2020 were included. An AI model using a supervised text classification algorithm was developed to analyze data. Data were analyzed for all 7458 patients (mean 68, SD 15 years, 54\% men). Specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for control and intervention groups. At first, a machine learning (ML) algorithm (or model) was chosen; afterward, the features needed were selected and then the model was tested and improved using iterative evaluation and in a further step through hyperparameter tuning. Finally, a method was selected to explain the final AI model. Results: The AI model had a specificity of 11\% and a sensitivity of 99.5\% whereas usual care had a specificity of 1\% and a sensitivity of 99.5\%. The PPV of the AI model was 15\% and the NPV was 99\%. The PPV of usual care was 13\% and the NPV was 94\%. Conclusions: The AI model was able to predict ACS based on retrospective data from the prehospital setting. It led to an increase in specificity (from 1\% to 11\%) and NPV (from 94\% to 99\%) when compared to usual care, with a similar sensitivity. Due to the retrospective nature of this study and the singular focus on ACS it should be seen as a proof-of-concept. Other (possibly life-threatening) diagnoses were not analyzed. Future prospective validation is necessary before implementation. ", doi="10.2196/51375", url="https://cardio.jmir.org/2023/1/e51375", url="http://www.ncbi.nlm.nih.gov/pubmed/37906226" } @Article{info:doi/10.2196/50448, author="Gong, Jeong Eun and Bang, Seok Chang and Lee, Jun Jae and Jeong, Min Hae and Baik, Ho Gwang and Jeong, Hoon Jae and Dick, Sigmund and Lee, Hun Gi", title="Clinical Decision Support System for All Stages of Gastric Carcinogenesis in Real-Time Endoscopy: Model Establishment and Validation Study", journal="J Med Internet Res", year="2023", month="Oct", day="30", volume="25", pages="e50448", keywords="atrophy", keywords="intestinal metaplasia", keywords="metaplasia", keywords="deep learning", keywords="endoscopy", keywords="gastric neoplasms", keywords="neoplasm", keywords="neoplasms", keywords="internal medicine", keywords="cancer", keywords="oncology", keywords="decision support", keywords="real time", keywords="gastrointestinal", keywords="gastric", keywords="intestinal", keywords="machine learning", keywords="clinical decision support system", keywords="CDSS", keywords="computer aided", keywords="diagnosis", keywords="diagnostic", keywords="carcinogenesis", abstract="Background: Our research group previously established a deep-learning--based clinical decision support system (CDSS) for real-time endoscopy-based detection and classification of gastric neoplasms. However, preneoplastic conditions, such as atrophy and intestinal metaplasia (IM) were not taken into account, and there is no established model that classifies all stages of gastric carcinogenesis. Objective: This study aims to build and validate a CDSS for real-time endoscopy for all stages of gastric carcinogenesis, including atrophy and IM. Methods: A total of 11,868 endoscopic images were used for training and internal testing. The primary outcomes were lesion classification accuracy (6 classes: advanced gastric cancer, early gastric cancer, dysplasia, atrophy, IM, and normal) and atrophy and IM lesion segmentation rates for the segmentation model. The following tests were carried out to validate the performance of lesion classification accuracy: (1) external testing using 1282 images from another institution and (2) evaluation of the classification accuracy of atrophy and IM in real-world procedures in a prospective manner. To estimate the clinical utility, 2 experienced endoscopists were invited to perform a blind test with the same data set. A CDSS was constructed by combining the established 6-class lesion classification model and the preneoplastic lesion segmentation model with the previously established lesion detection model. Results: The overall lesion classification accuracy (95\% CI) was 90.3\% (89\%-91.6\%) in the internal test. For the performance validation, the CDSS achieved 85.3\% (83.4\%-97.2\%) overall accuracy. The per-class external test accuracies for atrophy and IM were 95.3\% (92.6\%-98\%) and 89.3\% (85.4\%-93.2\%), respectively. CDSS-assisted endoscopy showed an accuracy of 92.1\% (88.8\%-95.4\%) for atrophy and 95.5\% (92\%-99\%) for IM in the real-world application of 522 consecutive screening endoscopies. There was no significant difference in the overall accuracy between the invited endoscopists and established CDSS in the prospective real-clinic evaluation (P=.23). The CDSS demonstrated a segmentation rate of 93.4\% (95\% CI 92.4\%-94.4\%) for atrophy or IM lesion segmentation in the internal testing. Conclusions: The CDSS achieved high performance in terms of computer-aided diagnosis of all stages of gastric carcinogenesis and demonstrated real-world application potential. ", doi="10.2196/50448", url="https://www.jmir.org/2023/1/e50448", url="http://www.ncbi.nlm.nih.gov/pubmed/37902818" } @Article{info:doi/10.2196/49324, author="Wilhelm, Isabelle Theresa and Roos, Jonas and Kaczmarczyk, Robert", title="Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study", journal="J Med Internet Res", year="2023", month="Oct", day="30", volume="25", pages="e49324", keywords="dermatology", keywords="ophthalmology", keywords="orthopedics", keywords="therapy", keywords="large language models", keywords="artificial intelligence", keywords="LLM", keywords="ChatGPT", keywords="chatbot", keywords="chatbots", keywords="orthopedic", keywords="recommendation", keywords="recommendations", keywords="medical information", keywords="health information", keywords="quality", keywords="reliability", keywords="accuracy", keywords="safety", keywords="reliable", keywords="medical advice", abstract="Background: As advancements in artificial intelligence (AI) continue, large language models (LLMs) have emerged as promising tools for generating medical information. Their rapid adaptation and potential benefits in health care require rigorous assessment in terms of the quality, accuracy, and safety of the generated information across diverse medical specialties. Objective: This study aimed to evaluate the performance of 4 prominent LLMs, namely, Claude-instant-v1.0, GPT-3.5-Turbo, Command-xlarge-nightly, and Bloomz, in generating medical content spanning the clinical specialties of ophthalmology, orthopedics, and dermatology. Methods: Three domain-specific physicians evaluated the AI-generated therapeutic recommendations for a diverse set of 60 diseases. The evaluation criteria involved the mDISCERN score, correctness, and potential harmfulness of the recommendations. ANOVA and pairwise t tests were used to explore discrepancies in content quality and safety across models and specialties. Additionally, using the capabilities of OpenAI's most advanced model, GPT-4, an automated evaluation of each model's responses to the diseases was performed using the same criteria and compared to the physicians' assessments through Pearson correlation analysis. Results: Claude-instant-v1.0 emerged with the highest mean mDISCERN score (3.35, 95\% CI 3.23-3.46). In contrast, Bloomz lagged with the lowest score (1.07, 95\% CI 1.03-1.10). Our analysis revealed significant differences among the models in terms of quality (P<.001). Evaluating their reliability, the models displayed strong contrasts in their falseness ratings, with variations both across models (P<.001) and specialties (P<.001). Distinct error patterns emerged, such as confusing diagnoses; providing vague, ambiguous advice; or omitting critical treatments, such as antibiotics for infectious diseases. Regarding potential harm, GPT-3.5-Turbo was found to be the safest, with the lowest harmfulness rating. All models lagged in detailing the risks associated with treatment procedures, explaining the effects of therapies on quality of life, and offering additional sources of information. Pearson correlation analysis underscored a substantial alignment between physician assessments and GPT-4's evaluations across all established criteria (P<.01). Conclusions: This study, while comprehensive, was limited by the involvement of a select number of specialties and physician evaluators. The straightforward prompting strategy (``How to treat\ldots'') and the assessment benchmarks, initially conceptualized for human-authored content, might have potential gaps in capturing the nuances of AI-driven information. The LLMs evaluated showed a notable capability in generating valuable medical content; however, evident lapses in content quality and potential harm signal the need for further refinements. Given the dynamic landscape of LLMs, this study's findings emphasize the need for regular and methodical assessments, oversight, and fine-tuning of these AI tools to ensure they produce consistently trustworthy and clinically safe medical advice. Notably, the introduction of an auto-evaluation mechanism using GPT-4, as detailed in this study, provides a scalable, transferable method for domain-agnostic evaluations, extending beyond therapy recommendation assessments. ", doi="10.2196/49324", url="https://www.jmir.org/2023/1/e49324", url="http://www.ncbi.nlm.nih.gov/pubmed/37902826" } @Article{info:doi/10.2196/46547, author="Liang, Xueping and Zhao, Juan and Chen, Yan and Bandara, Eranga and Shetty, Sachin", title="Architectural Design of a Blockchain-Enabled, Federated Learning Platform for Algorithmic Fairness in Predictive Health Care: Design Science Study", journal="J Med Internet Res", year="2023", month="Oct", day="30", volume="25", pages="e46547", keywords="fairness", keywords="federated learning", keywords="bias", keywords="health care", keywords="blockchain", keywords="software", keywords="proof of concept", keywords="implementation", keywords="privacy", abstract="Background: Developing effective and generalizable predictive models is critical for disease prediction and clinical decision-making, often requiring diverse samples to mitigate population bias and address algorithmic fairness. However, a major challenge is to retrieve learning models across multiple institutions without bringing in local biases and inequity, while preserving individual patients' privacy at each site. Objective: This study aims to understand the issues of bias and fairness in the machine learning process used in the predictive health care domain. We proposed a software architecture that integrates federated learning and blockchain to improve fairness, while maintaining acceptable prediction accuracy and minimizing overhead costs. Methods: We improved existing federated learning platforms by integrating blockchain through an iterative design approach. We used the design science research method, which involves 2 design cycles (federated learning for bias mitigation and decentralized architecture). The design involves a bias-mitigation process within the blockchain-empowered federated learning framework based on a novel architecture. Under this architecture, multiple medical institutions can jointly train predictive models using their privacy-protected data effectively and efficiently and ultimately achieve fairness in decision-making in the health care domain. Results: We designed and implemented our solution using the Aplos smart contract, microservices, Rahasak blockchain, and Apache Cassandra--based distributed storage. By conducting 20,000 local model training iterations and 1000 federated model training iterations across 5 simulated medical centers as peers in the Rahasak blockchain network, we demonstrated how our solution with an improved fairness mechanism can enhance the accuracy of predictive diagnosis. Conclusions: Our study identified the technical challenges of prediction biases faced by existing predictive models in the health care domain. To overcome these challenges, we presented an innovative design solution using federated learning and blockchain, along with the adoption of a unique distributed architecture for a fairness-aware system. We have illustrated how this design can address privacy, security, prediction accuracy, and scalability challenges, ultimately improving fairness and equity in the predictive health care domain. ", doi="10.2196/46547", url="https://www.jmir.org/2023/1/e46547", url="http://www.ncbi.nlm.nih.gov/pubmed/37902833" } @Article{info:doi/10.2196/48452, author="Kunitsu, Yuki", title="The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical Study Using the Japanese National Examination for Pharmacists", journal="JMIR Med Educ", year="2023", month="Oct", day="30", volume="9", pages="e48452", keywords="natural language processing", keywords="generative pretrained transformer", keywords="GPT-4", keywords="ChatGPT", keywords="artificial intelligence", keywords="AI", keywords="chatbot", keywords="pharmacy", keywords="pharmacist", abstract="Background: The advancement of artificial intelligence (AI), as well as machine learning, has led to its application in various industries, including health care. AI chatbots, such as GPT-4, developed by OpenAI, have demonstrated potential in supporting health care professionals by providing medical information, answering examination questions, and assisting in medical education. However, the applicability of GPT-4 in the field of pharmacy remains unexplored. Objective: This study aimed to evaluate GPT-4's ability to answer questions from the Japanese National Examination for Pharmacists (JNEP) and assess its potential as a support tool for pharmacists in their daily practice. Methods: The question texts and answer choices from the 107th and 108th JNEP, held in February 2022 and February 2023, were input into GPT-4. As GPT-4 cannot process diagrams, questions that included diagram interpretation were not analyzed and were initially given a score of 0. The correct answer rates were calculated and compared with the passing criteria of each examination to evaluate GPT-4's performance. Results: For the 107th and 108th JNEP, GPT-4 achieved an accuracy rate of 64.5\% (222/344) and 62.9\% (217/345), respectively, for all questions. When considering only the questions that GPT-4 could answer, the accuracy rates increased to 78.2\% (222/284) and 75.3\% (217/287), respectively. The accuracy rates tended to be lower for physics, chemistry, and calculation questions. Conclusions: Although GPT-4 demonstrated the potential to answer questions from the JNEP and support pharmacists' capabilities, it also showed limitations in handling highly specialized questions, calculation questions, and questions requiring diagram recognition. Further evaluation is necessary to explore its applicability in real-world clinical settings, considering the complexities of patient scenarios and collaboration with health care professionals. By addressing these limitations, GPT-4 could become a more reliable tool for pharmacists in their daily practice. ", doi="10.2196/48452", url="https://mededu.jmir.org/2023/1/e48452", url="http://www.ncbi.nlm.nih.gov/pubmed/37837968" } @Article{info:doi/10.2196/49892, author="Pelly, Louise Melissa and Fatehi, Farhad and Liew, Danny and Verdejo-Garcia, Antonio", title="Digital Health Secondary Prevention Using Co-Design Procedures: Focus Group Study With Health Care Providers and Patients With Myocardial Infarction", journal="JMIR Cardio", year="2023", month="Oct", day="30", volume="7", pages="e49892", keywords="co-design", keywords="digital health", keywords="myocardial infarction", keywords="qualitative", keywords="participatory", keywords="mobile health", abstract="Background: Myocardial infarction (MI) is a debilitating condition and a leading cause of morbidity and mortality worldwide. Digital health is a promising approach for delivering secondary prevention to support patients with a history of MI and for reducing risk factors that can lead to a future event. However, its potential can only be fulfilled when the technology meets the needs of the end users who will be interacting with this secondary prevention. Objective: We aimed to gauge the opinions of patients with a history of MI and health professionals concerning the functions, features, and characteristics of a digital health solution to support post-MI care. Methods: Our approach aligned with the gold standard participatory co-design procedures enabling progressive refinement of feedback via exploratory, confirmatory, and prototype-assisted feedback from participants. Patients with a history of MI and health professionals from Australia attended focus groups over a videoconference system. We engaged with 38 participants across 3 rounds of focus groups using an iterative co-design approach. Round 1 included 8 participants (4 patients and 4 health professionals), round 2 included 24 participants (11 patients and 13 health professionals), and round 3 included 22 participants (14 patients and 8 health professionals). Results: Participants highlighted the potential of digital health in addressing the unmet needs of post-MI care. Both patients with a history of MI and health professionals agreed that mental health is a key concern in post-MI care that requires further support. Participants agreed that family members can be used to support postdischarge care and require support from the health care team. Participants agreed that incorporating simple games with a points system can increase long-term engagement. However, patients with a history of MI emphasized a lack of support from their health care team, family, and community more strongly than health professionals. They also expressed some openness to using artificial intelligence, whereas health professionals expressed that users should not be aware of artificial intelligence use. Conclusions: These results provide valuable insights into the development of digital health secondary preventions aimed at supporting patients with a history of MI. Future research can implement a pilot study in the population with MI to trial these recommendations in a real-world setting. ", doi="10.2196/49892", url="https://cardio.jmir.org/2023/1/e49892", url="http://www.ncbi.nlm.nih.gov/pubmed/37902821" } @Article{info:doi/10.2196/48476, author="Vijayakumar, Smrithi and Lee, Vien V. and Leong, Ying Qiao and Hong, Jung Soo and Blasiak, Agata and Ho, Dean", title="Physicians' Perspectives on AI in Clinical Decision Support Systems: Interview Study of the CURATE.AI Personalized Dose Optimization Platform", journal="JMIR Hum Factors", year="2023", month="Oct", day="30", volume="10", pages="e48476", keywords="artificial intelligence", keywords="AI", keywords="clinical decision support system", keywords="CDSS", keywords="adoption", keywords="perception", keywords="decision support", keywords="acceptance", keywords="perspective", keywords="perspectives", keywords="opinion", keywords="attitude", keywords="qualitative", keywords="focus", keywords="interview", keywords="interviews", abstract="Background: Physicians play a key role in integrating new clinical technology into care practices through user feedback and growth propositions to developers of the technology. As physicians are stakeholders involved through the technology iteration process, understanding their roles as users can provide nuanced insights into the workings of these technologies that are being explored. Therefore, understanding physicians' perceptions can be critical toward clinical validation, implementation, and downstream adoption. Given the increasing prevalence of clinical decision support systems (CDSSs), there remains a need to gain an in-depth understanding of physicians' perceptions and expectations toward their downstream implementation. This paper explores physicians' perceptions of integrating CURATE.AI, a novel artificial intelligence (AI)--based and clinical stage personalized dosing CDSSs, into clinical practice. Objective: This study aims to understand physicians' perspectives of integrating CURATE.AI for clinical work and to gather insights on considerations of the implementation of AI-based CDSS tools. Methods: A total of 12 participants completed semistructured interviews examining their knowledge, experience, attitudes, risks, and future course of the personalized combination therapy dosing platform, CURATE.AI. Interviews were audio recorded, transcribed verbatim, and coded manually. The data were thematically analyzed. Results: Overall, 3 broad themes and 9 subthemes were identified through thematic analysis. The themes covered considerations that physicians perceived as significant across various stages of new technology development, including trial, clinical implementation, and mass adoption. Conclusions: The study laid out the various ways physicians interpreted an AI-based personalized dosing CDSS, CURATE.AI, for their clinical practice. The research pointed out that physicians' expectations during the different stages of technology exploration can be nuanced and layered with expectations of implementation that are relevant for technology developers and researchers. ", doi="10.2196/48476", url="https://humanfactors.jmir.org/2023/1/e48476", url="http://www.ncbi.nlm.nih.gov/pubmed/37902825" } @Article{info:doi/10.2196/50309, author="Paul, Sharad and Knight, Allanah", title="The Importance of Basal Cell Carcinoma Risk Stratification and Potential Future Pathways", journal="JMIR Dermatol", year="2023", month="Oct", day="30", volume="6", pages="e50309", keywords="skin cancer", keywords="BCC", keywords="basal cell carcinoma", keywords="dermatology", keywords="histology", keywords="cancer", keywords="tumour markers", keywords="angiogenic agents", keywords="angiogenic", keywords="carcinoma", keywords="skin", keywords="risk assessment", keywords="management", keywords="surgery", keywords="angiogenic marker", keywords="markers", keywords="immunohistochemistry", abstract="Background: Basal cell carcinoma (BCC) is the most common human cancer. Although there are surgical and topical treatments available, surgery remains the mainstay of treatment, leading to higher costs. What is needed is an accurate risk assessment of BCC so that treatments can be planned in a patient-centered manner. Objective: In this study, we will review the literature about guidelines for the management of BCC and analyze the potential indicators of high-risk BCC. Using this risk assessment approach, we will propose pathways that will be able to optimize treatments more efficiently. Methods: This paper presents a perspective from a skin cancer expert and clinic involved in the treatment of both simple and complex cases of BCC. It addresses the key challenges associated with accurate risk stratification prior to any treatment or procedure. Different immunohistochemical and angiogenic markers for high-risk BCC were reviewed in this study. Results: The expression of interleukin-6, vascular endothelial growth factor, and mast cells within BCC correlates with its aggressiveness. Other immunohistochemical markers, such as Cyclin D1 and Bcl-2, also play a significant role---Cyclin D1 is higher in the aggressive BCC, while Bcl-2 is lower in the aggressive BCC, compared to the nonaggressive variants. Conclusions: Based on our research, we will conclude that using immunohistochemical and angiogenic markers for risk assessment and stratification of BCC can help optimize treatment, ensuring that surgical procedures are used only when necessary. ", doi="10.2196/50309", url="https://derma.jmir.org/2023/1/e50309", url="http://www.ncbi.nlm.nih.gov/pubmed/37902813" } @Article{info:doi/10.2196/46934, author="Soliman, Amira and Agvall, Bj{\"o}rn and Etminani, Kobra and Hamed, Omar and Lingman, Markus", title="The Price of Explainability in Machine Learning Models for 100-Day Readmission Prediction in Heart Failure: Retrospective, Comparative, Machine Learning Study", journal="J Med Internet Res", year="2023", month="Oct", day="27", volume="25", pages="e46934", keywords="readmission prediction", keywords="heart failure", keywords="machine learning", keywords="explainable artificial intelligence", keywords="deep learning", keywords="shallow learning", abstract="Background: Sensitive and interpretable machine learning (ML) models can provide valuable assistance to clinicians in managing patients with heart failure (HF) at discharge by identifying individual factors associated with a high risk of readmission. In this cohort study, we delve into the factors driving the potential utility of classification models as decision support tools for predicting readmissions in patients with HF. Objective: The primary objective of this study is to assess the trade-off between using deep learning (DL) and traditional ML models to identify the risk of 100-day readmissions in patients with HF. Additionally, the study aims to provide explanations for the model predictions by highlighting important features both on a global scale across the patient cohort and on a local level for individual patients. Methods: The retrospective data for this study were obtained from the Regional Health Care Information Platform in Region Halland, Sweden. The study cohort consisted of patients diagnosed with HF who were over 40 years old and had been hospitalized at least once between 2017 and 2019. Data analysis encompassed the period from January 1, 2017, to December 31, 2019. Two ML models were developed and validated to predict 100-day readmissions, with a focus on the explainability of the model's decisions. These models were built based on decision trees and recurrent neural architecture. Model explainability was obtained using an ML explainer. The predictive performance of these models was compared against 2 risk assessment tools using multiple performance metrics. Results: The retrospective data set included a total of 15,612 admissions, and within these admissions, readmission occurred in 5597 cases, representing a readmission rate of 35.85\%. It is noteworthy that a traditional and explainable model, informed by clinical knowledge, exhibited performance comparable to the DL model and surpassed conventional scoring methods in predicting readmission among patients with HF. The evaluation of predictive model performance was based on commonly used metrics, with an area under the precision-recall curve of 66\% for the deep model and 68\% for the traditional model on the holdout data set. Importantly, the explanations provided by the traditional model offer actionable insights that have the potential to enhance care planning. Conclusions: This study found that a widely used deep prediction model did not outperform an explainable ML model when predicting readmissions among patients with HF. The results suggest that model transparency does not necessarily compromise performance, which could facilitate the clinical adoption of such models. ", doi="10.2196/46934", url="https://www.jmir.org/2023/1/e46934", url="http://www.ncbi.nlm.nih.gov/pubmed/37889530" } @Article{info:doi/10.2196/44417, author="Yang, Xulin and Qiu, Hang and Wang, Liya and Wang, Xiaodong", title="Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study", journal="J Med Internet Res", year="2023", month="Oct", day="26", volume="25", pages="e44417", keywords="colorectal cancer", keywords="survival prediction", keywords="machine learning", keywords="time-to-event", keywords="SHAP", keywords="SHapley Additive exPlanations", abstract="Background: Machine learning (ML) methods have shown great potential in predicting colorectal cancer (CRC) survival. However, the ML models introduced thus far have mainly focused on binary outcomes and have not considered the time-to-event nature of this type of modeling. Objective: This study aims to evaluate the performance of ML approaches for modeling time-to-event survival data and develop transparent models for predicting CRC-specific survival. Methods: The data set used in this retrospective cohort study contains information on patients who were newly diagnosed with CRC between December 28, 2012, and December 27, 2019, at West China Hospital, Sichuan University. We assessed the performance of 6 representative ML models, including random survival forest (RSF), gradient boosting machine (GBM), DeepSurv, DeepHit, neural net-extended time-dependent Cox (or Cox-Time), and neural multitask logistic regression (N-MTLR) in predicting CRC-specific survival. Multiple imputation by chained equations method was applied to handle missing values in variables. Multivariable analysis and clinical experience were used to select significant features associated with CRC survival. Model performance was evaluated in stratified 5-fold cross-validation repeated 5 times by using the time-dependent concordance index, integrated Brier score, calibration curves, and decision curves. The SHapley Additive exPlanations method was applied to calculate feature importance. Results: A total of 2157 patients with CRC were included in this study. Among the 6 time-to-event ML models, the DeepHit model exhibited the best discriminative ability (time-dependent concordance index 0.789, 95\% CI 0.779-0.799) and the RSF model produced better-calibrated survival estimates (integrated Brier score 0.096, 95\% CI 0.094-0.099), but these are not statistically significant. Additionally, the RSF, GBM, DeepSurv, Cox-Time, and N-MTLR models have comparable predictive accuracy to the Cox Proportional Hazards model in terms of discrimination and calibration. The calibration curves showed that all the ML models exhibited good 5-year survival calibration. The decision curves for CRC-specific survival at 5 years showed that all the ML models, especially RSF, had higher net benefits than default strategies of treating all or no patients at a range of clinically reasonable risk thresholds. The SHapley Additive exPlanations method revealed that R0 resection, tumor-node-metastasis staging, and the number of positive lymph nodes were important factors for 5-year CRC-specific survival. Conclusions: This study showed the potential of applying time-to-event ML predictive algorithms to help predict CRC-specific survival. The RSF, GBM, Cox-Time, and N-MTLR algorithms could provide nonparametric alternatives to the Cox Proportional Hazards model in estimating the survival probability of patients with CRC. The transparent time-to-event ML models help clinicians to more accurately predict the survival rate for these patients and improve patient outcomes by enabling personalized treatment plans that are informed by explainable ML models. ", doi="10.2196/44417", url="https://www.jmir.org/2023/1/e44417", url="http://www.ncbi.nlm.nih.gov/pubmed/37883174" } @Article{info:doi/10.2196/50895, author="Matsumoto, Koutarou and Nohara, Yasunobu and Sakaguchi, Mikako and Takayama, Yohei and Fukushige, Syota and Soejima, Hidehisa and Nakashima, Naoki and Kamouchi, Masahiro", title="Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study", journal="JMIR Perioper Med", year="2023", month="Oct", day="26", volume="6", pages="e50895", keywords="postoperative delirium", keywords="prediction model", keywords="machine learning", keywords="temporal generalizability", keywords="electronic health record data", abstract="Background: Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications. Objective: The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model. Methods: The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method. Results: A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4\%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2\%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept --0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance. Conclusions: The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium. ", doi="10.2196/50895", url="https://periop.jmir.org/2023/1/e50895", url="http://www.ncbi.nlm.nih.gov/pubmed/37883164" } @Article{info:doi/10.2196/46905, author="Nguyen, Kim-Anh-Nhi and Tandon, Pranai and Ghanavati, Sahar and Cheetirala, Narayana Satya and Timsina, Prem and Freeman, Robert and Reich, David and Levin, A. Matthew and Mazumdar, Madhu and Fayad, A. Zahi and Kia, Arash", title="A Hybrid Decision Tree and Deep Learning Approach Combining Medical Imaging and Electronic Medical Records to Predict Intubation Among Hospitalized Patients With COVID-19: Algorithm Development and Validation", journal="JMIR Form Res", year="2023", month="Oct", day="26", volume="7", pages="e46905", keywords="COVID-19", keywords="medical imaging", keywords="machine learning", keywords="chest radiograph", keywords="mechanical ventilation", keywords="electronic health records", keywords="intubation", keywords="decision trees", keywords="hybrid model", keywords="clinical informatics", abstract="Background: Early prediction of the need for invasive mechanical ventilation (IMV) in patients hospitalized with COVID-19 symptoms can help in the allocation of resources appropriately and improve patient outcomes by appropriately monitoring and treating patients at the greatest risk of respiratory failure. To help with the complexity of deciding whether a patient needs IMV, machine learning algorithms may help bring more prognostic value in a timely and systematic manner. Chest radiographs (CXRs) and electronic medical records (EMRs), typically obtained early in patients admitted with COVID-19, are the keys to deciding whether they need IMV. Objective: We aimed to evaluate the use of a machine learning model to predict the need for intubation within 24 hours by using a combination of CXR and EMR data in an end-to-end automated pipeline. We included historical data from 2481 hospitalizations at The Mount Sinai Hospital in New York City. Methods: CXRs were first resized, rescaled, and normalized. Then lungs were segmented from the CXRs by using a U-Net algorithm. After splitting them into a training and a test set, the training set images were augmented. The augmented images were used to train an image classifier to predict the probability of intubation with a prediction window of 24 hours by retraining a pretrained DenseNet model by using transfer learning, 10-fold cross-validation, and grid search. Then, in the final fusion model, we trained a random forest algorithm via 10-fold cross-validation by combining the probability score from the image classifier with 41 longitudinal variables in the EMR. Variables in the EMR included clinical and laboratory data routinely collected in the inpatient setting. The final fusion model gave a prediction likelihood for the need of intubation within 24 hours as well. Results: At a prediction probability threshold of 0.5, the fusion model provided 78.9\% (95\% CI 59\%-96\%) sensitivity, 83\% (95\% CI 76\%-89\%) specificity, 0.509 (95\% CI 0.34-0.67) F1-score, 0.874 (95\% CI 0.80-0.94) area under the receiver operating characteristic curve (AUROC), and 0.497 (95\% CI 0.32-0.65) area under the precision recall curve (AUPRC) on the holdout set. Compared to the image classifier alone, which had an AUROC of 0.577 (95\% CI 0.44-0.73) and an AUPRC of 0.206 (95\% CI 0.08-0.38), the fusion model showed significant improvement (P<.001). The most important predictor variables were respiratory rate, C-reactive protein, oxygen saturation, and lactate dehydrogenase. The imaging probability score ranked 15th in overall feature importance. Conclusions: We show that, when linked with EMR data, an automated deep learning image classifier improved performance in identifying hospitalized patients with severe COVID-19 at risk for intubation. With additional prospective and external validation, such a model may assist risk assessment and optimize clinical decision-making in choosing the best care plan during the critical stages of COVID-19. ", doi="10.2196/46905", url="https://formative.jmir.org/2023/1/e46905", url="http://www.ncbi.nlm.nih.gov/pubmed/37883177" } @Article{info:doi/10.2196/47105, author="Shiferaw, Biruk Kirubel and Roloff, Moritz and Waltemath, Dagmar and Zeleke, Alamirrew Atinkut", title="Guidelines and Standard Frameworks for AI in Medicine: Protocol for a Systematic Literature Review", journal="JMIR Res Protoc", year="2023", month="Oct", day="25", volume="12", pages="e47105", keywords="artificial intelligence", keywords="biomedical", keywords="guidelines", keywords="machine learning", keywords="medicine", abstract="Background: Applications of artificial intelligence (AI) are pervasive in modern biomedical science. In fact, research results suggesting algorithms and AI models for different target diseases and conditions are continuously increasing. While this situation undoubtedly improves the outcome of AI models, health care providers are increasingly unsure which AI model to use due to multiple alternatives for a specific target and the ``black box'' nature of AI. Moreover, the fact that studies rarely use guidelines in developing and reporting AI models poses additional challenges in trusting and adapting models for practical implementation. Objective: This review protocol describes the planned steps and methods for a review of the synthesized evidence regarding the quality of available guidelines and frameworks to facilitate AI applications in medicine. Methods: We will commence a systematic literature search using medical subject headings terms for medicine, guidelines, and machine learning (ML). All available guidelines, standard frameworks, best practices, checklists, and recommendations will be included, irrespective of the study design. The search will be conducted on web-based repositories such as PubMed, Web of Science, and the EQUATOR (Enhancing the Quality and Transparency of Health Research) network. After removing duplicate results, a preliminary scan for titles will be done by 2 reviewers. After the first scan, the reviewers will rescan the selected literature for abstract review, and any incongruities about whether to include the article for full-text review or not will be resolved by the third and fourth reviewer based on the predefined criteria. A Google Scholar (Google LLC) search will also be performed to identify gray literature. The quality of identified guidelines will be evaluated using the Appraisal of Guidelines, Research, and Evaluation (AGREE II) tool. A descriptive summary and narrative synthesis will be carried out, and the details of critical appraisal and subgroup synthesis findings will be presented. Results: The results will be reported using the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analyses) reporting guidelines. Data analysis is currently underway, and we anticipate finalizing the review by November 2023. Conclusions: Guidelines and recommended frameworks for developing, reporting, and implementing AI studies have been developed by different experts to facilitate the reliable assessment of validity and consistent interpretation of ML models for medical applications. We postulate that a guideline supports the assessment of an ML model only if the quality and reliability of the guideline are high. Assessing the quality and aspects of available guidelines, recommendations, checklists, and frameworks---as will be done in the proposed review---will provide comprehensive insights into current gaps and help to formulate future research directions. International Registered Report Identifier (IRRID): DERR1-10.2196/47105 ", doi="10.2196/47105", url="https://www.researchprotocols.org/2023/1/e47105", url="http://www.ncbi.nlm.nih.gov/pubmed/37878365" } @Article{info:doi/10.2196/49025, author="Ranusch, Allison and Lin, Ying-Jen and Dorsch, P. Michael and Allen, L. Arthur and Spoutz, Patrick and Seagull, Jacob F. and Sussman, B. Jeremy and Barnes, D. Geoffrey", title="Role of Individual Clinician Authority in the Implementation of Informatics Tools for Population-Based Medication Management: Qualitative Semistructured Interview Study", journal="JMIR Hum Factors", year="2023", month="Oct", day="24", volume="10", pages="e49025", keywords="direct oral anticoagulant", keywords="population management", keywords="implementation science", keywords="medical informatics", keywords="individual clinician authority", keywords="electronic health record", keywords="health records", keywords="EHR", keywords="EHRs", keywords="implementation", keywords="clotting", keywords="clot", keywords="clots", keywords="anticoagulant", keywords="anticoagulants", keywords="dashboard", keywords="DOAC", keywords="satisfaction", keywords="interview", keywords="interviews", keywords="pharmacist", keywords="pharmacy", keywords="pharmacology", keywords="medication", keywords="prescribe", keywords="prescribing", abstract="Background: Direct oral anticoagulant (DOAC) medications are frequently associated with inappropriate prescribing and adverse events. To improve the safe use of DOACs, health systems are implementing population health tools within their electronic health record (EHR). While EHR informatics tools can help increase awareness of inappropriate prescribing of medications, a lack of empowerment (or insufficient empowerment) of nonphysicians to implement change is a key barrier. Objective: This study examined how the individual authority of clinical pharmacists and anticoagulation nurses is impacted by and changes the implementation success of an EHR DOAC Dashboard for safe DOAC medication prescribing. Methods: We conducted semistructured interviews with pharmacists and nurses following the implementation of the EHR DOAC Dashboard at 3 clinical sites. Interview transcripts were coded according to the key determinants of implementation success. The intersections between individual clinician authority and other determinants were examined to identify themes. Results: A high level of individual clinician authority was associated with high levels of key facilitators for effective use of the DOAC Dashboard (communication, staffing and work schedule, job satisfaction, and EHR integration). Conversely, a lack of individual authority was often associated with key barriers to effective DOAC Dashboard use. Positive individual authority was sometimes present with a negative example of another determinant, but no evidence was found of individual authority co-occurring with a positive instance of another determinant. Conclusions: Increased individual clinician authority is a necessary antecedent to the effective implementation of an EHR DOAC Population Management Dashboard and positively affects other aspects of implementation. International Registered Report Identifier (IRRID): RR2-10.1186/s13012-020-01044-5 ", doi="10.2196/49025", url="https://humanfactors.jmir.org/2023/1/e49025", url="http://www.ncbi.nlm.nih.gov/pubmed/37874636" } @Article{info:doi/10.2196/49842, author="Joo, Hyeon and Mathis, R. Michael and Tam, Marty and James, Cornelius and Han, Peijin and Mangrulkar, S. Rajesh and Friedman, P. Charles and Vydiswaran, Vinod V. G.", title="Applying AI and Guidelines to Assist Medical Students in Recognizing Patients With Heart Failure: Protocol for a Randomized Trial", journal="JMIR Res Protoc", year="2023", month="Oct", day="24", volume="12", pages="e49842", keywords="medical education", keywords="clinical decision support systems", keywords="artificial intelligence", keywords="machine learning", keywords="heart failure", keywords="evidence-based medicine", keywords="guidelines", keywords="digital health interventions", abstract="Background: The integration of artificial intelligence (AI) into clinical practice is transforming both clinical practice and medical education. AI-based systems aim to improve the efficacy of clinical tasks, enhancing diagnostic accuracy and tailoring treatment delivery. As it becomes increasingly prevalent in health care for high-quality patient care, it is critical for health care providers to use the systems responsibly to mitigate bias, ensure effective outcomes, and provide safe clinical practices. In this study, the clinical task is the identification of heart failure (HF) prior to surgery with the intention of enhancing clinical decision-making skills. HF is a common and severe disease, but detection remains challenging due to its subtle manifestation, often concurrent with other medical conditions, and the absence of a simple and effective diagnostic test. While advanced HF algorithms have been developed, the use of these AI-based systems to enhance clinical decision-making in medical education remains understudied. Objective: This research protocol is to demonstrate our study design, systematic procedures for selecting surgical cases from electronic health records, and interventions. The primary objective of this study is to measure the effectiveness of interventions aimed at improving HF recognition before surgery, the second objective is to evaluate the impact of inaccurate AI recommendations, and the third objective is to explore the relationship between the inclination to accept AI recommendations and their accuracy. Methods: Our study used a 3 {\texttimes} 2 factorial design (intervention type {\texttimes} order of prepost sets) for this randomized trial with medical students. The student participants are asked to complete a 30-minute e-learning module that includes key information about the intervention and a 5-question quiz, and a 60-minute review of 20 surgical cases to determine the presence of HF. To mitigate selection bias in the pre- and posttests, we adopted a feature-based systematic sampling procedure. From a pool of 703 expert-reviewed surgical cases, 20 were selected based on features such as case complexity, model performance, and positive and negative labels. This study comprises three interventions: (1) a direct AI-based recommendation with a predicted HF score, (2) an indirect AI-based recommendation gauged through the area under the curve metric, and (3) an HF guideline-based intervention. Results: As of July 2023, 62 of the enrolled medical students have fulfilled this study's participation, including the completion of a short quiz and the review of 20 surgical cases. The subject enrollment commenced in August 2022 and will end in December 2023, with the goal of recruiting 75 medical students in years 3 and 4 with clinical experience. Conclusions: We demonstrated a study protocol for the randomized trial, measuring the effectiveness of interventions using AI and HF guidelines among medical students to enhance HF recognition in preoperative care with electronic health record data. International Registered Report Identifier (IRRID): DERR1-10.2196/49842 ", doi="10.2196/49842", url="https://www.researchprotocols.org/2023/1/e49842", url="http://www.ncbi.nlm.nih.gov/pubmed/37874618" } @Article{info:doi/10.2196/47590, author="Lei, Mingxing and Wu, Bing and Zhang, Zhicheng and Qin, Yong and Cao, Xuyong and Cao, Yuncen and Liu, Baoge and Su, Xiuyun and Liu, Yaosheng", title="A Web-Based Calculator to Predict Early Death Among Patients With Bone Metastasis Using Machine Learning Techniques: Development and Validation Study", journal="J Med Internet Res", year="2023", month="Oct", day="23", volume="25", pages="e47590", keywords="bone metastasis", keywords="early death", keywords="machine learning", keywords="prediction model", keywords="local interpretable model--agnostic explanation", abstract="Background: Patients with bone metastasis often experience a significantly limited survival time, and a life expectancy of <3 months is generally regarded as a contraindication for extensive invasive surgeries. In this context, the accurate prediction of survival becomes very important since it serves as a crucial guide in making clinical decisions. Objective: This study aimed to develop a machine learning--based web calculator that can provide an accurate assessment of the likelihood of early death among patients with bone metastasis. Methods: This study analyzed a large cohort of 118,227 patients diagnosed with bone metastasis between 2010 and 2019 using the data obtained from a national cancer database. The entire cohort of patients was randomly split 9:1 into a training group (n=106,492) and a validation group (n=11,735). Six approaches---logistic regression, extreme gradient boosting machine, decision tree, random forest, neural network, and gradient boosting machine---were implemented in this study. The performance of these approaches was evaluated using 11 measures, and each approach was ranked based on its performance in each measure. Patients (n=332) from a teaching hospital were used as the external validation group, and external validation was performed using the optimal model. Results: In the entire cohort, a substantial proportion of patients (43,305/118,227, 36.63\%) experienced early death. Among the different approaches evaluated, the gradient boosting machine exhibited the highest score of prediction performance (54 points), followed by the neural network (52 points) and extreme gradient boosting machine (50 points). The gradient boosting machine demonstrated a favorable discrimination ability, with an area under the curve of 0.858 (95\% CI 0.851-0.865). In addition, the calibration slope was 1.02, and the intercept-in-large value was ?0.02, indicating good calibration of the model. Patients were divided into 2 risk groups using a threshold of 37\% based on the gradient boosting machine. Patients in the high-risk group (3105/4315, 71.96\%) were found to be 4.5 times more likely to experience early death compared with those in the low-risk group (1159/7420, 15.62\%). External validation of the model demonstrated a high area under the curve of 0.847 (95\% CI 0.798-0.895), indicating its robust performance. The model developed by the gradient boosting machine has been deployed on the internet as a calculator. Conclusions: This study develops a machine learning--based calculator to assess the probability of early death among patients with bone metastasis. The calculator has the potential to guide clinical decision-making and improve the care of patients with bone metastasis by identifying those at a higher risk of early death. ", doi="10.2196/47590", url="https://www.jmir.org/2023/1/e47590", url="http://www.ncbi.nlm.nih.gov/pubmed/37870889" } @Article{info:doi/10.2196/47346, author="Velazquez-Diaz, Daniel and Arco, E. Juan and Ortiz, Andres and P{\'e}rez-Cabezas, Ver{\'o}nica and Lucena-Anton, David and Moral-Munoz, A. Jose and Gal{\'a}n-Mercant, Alejandro", title="Use of Artificial Intelligence in the Identification and Diagnosis of Frailty Syndrome in Older Adults: Scoping Review", journal="J Med Internet Res", year="2023", month="Oct", day="20", volume="25", pages="e47346", keywords="frail older adult", keywords="identification", keywords="diagnosis", keywords="artificial intelligence", keywords="review", keywords="frailty", keywords="older adults", keywords="aging", keywords="biological variability", keywords="detection", keywords="accuracy", keywords="sensitivity", keywords="screening", keywords="tool", abstract="Background: Frailty syndrome (FS) is one of the most common noncommunicable diseases, which is associated with lower physical and mental capacities in older adults. FS diagnosis is mostly focused on biological variables; however, it is likely that this diagnosis could fail owing to the high biological variability in this syndrome. Therefore, artificial intelligence (AI) could be a potential strategy to identify and diagnose this complex and multifactorial geriatric syndrome. Objective: The objective of this scoping review was to analyze the existing scientific evidence on the use of AI for the identification and diagnosis of FS in older adults, as well as to identify which model provides enhanced accuracy, sensitivity, specificity, and area under the curve (AUC). Methods: A search was conducted using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines on various databases: PubMed, Web of Science, Scopus, and Google Scholar. The search strategy followed Population/Problem, Intervention, Comparison, and Outcome (PICO) criteria with the population being older adults; intervention being AI; comparison being compared or not to other diagnostic methods; and outcome being FS with reported sensitivity, specificity, accuracy, or AUC values. The results were synthesized through information extraction and are presented in tables. Results: We identified 26 studies that met the inclusion criteria, 6 of which had a data set over 2000 and 3 with data sets below 100. Machine learning was the most widely used type of AI, employed in 18 studies. Moreover, of the 26 included studies, 9 used clinical data, with clinical histories being the most frequently used data type in this category. The remaining 17 studies used nonclinical data, most frequently involving activity monitoring using an inertial sensor in clinical and nonclinical contexts. Regarding the performance of each AI model, 10 studies achieved a value of precision, sensitivity, specificity, or AUC ?90. Conclusions: The findings of this scoping review clarify the overall status of recent studies using AI to identify and diagnose FS. Moreover, the findings show that the combined use of AI using clinical data along with nonclinical information such as the kinematics of inertial sensors that monitor activities in a nonclinical context could be an appropriate tool for the identification and diagnosis of FS. Nevertheless, some possible limitations of the evidence included in the review could be small sample sizes, heterogeneity of study designs, and lack of standardization in the AI models and diagnostic criteria used across studies. Future research is needed to validate AI systems with diverse data sources for diagnosing FS. AI should be used as a decision support tool for identifying FS, with data quality and privacy addressed, and the tool should be regularly monitored for performance after being integrated in clinical practice. ", doi="10.2196/47346", url="https://www.jmir.org/2023/1/e47346", url="http://www.ncbi.nlm.nih.gov/pubmed/37862082" } @Article{info:doi/10.2196/42788, author="Chen, Chih-Chi and Wu, Cheng-Ta and Chen, C. Carl P. and Chung, Chia-Ying and Chen, Shann-Ching and Lee, S. Mel and Cheng, Chi-Tung and Liao, Chien-Hung", title="Predicting the Risk of Total Hip Replacement by Using A Deep Learning Algorithm on Plain Pelvic Radiographs: Diagnostic Study", journal="JMIR Form Res", year="2023", month="Oct", day="20", volume="7", pages="e42788", keywords="osteoarthritis", keywords="orthopedic procedure", keywords="artificial intelligence", keywords="AI", keywords="deep learning", keywords="machine learning", keywords="orthopedic", keywords="pelvic", keywords="radiograph", keywords="predict", keywords="hip replacement", keywords="surgery", keywords="convolutional neural network", keywords="CNN", keywords="algorithm", keywords="surgical", keywords="medical image", keywords="medical imaging", abstract="Background: Total hip replacement (THR) is considered the gold standard of treatment for refractory degenerative hip disorders. Identifying patients who should receive THR in the short term is important. Some conservative treatments, such as intra-articular injection administered a few months before THR, may result in higher odds of arthroplasty infection. Delayed THR after functional deterioration may result in poorer outcomes and longer waiting times for those who have been flagged as needing THR. Deep learning (DL) in medical imaging applications has recently obtained significant breakthroughs. However, the use of DL in practical wayfinding, such as short-term THR prediction, is still lacking. Objective: In this study, we will propose a DL-based assistant system for patients with pelvic radiographs to identify the need for THR within 3 months. Methods: We developed a convolutional neural network--based DL algorithm to analyze pelvic radiographs, predict the hip region of interest (ROI), and determine whether or not THR is required. The data set was collected from August 2008 to December 2017. The images included 3013 surgical hip ROIs that had undergone THR and 1630 nonsurgical hip ROIs. The images were split, using split-sample validation, into training (n=3903, 80\%), validation (n=476, 10\%), and testing (n=475, 10\%) sets to evaluate the algorithm performance. Results: The algorithm, called SurgHipNet, yielded an area under the receiver operating characteristic curve of 0.994 (95\% CI 0.990-0.998). The accuracy, sensitivity, specificity, and F1-score of the model were 0.977, 0.920, 0932, and 0.944, respectively. Conclusions: The proposed approach has demonstrated that SurgHipNet shows the ability and potential to provide efficient support in clinical decision-making; it can assist physicians in promptly determining the optimal timing for THR. ", doi="10.2196/42788", url="https://formative.jmir.org/2023/1/e42788", url="http://www.ncbi.nlm.nih.gov/pubmed/37862084" } @Article{info:doi/10.2196/44065, author="Solomon, Jeffrey and Dauber-Decker, Katherine and Richardson, Safiya and Levy, Sera and Khan, Sundas and Coleman, Benjamin and Persaud, Rupert and Chelico, John and King, D'Arcy and Spyropoulos, Alex and McGinn, Thomas", title="Integrating Clinical Decision Support Into Electronic Health Record Systems Using a Novel Platform (EvidencePoint): Developmental Study", journal="JMIR Form Res", year="2023", month="Oct", day="19", volume="7", pages="e44065", keywords="clinical decision support system", keywords="cloud based", keywords="decision support", keywords="development", keywords="EHR", keywords="electronic health record", keywords="evidence-based medicine", keywords="health information technology", keywords="platform", keywords="user-centered design", abstract="Background: Through our work, we have demonstrated how clinical decision support (CDS) tools integrated into the electronic health record (EHR) assist providers in adopting evidence-based practices. This requires confronting technical challenges that result from relying on the EHR as the foundation for tool development; for example, the individual CDS tools need to be built independently for each different EHR. Objective: The objective of our research was to build and implement an EHR-agnostic platform for integrating CDS tools, which would remove the technical constraints inherent in relying on the EHR as the foundation and enable a single set of CDS tools that can work with any EHR. Methods: We developed EvidencePoint, a novel, cloud-based, EHR-agnostic CDS platform, and we will describe the development of EvidencePoint and the deployment of its initial CDS tools, which include EHR-integrated applications for clinical use cases such as prediction of hospitalization survival for patients with COVID-19, venous thromboembolism prophylaxis, and pulmonary embolism diagnosis. Results: The results below highlight the adoption of the CDS tools, the International Medical Prevention Registry on Venous Thromboembolism-D-Dimer, the Wells' criteria, and the Northwell COVID-19 Survival (NOCOS), following development, usability testing, and implementation. The International Medical Prevention Registry on Venous Thromboembolism-D-Dimer CDS was used in 5249 patients at the 2 clinical intervention sites. The intervention group tool adoption was 77.8\% (4083/5249 possible uses). For the NOCOS tool, which was designed to assist with triaging patients with COVID-19 for hospital admission in the event of constrained hospital resources, the worst-case resourcing scenario never materialized and triaging was never required. As a result, the NOCOS tool was not frequently used, though the EvidencePoint platform's flexibility and customizability enabled the tool to be developed and deployed rapidly under the emergency conditions of the pandemic. Adoption rates for the Wells' criteria tool will be reported in a future publication. Conclusions: The EvidencePoint system successfully demonstrated that a flexible, user-friendly platform for hosting CDS tools outside of a specific EHR is feasible. The forthcoming results of our outcomes analyses will demonstrate the adoption rate of EvidencePoint tools as well as the impact of behavioral economics ``nudges'' on the adoption rate. Due to the EHR-agnostic nature of EvidencePoint, the development process for additional forms of CDS will be simpler than traditional and cumbersome IT integration approaches and will benefit from the capabilities provided by the core system of EvidencePoint. ", doi="10.2196/44065", url="https://formative.jmir.org/2023/1/e44065", url="http://www.ncbi.nlm.nih.gov/pubmed/37856193" } @Article{info:doi/10.2196/45163, author="Fernando, Manasha and Abell, Bridget and Tyack, Zephanie and Donovan, Thomasina and McPhail, M. Steven and Naicker, Sundresan", title="Using Theories, Models, and Frameworks to Inform Implementation Cycles of Computerized Clinical Decision Support Systems in Tertiary Health Care Settings: Scoping Review", journal="J Med Internet Res", year="2023", month="Oct", day="18", volume="25", pages="e45163", keywords="computerized clinical decision support systems", keywords="CDSS", keywords="implementation science", keywords="hospital", keywords="theories", keywords="models", keywords="frameworks", keywords="mobile phone", abstract="Background: Computerized clinical decision support systems (CDSSs) are essential components of modern health system service delivery, particularly within acute care settings such as hospitals. Theories, models, and frameworks may assist in facilitating the implementation processes associated with CDSS innovation and its use within these care settings. These processes include context assessments to identify key determinants, implementation plans for adoption, promoting ongoing uptake, adherence, and long-term evaluation. However, there has been no prior review synthesizing the literature regarding the theories, models, and frameworks that have informed the implementation and adoption of CDSSs within hospitals. Objective: This scoping review aims to identify the theory, model, and framework approaches that have been used to facilitate the implementation and adoption of CDSSs in tertiary health care settings, including hospitals. The rationales reported for selecting these approaches, including the limitations and strengths, are described. Methods: A total of 5 electronic databases were searched (CINAHL via EBSCOhost, PubMed, Scopus, PsycINFO, and Embase) to identify studies that implemented or adopted a CDSS in a tertiary health care setting using an implementation theory, model, or framework. No date or language limits were applied. A narrative synthesis was conducted using full-text publications and abstracts. Implementation phases were classified according to the ``Active Implementation Framework stages'': exploration (feasibility and organizational readiness), installation (organizational preparation), initial implementation (initiating implementation, ie, training), full implementation (sustainment), and nontranslational effectiveness studies. Results: A total of 81 records (42 full text and 39 abstracts) were included. Full-text studies and abstracts are reported separately. For full-text studies, models (18/42, 43\%), followed by determinants frameworks (14/42,33\%), were most frequently used to guide adoption and evaluation strategies. Most studies (36/42, 86\%) did not list the limitations associated with applying a specific theory, model, or framework. Conclusions: Models and related quality improvement methods were most frequently used to inform CDSS adoption. Models were not typically combined with each other or with theory to inform full-cycle implementation strategies. The findings highlight a gap in the application of implementation methods including theories, models, and frameworks to facilitate full-cycle implementation strategies for hospital CDSSs. ", doi="10.2196/45163", url="https://www.jmir.org/2023/1/e45163", url="http://www.ncbi.nlm.nih.gov/pubmed/37851492" } @Article{info:doi/10.2196/50357, author="Wosny, Marie and Strasser, Maria Livia and Hastings, Janna", title="Experience of Health Care Professionals Using Digital Tools in the Hospital: Qualitative Systematic Review", journal="JMIR Hum Factors", year="2023", month="Oct", day="17", volume="10", pages="e50357", keywords="health information technology", keywords="electronic health record", keywords="electronic medical records", keywords="clinical decision support", keywords="health care professionals", keywords="burnout", keywords="qualitative research", abstract="Background: The digitalization of health care has many potential benefits, but it may also negatively impact health care professionals' well-being. Burnout can, in part, result from inefficient work processes related to the suboptimal implementation and use of health information technologies. Although strategies to reduce stress and mitigate clinician burnout typically involve individual-based interventions, emerging evidence suggests that improving the experience of using health information technologies can have a notable impact. Objective: The aim of this systematic review was to collect evidence of the benefits and challenges associated with the use of digital tools in hospital settings with a particular focus on the experiences of health care professionals using these tools. Methods: We conducted a systematic literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to explore the experience of health care professionals with digital tools in hospital settings. Using a rigorous selection process to ensure the methodological quality and validity of the study results, we included qualitative studies with distinct data that described the experiences of physicians and nurses. A panel of 3 independent researchers performed iterative data analysis and identified thematic constructs. Results: Of the 1175 unique primary studies, we identified 17 (1.45\%) publications that focused on health care professionals' experiences with various digital tools in their day-to-day practice. Of the 17 studies, 10 (59\%) focused on clinical decision support tools, followed by 6 (35\%) studies focusing on electronic health records and 1 (6\%) on a remote patient-monitoring tool. We propose a theoretical framework for understanding the complex interplay between the use of digital tools, experience, and outcomes. We identified 6 constructs that encompass the positive and negative experiences of health care professionals when using digital tools, along with moderators and outcomes. Positive experiences included feeling confident, responsible, and satisfied, whereas negative experiences included frustration, feeling overwhelmed, and feeling frightened. Positive moderators that may reinforce the use of digital tools included sufficient training and adequate workflow integration, whereas negative moderators comprised unfavorable social structures and the lack of training. Positive outcomes included improved patient care and increased workflow efficiency, whereas negative outcomes included increased workload, increased safety risks, and issues with information quality. Conclusions: Although positive and negative outcomes and moderators that may affect the use of digital tools were commonly reported, the experiences of health care professionals, such as their thoughts and emotions, were less frequently discussed. On the basis of this finding, this study highlights the need for further research specifically targeting experiences as an important mediator of clinician well-being. It also emphasizes the importance of considering differences in the nature of specific tools as well as the profession and role of individual users. Trial Registration: PROSPERO CRD42023393883; https://tinyurl.com/2htpzzxj ", doi="10.2196/50357", url="https://humanfactors.jmir.org/2023/1/e50357", url="http://www.ncbi.nlm.nih.gov/pubmed/37847535" } @Article{info:doi/10.2196/49949, author="Thirunavukarasu, James Arun and Elangovan, Kabilan and Gutierrez, Laura and Li, Yong and Tan, Iris and Keane, A. Pearse and Korot, Edward and Ting, Wei Daniel Shu", title="Democratizing Artificial Intelligence Imaging Analysis With Automated Machine Learning: Tutorial", journal="J Med Internet Res", year="2023", month="Oct", day="12", volume="25", pages="e49949", keywords="machine learning", keywords="automated machine learning", keywords="autoML", keywords="artificial intelligence", keywords="democratization", keywords="autonomous AI", keywords="imaging", keywords="image analysis", keywords="automation", keywords="AI engineering", doi="10.2196/49949", url="https://www.jmir.org/2023/1/e49949", url="http://www.ncbi.nlm.nih.gov/pubmed/37824185" } @Article{info:doi/10.2196/44895, author="Feng, Jing and Zhang, Qizhi and Wu, Feng and Peng, Jinxiang and Li, Ziwei and Chen, Zhuang", title="The Value of Applying Machine Learning in Predicting the Time of Symptom Onset in Stroke Patients: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2023", month="Oct", day="12", volume="25", pages="e44895", keywords="machine learning", keywords="ischemic stroke", keywords="onset time", keywords="stroke", abstract="Background: Machine learning is a potentially effective method for identifying and predicting the time of the onset of stroke. However, the value of applying machine learning in this field remains controversial and debatable. Objective: We aimed to assess the value of applying machine learning in predicting the time of stroke onset. Methods: PubMed, Web of Science, Embase, and Cochrane were comprehensively searched. The C index and sensitivity with 95\% CI were used as effect sizes. The risk of bias was evaluated using PROBAST (Prediction Model Risk of Bias Assessment Tool), and meta-analysis was conducted using R (version 4.2.0; R Core Team). Results: Thirteen eligible studies were included in the meta-analysis involving 55 machine learning models with 41 models in the training set and 14 in the validation set. The overall C index was 0.800 (95\% CI 0.773-0.826) in the training set and 0.781 (95\% CI 0.709-0.852) in the validation set. The sensitivity and specificity were 0.76 (95\% CI 0.73-0.80) and 0.79 (95\% CI 0.74-0.82) in the training set and 0.81 (95\% CI 0.68-0.90) and 0.83 (95\% CI 0.73-0.89) in the validation set, respectively. Subgroup analysis revealed that the accuracy of machine learning in predicting the time of stroke onset within 4.5 hours was optimal (training: 0.80, 95\% CI 0.77-0.83; validation: 0.79, 95\% CI 0.71-0.86). Conclusions: Machine learning has ideal performance in identifying the time of stroke onset. More reasonable image segmentation and texture extraction methods in radiomics should be used to promote the value of applying machine learning in diverse ethnic backgrounds. Trial Registration: PROSPERO CRD42022358898; https://www.crd.york.ac.uk/Prospero/display\_record.php?RecordID=358898 ", doi="10.2196/44895", url="https://www.jmir.org/2023/1/e44895", url="http://www.ncbi.nlm.nih.gov/pubmed/37824198" } @Article{info:doi/10.2196/48808, author="Hirosawa, Takanobu and Kawamura, Ren and Harada, Yukinori and Mizuta, Kazuya and Tokumasu, Kazuki and Kaji, Yuki and Suzuki, Tomoharu and Shimizu, Taro", title="ChatGPT-Generated Differential Diagnosis Lists for Complex Case--Derived Clinical Vignettes: Diagnostic Accuracy Evaluation", journal="JMIR Med Inform", year="2023", month="Oct", day="9", volume="11", pages="e48808", keywords="artificial intelligence", keywords="AI chatbot", keywords="ChatGPT", keywords="large language models", keywords="clinical decision support", keywords="natural language processing", keywords="diagnostic excellence", keywords="language model", keywords="vignette", keywords="case study", keywords="diagnostic", keywords="accuracy", keywords="decision support", keywords="diagnosis", abstract="Background: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown. Objective: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan. Methods: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis. Results: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83\% (43/52), 81\% (42/52), and 60\% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73\% (38/52), 65\% (34/52), and 42\% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83\% vs 39/52, 75\%, respectively; P=.47) and within the top 5 (42/52, 81\% vs 35/52, 67\%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60\% vs 26/52, 50\%, respectively; P=.43) although the difference was not significant. The ChatGPT models' diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022). Conclusions: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80\%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making. ", doi="10.2196/48808", url="https://medinform.jmir.org/2023/1/e48808", url="http://www.ncbi.nlm.nih.gov/pubmed/37812468" } @Article{info:doi/10.2196/46809, author="Sangeorzan, Irina and Antonacci, Grazia and Martin, Anne and Grodzinski, Ben and Zipser, M. Carl and Murphy, J. Rory K. and Andriopoulou, Panoraia and Cook, E. Chad and Anderson, B. David and Guest, James and Furlan, C. Julio and Kotter, N. Mark R. and Boerger, F. Timothy and Sadler, Iwan and Roberts, A. Elizabeth and Wood, Helen and Fraser, Christine and Fehlings, G. Michael and Kumar, Vishal and Jung, Josephine and Milligan, James and Nouri, Aria and Martin, R. Allan and Blizzard, Tammy and Vialle, Roberto Luiz and Tetreault, Lindsay and Kalsi-Ryan, Sukhvinder and MacDowall, Anna and Martin-Moore, Esther and Burwood, Martin and Wood, Lianne and Lalkhen, Abdul and Ito, Manabu and Wilson, Nicky and Treanor, Caroline and Dugan, Sheila and Davies, M. Benjamin", title="Toward Shared Decision-Making in Degenerative Cervical Myelopathy: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2023", month="Oct", day="9", volume="12", pages="e46809", keywords="degenerative cervical myelopathy", keywords="spine", keywords="spinal cord", keywords="chronic", keywords="aging", keywords="geriatric", keywords="patient engagement", keywords="shared decision-making", keywords="process mapping", keywords="core information set", keywords="decision-making", keywords="patient education", keywords="common data element", keywords="Research Objectives and Common Data Elements for Degenerative Cervical Myelopathy", keywords="RECODE-DCM", abstract="Background: Health care decisions are a critical determinant in the evolution of chronic illness. In shared decision-making (SDM), patients and clinicians work collaboratively to reach evidence-based health decisions that align with individual circumstances, values, and preferences. This personalized approach to clinical care likely has substantial benefits in the oversight of degenerative cervical myelopathy (DCM), a type of nontraumatic spinal cord injury. Its chronicity, heterogeneous clinical presentation, complex management, and variable disease course engenders an imperative for a patient-centric approach that accounts for each patient's unique needs and priorities. Inadequate patient knowledge about the condition and an incomplete understanding of the critical decision points that arise during the course of care currently hinder the fruitful participation of health care providers and patients in SDM. This study protocol presents the rationale for deploying SDM for DCM and delineates the groundwork required to achieve this. Objective: The study's primary outcome is the development of a comprehensive checklist to be implemented upon diagnosis that provides patients with essential information necessary to support their informed decision-making. This is known as a core information set (CIS). The secondary outcome is the creation of a detailed process map that provides a diagrammatic representation of the global care workflows and cognitive processes involved in DCM care. Characterizing the critical decision points along a patient's journey will allow for an effective exploration of SDM tools for routine clinical practice to enhance patient-centered care and improve clinical outcomes. Methods: Both CISs and process maps are coproduced iteratively through a collaborative process involving the input and consensus of key stakeholders. This will be facilitated by Myelopathy.org, a global DCM charity, through its Research Objectives and Common Data Elements for Degenerative Cervical Myelopathy community. To develop the CIS, a 3-round, web-based Delphi process will be used, starting with a baseline list of information items derived from a recent scoping review of educational materials in DCM, patient interviews, and a qualitative survey of professionals. A priori criteria for achieving consensus are specified. The process map will be developed iteratively using semistructured interviews with patients and professionals and validated by key stakeholders. Results: Recruitment for the Delphi consensus study began in April 2023. The pilot-testing of process map interview participants started simultaneously, with the formulation of an initial baseline map underway. Conclusions: This protocol marks the first attempt to provide a starting point for investigating SDM in DCM. The primary work centers on developing an educational tool for use in diagnosis to enable enhanced onward decision-making. The wider objective is to aid stakeholders in developing SDM tools by identifying critical decision junctures in DCM care. Through these approaches, we aim to provide an exhaustive launchpad for formulating SDM tools in the wider DCM community. International Registered Report Identifier (IRRID): DERR1-10.2196/46809 ", doi="10.2196/46809", url="https://www.researchprotocols.org/2023/1/e46809", url="http://www.ncbi.nlm.nih.gov/pubmed/37812472" } @Article{info:doi/10.2196/46807, author="Velez, Tom and Wang, Tony and Garibaldi, Brian and Singman, Eric and Koutroulis, Ioannis", title="Identification and Prediction of Clinical Phenotypes in Hospitalized Patients With COVID-19: Machine Learning From Medical Records", journal="JMIR Form Res", year="2023", month="Oct", day="6", volume="7", pages="e46807", keywords="big data", keywords="COVID", keywords="respiratory distress", keywords="critical care", keywords="early warning", keywords="electronic medical record", keywords="machine learning", keywords="clinical phenotypes", keywords="pathogenesis", keywords="infection", keywords="immune response", keywords="treatment", keywords="biomarkers", keywords="training", keywords="sepsis", keywords="mortality", keywords="utility", keywords="phenotype", keywords="support tool", abstract="Background: There is significant heterogeneity in disease progression among hospitalized patients with COVID-19. The pathogenesis of SARS-CoV-2 infection is attributed to a complex interplay between virus and host immune response that in some patients unpredictably and rapidly leads to ``hyperinflammation'' associated with increased risk of mortality. The early identification of patients at risk of progression to hyperinflammation may help inform timely therapeutic decisions and lead to improved outcomes. Objective: The primary objective of this study was to use machine learning to reproducibly identify specific risk-stratifying clinical phenotypes across hospitalized patients with COVID-19 and compare treatment response characteristics and outcomes. A secondary objective was to derive a predictive phenotype classification model using routinely available early encounter data that may be useful in informing optimal COVID-19 bedside clinical management. Methods: This was a retrospective analysis of electronic health record data of adult patients (N=4379) who were admitted to a Johns Hopkins Health System hospital for COVID-19 treatment from 2020 to 2021. Phenotypes were identified by clustering 38 routine clinical observations recorded during inpatient care. To examine the reproducibility and validity of the derived phenotypes, patient data were randomly divided into 2 cohorts, and clustering analysis was performed independently for each cohort. A predictive phenotype classifier using the gradient-boosting machine method was derived using routine clinical observations recorded during the first 6 hours following admission. Results: A total of 2 phenotypes (designated as phenotype 1 and phenotype 2) were identified in patients admitted for COVID-19 in both the training and validation cohorts with similar distributions of features, correlations with biomarkers, treatments, comorbidities, and outcomes. In both the training and validation cohorts, phenotype-2 patients were older; had elevated markers of inflammation; and were at an increased risk of requiring intensive care unit--level care, developing sepsis, and mortality compared with phenotype-1 patients. The gradient-boosting machine phenotype prediction model yielded an area under the curve of 0.89 and a positive predictive value of 0.83. Conclusions: Using machine learning clustering, we identified and internally validated 2 clinical COVID-19 phenotypes with distinct treatment or response characteristics consistent with similar 2-phenotype models derived from other hospitalized populations with COVID-19, supporting the reliability and generalizability of these findings. COVID-19 phenotypes can be accurately identified using machine learning models based on readily available early encounter clinical data. A phenotype prediction model based on early encounter data may be clinically useful for timely bedside risk stratification and treatment personalization. ", doi="10.2196/46807", url="https://formative.jmir.org/2023/1/e46807", url="http://www.ncbi.nlm.nih.gov/pubmed/37642512" } @Article{info:doi/10.2196/44892, author="Zhang, Ying and Li, Xiaoying and Liu, Yi and Li, Aihua and Yang, Xuemei and Tang, Xiaoli", title="A Multilabel Text Classifier of Cancer Literature at the Publication Level: Methods Study of Medical Text Classification", journal="JMIR Med Inform", year="2023", month="Oct", day="5", volume="11", pages="e44892", keywords="text classification", keywords="publication-level classifier", keywords="cancer literature", keywords="deep learning", abstract="Background: Given the threat posed by cancer to human health, there is a rapid growth in the volume of data in the cancer field and interdisciplinary and collaborative research is becoming increasingly important for fine-grained classification. The low-resolution classifier of reported studies at the journal level fails to satisfy advanced searching demands, and a single label does not adequately characterize the literature originated from interdisciplinary research results. There is thus a need to establish a multilabel classifier with higher resolution to support literature retrieval for cancer research and reduce the burden of screening papers for clinical relevance. Objective: The primary objective of this research was to address the low-resolution issue of cancer literature classification due to the ambiguity of the existing journal-level classifier in order to support gaining high-relevance evidence for clinical consideration and all-sided results for literature retrieval. Methods: We trained a multilabel classifier with scalability for classifying the literature on cancer research directly at the publication level to assign proper content-derived labels based on the ``Bidirectional Encoder Representation from Transformers (BERT) + X'' model and obtain the best option for X. First, a corpus of 70,599 cancer publications retrieved from the Dimensions database was divided into a training and a testing set in a ratio of 7:3. Second, using the classification terminology of International Cancer Research Partnership cancer types, we compared the performance of classifiers developed using BERT and 5 classical deep learning models, such as the text recurrent neural network (TextRNN) and FastText, followed by metrics analysis. Results: After comparing various combined deep learning models, we obtained a classifier based on the optimal combination ``BERT + TextRNN,'' with a precision of 93.09\%, a recall of 87.75\%, and an F1-score of 90.34\%. Moreover, we quantified the distinctive characteristics in the text structure and multilabel distribution in order to generalize the model to other fields with similar characteristics. Conclusions: The ``BERT + TextRNN'' model was trained for high-resolution classification of cancer literature at the publication level to support accurate retrieval and academic statistics. The model automatically assigns 1 or more labels to each cancer paper, as required. Quantitative comparison verified that the ``BERT + TextRNN'' model is the best fit for multilabel classification of cancer literature compared to other models. More data from diverse fields will be collected to testify the scalability and extensibility of the proposed model in the future. ", doi="10.2196/44892", url="https://medinform.jmir.org/2023/1/e44892", url="http://www.ncbi.nlm.nih.gov/pubmed/37796584" } @Article{info:doi/10.2196/48413, author="Wojtara, Sara Magda and Kang, Jayne and Zaman, Mohammed", title="Congenital Telangiectatic Erythema: Scoping Review", journal="JMIR Dermatol", year="2023", month="Oct", day="5", volume="6", pages="e48413", keywords="rare diseases", keywords="rare disease", keywords="artificial intelligence", keywords="AI", keywords="dermatology", keywords="dermatologist", keywords="DNA repair", keywords="teledermatology", keywords="systematic review", keywords="erythema", keywords="deoxyribonucleic acid", keywords="bloom syndrome", keywords="postnatal growth deficiency", keywords="immune abnormality", keywords="cancer", keywords="oncology", keywords="DNA mutation", keywords="heredity", abstract="Background: Congenital telangiectatic erythema (CTE), also known as Bloom syndrome, is a rare autosomal recessive disorder characterized by below-average height, a narrow face, a red skin rash occurring on sun-exposed areas of the body, and an increased risk of cancer. CTE is one of many genodermatoses and photodermatoses associated with defects in DNA repair. CTE is caused by a mutation occurring in the BLM gene, which causes abnormal breaks in chromosomes. Objective: We aimed to analyze the existing literature on CTE to provide additional insight into its heredity, the spectrum of clinical presentations, and the management of this disorder. In addition, the gaps in current research and the use of artificial intelligence to streamline clinical diagnosis and the management of CTE are outlined. Methods: A literature search was conducted on PubMed, DOAJ, and Scopus using search terms such as ``congenital telangiectatic erythema,'' ``bloom syndrome,'' and ``bloom-torre-machacek.'' Due to limited current literature, studies published from January 2000 to January 2023 were considered for this review. A total of 49 sources from the literature were analyzed. Results: Through this scoping review, the researchers were able to identify several publications focusing on Bloom syndrome. Some common subject areas included the heredity of CTE, clinical presentations of CTE, and management of CTE. In addition, the literature on rare diseases shows the potential advancements in understanding and treatment with artificial intelligence. Future studies should address the causes of heterogeneity in presentation and examine potential therapeutic candidates for CTE and similarly presenting syndromes. Conclusions: This review illuminated current advances in potential molecular targets or causative pathways in the development of CTE as well as clinical features including erythema, increased cancer risk, and growth abnormalities. Future studies should continue to explore innovations in this space, especially in regard to the use of artificial intelligence, including machine learning and deep learning, for the diagnosis and clinical management of rare diseases such as CTE. ", doi="10.2196/48413", url="https://derma.jmir.org/2023/1/e48413", url="http://www.ncbi.nlm.nih.gov/pubmed/37796556" } @Article{info:doi/10.2196/49944, author="Homburg, Maarten and Meijer, Eline and Berends, Matthijs and Kupers, Thijmen and Olde Hartman, Tim and Muris, Jean and de Schepper, Evelien and Velek, Premysl and Kuiper, Jeroen and Berger, Marjolein and Peters, Lilian", title="A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study", journal="J Med Internet Res", year="2023", month="Oct", day="4", volume="25", pages="e49944", keywords="natural language processing", keywords="primary care", keywords="COVID-19", keywords="EHR", keywords="electronic health records", keywords="public health", keywords="multidisciplinary", keywords="NLP", keywords="disease identification", keywords="BERT model", keywords="model development", keywords="prediction", abstract="Background: Natural language processing (NLP) models such as bidirectional encoder representations from transformers (BERT) hold promise in revolutionizing disease identification from electronic health records (EHRs) by potentially enhancing efficiency and accuracy. However, their practical application in practice settings demands a comprehensive and multidisciplinary approach to development and validation. The COVID-19 pandemic highlighted challenges in disease identification due to limited testing availability and challenges in handling unstructured data. In the Netherlands, where general practitioners (GPs) serve as the first point of contact for health care, EHRs generated by these primary care providers contain a wealth of potentially valuable information. Nonetheless, the unstructured nature of free-text entries in EHRs poses challenges in identifying trends, detecting disease outbreaks, or accurately pinpointing COVID-19 cases. Objective: This study aims to develop and validate a BERT model for detecting COVID-19 consultations in general practice EHRs in the Netherlands. Methods: The BERT model was initially pretrained on Dutch language data and fine-tuned using a comprehensive EHR data set comprising confirmed COVID-19 GP consultations and non--COVID-19--related consultations. The data set was partitioned into a training and development set, and the model's performance was evaluated on an independent test set that served as the primary measure of its effectiveness in COVID-19 detection. To validate the final model, its performance was assessed through 3 approaches. First, external validation was applied on an EHR data set from a different geographic region in the Netherlands. Second, validation was conducted using results of polymerase chain reaction (PCR) test data obtained from municipal health services. Lastly, correlation between predicted outcomes and COVID-19--related hospitalizations in the Netherlands was assessed, encompassing the period around the outbreak of the pandemic in the Netherlands, that is, the period before widespread testing. Results: The model development used 300,359 GP consultations. We developed a highly accurate model for COVID-19 consultations (accuracy 0.97, F1-score 0.90, precision 0.85, recall 0.85, specificity 0.99). External validations showed comparable high performance. Validation on PCR test data showed high recall but low precision and specificity. Validation using hospital data showed significant correlation between COVID-19 predictions of the model and COVID-19--related hospitalizations (F1-score 96.8; P<.001; R2=0.69). Most importantly, the model was able to predict COVID-19 cases weeks before the first confirmed case in the Netherlands. Conclusions: The developed BERT model was able to accurately identify COVID-19 cases among GP consultations even preceding confirmed cases. The validated efficacy of our BERT model highlights the potential of NLP models to identify disease outbreaks early, exemplifying the power of multidisciplinary efforts in harnessing technology for disease identification. Moreover, the implications of this study extend beyond COVID-19 and offer a blueprint for the early recognition of various illnesses, revealing that such models could revolutionize disease surveillance. ", doi="10.2196/49944", url="https://www.jmir.org/2023/1/e49944", url="http://www.ncbi.nlm.nih.gov/pubmed/37792444" } @Article{info:doi/10.2196/44332, author="Pfisterer, J. Kaylen and Lohani, Raima and Janes, Elizabeth and Ng, Denise and Wang, Dan and Bryant-Lukosius, Denise and Rendon, Ricardo and Berlin, Alejandro and Bender, Jacqueline and Brown, Ian and Feifer, Andrew and Gotto, Geoffrey and Saha, Shumit and Cafazzo, A. Joseph and Pham, Quynh", title="An Actionable Expert-System Algorithm to Support Nurse-Led Cancer Survivorship Care: Algorithm Development Study", journal="JMIR Cancer", year="2023", month="Oct", day="4", volume="9", pages="e44332", keywords="prostate cancer", keywords="patient-reported outcomes", keywords="nurse-led model of care", keywords="expert system", keywords="artificial intelligence--powered decision support", keywords="digital health", keywords="nursing", keywords="algorithm development", keywords="cancer treatment", keywords="AI", keywords="survivorship", keywords="cancer", abstract="Background: Comprehensive models of survivorship care are necessary to improve access to and coordination of care. New models of care provide the opportunity to address the complexity of physical and psychosocial problems and long-term health needs experienced by patients following cancer treatment. Objective: This paper presents our expert-informed, rules-based survivorship algorithm to build a nurse-led model of survivorship care to support men living with prostate cancer (PCa). The algorithm is called No Evidence of Disease (Ned) and supports timelier decision-making, enhanced safety, and continuity of care. Methods: An initial rule set was developed and refined through working groups with clinical experts across Canada (eg, nurse experts, physician experts, and scientists; n=20), and patient partners (n=3). Algorithm priorities were defined through a multidisciplinary consensus meeting with clinical nurse specialists, nurse scientists, nurse practitioners, urologic oncologists, urologists, and radiation oncologists (n=17). The system was refined and validated using the nominal group technique. Results: Four levels of alert classification were established, initiated by responses on the Expanded Prostate Cancer Index Composite for Clinical Practice survey, and mediated by changes in minimal clinically important different alert thresholds, alert history, and clinical urgency with patient autonomy influencing clinical acuity. Patient autonomy was supported through tailored education as a first line of response, and alert escalation depending on a patient-initiated request for a nurse consultation. Conclusions: The Ned algorithm is positioned to facilitate PCa nurse-led care models with a high nurse-to-patient ratio. This novel expert-informed PCa survivorship care algorithm contains a defined escalation pathway for clinically urgent symptoms while honoring patient preference. Though further validation is required through a pragmatic trial, we anticipate the Ned algorithm will support timelier decision-making and enhance continuity of care through the automation of more frequent automated checkpoints, while empowering patients to self-manage their symptoms more effectively than standard care. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2020-045806 ", doi="10.2196/44332", url="https://cancer.jmir.org/2023/1/e44332", url="http://www.ncbi.nlm.nih.gov/pubmed/37792435" } @Article{info:doi/10.2196/49995, author="Fraser, Hamish and Crossland, Daven and Bacher, Ian and Ranney, Megan and Madsen, Tracy and Hilliard, Ross", title="Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study", journal="JMIR Mhealth Uhealth", year="2023", month="Oct", day="3", volume="11", pages="e49995", keywords="diagnosis", keywords="triage", keywords="symptom checker", keywords="emergency patient", keywords="ChatGPT", keywords="LLM", keywords="diagnose", keywords="self-diagnose", keywords="self-diagnosis", keywords="app", keywords="application", keywords="language model", keywords="accuracy", keywords="ChatGPT-3.5", keywords="ChatGPT-4.0", keywords="emergency", keywords="machine learning", abstract="Background: Diagnosis is a core component of effective health care, but misdiagnosis is common and can put patients at risk. Diagnostic decision support systems can play a role in improving diagnosis by physicians and other health care workers. Symptom checkers (SCs) have been designed to improve diagnosis and triage (ie, which level of care to seek) by patients. Objective: The aim of this study was to evaluate the performance of the new large language model ChatGPT (versions 3.5 and 4.0), the widely used WebMD SC, and an SC developed by Ada Health in the diagnosis and triage of patients with urgent or emergent clinical problems compared with the final emergency department (ED) diagnoses and physician reviews. Methods: We used previously collected, deidentified, self-report data from 40 patients presenting to an ED for care who used the Ada SC to record their symptoms prior to seeing the ED physician. Deidentified data were entered into ChatGPT versions 3.5 and 4.0 and WebMD by a research assistant blinded to diagnoses and triage. Diagnoses from all 4 systems were compared with the previously abstracted final diagnoses in the ED as well as with diagnoses and triage recommendations from three independent board-certified ED physicians who had blindly reviewed the self-report clinical data from Ada. Diagnostic accuracy was calculated as the proportion of the diagnoses from ChatGPT, Ada SC, WebMD SC, and the independent physicians that matched at least one ED diagnosis (stratified as top 1 or top 3). Triage accuracy was calculated as the number of recommendations from ChatGPT, WebMD, or Ada that agreed with at least 2 of the independent physicians or were rated ``unsafe'' or ``too cautious.'' Results: Overall, 30 and 37 cases had sufficient data for diagnostic and triage analysis, respectively. The rate of top-1 diagnosis matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 9 (30\%), 12 (40\%), 10 (33\%), and 12 (40\%), respectively, with a mean rate of 47\% for the physicians. The rate of top-3 diagnostic matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 19 (63\%), 19 (63\%), 15 (50\%), and 17 (57\%), respectively, with a mean rate of 69\% for physicians. The distribution of triage results for Ada was 62\% (n=23) agree, 14\% unsafe (n=5), and 24\% (n=9) too cautious; that for ChatGPT 3.5 was 59\% (n=22) agree, 41\% (n=15) unsafe, and 0\% (n=0) too cautious; that for ChatGPT 4.0 was 76\% (n=28) agree, 22\% (n=8) unsafe, and 3\% (n=1) too cautious; and that for WebMD was 70\% (n=26) agree, 19\% (n=7) unsafe, and 11\% (n=4) too cautious. The unsafe triage rate for ChatGPT 3.5 (41\%) was significantly higher (P=.009) than that of Ada (14\%). Conclusions: ChatGPT 3.5 had high diagnostic accuracy but a high unsafe triage rate. ChatGPT 4.0 had the poorest diagnostic accuracy, but a lower unsafe triage rate and the highest triage agreement with the physicians. The Ada and WebMD SCs performed better overall than ChatGPT. Unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to triage accuracy and extensive clinical evaluation. ", doi="10.2196/49995", url="https://mhealth.jmir.org/2023/1/e49995", url="http://www.ncbi.nlm.nih.gov/pubmed/37788063" } @Article{info:doi/10.2196/44187, author="Hill, Adele and Joyner, H. Christopher and Keith-Jopp, Chloe and Yet, Barbaros and Tuncer Sakar, Ceren and Marsh, William and Morrissey, Dylan", title="Assessing Serious Spinal Pathology Using Bayesian Network Decision Support: Development and Validation Study", journal="JMIR Form Res", year="2023", month="Oct", day="3", volume="7", pages="e44187", keywords="artificial intelligence", keywords="back pain", keywords="Bayesian network", keywords="expert consensus", abstract="Background: Identifying and managing serious spinal pathology (SSP) such as cauda equina syndrome or spinal infection in patients presenting with low back pain is challenging. Traditional red flag questioning is increasingly criticized, and previous studies show that many clinicians lack confidence in managing patients presenting with red flags. Improving decision-making and reducing the variability of care for these patients is a key priority for clinicians and researchers. Objective: We aimed to improve SSP identification by constructing and validating a decision support tool using a Bayesian network (BN), which is an artificial intelligence technique that combines current evidence and expert knowledge. Methods: A modified RAND appropriateness procedure was undertaken with 16 experts over 3 rounds, designed to elicit the variables, structure, and conditional probabilities necessary to build a causal BN. The BN predicts the likelihood of a patient with a particular presentation having an SSP. The second part of this study used an established framework to direct a 4-part validation that included comparison of the BN with consensus statements, practice guidelines, and recent research. Clinical cases were entered into the model and the results were compared with clinical judgment from spinal experts who were not involved in the elicitation. Receiver operating characteristic curves were plotted and area under the curve were calculated for accuracy statistics. Results: The RAND appropriateness procedure elicited a model including 38 variables in 3 domains: risk factors (10 variables), signs and symptoms (17 variables), and judgment factors (11 variables). Clear consensus was found in the risk factors and signs and symptoms for SSP conditions. The 4-part BN validation demonstrated good performance overall and identified areas for further development. Comparison with available clinical literature showed good overall agreement but suggested certain improvements required to, for example, 2 of the 11 judgment factors. Case analysis showed that cauda equina syndrome, space-occupying lesion/cancer, and inflammatory condition identification performed well across the validation domains. Fracture identification performed less well, but the reasons for the erroneous results are well understood. A review of the content by independent spinal experts backed up the issues with the fracture node, but the BN was otherwise deemed acceptable. Conclusions: The RAND appropriateness procedure and validation framework were successfully implemented to develop the BN for SSP. In comparison with other expert-elicited BN studies, this work goes a step further in validating the output before attempting implementation. Using a framework for model validation, the BN showed encouraging validity and has provided avenues for further developing the outputs that demonstrated poor accuracy. This study provides the vital first step of improving our ability to predict outcomes in low back pain by first considering the problem of SSP. International Registered Report Identifier (IRRID): RR2-10.2196/21804 ", doi="10.2196/44187", url="https://formative.jmir.org/2023/1/e44187", url="http://www.ncbi.nlm.nih.gov/pubmed/37788068" } @Article{info:doi/10.2196/46381, author="Kalmus, Olivier and Smits, Kirsten and Seitz, Max and Haux, Christian and Robra, Bernt-Peter and Listl, Stefan", title="Evaluation of a Digital Decision Support System to Integrate Type 2 Diabetes Mellitus and Periodontitis Care: Case-Vignette Study in Simulated Environments", journal="J Med Internet Res", year="2023", month="Oct", day="2", volume="25", pages="e46381", keywords="digital health", keywords="integrated care", keywords="decision support", keywords="oral health", keywords="diabetes", keywords="periodontitis", keywords="oral care", keywords="type 2 diabetes", keywords="evaluation", keywords="survey", keywords="hemoglobin", keywords="diagnostic device", keywords="telemedicine", abstract="Background: As highlighted by the recent World Health Organization Oral Health Resolution, there is an urgent need to better integrate primary and oral health care. Despite evidence and guidelines substantiating the relevance of integrating type 2 diabetes mellitus (T2DM) and periodontitis care, the fragmentation of primary and oral health care persists. Objective: This paper reports on the evaluation of a prototype digital decision support system (DSS) that was developed to enhance the integration of T2DM and periodontitis care. Methods: The effects of the prototype DSS were assessed in web-based simulated environments, using 2 different sets of case vignettes in combination with evaluation surveys among 202 general dental practitioners (GDPs) and 206 general practitioners (GPs). Each participant evaluated 3 vignettes, one of which, chosen at random, was assisted by the DSS. Logistic regression analyses were conducted at the participant and case levels. Results: Under DSS assistance, GPs had 8.3 (95\% CI 4.32-16.03) times higher odds of recommending a GDP visit. There was no significant impact of DSS assistance on GP advice about common risk factors for T2DM and periodontal disease. GDPs had 4.3 (95\% CI 2.08-9.04) times higher odds of recommending a GP visit, 1.6 (95\% CI 1.03-2.33) times higher odds of giving advice on disease correlations, and 3.2 (95\% CI 1.63-6.35) times higher odds of asking patients about their glycated hemoglobin value. Conclusions: The findings of this study provide a proof of concept for a digital DSS to integrate T2DM and periodontal care. Future updating and testing is warranted to continuously enhance the functionalities of the DSS in terms of interoperability with various types of data sources and diagnostic devices; incorporation of other (oral) health dimensions; application in various settings, including via telemedicine; and further customization of end-user interfaces. ", doi="10.2196/46381", url="https://www.jmir.org/2023/1/e46381", url="http://www.ncbi.nlm.nih.gov/pubmed/37782539" } @Article{info:doi/10.2196/45132, author="Kabukye, K. Johnblack and Namugga, Jane and Mpamani, Jackson Collins and Katumba, Andrew and Nakatumba-Nabende, Joyce and Nabuuma, Hanifa and Musoke, Senkomago Stephen and Nankya, Esther and Soomre, Edna and Nakisige, Carolyn and Orem, Jackson", title="Implementing Smartphone-Based Telemedicine for Cervical Cancer Screening in Uganda: Qualitative Study of Stakeholders' Perceptions", journal="J Med Internet Res", year="2023", month="Oct", day="2", volume="25", pages="e45132", keywords="telemedicine", keywords="cervical cancer", keywords="screening", keywords="visual inspection with acetic acid", keywords="cervicography", keywords="Uganda", keywords="digital health", abstract="Background: In Uganda, cervical cancer (CaCx) is the commonest cancer, accounting for 35.7\% of all cancer cases in women. The rates of human papillomavirus vaccination and CaCx screening remain low. Digital health tools and interventions have the potential to improve different aspects of CaCx screening and control in Uganda. Objective: This study aimed to describe stakeholders' perceptions of the telemedicine system we developed to improve CaCx screening in Uganda. Methods: We developed and implemented a smartphone-based telemedicine system for capturing and sharing cervical images and other clinical data, as well as an artificial intelligence model for automatic analysis of images. We conducted focus group discussions with health workers at the screening clinics (n=27) and women undergoing screening (n=15) to explore their perceptions of the system. The focus group discussions were supplemented with field observations and an evaluation survey of the health workers on system usability and the overall project. Results: In general, both patients and health workers had positive opinions about the system. Highlighted benefits included better cervical visualization, the ability to obtain a second opinion, improved communication between nurses and patients (to explain screening findings), improved clinical data management, performance monitoring and feedback, and modernization of screening service. However, there were also some negative perceptions. For example, some health workers felt the system is time-consuming, especially when it had just been introduced, while some patients were apprehensive about cervical image capture and sharing. Finally, commonplace challenges in digital health (eg, lack of interoperability and problems with sustainability) and challenges in cancer screening in general (eg, arduous referrals, inadequate monitoring and quality control) also resurfaced. Conclusions: This study demonstrates the feasibility and value of digital health tools in CaCx screening in Uganda, particularly with regard to improving patient experience and the quality of screening services. It also provides examples of potential limitations that must be addressed for successful implementation. ", doi="10.2196/45132", url="https://www.jmir.org/2023/1/e45132", url="http://www.ncbi.nlm.nih.gov/pubmed/37782541" } @Article{info:doi/10.2196/47486, author="Khanna, Amit and Jones, Graham", title="Toward Personalized Medicine Approaches for Parkinson Disease Using Digital Technologies", journal="JMIR Form Res", year="2023", month="Sep", day="27", volume="7", pages="e47486", keywords="digital health", keywords="monitoring", keywords="personalized medicine", keywords="Parkinson disease", keywords="wearables", keywords="neurodegenerative disorder", keywords="cognitive impairment", keywords="economic burden", keywords="digital technology", keywords="symptom management", keywords="disease control", keywords="debilitating disease", keywords="intervention", doi="10.2196/47486", url="https://formative.jmir.org/2023/1/e47486", url="http://www.ncbi.nlm.nih.gov/pubmed/37756050" } @Article{info:doi/10.2196/46520, author="Ha, Seokmin and Choi, Jung Su and Lee, Sujin and Wijaya, Hansel Reinatt and Kim, Hyun Jee and Joo, Yeon Eun and Kim, Kyoung Jae", title="Predicting the Risk of Sleep Disorders Using a Machine Learning--Based Simple Questionnaire: Development and Validation Study", journal="J Med Internet Res", year="2023", month="Sep", day="21", volume="25", pages="e46520", keywords="obstructive sleep apnea", keywords="insomnia", keywords="comorbid insomnia and sleep apnea", keywords="polysomnography", keywords="questionnaires", keywords="risk prediction", keywords="XGBoost", keywords="machine learning", keywords="risk", keywords="sleep", abstract="Background: Sleep disorders, such as obstructive sleep apnea (OSA), comorbid insomnia and sleep apnea (COMISA), and insomnia are common and can have serious health consequences. However, accurately diagnosing these conditions can be challenging as a result of the underrecognition of these diseases, the time-intensive nature of sleep monitoring necessary for a proper diagnosis, and patients' hesitancy to undergo demanding and costly overnight polysomnography tests. Objective: We aim to develop a machine learning algorithm that can accurately predict the risk of OSA, COMISA, and insomnia with a simple set of questions, without the need for a polysomnography test. Methods: We applied extreme gradient boosting to the data from 2 medical centers (n=4257 from Samsung Medical Center and n=365 from Ewha Womans University Medical Center Seoul Hospital). Features were selected based on feature importance calculated by the Shapley additive explanations (SHAP) method. We applied extreme gradient boosting using selected features to develop a simple questionnaire predicting sleep disorders (SLEEPS). The accuracy of the algorithm was evaluated using the area under the receiver operating characteristics curve. Results: In total, 9 features were selected to construct SLEEPS. SLEEPS showed high accuracy, with an area under the receiver operating characteristics curve of greater than 0.897 for all 3 sleep disorders, and consistent performance across both sets of data. We found that the distinction between COMISA and OSA was critical for accurate prediction. A publicly accessible website was created based on the algorithm that provides predictions for the risk of the 3 sleep disorders and shows how the risk changes with changes in weight or age. Conclusions: SLEEPS has the potential to improve the diagnosis and treatment of sleep disorders by providing more accessibility and convenience. The creation of a publicly accessible website based on the algorithm provides a user-friendly tool for assessing the risk of OSA, COMISA, and insomnia. ", doi="10.2196/46520", url="https://www.jmir.org/2023/1/e46520", url="http://www.ncbi.nlm.nih.gov/pubmed/37733411" } @Article{info:doi/10.2196/43963, author="Wang, M. Sabrina and Hogg, Jeffry H. D. and Sangvai, Devdutta and Patel, R. Manesh and Weissler, Hope E. and Kellogg, C. Katherine and Ratliff, William and Balu, Suresh and Sendak, Mark", title="Development and Integration of Machine Learning Algorithm to Identify Peripheral Arterial Disease: Multistakeholder Qualitative Study", journal="JMIR Form Res", year="2023", month="Sep", day="21", volume="7", pages="e43963", keywords="machine learning", keywords="implementation", keywords="integration", keywords="support", keywords="quality", keywords="peripheral arterial disease", keywords="algorithm", keywords="efficacy", keywords="structure", keywords="barrier", keywords="clinical", keywords="engagement", keywords="development", keywords="translation", keywords="detection", abstract="Background: Machine learning (ML)--driven clinical decision support (CDS) continues to draw wide interest and investment as a means of improving care quality and value, despite mixed real-world implementation outcomes. Objective: This study aimed to explore the factors that influence the integration of a peripheral arterial disease (PAD) identification algorithm to implement timely guideline-based care. Methods: A total of 12 semistructured interviews were conducted with individuals from 3 stakeholder groups during the first 4 weeks of integration of an ML-driven CDS. The stakeholder groups included technical, administrative, and clinical members of the team interacting with the ML-driven CDS. The ML-driven CDS identified patients with a high probability of having PAD, and these patients were then reviewed by an interdisciplinary team that developed a recommended action plan and sent recommendations to the patient's primary care provider. Pseudonymized transcripts were coded, and thematic analysis was conducted by a multidisciplinary research team. Results: Three themes were identified: positive factors translating in silico performance to real-world efficacy, organizational factors and data structure factors affecting clinical impact, and potential challenges to advancing equity. Our study found that the factors that led to successful translation of in silico algorithm performance to real-world impact were largely nontechnical, given adequate efficacy in retrospective validation, including strong clinical leadership, trustworthy workflows, early consideration of end-user needs, and ensuring that the CDS addresses an actionable problem. Negative factors of integration included failure to incorporate the on-the-ground context, the lack of feedback loops, and data silos limiting the ML-driven CDS. The success criteria for each stakeholder group were also characterized to better understand how teams work together to integrate ML-driven CDS and to understand the varying needs across stakeholder groups. Conclusions: Longitudinal and multidisciplinary stakeholder engagement in the development and integration of ML-driven CDS underpins its effective translation into real-world care. Although previous studies have focused on the technical elements of ML-driven CDS, our study demonstrates the importance of including administrative and operational leaders as well as an early consideration of clinicians' needs. Seeing how different stakeholder groups have this more holistic perspective also permits more effective detection of context-driven health care inequities, which are uncovered or exacerbated via ML-driven CDS integration through structural and organizational challenges. Many of the solutions to these inequities lie outside the scope of ML and require coordinated systematic solutions for mitigation to help reduce disparities in the care of patients with PAD. ", doi="10.2196/43963", url="https://formative.jmir.org/2023/1/e43963", url="http://www.ncbi.nlm.nih.gov/pubmed/37733427" } @Article{info:doi/10.2196/48115, author="Zhang, Zeyu and Fang, Meng and Wu, Rebecca and Zong, Hui and Huang, Honglian and Tong, Yuantao and Xie, Yujia and Cheng, Shiyang and Wei, Ziyi and Crabbe, C. M. James and Zhang, Xiaoyan and Wang, Ying", title="Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19", journal="J Med Internet Res", year="2023", month="Sep", day="20", volume="25", pages="e48115", keywords="biomedical text mining", keywords="biomedical relation extraction", keywords="pretrained language model", keywords="task-adaptive pretraining", keywords="knowledge graph", keywords="knowledge discovery", keywords="clinical drug path", keywords="COVID-19", abstract="Background: Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. Objective: We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. Methods: Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. Results: The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. Conclusions: This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research. ", doi="10.2196/48115", url="https://www.jmir.org/2023/1/e48115", url="http://www.ncbi.nlm.nih.gov/pubmed/37632414" } @Article{info:doi/10.2196/48636, author="Butters, Alexandra and Blanch, Bianca and Kemp-Casey, Anna and Do, Judy and Yeates, Laura and Leslie, Felicity and Semsarian, Christopher and Nedkoff, Lee and Briffa, Tom and Ingles, Jodie and Sweeting, Joanna", title="The Australian Genetic Heart Disease Registry: Protocol for a Data Linkage Study", journal="JMIR Res Protoc", year="2023", month="Sep", day="20", volume="12", pages="e48636", keywords="data linkage", keywords="genetic heart diseases", keywords="health care use", keywords="cardiomyopathies", keywords="arrhythmia", keywords="cardiology", keywords="heart", keywords="genetics", keywords="registry", keywords="registries", keywords="risk", keywords="mortality", keywords="national", keywords="big data", keywords="harmonization", keywords="probabilistic matching", abstract="Background: Genetic heart diseases such as hypertrophic cardiomyopathy can cause significant morbidity and mortality, ranging from syncope, chest pain, and palpitations to heart failure and sudden cardiac death. These diseases are inherited in an autosomal dominant fashion, meaning family members of affected individuals have a 1 in 2 chance of also inheriting the disease (``at-risk relatives''). The health care use patterns of individuals with a genetic heart disease, including emergency department presentations and hospital admissions, are poorly understood. By linking genetic heart disease registry data to routinely collected health data, we aim to provide a more comprehensive clinical data set to examine the burden of disease on individuals, families, and health care systems. Objective: The objective of this study is to link the Australian Genetic Heart Disease (AGHD) Registry with routinely collected whole-population health data sets to investigate the health care use of individuals with a genetic heart disease and their at-risk relatives. This linked data set will allow for the investigation of differences in outcomes and health care use due to disease, sex, socioeconomic status, and other factors. Methods: The AGHD Registry is a nationwide data set that began in 2007 and aims to recruit individuals with a genetic heart disease and their family members. In this study, demographic, clinical, and genetic data (available from 2007 to 2019) for AGHD Registry participants and at-risk relatives residing in New South Wales (NSW), Australia, were linked to routinely collected health data. These data included NSW-based data sets covering hospitalizations (2001-2019), emergency department presentations (2005-2019), and both state-wide and national mortality registries (2007-2019). The linkage was performed by the Centre for Health Record Linkage. Investigations stratifying by diagnosis, age, sex, socioeconomic status, and gene status will be undertaken and reported using descriptive statistics. Results: NSW AGHD Registry participants were linked to routinely collected health data sets using probabilistic matching (November 2019). Of 1720 AGHD Registry participants, 1384 had linkages with 11,610 hospital records, 7032 emergency department records, and 60 death records. Data assessment and harmonization were performed, and descriptive data analysis is underway. Conclusions: We intend to provide insights into the health care use patterns of individuals with a genetic heart disease and their at-risk relatives, including frequency of hospital admissions and differences due to factors such as disease, sex, and socioeconomic status. Identifying disparities and potential barriers to care may highlight specific health care needs (eg, between sexes) and factors impacting health care access and use. International Registered Report Identifier (IRRID): DERR1-10.2196/48636 ", doi="10.2196/48636", url="https://www.researchprotocols.org/2023/1/e48636", url="http://www.ncbi.nlm.nih.gov/pubmed/37728963" } @Article{info:doi/10.2196/45767, author="Dolatabadi, Elham and Moyano, Diana and Bales, Michael and Spasojevic, Sofija and Bhambhoria, Rohan and Bhatti, Junaid and Debnath, Shyamolima and Hoell, Nicholas and Li, Xin and Leng, Celine and Nanda, Sasha and Saab, Jad and Sahak, Esmat and Sie, Fanny and Uppal, Sara and Vadlamudi, Khatri Nirma and Vladimirova, Antoaneta and Yakimovich, Artur and Yang, Xiaoxue and Kocak, Akinli Sedef and Cheung, M. Angela", title="Using Social Media to Help Understand Patient-Reported Health Outcomes of Post--COVID-19 Condition: Natural Language Processing Approach", journal="J Med Internet Res", year="2023", month="Sep", day="19", volume="25", pages="e45767", keywords="long COVID", keywords="post--COVID-19 condition", keywords="PCC", keywords="social media", keywords="natural language processing", keywords="transformer models", keywords="bidirectional encoder representations from transformers", keywords="machine learning", keywords="Twitter", keywords="Reddit", keywords="PRO", keywords="patient-reported outcome", keywords="patient-reported symptom", keywords="health outcome", keywords="symptom", keywords="entity extraction", keywords="entity normalization", abstract="Background: While scientific knowledge of post--COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians. Objective: In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline's potential as a surveillance tool. Methods: We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries. Results: UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42\% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada. Conclusions: The outcome of our social media--derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient's journey that can help health care providers anticipate future needs. International Registered Report Identifier (IRRID): RR2-10.1101/2022.12.14.22283419 ", doi="10.2196/45767", url="https://www.jmir.org/2023/1/e45767", url="http://www.ncbi.nlm.nih.gov/pubmed/37725432" } @Article{info:doi/10.2196/47398, author="Sreepada, Syamala Rama and Chang, Ching Ai and West, C. Nicholas and Sujan, Jonath and Lai, Brendan and Poznikoff, K. Andrew and Munk, Rebecca and Froese, R. Norbert and Chen, C. James and G{\"o}rges, Matthias", title="Dashboard of Short-Term Postoperative Patient Outcomes for Anesthesiologists: Development and Preliminary Evaluation", journal="JMIR Perioper Med", year="2023", month="Sep", day="19", volume="6", pages="e47398", keywords="quality improvement", keywords="feedback", keywords="anesthesiologists", keywords="patient reported outcome measures", keywords="data display", keywords="user-centered design", keywords="surgical outcome", keywords="discharge", keywords="anesthesiology", keywords="postoperative care", keywords="registry", keywords="dashboard", keywords="interactive", keywords="practice", keywords="performance", keywords="patient outcome", keywords="mobile phone", abstract="Background: Anesthesiologists require an understanding of their patients' outcomes to evaluate their performance and improve their practice. Traditionally, anesthesiologists had limited information about their surgical outpatients' outcomes due to minimal contact post discharge. Leveraging digital health innovations for analyzing personal and population outcomes may improve perioperative care. BC Children's Hospital's postoperative follow-up registry for outpatient surgeries collects short-term outcomes such as pain, nausea, and vomiting. Yet, these data were previously not available to anesthesiologists. Objective: This quality improvement study aimed to visualize postoperative outcome data to allow anesthesiologists to reflect on their care and compare their performance with their peers. Methods: The postoperative follow-up registry contains nurse-reported postoperative outcomes, including opioid and antiemetic administration in the postanesthetic care unit (PACU), and family-reported outcomes, including pain, nausea, and vomiting, within 24 hours post discharge. Dashboards were iteratively co-designed with 5 anesthesiologists, and a department-wide usability survey gathered anesthesiologists' feedback on the dashboards, allowing further design improvements. A final dashboard version has been deployed, with data updated weekly. Results: The dashboard contains three sections: (1) 24-hour outcomes, (2) PACU outcomes, and (3) a practice profile containing individual anesthesiologist's case mix, grouped by age groups, sex, and surgical service. At the time of evaluation, the dashboard included 24-hour data from 7877 cases collected from September 2020 to February 2023 and PACU data from 8716 cases collected from April 2021 to February 2023. The co-design process and usability evaluation indicated that anesthesiologists preferred simpler designs for data summaries but also required the ability to explore details of specific outcomes and cases if needed. Anesthesiologists considered security and confidentiality to be key features of the design and most deemed the dashboard information useful and potentially beneficial for their practice. Conclusions: We designed and deployed a dynamic, personalized dashboard for anesthesiologists to review their outpatients' short-term postoperative outcomes. This dashboard facilitates personal reflection on individual practice in the context of peer and departmental performance and, hence, the opportunity to evaluate iterative practice changes. Further work is required to establish their effect on improving individual and department performance and patient outcomes. ", doi="10.2196/47398", url="https://periop.jmir.org/2023/1/e47398", url="http://www.ncbi.nlm.nih.gov/pubmed/37725426" } @Article{info:doi/10.2196/41409, author="Noort, C. Bart A. and Buijs, Paul and Roemeling, Oskar", title="Outsourcing the Management of Reusable Medical Devices in a Chain-Wide Care Setting: Mixed Methods Feasibility Study", journal="Interact J Med Res", year="2023", month="Sep", day="19", volume="12", pages="e41409", keywords="health care logistics", keywords="outsourcing", keywords="web ordering portal", keywords="medical devices", keywords="feasibility study", keywords="device management", abstract="Background: Managing reusable medical devices incurs substantial health care costs and complexity, particularly in integrated care settings. This complexity hampers care quality, safety, and costs. Studying logistical innovations within integrated care can provide insights to medical devices use among staff effectively. Objective: This study aimed to establish the feasibility of a logistical intervention through outsourcing and a web portal. The goal was to provide insights into users' acceptability of the intervention, on whether the intervention was successfully implemented, and on the intervention's preliminary efficacy, thus benefiting practitioners and researchers. Methods: This paper presents a mixed methods feasibility study at a large chain-wide health care provider in the Netherlands. The intervention entailed outsourcing noncritical reusable medical devices and introducing a web portal for device management. A questionnaire gauged perceived ordering and delivery times, satisfaction with the ordering and delivery process, compliance with safety and hygiene certification, and effects on the care delivery process. Qualitative data in the form of observations, documentation, and interviews were used to identify implementing challenges. Using on-site stocktaking and data from information systems, we analyzed the utilization, costs, and rental time of medical devices before and after the intervention for wheelchairs and anti--pressure ulcer mattresses. Results: Looking at the acceptability of the intervention, a high user satisfaction with the ordering and delivery process was reported (rated on a 5-point Likert scale). With respect to preliminary efficacy, we noted a reduction in the utilization of wheelchairs (on average, 1106, SD 106 fewer utilization d/mo), and a halted increase in the utilization of anti--pressure ulcer mattresses. In addition, nurses who used the web portal reported shorter ordering times for wheelchairs (?2.7 min) and anti--pressure ulcer mattresses (?3.1 min), as well as shorter delivery times for wheelchairs (?0.5 d). Moreover, an increase in device certification was reported (average score of 1.9, SD 1.0), indicating higher levels of safety and hygiene standards. In theory, these improvements should translate into better outcomes in terms of costs and the quality of care. However, we were unable to establish a reduction in total care costs or a reduced rental time per device. Furthermore, respondents did not identify improvements in safety or the quality of care. Although implementation challenges related to the diverse supply base and complexities with different care financers were observed, the overall implementation of the intervention was considered successful. Conclusions: This study confirms the feasibility of our intervention, in terms of acceptability, implementation success, and preliminary efficacy. The integrated management of medical devices should enable a reduction in costs, required devices, and material waste, as well as higher quality care. However, several challenges remain related to the implementation of such interventions. ", doi="10.2196/41409", url="https://www.i-jmr.org/2023/1/e41409", url="http://www.ncbi.nlm.nih.gov/pubmed/37725420" } @Article{info:doi/10.2196/45760, author="Kwun, Ju-Seung and Lee, Hoon Jang and Park, Eun Bo and Park, Sung Jong and Kim, Jeong Hyeon and Kim, Sun-Hwa and Jeon, Ki-Hyun and Cho, Hyoung-won and Kang, Si-Hyuck and Lee, Wonjae and Youn, Tae-Jin and Chae, In-Ho and Yoon, Chang-Hwan", title="Diagnostic Value of a Wearable Continuous Electrocardiogram Monitoring Device (AT-Patch) for New-Onset Atrial Fibrillation in High-Risk Patients: Prospective Cohort Study", journal="J Med Internet Res", year="2023", month="Sep", day="18", volume="25", pages="e45760", keywords="arrhythmias", keywords="atrial fibrillation", keywords="wearable electronic device", keywords="patch electrocardiogram monitor", keywords="electrocardiogram", keywords="adult", keywords="AT-Patch", keywords="heart failure", keywords="mobile phone", abstract="Background: While conventional electrocardiogram monitoring devices are useful for detecting atrial fibrillation, they have considerable drawbacks, including a short monitoring duration and invasive device implantation. The use of patch-type devices circumvents these drawbacks and has shown comparable diagnostic capability for the early detection of atrial fibrillation. Objective: We aimed to determine whether a patch-type device (AT-Patch) applied to patients with a high risk of new-onset atrial fibrillation defined by the congestive heart failure, hypertension, age ?75 years, diabetes mellitus, stroke, vascular disease, age 65-74 years, sex scale (CHA2DS2-VASc) score had increased detection rates. Methods: In this nonrandomized multicenter prospective cohort study, we enrolled 320 adults aged ?19 years who had never experienced atrial fibrillation and whose CHA2DS2-VASc score was ?2. The AT-Patch was attached to each individual for 11 days, and the data were analyzed for arrhythmic events by 2 independent cardiologists. Results: Atrial fibrillation was detected by the AT-Patch in 3.4\% (11/320) of patients, as diagnosed by both cardiologists. Interestingly, when participants with or without atrial fibrillation were compared, a previous history of heart failure was significantly more common in the atrial fibrillation group (n=4/11, 36.4\% vs n=16/309, 5.2\%, respectively; P=.003). When a CHA2DS2-VASc score ?4 was combined with previous heart failure, the detection rate was significantly increased to 24.4\%. Comparison of the recorded electrocardiogram data revealed that supraventricular and ventricular ectopic rhythms were significantly more frequent in the new-onset atrial fibrillation group compared with nonatrial fibrillation group (3.4\% vs 0.4\%; P=.001 and 5.2\% vs 1.2\%; P<.001), respectively. Conclusions: This study detected a moderate number of new-onset atrial fibrillations in high-risk patients using the AT-Patch device. Further studies will aim to investigate the value of early detection of atrial fibrillation, particularly in patients with heart failure as a means of reducing adverse clinical outcomes of atrial fibrillation. Trial Registration: ClinicalTrials.gov NCT04857268; https://classic.clinicaltrials.gov/ct2/show/NCT04857268 ", doi="10.2196/45760", url="https://www.jmir.org/2023/1/e45760", url="http://www.ncbi.nlm.nih.gov/pubmed/37721791" } @Article{info:doi/10.2196/48534, author="Matsuda, Shinichi and Ohtomo, Takumi and Okuyama, Masaru and Miyake, Hiraku and Aoki, Kotonari", title="Estimating Patient Satisfaction Through a Language Processing Model: Model Development and Evaluation", journal="JMIR Form Res", year="2023", month="Sep", day="14", volume="7", pages="e48534", keywords="breast cancer", keywords="internet", keywords="machine learning", keywords="natural language processing", keywords="natural language-processing model", keywords="neural network", keywords="NLP", keywords="patient satisfaction", keywords="textual data", abstract="Background: Measuring patient satisfaction is a crucial aspect of medical care. Advanced natural language processing (NLP) techniques enable the extraction and analysis of high-level insights from textual data; nonetheless, data obtained from patients are often limited. Objective: This study aimed to create a model that quantifies patient satisfaction based on diverse patient-written textual data. Methods: We constructed a neural network--based NLP model for this cross-sectional study using the textual content from disease blogs written in Japanese on the Internet between 1994 and 2020. We extracted approximately 20 million sentences from 56,357 patient-authored disease blogs and constructed a model to predict the patient satisfaction index (PSI) using a regression approach. After evaluating the model's effectiveness, PSI was predicted before and after cancer notification to examine the emotional impact of cancer diagnoses on 48 patients with breast cancer. Results: We assessed the correlation between the predicted and actual PSI values, labeled by humans, using the test set of 169 sentences. The model successfully quantified patient satisfaction by detecting nuances in sentences with excellent effectiveness (Spearman correlation coefficient [$\rho$]=0.832; root-mean-squared error [RMSE]=0.166; P<.001). Furthermore, the PSI was significantly lower in the cancer notification period than in the preceding control period (?0.057 and ?0.012, respectively; 2-tailed t47=5.392, P<.001), indicating that the model quantifies the psychological and emotional changes associated with the cancer diagnosis notification. Conclusions: Our model demonstrates the ability to quantify patient dissatisfaction and identify significant emotional changes during the disease course. This approach may also help detect issues in routine medical practice. ", doi="10.2196/48534", url="https://formative.jmir.org/2023/1/e48534", url="http://www.ncbi.nlm.nih.gov/pubmed/37707946" } @Article{info:doi/10.2196/51776, author="Fear, Kathleen and Gleber, Conrad", title="Shaping the Future of Older Adult Care: ChatGPT, Advanced AI, and the Transformation of Clinical Practice", journal="JMIR Aging", year="2023", month="Sep", day="13", volume="6", pages="e51776", keywords="generative AI", keywords="artificial intelligence", keywords="large language models", keywords="ChatGPT", keywords="Generative Pre-trained Transformer", doi="10.2196/51776", url="https://aging.jmir.org/2023/1/e51776", url="http://www.ncbi.nlm.nih.gov/pubmed/37703085" } @Article{info:doi/10.2196/46891, author="Huang, Guoqing and Jin, Qiankai and Mao, Yushan", title="Predicting the 5-Year Risk of Nonalcoholic Fatty Liver Disease Using Machine Learning Models: Prospective Cohort Study", journal="J Med Internet Res", year="2023", month="Sep", day="12", volume="25", pages="e46891", keywords="nonalcoholic fatty liver disease", keywords="machine learning", keywords="independent risk factors", keywords="prediction model", keywords="model", keywords="fatty liver", keywords="prevention", keywords="liver", keywords="prognostic", keywords="China", keywords="development", keywords="validation", keywords="risk model", keywords="clinical applicability", abstract="Background: Nonalcoholic fatty liver disease (NAFLD) has emerged as a worldwide public health issue. Identifying and targeting populations at a heightened risk of developing NAFLD over a 5-year period can help reduce and delay adverse hepatic prognostic events. Objective: This study aimed to investigate the 5-year incidence of NAFLD in the Chinese population. It also aimed to establish and validate a machine learning model for predicting the 5-year NAFLD risk. Methods: The study population was derived from a 5-year prospective cohort study. A total of 6196 individuals without NAFLD who underwent health checkups in 2010 at Zhenhai Lianhua Hospital in Ningbo, China, were enrolled in this study. Extreme gradient boosting (XGBoost)--recursive feature elimination, combined with the least absolute shrinkage and selection operator (LASSO), was used to screen for characteristic predictors. A total of 6 machine learning models, namely logistic regression, decision tree, support vector machine, random forest, categorical boosting, and XGBoost, were utilized in the construction of a 5-year risk model for NAFLD. Hyperparameter optimization of the predictive model was performed in the training set, and a further evaluation of the model performance was carried out in the internal and external validation sets. Results: The 5-year incidence of NAFLD was 18.64\% (n=1155) in the study population. We screened 11 predictors for risk prediction model construction. After the hyperparameter optimization, CatBoost demonstrated the best prediction performance in the training set, with an area under the receiver operating characteristic (AUROC) curve of 0.810 (95\% CI 0.768-0.852). Logistic regression showed the best prediction performance in the internal and external validation sets, with AUROC curves of 0.778 (95\% CI 0.759-0.794) and 0.806 (95\% CI 0.788-0.821), respectively. The development of web-based calculators has enhanced the clinical feasibility of the risk prediction model. Conclusions: Developing and validating machine learning models can aid in predicting which populations are at the highest risk of developing NAFLD over a 5-year period, thereby helping delay and reduce the occurrence of adverse liver prognostic events. ", doi="10.2196/46891", url="https://www.jmir.org/2023/1/e46891", url="http://www.ncbi.nlm.nih.gov/pubmed/37698911" } @Article{info:doi/10.2196/44897, author="Li, Chaixiu and Fu, Jiaqi and Lai, Jie and Sun, Lijun and Zhou, Chunlan and Li, Wenji and Jian, Biao and Deng, Shisi and Zhang, Yujie and Guo, Zihan and Liu, Yusheng and Zhou, Yanni and Xie, Shihui and Hou, Mingyue and Wang, Ru and Chen, Qinjie and Wu, Yanni", title="Construction of an Emotional Lexicon of Patients With Breast Cancer: Development and Sentiment Analysis", journal="J Med Internet Res", year="2023", month="Sep", day="12", volume="25", pages="e44897", keywords="breast cancer", keywords="lexicon construction", keywords="domain emotional lexicon", keywords="sentiment analysis", keywords="natural language processing", abstract="Background: The innovative method of sentiment analysis based on an emotional lexicon shows prominent advantages in capturing emotional information, such as individual attitudes, experiences, and needs, which provides a new perspective and method for emotion recognition and management for patients with breast cancer (BC). However, at present, sentiment analysis in the field of BC is limited, and there is no emotional lexicon for this field. Therefore, it is necessary to construct an emotional lexicon that conforms to the characteristics of patients with BC so as to provide a new tool for accurate identification and analysis of the patients' emotions and a new method for their personalized emotion management. Objective: This study aimed to construct an emotional lexicon of patients with BC. Methods: Emotional words were obtained by merging the words in 2 general sentiment lexicons, the Chinese Linguistic Inquiry and Word Count (C-LIWC) and HowNet, and the words in text corpora acquired from patients with BC via Weibo, semistructured interviews, and expressive writing. The lexicon was constructed using manual annotation and classification under the guidance of Russell's valence-arousal space. Ekman's basic emotional categories, Lazarus' cognitive appraisal theory of emotion, and a qualitative text analysis based on the text corpora of patients with BC were combined to determine the fine-grained emotional categories of the lexicon we constructed. Precision, recall, and the F1-score were used to evaluate the lexicon's performance. Results: The text corpora collected from patients in different stages of BC included 150 written materials, 17 interviews, and 6689 original posts and comments from Weibo, with a total of 1,923,593 Chinese characters. The emotional lexicon of patients with BC contained 9357 words and covered 8 fine-grained emotional categories: joy, anger, sadness, fear, disgust, surprise, somatic symptoms, and BC terminology. Experimental results showed that precision, recall, and the F1-score of positive emotional words were 98.42\%, 99.73\%, and 99.07\%, respectively, and those of negative emotional words were 99.73\%, 98.38\%, and 99.05\%, respectively, which all significantly outperformed the C-LIWC and HowNet. Conclusions: The emotional lexicon with fine-grained emotional categories conforms to the characteristics of patients with BC. Its performance related to identifying and classifying domain-specific emotional words in BC is better compared to the C-LIWC and HowNet. This lexicon not only provides a new tool for sentiment analysis in the field of BC but also provides a new perspective for recognizing the specific emotional state and needs of patients with BC and formulating tailored emotional management plans. ", doi="10.2196/44897", url="https://www.jmir.org/2023/1/e44897", url="http://www.ncbi.nlm.nih.gov/pubmed/37698914" } @Article{info:doi/10.2196/47095, author="Deng, Yuhan and Ma, Yuan and Fu, Jingzhu and Wang, Xiaona and Yu, Canqing and Lv, Jun and Man, Sailimai and Wang, Bo and Li, Liming", title="Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study", journal="JMIR Public Health Surveill", year="2023", month="Sep", day="7", volume="9", pages="e47095", keywords="machine learning", keywords="carotid plaque", keywords="health check-up", keywords="prediction", keywords="fatty liver", keywords="risk assessment", keywords="risk stratification", keywords="cardiovascular", keywords="logistic regression", abstract="Background: Carotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease. Objective: This study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. Methods: Our study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set. Results: Among the participants, 26.23\% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64\% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25\% and 65\% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque. Conclusions: The combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver. ", doi="10.2196/47095", url="https://publichealth.jmir.org/2023/1/e47095", url="http://www.ncbi.nlm.nih.gov/pubmed/37676713" } @Article{info:doi/10.2196/42047, author="Fernandes, J. Glenn and Choi, Arthur and Schauer, Michael Jacob and Pfammatter, F. Angela and Spring, J. Bonnie and Darwiche, Adnan and Alshurafa, I. Nabil", title="An Explainable Artificial Intelligence Software Tool for Weight Management Experts (PRIMO): Mixed Methods Study", journal="J Med Internet Res", year="2023", month="Sep", day="6", volume="25", pages="e42047", keywords="explainable artificial intelligence", keywords="explainable AI", keywords="machine learning", keywords="ML", keywords="interpretable ML", keywords="random forest", keywords="decision-making", keywords="weight loss prediction", keywords="mobile phone", abstract="Background: Predicting the likelihood of success of weight loss interventions using machine learning (ML) models may enhance intervention effectiveness by enabling timely and dynamic modification of intervention components for nonresponders to treatment. However, a lack of understanding and trust in these ML models impacts adoption among weight management experts. Recent advances in the field of explainable artificial intelligence enable the interpretation of ML models, yet it is unknown whether they enhance model understanding, trust, and adoption among weight management experts. Objective: This study aimed to build and evaluate an ML model that can predict 6-month weight loss success (ie, ?7\% weight loss) from 5 engagement and diet-related features collected over the initial 2 weeks of an intervention, to assess whether providing ML-based explanations increases weight management experts' agreement with ML model predictions, and to inform factors that influence the understanding and trust of ML models to advance explainability in early prediction of weight loss among weight management experts. Methods: We trained an ML model using the random forest (RF) algorithm and data from a 6-month weight loss intervention (N=419). We leveraged findings from existing explainability metrics to develop Prime Implicant Maintenance of Outcome (PRIMO), an interactive tool to understand predictions made by the RF model. We asked 14 weight management experts to predict hypothetical participants' weight loss success before and after using PRIMO. We compared PRIMO with 2 other explainability methods, one based on feature ranking and the other based on conditional probability. We used generalized linear mixed-effects models to evaluate participants' agreement with ML predictions and conducted likelihood ratio tests to examine the relationship between explainability methods and outcomes for nested models. We conducted guided interviews and thematic analysis to study the impact of our tool on experts' understanding and trust in the model. Results: Our RF model had 81\% accuracy in the early prediction of weight loss success. Weight management experts were significantly more likely to agree with the model when using PRIMO ($\chi$2=7.9; P=.02) compared with the other 2 methods with odds ratios of 2.52 (95\% CI 0.91-7.69) and 3.95 (95\% CI 1.50-11.76). From our study, we inferred that our software not only influenced experts' understanding and trust but also impacted decision-making. Several themes were identified through interviews: preference for multiple explanation types, need to visualize uncertainty in explanations provided by PRIMO, and need for model performance metrics on similar participant test instances. Conclusions: Our results show the potential for weight management experts to agree with the ML-based early prediction of success in weight loss treatment programs, enabling timely and dynamic modification of intervention components to enhance intervention effectiveness. Our findings provide methods for advancing the understandability and trust of ML models among weight management experts. ", doi="10.2196/42047", url="https://www.jmir.org/2023/1/e42047", url="http://www.ncbi.nlm.nih.gov/pubmed/37672333" } @Article{info:doi/10.2196/45547, author="Bae, Kideog and Jeon, Seok Young and Hwangbo, Yul and Yoo, Woo Chong and Han, Nayoung and Feng, Mengling", title="Data-Efficient Computational Pathology Platform for Faster and Cheaper Breast Cancer Subtype Identifications: Development of a Deep Learning Model", journal="JMIR Cancer", year="2023", month="Sep", day="5", volume="9", pages="e45547", keywords="deep learning", keywords="self-supervised learning", keywords="immunohistochemical staining", keywords="machine learning", keywords="histology", keywords="pathology", keywords="computation", keywords="predict", keywords="diagnosis", keywords="diagnose", keywords="carcinoma", keywords="cancer", keywords="oncology", keywords="breast cancer", abstract="Background: Breast cancer subtyping is a crucial step in determining therapeutic options, but the molecular examination based on immunohistochemical staining is expensive and time-consuming. Deep learning opens up the possibility to predict the subtypes based on the morphological information from hematoxylin and eosin staining, a much cheaper and faster alternative. However, training the predictive model conventionally requires a large number of histology images, which is challenging to collect by a single institute. Objective: We aimed to develop a data-efficient computational pathology platform, 3DHistoNet, which is capable of learning from z-stacked histology images to accurately predict breast cancer subtypes with a small sample size. Methods: We retrospectively examined 401 cases of patients with primary breast carcinoma diagnosed between 2018 and 2020 at the Department of Pathology, National Cancer Center, South Korea. Pathology slides of the patients with breast carcinoma were prepared according to the standard protocols. Age, gender, histologic grade, hormone receptor (estrogen receptor [ER], progesterone receptor [PR], and androgen receptor [AR]) status, erb-B2 receptor tyrosine kinase 2 (HER2) status, and Ki-67 index were evaluated by reviewing medical charts and pathological records. Results: The area under the receiver operating characteristic curve and decision curve were analyzed to evaluate the performance of our 3DHistoNet platform for predicting the ER, PR, AR, HER2, and Ki67 subtype biomarkers with 5-fold cross-validation. We demonstrated that 3DHistoNet can predict all clinically important biomarkers (ER, PR, AR, HER2, and Ki67) with performance exceeding the conventional multiple instance learning models by a considerable margin (area under the receiver operating characteristic curve: 0.75-0.91 vs 0.67-0.8). We further showed that our z-stack histology scanning method can make up for insufficient training data sets without any additional cost incurred. Finally, 3DHistoNet offered an additional capability to generate attention maps that reveal correlations between Ki67 and histomorphological features, which renders the hematoxylin and eosin image in higher fidelity to the pathologist. Conclusions: Our stand-alone, data-efficient pathology platform that can both generate z-stacked images and predict key biomarkers is an appealing tool for breast cancer diagnosis. Its development would encourage morphology-based diagnosis, which is faster, cheaper, and less error-prone compared to the protein quantification method based on immunohistochemical staining. ", doi="10.2196/45547", url="https://cancer.jmir.org/2023/1/e45547", url="http://www.ncbi.nlm.nih.gov/pubmed/37669090" } @Article{info:doi/10.2196/49774, author="Spooner, Caitlin and Vivat, Bella and White, Nicola and Stone, Patrick", title="Developing a Core Outcome Set for Prognostic Research in Palliative Cancer Care: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2023", month="Sep", day="1", volume="12", pages="e49774", keywords="core outcome set", keywords="palliative care", keywords="end-of-life", keywords="prognosis", keywords="advanced cancer", keywords="systematic review", keywords="interviews", keywords="Delphi study", abstract="Background: Studies exploring the impact of receiving end-of-life prognoses in patients with advanced cancer use a variety of different measures to evaluate the outcomes, and thus report often conflicting findings. The standardization of outcomes reported in studies of prognostication in palliative cancer care could enable uniform assessment and reporting, as well as intertrial comparisons. A core outcome set promotes consistency in outcome selection and reporting among studies within a particular population. We aim to develop a set of core outcomes to be used to measure the impact of end-of-life prognostication in palliative cancer care. Objective: This protocol outlines the proposed methodology to develop a core outcome set for measuring the impact of end-of-life prognostication in palliative cancer care. Methods: We will adopt a mixed methods approach consisting of 3 phases using methodology recommended by the Core Outcome Measure in Effectiveness Trials (COMET) initiative. In phase I, we will conduct a systematic review to identify existing outcomes that prognostic studies have previously used, so as to inform the development of items and domains for the proposed core outcome set. Phase II will consist of semistructured interviews with patients with advanced cancer who are receiving palliative care, informal caregivers, and clinicians, to explore their perceptions and experiences of end-of-life prognostication. Outcomes identified in the interviews will be combined with those found in existing literature and taken forward to phase III, a Delphi survey, in which we will ask patients, informal caregivers, clinicians, and relevant researchers to rate these outcomes until consensus is achieved as to which are considered to be the most important for inclusion in the core outcome set. The resulting, prioritized outcomes will be discussed in a consensus meeting to agree and endorse the final core outcome set. Results: Ethical approval was received for this study in September 2022. As of July 2023, we have completed and published the systematic review (phase I) and have started recruitment for phase II. Data analysis for phase II has not yet started. We expect to complete the study by October 2024. Conclusions: This protocol presents the stepwise approach that will be taken to develop a core outcome set for measuring the impact of end-of-life prognostication in palliative cancer care. The final core outcome set has the potential for translation into clinical practice, allowing for consistent evaluation of emerging prognostic algorithms and improving communication of end-of-life prognostication. This study will also potentially facilitate the design of future clinical trials of the impact of end-of-life prognostication in palliative care that are acceptable to key stakeholders. Trial Registration: Core Outcome Measures in Effectiveness Trials 2136; https://www.comet-initiative.org/Studies/Details/2136 International Registered Report Identifier (IRRID): DERR1-10.2196/49774 ", doi="10.2196/49774", url="https://www.researchprotocols.org/2023/1/e49774", url="http://www.ncbi.nlm.nih.gov/pubmed/37656505" } @Article{info:doi/10.2196/47260, author="Asan, Onur and Choi, Euiji and Wang, Xiaomei", title="Artificial Intelligence--Based Consumer Health Informatics Application: Scoping Review", journal="J Med Internet Res", year="2023", month="Aug", day="30", volume="25", pages="e47260", keywords="consumer informatics", keywords="artificial intelligence", keywords="mobile health", keywords="mHealth", keywords="patient outcomes", keywords="personalized health care", keywords="machine learning", keywords="digital health", keywords="mobile phone", abstract="Background: There is no doubt that the recent surge in artificial intelligence (AI) research will change the trajectory of next-generation health care, making it more approachable and accessible to patients. Therefore, it is critical to research patient perceptions and outcomes because this trend will allow patients to be the primary consumers of health technology and decision makers for their own health. Objective: This study aimed to review and analyze papers on AI-based consumer health informatics (CHI) for successful future patient-centered care. Methods: We searched for all peer-reviewed papers in PubMed published in English before July 2022. Research on an AI-based CHI tool or system that reports patient outcomes or perceptions was identified for the scoping review. Results: We identified 20 papers that met our inclusion criteria. The eligible studies were summarized and discussed with respect to the role of the AI-based CHI system, patient outcomes, and patient perceptions. The AI-based CHI systems identified included systems in mobile health (13/20, 65\%), robotics (5/20, 25\%), and telemedicine (2/20, 10\%). All the systems aimed to provide patients with personalized health care. Patient outcomes and perceptions across various clinical disciplines were discussed, demonstrating the potential of an AI-based CHI system to benefit patients. Conclusions: This scoping review showed the trend in AI-based CHI systems and their impact on patient outcomes as well as patients' perceptions of these systems. Future studies should also explore how clinicians and health care professionals perceive these consumer-based systems and integrate them into the overall workflow. ", doi="10.2196/47260", url="https://www.jmir.org/2023/1/e47260", url="http://www.ncbi.nlm.nih.gov/pubmed/37647122" } @Article{info:doi/10.2196/44983, author="Stremmel, Christopher and Breitschwerdt, R{\"u}diger", title="Digital Transformation in the Diagnostics and Therapy of Cardiovascular Diseases: Comprehensive Literature Review", journal="JMIR Cardio", year="2023", month="Aug", day="30", volume="7", pages="e44983", keywords="cardiovascular", keywords="digital medicine", keywords="telehealth", keywords="artificial intelligence", keywords="telemedicine", keywords="mobile phone", keywords="review", abstract="Background: The digital transformation of our health care system has experienced a clear shift in the last few years due to political, medical, and technical innovations and reorganization. In particular, the cardiovascular field has undergone a significant change, with new broad perspectives in terms of optimized treatment strategies for patients nowadays. Objective: After a short historical introduction, this comprehensive literature review aimed to provide a detailed overview of the scientific evidence regarding digitalization in the diagnostics and therapy of cardiovascular diseases (CVDs). Methods: We performed an extensive literature search of the PubMed database and included all related articles that were published as of March 2022. Of the 3021 studies identified, 1639 (54.25\%) studies were selected for a structured analysis and presentation (original articles: n=1273, 77.67\%; reviews or comments: n=366, 22.33\%). In addition to studies on CVDs in general, 829 studies could be assigned to a specific CVD with a diagnostic and therapeutic approach. For data presentation, all 829 publications were grouped into 6 categories of CVDs. Results: Evidence-based innovations in the cardiovascular field cover a wide medical spectrum, starting from the diagnosis of congenital heart diseases or arrhythmias and overoptimized workflows in the emergency care setting of acute myocardial infarction to telemedical care for patients having chronic diseases such as heart failure, coronary artery disease, or hypertension. The use of smartphones and wearables as well as the integration of artificial intelligence provides important tools for location-independent medical care and the prevention of adverse events. Conclusions: Digital transformation has opened up multiple new perspectives in the cardiovascular field, with rapidly expanding scientific evidence. Beyond important improvements in terms of patient care, these innovations are also capable of reducing costs for our health care system. In the next few years, digital transformation will continue to revolutionize the field of cardiovascular medicine and broaden our medical and scientific horizons. ", doi="10.2196/44983", url="https://cardio.jmir.org/2023/1/e44983", url="http://www.ncbi.nlm.nih.gov/pubmed/37647103" } @Article{info:doi/10.2196/44483, author="van Rossum, C. Mathilde and Bekhuis, M. Robin E. and Wang, Ying and Hegeman, H. Johannes and Folbert, C. Ellis and Vollenbroek-Hutten, R. Miriam M. and Kalkman, J. Cornelis and Kouwenhoven, A. Ewout and Hermens, J. Hermie", title="Early Warning Scores to Support Continuous Wireless Vital Sign Monitoring for Complication Prediction in Patients on Surgical Wards: Retrospective Observational Study", journal="JMIR Perioper Med", year="2023", month="Aug", day="30", volume="6", pages="e44483", keywords="early warning scores", keywords="vital signs", keywords="telemedicine", keywords="physiological monitoring", keywords="clinical alarms", keywords="postoperative complications", keywords="perioperative nursing", abstract="Background: Wireless vital sign sensors are increasingly being used to monitor patients on surgical wards. Although early warning scores (EWSs) are the current standard for the identification of patient deterioration in a ward setting, their usefulness for continuous monitoring is unknown. Objective: This study aimed to explore the usability and predictive value of high-rate EWSs obtained from continuous vital sign recordings for early identification of postoperative complications and compares the performance of a sensor-based EWS alarm system with manual intermittent EWS measurements and threshold alarms applied to individual vital sign recordings (single-parameter alarms). Methods: Continuous vital sign measurements (heart rate, respiratory rate, blood oxygen saturation, and axillary temperature) collected with wireless sensors in patients on surgical wards were used for retrospective simulation of EWSs (sensor EWSs) for different time windows (1-240 min), adopting criteria similar to EWSs based on manual vital signs measurements (nurse EWSs). Hourly sensor EWS measurements were compared between patients with (event group: 14/46, 30\%) and without (control group: 32/46, 70\%) postoperative complications. In addition, alarms were simulated for the sensor EWSs using a range of alarm thresholds (1-9) and compared with alarms based on nurse EWSs and single-parameter alarms. Alarm performance was evaluated using the sensitivity to predict complications within 24 hours, daily alarm rate, and false discovery rate (FDR). Results: The hourly sensor EWSs of the event group (median 3.4, IQR 3.1-4.1) was significantly higher (P<.004) compared with the control group (median 2.8, IQR 2.4-3.2). The alarm sensitivity of the hourly sensor EWSs was the highest (80\%-67\%) for thresholds of 3 to 5, which was associated with alarm rates of 2 (FDR=85\%) to 1.2 (FDR=83\%) alarms per patient per day respectively. The sensitivity of sensor EWS--based alarms was higher than that of nurse EWS--based alarms (maximum=40\%) but lower than that of single-parameter alarms (87\%) for all thresholds. In contrast, the (false) alarm rates of sensor EWS--based alarms were higher than that of nurse EWS--based alarms (maximum=0.6 alarm/patient/d; FDR=80\%) but lower than that of single-parameter alarms (2 alarms/patient/d; FDR=84\%) for most thresholds. Alarm rates for sensor EWSs increased for shorter time windows, reaching 70 alarms per patient per day when calculated every minute. Conclusions: EWSs obtained using wireless vital sign sensors may contribute to the early recognition of postoperative complications in a ward setting, with higher alarm sensitivity compared with manual EWS measurements. Although hourly sensor EWSs provide fewer alarms compared with single-parameter alarms, high false alarm rates can be expected when calculated over shorter time spans. Further studies are recommended to optimize care escalation criteria for continuous monitoring of vital signs in a ward setting and to evaluate the effects on patient outcomes. ", doi="10.2196/44483", url="https://periop.jmir.org/2023/1/e44483", url="http://www.ncbi.nlm.nih.gov/pubmed/37647104" } @Article{info:doi/10.2196/49283, author="Lee, Seungseok and Kang, Seong Wu and Kim, Wan Do and Seo, Hyun Sang and Kim, Joongsuck and Jeong, Tak Soon and Yon, Keon Dong and Lee, Jinseok", title="An Artificial Intelligence Model for Predicting Trauma Mortality Among Emergency Department Patients in South Korea: Retrospective Cohort Study", journal="J Med Internet Res", year="2023", month="Aug", day="29", volume="25", pages="e49283", keywords="artificial intelligence", keywords="trauma", keywords="mortality prediction", keywords="international classification of disease", keywords="emergency department", keywords="ICD", keywords="model", keywords="models", keywords="mortality", keywords="predict", keywords="prediction", keywords="predictive", keywords="emergency", keywords="death", keywords="traumatic", keywords="nationwide", keywords="national", keywords="cohort", keywords="retrospective", abstract="Background: Within the trauma system, the emergency department (ED) is the hospital's first contact and is vital for allocating medical resources. However, there is generally limited information about patients that die in the ED. Objective: The aim of this study was to develop an artificial intelligence (AI) model to predict trauma mortality and analyze pertinent mortality factors for all patients visiting the ED. Methods: We used the Korean National Emergency Department Information System (NEDIS) data set (N=6,536,306), incorporating over 400 hospitals between 2016 and 2019. We included the International Classification of Disease 10th Revision (ICD-10) codes and chose the following input features to predict ED patient mortality: age, sex, intentionality, injury, emergent symptom, Alert/Verbal/Painful/Unresponsive (AVPU) scale, Korean Triage and Acuity Scale (KTAS), and vital signs. We compared three different feature set performances for AI input: all features (n=921), ICD-10 features (n=878), and features excluding ICD-10 codes (n=43). We devised various machine learning models with an ensemble approach via 5-fold cross-validation and compared the performance of each model with that of traditional prediction models. Lastly, we investigated explainable AI feature effects and deployed our final AI model on a public website, providing access to our mortality prediction results among patients visiting the ED. Results: Our proposed AI model with the all-feature set achieved the highest area under the receiver operating characteristic curve (AUROC) of 0.9974 (adaptive boosting [AdaBoost], AdaBoost + light gradient boosting machine [LightGBM]: Ensemble), outperforming other state-of-the-art machine learning and traditional prediction models, including extreme gradient boosting (AUROC=0.9972), LightGBM (AUROC=0.9973), ICD-based injury severity scores (AUC=0.9328 for the inclusive model and AUROC=0.9567 for the exclusive model), and KTAS (AUROC=0.9405). In addition, our proposed AI model outperformed a cutting-edge AI model designed for in-hospital mortality prediction (AUROC=0.7675) for all ED visitors. From the AI model, we also discovered that age and unresponsiveness (coma) were the top two mortality predictors among patients visiting the ED, followed by oxygen saturation, multiple rib fractures (ICD-10 code S224), painful response (stupor, semicoma), and lumbar vertebra fracture (ICD-10 code S320). Conclusions: Our proposed AI model exhibits remarkable accuracy in predicting ED mortality. Including the necessity for external validation, a large nationwide data set would provide a more accurate model and minimize overfitting. We anticipate that our AI-based risk calculator tool will substantially aid health care providers, particularly regarding triage and early diagnosis for trauma patients. ", doi="10.2196/49283", url="https://www.jmir.org/2023/1/e49283", url="http://www.ncbi.nlm.nih.gov/pubmed/37642984" } @Article{info:doi/10.2196/42129, author="Gassner, Mathias and Barranco Garcia, Javier and Tanadini-Lang, Stephanie and Bertoldo, Fabio and Fr{\"o}hlich, Fabienne and Guckenberger, Matthias and Haueis, Silvia and Pelzer, Christin and Reyes, Mauricio and Schmithausen, Patrick and Simic, Dario and Staeger, Ramon and Verardi, Fabio and Andratschke, Nicolaus and Adelmann, Andreas and Braun, P. Ralph", title="Saliency-Enhanced Content-Based Image Retrieval for Diagnosis Support in Dermatology Consultation: Reader Study", journal="JMIR Dermatol", year="2023", month="Aug", day="24", volume="6", pages="e42129", keywords="dermatology", keywords="deep learning", keywords="melanoma", keywords="saliency maps", keywords="image retrieval", keywords="dermoscopy", keywords="skin cancer", keywords="diagnosis", keywords="algorithms", keywords="convolutional neural network", keywords="dermoscopic images", abstract="Background: Previous research studies have demonstrated that medical content image retrieval can play an important role by assisting dermatologists in skin lesion diagnosis. However, current state-of-the-art approaches have not been adopted in routine consultation, partly due to the lack of interpretability limiting trust by clinical users. Objective: This study developed a new image retrieval architecture for polarized or dermoscopic imaging guided by interpretable saliency maps. This approach provides better feature extraction, leading to better quantitative retrieval performance as well as providing interpretability for an eventual real-world implementation. Methods: Content-based image retrieval (CBIR) algorithms rely on the comparison of image features embedded by convolutional neural network (CNN) against a labeled data set. Saliency maps are computer vision--interpretable methods that highlight the most relevant regions for the prediction made by a neural network. By introducing a fine-tuning stage that includes saliency maps to guide feature extraction, the accuracy of image retrieval is optimized. We refer to this approach as saliency-enhanced CBIR (SE-CBIR). A reader study was designed at the University Hospital Zurich Dermatology Clinic to evaluate SE-CBIR's retrieval accuracy as well as the impact of the participant's confidence on the diagnosis. Results: SE-CBIR improved the retrieval accuracy by 7\% (77\% vs 84\%) when doing single-lesion retrieval against traditional CBIR. The reader study showed an overall increase in classification accuracy of 22\% (62\% vs 84\%) when the participant is provided with SE-CBIR retrieved images. In addition, the overall confidence in the lesion's diagnosis increased by 24\%. Finally, the use of SE-CBIR as a support tool helped the participants reduce the number of nonmelanoma lesions previously diagnosed as melanoma (overdiagnosis) by 53\%. Conclusions: SE-CBIR presents better retrieval accuracy compared to traditional CBIR CNN-based approaches. Furthermore, we have shown how these support tools can help dermatologists and residents improve diagnosis accuracy and confidence. Additionally, by introducing interpretable methods, we should expect increased acceptance and use of these tools in routine consultation. ", doi="10.2196/42129", url="https://derma.jmir.org/2023/1/e42129", url="http://www.ncbi.nlm.nih.gov/pubmed/37616039" } @Article{info:doi/10.2196/47335, author="Nair, Monika and Andersson, Jonas and Nygren, M. Jens and Lundgren, E. Lina", title="Barriers and Enablers for Implementation of an Artificial Intelligence--Based Decision Support Tool to Reduce the Risk of Readmission of Patients With Heart Failure: Stakeholder Interviews", journal="JMIR Form Res", year="2023", month="Aug", day="23", volume="7", pages="e47335", keywords="implementation", keywords="AI systems", keywords="health care", keywords="interviews", keywords="artificial Intelligence", keywords="AI", keywords="decision support tool", keywords="readmission", keywords="prediction", keywords="heart failure", keywords="digital tool", abstract="Background: Artificial intelligence (AI) applications in health care are expected to provide value for health care organizations, professionals, and patients. However, the implementation of such systems should be carefully planned and organized in order to ensure quality, safety, and acceptance. The gathered view of different stakeholders is a great source of information to understand the barriers and enablers for implementation in a specific context. Objective: This study aimed to understand the context and stakeholder perspectives related to the future implementation of a clinical decision support system for predicting readmissions of patients with heart failure. The study was part of a larger project involving model development, interface design, and implementation planning of the system. Methods: Interviews were held with 12 stakeholders from the regional and municipal health care organizations to gather their views on the potential effects implementation of such a decision support system could have as well as barriers and enablers for implementation. Data were analyzed based on the categories defined in the nonadoption, abandonment, scale-up, spread, sustainability (NASSS) framework. Results: Stakeholders had in general a positive attitude and curiosity toward AI-based decision support systems, and mentioned several barriers and enablers based on the experiences of previous implementations of information technology systems. Central aspects to consider for the proposed clinical decision support system were design aspects, access to information throughout the care process, and integration into the clinical workflow. The implementation of such a system could lead to a number of effects related to both clinical outcomes as well as resource allocation, which are all important to address in the planning of implementation. Stakeholders saw, however, value in several aspects of implementing such system, emphasizing the increased quality of life for those patients who can avoid being hospitalized. Conclusions: Several ideas were put forward on how the proposed AI system would potentially affect and provide value for patients, professionals, and the organization, and implementation aspects were important parts of that. A successful system can help clinicians to prioritize the need for different types of treatments but also be used for planning purposes within the hospital. However, the system needs not only technological and clinical precision but also a carefully planned implementation process. Such a process should take into consideration the aspects related to all the categories in the NASSS framework. This study further highlighted the importance to study stakeholder needs early in the process of development, design, and implementation of decision support systems, as the data revealed new information on the potential use of the system and the placement of the application in the care process. ", doi="10.2196/47335", url="https://formative.jmir.org/2023/1/e47335", url="http://www.ncbi.nlm.nih.gov/pubmed/37610799" } @Article{info:doi/10.2196/48659, author="Rao, Arya and Pang, Michael and Kim, John and Kamineni, Meghana and Lie, Winston and Prasad, K. Anoop and Landman, Adam and Dreyer, Keith and Succi, D. Marc", title="Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study", journal="J Med Internet Res", year="2023", month="Aug", day="22", volume="25", pages="e48659", keywords="large language models", keywords="LLMs", keywords="artificial intelligence", keywords="AI", keywords="clinical decision support", keywords="clinical vignettes", keywords="ChatGPT", keywords="Generative Pre-trained Transformer", keywords="GPT", keywords="utility", keywords="development", keywords="usability", keywords="chatbot", keywords="accuracy", keywords="decision-making", abstract="Background: Large language model (LLM)--based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated. Objective: This study aimed to evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. Methods: We inputted all 36 published clinical vignettes from the Merck Sharpe \& Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT's performance on clinical tasks. Results: ChatGPT achieved an overall accuracy of 71.7\% (95\% CI 69.3\%-74.1\%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9\% (95\% CI 67.8\%-86.1\%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3\% (95\% CI 54.2\%-66.6\%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis ($\beta$=--15.8\%; P<.001) and clinical management ($\beta$=--7.4\%; P=.02) question types. Conclusions: ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT's training data set. ", doi="10.2196/48659", url="https://www.jmir.org/2023/1/e48659", url="http://www.ncbi.nlm.nih.gov/pubmed/37606976" } @Article{info:doi/10.2196/41552, author="Nakikj, Drashko and Kreda, David and Gehlenborg, Nils", title="Alerts and Collections for Automating Patients' Sensemaking and Organizing of Their Electronic Health Record Data for Reflection, Planning, and Clinical Visits: Qualitative Research-Through-Design Study", journal="JMIR Hum Factors", year="2023", month="Aug", day="21", volume="10", pages="e41552", keywords="patients", keywords="electronic health records", keywords="sensemaking", keywords="pattern detection", keywords="data organization", keywords="alerts", keywords="reports", keywords="collections", abstract="Background: Electronic health record (EHR) data from multiple providers often exhibit important but convoluted and complex patterns that patients find hard and time-consuming to identify and interpret. However, existing patient-facing applications lack the capability to incorporate automatic pattern detection robustly and toward supporting making sense of the patient's EHR data. In addition, there is no means to organize EHR data in an efficient way that suits the patient's needs and makes them more actionable in real-life settings. These shortcomings often result in a skewed and incomplete picture of the patient's health status, which may lead to suboptimal decision-making and actions that put the patient at risk. Objective: Our main goal was to investigate patients' attitudes, needs, and use scenarios with respect to automatic support for surfacing important patterns in their EHR data and providing means for organizing them that best suit patients' needs. Methods: We conducted an inquisitive research-through-design study with 14 participants. Presented in the context of a cutting-edge application with strong emphasis on independent EHR data sensemaking, called Discovery, we used high-level mock-ups for the new features that were supposed to support automatic identification of important data patterns and offer recommendations---Alerts---and means for organizing the medical records based on patients' needs, much like photos in albums---Collections. The combined audio recording transcripts and in-study notes were analyzed using the reflexive thematic analysis approach. Results: The Alerts and Collections can be used for raising awareness, reflection, planning, and especially evidence-based patient-provider communication. Moreover, patients desired carefully designed automatic pattern detection with safe and actionable recommendations, which produced a well-tailored and scoped landscape of alerts for both potential threats and positive progress. Furthermore, patients wanted to contribute their own data (eg, progress notes) and log feelings, daily observations, and measurements to enrich the meaning and enable easier sensemaking of the alerts and collections. On the basis of the findings, we renamed Alerts to Reports for a more neutral tone and offered design implications for contextualizing the reports more deeply for increased actionability; automatically generating the collections for more expedited and exhaustive organization of the EHR data; enabling patient-generated data input in various formats to support coarser organization, richer pattern detection, and learning from experience; and using the reports and collections for efficient, reliable, and common-ground patient-provider communication. Conclusions: Patients need to have a flexible and rich way to organize and annotate their EHR data; be introduced to insights from these data---both positive and negative; and share these artifacts with their physicians in clinical visits or via messaging for establishing shared mental models for clear goals, agreed-upon priorities, and feasible actions. ", doi="10.2196/41552", url="https://humanfactors.jmir.org/2023/1/e41552", url="http://www.ncbi.nlm.nih.gov/pubmed/37603400" } @Article{info:doi/10.2196/47366, author="Liu, Jen-Hsuan and Shih, Chih-Yuan and Huang, Hsien-Liang and Peng, Jen-Kuei and Cheng, Shao-Yi and Tsai, Jaw-Shiun and Lai, Feipei", title="Evaluating the Potential of Machine Learning and Wearable Devices in End-of-Life Care in Predicting 7-Day Death Events Among Patients With Terminal Cancer: Cohort Study", journal="J Med Internet Res", year="2023", month="Aug", day="18", volume="25", pages="e47366", keywords="artificial intelligence", keywords="end-of-life care", keywords="machine learning", keywords="palliative care", keywords="survival prediction", keywords="terminal cancer", keywords="wearable device", abstract="Background: An accurate prediction of mortality in end-of-life care is crucial but presents challenges. Existing prognostic tools demonstrate moderate performance in predicting survival across various time frames, primarily in in-hospital settings and single-time evaluations. However, these tools may fail to capture the individualized and diverse trajectories of patients. Limited evidence exists regarding the use of artificial intelligence (AI) and wearable devices, specifically among patients with cancer at the end of life. Objective: This study aimed to investigate the potential of using wearable devices and AI to predict death events among patients with cancer at the end of life. Our hypothesis was that continuous monitoring through smartwatches can offer valuable insights into the progression of patients at the end of life and enable the prediction of changes in their condition, which could ultimately enhance personalized care, particularly in outpatient or home care settings. Methods: This prospective study was conducted at the National Taiwan University Hospital. Patients diagnosed with cancer and receiving end-of-life care were invited to enroll in wards, outpatient clinics, and home-based care settings. Each participant was given a smartwatch to collect physiological data, including steps taken, heart rate, sleep time, and blood oxygen saturation. Clinical assessments were conducted weekly. The participants were followed until the end of life or up to 52 weeks. With these input features, we evaluated the prediction performance of several machine learning--based classifiers and a deep neural network in 7-day death events. We used area under the receiver operating characteristic curve (AUROC), F1-score, accuracy, and specificity as evaluation metrics. A Shapley additive explanations value analysis was performed to further explore the models with good performance. Results: From September 2021 to August 2022, overall, 1657 data points were collected from 40 patients with a median survival time of 34 days, with the detection of 28 death events. Among the proposed models, extreme gradient boost (XGBoost) yielded the best result, with an AUROC of 96\%, F1-score of 78.5\%, accuracy of 93\%, and specificity of 97\% on the testing set. The Shapley additive explanations value analysis identified the average heart rate as the most important feature. Other important features included steps taken, appetite, urination status, and clinical care phase. Conclusions: We demonstrated the successful prediction of patient deaths within the next 7 days using a combination of wearable devices and AI. Our findings highlight the potential of integrating AI and wearable technology into clinical end-of-life care, offering valuable insights and supporting clinical decision-making for personalized patient care. It is important to acknowledge that our study was conducted in a relatively small cohort; thus, further research is needed to validate our approach and assess its impact on clinical care. Trial Registration: ClinicalTrials.gov NCT05054907; https://classic.clinicaltrials.gov/ct2/show/NCT05054907 ", doi="10.2196/47366", url="https://www.jmir.org/2023/1/e47366", url="http://www.ncbi.nlm.nih.gov/pubmed/37594793" } @Article{info:doi/10.2196/47142, author="Costa, Lemos Wilbert Dener and de Oliveira, Maicon Alan and Aguilar, Jos{\'e} Guilherme and dos Santos, Costa Luana Michelly Aparecida and dos Santos, Albano Luiz Ricardo and Donato, Barros Dantony de Castro and Foresto, Felipe and Frade, Cipriani Marco Andrey", title="A Review of Software and Mobile Apps to Support the Clinical Diagnosis of Hansen Disease", journal="JMIR Dermatol", year="2023", month="Aug", day="18", volume="6", pages="e47142", keywords="software", keywords="mobile apps", keywords="leprosy", keywords="medical informatics", keywords="Mycobacterium leprae", keywords="clinical diagnosis", keywords="Hansen disease", keywords="mHealth", keywords="mobile health", keywords="mobile app", keywords="Hansen", keywords="dermatology", keywords="scoping review", keywords="skin", keywords="diagnosis", keywords="diagnostic", doi="10.2196/47142", url="https://derma.jmir.org/2023/1/e47142", url="http://www.ncbi.nlm.nih.gov/pubmed/37594779" } @Article{info:doi/10.2196/45820, author="Gompels, Ben and Rusby, Tobin and Limb, Richard and Ralte, Peter", title="Diagnostic Accuracy and Confidence in Management of Forearm and Hand Fractures Among Foundation Doctors in the Accident and Emergency Department: Survey Study", journal="JMIR Form Res", year="2023", month="Aug", day="18", volume="7", pages="e45820", keywords="education", keywords="diagnostic accuracy", keywords="doctor", keywords="fracture", keywords="x-ray", keywords="radiograph", keywords="diagnostic error", keywords="patient safety", abstract="Background: Accurate interpretation of radiographs is crucial for junior doctors in the accident and emergency (A\&E) department (the emergency medicine department). However, it remains a significant challenge and a leading cause of diagnostic errors. Objective: This study aimed to evaluate the accuracy and confidence of foundation doctors (doctors within their first 2 years of qualifying) in correctly interpreting and managing forearm and hand fractures on plain radiographs. Methods: A total of 42 foundation doctors with less than 2 years of experience and no prior emergency medicine training who worked in a large district general hospital participated in a web-based questionnaire. The questionnaire consisted of 3 case studies: distal radius fracture, scaphoid fracture, and a normal radiograph. Respondents were required to identify the presence or absence of a fracture, determine the fracture location, suggest appropriate management, and rate their confidence on a Likert scale. Results: Overall, 48\% (61/126) of respondents accurately identified the presence and location of fractures. The correct management option was chosen by 64\% (81/126) of respondents. The median diagnostic confidence score was 4 of 10, with a mean diagnostic certainty of 4.4 of 10. Notably, respondents exhibited a significantly lower confidence score for the normal radiograph compared to the distal radius fracture radiograph (P=.01). Conclusions: This study reveals diagnostic uncertainty among foundation doctors in interpreting plain radiographs, with a notable inclination toward overdiagnosing fractures. The findings emphasize the need for close supervision and senior support to mitigate diagnostic errors. Further training and educational interventions are warranted to improve the accuracy and confidence of junior doctors in radiographic interpretation. This study has several limitations, including a small sample size and reliance on self-reported data. The findings may not be generalizable to other health care settings or specialties. Future research should aim for larger, more diverse samples and explore the impact of specific educational interventions on diagnostic accuracy and confidence. ", doi="10.2196/45820", url="https://formative.jmir.org/2023/1/e45820", url="http://www.ncbi.nlm.nih.gov/pubmed/37594796" } @Article{info:doi/10.2196/46854, author="Yi, Min and Cao, Yuebin and Wang, Lin and Gu, Yaowen and Zheng, Xueqian and Wang, Jiangjun and Chen, Wei and Wei, Liangyu and Zhou, Yujin and Shi, Chenyi and Cao, Yanlin", title="Prediction of Medical Disputes Between Health Care Workers and Patients in Terms of Hospital Legal Construction Using Machine Learning Techniques: Externally Validated Cross-Sectional Study", journal="J Med Internet Res", year="2023", month="Aug", day="17", volume="25", pages="e46854", keywords="medical workers", keywords="medical disputes", keywords="hospital legal construction", keywords="machine learning", keywords="multicenter analysis", abstract="Background: Medical disputes are a global public health issue that is receiving increasing attention. However, studies investigating the relationship between hospital legal construction and medical disputes are scarce. The development of a multicenter model incorporating machine learning (ML) techniques for the individualized prediction of medical disputes would be beneficial for medical workers. Objective: This study aimed to identify predictors related to medical disputes from the perspective of hospital legal construction and the use of ML techniques to build models for predicting the risk of medical disputes. Methods: This study enrolled 38,053 medical workers from 130 tertiary hospitals in Hunan province, China. The participants were randomly divided into a training cohort (34,286/38,053, 90.1\%) and an internal validation cohort (3767/38,053, 9.9\%). Medical workers from 87 tertiary hospitals in Beijing were included in an external validation cohort (26,285/26,285, 100\%). This study used logistic regression and 5 ML techniques: decision tree, random forest, support vector machine, gradient boosting decision tree (GBDT), and deep neural network. In total, 12 metrics, including discrimination and calibration, were used for performance evaluation. A scoring system was developed to select the optimal model. Shapley additive explanations was used to generate the importance coefficients for characteristics. To promote the clinical practice of our proposed optimal model, reclassification of patients was performed, and a web-based app for medical dispute prediction was created, which can be easily accessed by the public. Results: Medical disputes occurred among 46.06\% (17,527/38,053) of the medical workers in Hunan province, China. Among the 26 clinical characteristics, multivariate analysis demonstrated that 18 characteristics were significantly associated with medical disputes, and these characteristics were used for ML model development. Among the ML techniques, GBDT was identified as the optimal model, demonstrating the lowest Brier score (0.205), highest area under the receiver operating characteristic curve (0.738, 95\% CI 0.722-0.754), and the largest discrimination slope (0.172) and Youden index (1.355). In addition, it achieved the highest metrics score (63 points), followed by deep neural network (46 points) and random forest (45 points), in the internal validation set. In the external validation set, GBDT still performed comparably, achieving the second highest metrics score (52 points). The high-risk group had more than twice the odds of experiencing medical disputes compared with the low-risk group. Conclusions: We established a prediction model to stratify medical workers into different risk groups for encountering medical disputes. Among the 5 ML models, GBDT demonstrated the optimal comprehensive performance and was used to construct the web-based app. Our proposed model can serve as a useful tool for identifying medical workers at high risk of medical disputes. We believe that preventive strategies should be implemented for the high-risk group. ", doi="10.2196/46854", url="https://www.jmir.org/2023/1/e46854", url="http://www.ncbi.nlm.nih.gov/pubmed/37590041" } @Article{info:doi/10.2196/45043, author="Yao, Yingwei and Dunn Lopez, Karen and Bjarnadottir, I. Ragnhildur and Macieira, R. Tamara G. and Dos Santos, Cristina Fabiana and Madandola, O. Olatunde and Cho, Hwayoung and Priola, B. Karen J. and Wolf, Jessica and Wilkie, J. Diana and Keenan, Gail", title="Examining Care Planning Efficiency and Clinical Decision Support Adoption in a System Tailoring to Nurses' Graph Literacy: National, Web-Based Randomized Controlled Trial", journal="J Med Internet Res", year="2023", month="Aug", day="11", volume="25", pages="e45043", keywords="clinical decision support", keywords="nurse decision-making", keywords="nurse care planning", keywords="simulation", keywords="remote testing", keywords="tailored interfaces", keywords="graph literacy", keywords="cognitive workload", abstract="Background: The proliferation of health care data in electronic health records (EHRs) is fueling the need for clinical decision support (CDS) that ensures accuracy and reduces cognitive processing and documentation burden. The CDS format can play a key role in achieving the desired outcomes. Building on our laboratory-based pilot study with 60 registered nurses (RNs) from 1 Midwest US metropolitan area indicating the importance of graph literacy (GL), we conducted a fully powered, innovative, national, and web-based randomized controlled trial with 203 RNs. Objective: This study aimed to compare care planning time (CPT) and the adoption of evidence-based CDS recommendations by RNs randomly assigned to 1 of 4 CDS format groups: text only (TO), text+table (TT), text+graph (TG), and tailored (based on the RN's GL score). We hypothesized that the tailored CDS group will have faster CPT (primary) and higher adoption rates (secondary) than the 3 nontailored CDS groups. Methods: Eligible RNs employed in an adult hospital unit within the past 2 years were recruited randomly from 10 State Board of Nursing lists representing the 5 regions of the United States (Northeast, Southeast, Midwest, Southwest, and West) to participate in a randomized controlled trial. RNs were randomly assigned to 1 of 4 CDS format groups---TO, TT, TG, and tailored (based on the RN's GL score)---and interacted with the intervention on their PCs. Regression analysis was performed to estimate the effect of tailoring and the association between CPT and RN characteristics. Results: The differences between the tailored (n=46) and nontailored (TO, n=55; TT, n=54; and TG, n=48) CDS groups were not significant for either the CPT or the CDS adoption rate. RNs with low GL had longer CPT interacting with the TG CDS format than the TO CDS format (P=.01). The CPT in the TG CDS format was associated with age (P=.02), GL (P=.02), and comfort with EHRs (P=.047). Comfort with EHRs was also associated with CPT in the TT CDS format (P<.001). Conclusions: Although tailoring based on GL did not improve CPT or adoption, the study reinforced previous pilot findings that low GL is associated with longer CPT when graphs were included in care planning CDS. Higher GL, younger age, and comfort with EHRs were associated with shorter CPT. These findings are robust based on our new innovative testing strategy in which a diverse national sample of RN participants (randomly derived from 10 State Board of Nursing lists) interacted on the web with the intervention on their PCs. Future studies applying our innovative methodology are recommended to cost-effectively enhance the understanding of how the RN's GL, combined with additional factors, can inform the development of efficient CDS for care planning and other EHR components before use in practice. ", doi="10.2196/45043", url="https://www.jmir.org/2023/1/e45043", url="http://www.ncbi.nlm.nih.gov/pubmed/37566456" } @Article{info:doi/10.2196/46252, author="Javorszky, Maria Susanne and Reiter, Raphael and Iglseder, Bernhard", title="Validation of a Geriatric Bedside Swallowing Screen (GEBS): Protocol of a Prospective Cohort Study", journal="JMIR Res Protoc", year="2023", month="Aug", day="11", volume="12", pages="e46252", keywords="dysphagia", keywords="geriatrics", keywords="swallowing disorder", keywords="assessment", keywords="screening", keywords="cohort study", keywords="multimorbidity", keywords="hospital setting", abstract="Background: Demographic changes will raise the need for specialized care of older patients. Oropharyngeal dysphagia has recently been declared a geriatric syndrome reflecting its multifactorial background. Alongside multimorbidity, sarcopenia, frailty, and disability, swallowing disorders increase with advancing age, with prevalence rates reported to be as high as 44\% in acute geriatric hospital settings and 80\% in long-term care facilities. Hence, systematic screening of older patients to diagnose dysphagia and initiate treatment is of paramount importance to prevent bolus death, aspiration pneumonia, and malnutrition and improve quality of life. Several screening tools have been evaluated in emergency and stroke units. However, no published dysphagia screening tool has been validated in the hospitalized, older adult population using a gold standard in dysphagia diagnostics as a reference test. The validation of the proposed test is a first step. Objective: The Geriatric Bedside Swallowing Screen (GEBS) study aims to validate a new screening tool developed specifically for older inpatients against an instrumental swallowing evaluation, the flexible endoscopic evaluation of swallowing (FEES), which is considered a gold standard. Primary outcomes to be evaluated are sensitivity and specificity for the GEBS in the detection of dysphagia in a mixed older adult population. The presence of dysphagia will be defined by an instrumental swallowing evaluation (FEES), analyzed by the standardized penetration-aspiration scale. Methods: To validate the GEBS, a prospective cohort study will be carried out. Two institutions, an acute geriatric department and a long-term care facility, will aim to recruit a total of 100 patients aged ?75 years. After giving their informed consent, patients will undergo the full screening protocol described in the GEBS as well as an evaluation of swallowing function using the FEES. Investigators will be blinded to the results of the respective other testing. The analysis of pseudonymized data sets will be done by a third investigator. Outcomes to be considered are sensitivity, specificity, diagnostic odds ratio, positive and negative likelihood quotient, and the reliability of the proposed dysphagia screening tool using the $\kappa$ coefficient. Results: Recruitment started in October 2022 and will end in April 2024. Data publication is planned for early 2025. Conclusions: If proven to be a valid screening tool for the early detection of dysphagia, further studies including different older adult populations as well as studies to determine the impact of systematic dysphagia screening on parameters, such as rates of aspiration pneumonia or nutritional status, should be planned. Effective screening of dysphagia will lead to earlier detection of patients with impaired swallowing. Those who fail the screening will be referred to speech language pathology for further diagnosis, thus optimizing care while streamlining personnel resources. Trial Registration: ISCRTN Registry ISRCTN11581931; https://www.isrctn.com/ISRCTN11581931 International Registered Report Identifier (IRRID): DERR1-10.2196/46252 ", doi="10.2196/46252", url="https://www.researchprotocols.org/2023/1/e46252", url="http://www.ncbi.nlm.nih.gov/pubmed/37566452" } @Article{info:doi/10.2196/42153, author="Haniuda, Yu and Tsubaki, Michihiro and Ito, Yoshiyasu", title="Evaluating the Usability of Electronic Patient-Reported Outcome Apps: Comment on a Symptom Management Platform for Outpatients With Advanced Cancer", journal="JMIR Form Res", year="2023", month="Aug", day="7", volume="7", pages="e42153", keywords="electronic patient-reported outcome", keywords="symptom management", keywords="advanced cancer", keywords="outpatient", keywords="follow-up", keywords="cancer", doi="10.2196/42153", url="https://formative.jmir.org/2023/1/e42153", url="http://www.ncbi.nlm.nih.gov/pubmed/37548992" } @Article{info:doi/10.2196/45116, author="Ahmadi, Najia and Zoch, Michele and Kelbert, Patricia and Noll, Richard and Schaaf, Jannik and Wolfien, Markus and Sedlmayr, Martin", title="Methods Used in the Development of Common Data Models for Health Data: Scoping Review", journal="JMIR Med Inform", year="2023", month="Aug", day="3", volume="11", pages="e45116", keywords="common data model", keywords="common data elements", keywords="health data", keywords="electronic health record", keywords="Observational Medical Outcomes Partnership", keywords="stakeholder involvement", keywords="Data harmonisation", keywords="Interoperability", keywords="Standardized Data Repositories", keywords="Suggestive Development Process", keywords="Healthcare", keywords="Medical Informatics", keywords="", abstract="Background: Common data models (CDMs) are essential tools for data harmonization, which can lead to significant improvements in the health domain. CDMs unite data from disparate sources and ease collaborations across institutions, resulting in the generation of large standardized data repositories across different entities. An overview of existing CDMs and methods used to develop these data sets may assist in the development process of future models for the health domain, such as for decision support systems. Objective: This scoping review investigates methods used in the development of CDMs for health data. We aim to provide a broad overview of approaches and guidelines that are used in the development of CDMs (ie, common data elements or common data sets) for different health domains on an international level. Methods: This scoping review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist. We conducted the literature search in prominent databases, namely, PubMed, Web of Science, Science Direct, and Scopus, starting from January 2000 until March 2022. We identified and screened 1309 articles. The included articles were evaluated based on the type of adopted method, which was used in the conception, users' needs collection, implementation, and evaluation phases of CDMs, and whether stakeholders (such as medical experts, patients' representatives, and IT staff) were involved during the process. Moreover, the models were grouped into iterative or linear types based on the imperativeness of the stages during development. Results: We finally identified 59 articles that fit our eligibility criteria. Of these articles, 45 specifically focused on common medical conditions, 10 focused on rare medical conditions, and the remaining 4 focused on both conditions. The development process usually involved stakeholders but in different ways (eg, working group meetings, Delphi approaches, interviews, and questionnaires). Twenty-two models followed an iterative process. Conclusions: The included articles showed the diversity of methods used to develop a CDM in different domains of health. We highlight the need for more specialized CDM development methods in the health domain and propose a suggestive development process that might ease the development of CDMs in the health domain in the future. ", doi="10.2196/45116", url="https://medinform.jmir.org/2023/1/e45116", url="http://www.ncbi.nlm.nih.gov/pubmed/37535410" } @Article{info:doi/10.2196/46434, author="Amiri, Maryam and Li, Juan and Hasan, Wordh", title="Personalized Flexible Meal Planning for Individuals With Diet-Related Health Concerns: System Design and Feasibility Validation Study", journal="JMIR Form Res", year="2023", month="Aug", day="3", volume="7", pages="e46434", keywords="diabetes", keywords="fuzzy logic", keywords="meal planning", keywords="multicriteria decision-making", keywords="optimization", abstract="Background: Chronic diseases such as heart disease, stroke, diabetes, and hypertension are major global health challenges. Healthy eating can help people with chronic diseases manage their condition and prevent complications. However, making healthy meal plans is not easy, as it requires the consideration of various factors such as health concerns, nutritional requirements, tastes, economic status, and time limits. Therefore, there is a need for effective, affordable, and personalized meal planning that can assist people in choosing food that suits their individual needs and preferences. Objective: This study aimed to design an artificial intelligence (AI)--powered meal planner that can generate personalized healthy meal plans based on the user's specific health conditions, personal preferences, and status. Methods: We proposed a system that integrates semantic reasoning, fuzzy logic, heuristic search, and multicriteria analysis to produce flexible, optimized meal plans based on the user's health concerns, nutrition needs, as well as food restrictions or constraints, along with other personal preferences. Specifically, we constructed an ontology-based knowledge base to model knowledge about food and nutrition. We defined semantic rules to represent dietary guidelines for different health concerns and built a fuzzy membership of food nutrition based on the experience of experts to handle vague and uncertain nutritional data. We applied a semantic rule-based filtering mechanism to filter out food that violate mandatory health guidelines and constraints, such as allergies and religion. We designed a novel, heuristic search method that identifies the best meals among several candidates and evaluates them based on their fuzzy nutritional score. To select nutritious meals that also satisfy the user's other preferences, we proposed a multicriteria decision-making approach. Results: We implemented a mobile app prototype system and evaluated its effectiveness through a use case study and user study. The results showed that the system generated healthy and personalized meal plans that considered the user's health concerns, optimized nutrition values, respected dietary restrictions and constraints, and met the user's preferences. The users were generally satisfied with the system and its features. Conclusions: We designed an AI-powered meal planner that helps people create healthy and personalized meal plans based on their health conditions, preferences, and status. Our system uses multiple techniques to create optimized meal plans that consider multiple factors that affect food choice. Our evaluation tests confirmed the usability and feasibility of the proposed system. However, some limitations such as the lack of dynamic and real-time updates should be addressed in future studies. This study contributes to the development of AI-powered personalized meal planning systems that can support people's health and nutrition goals. ", doi="10.2196/46434", url="https://formative.jmir.org/2023/1/e46434", url="http://www.ncbi.nlm.nih.gov/pubmed/37535413" } @Article{info:doi/10.2196/48128, author="Hekman, J. Daniel and Cochran, L. Amy and Maru, P. Apoorva and Barton, J. Hanna and Shah, N. Manish and Wiegmann, Douglas and Smith, A. Maureen and Liao, Frank and Patterson, W. Brian", title="Effectiveness of an Emergency Department--Based Machine Learning Clinical Decision Support Tool to Prevent Outpatient Falls Among Older Adults: Protocol for a Quasi-Experimental Study", journal="JMIR Res Protoc", year="2023", month="Aug", day="3", volume="12", pages="e48128", keywords="falls", keywords="emergency medicine", keywords="machine learning", keywords="clinical decision support", keywords="automated screening", keywords="geriatrics", abstract="Background: Emergency department (ED) providers are important collaborators in preventing falls for older adults because they are often the first health care providers to see a patient after a fall and because at-home falls are often preceded by previous ED visits. Previous work has shown that ED referrals to falls interventions can reduce the risk of an at-home fall by 38\%. Screening patients at risk for a fall can be time-consuming and difficult to implement in the ED setting. Machine learning (ML) and clinical decision support (CDS) offer the potential of automating the screening process. However, it remains unclear whether automation of screening and referrals can reduce the risk of future falls among older patients. Objective: The goal of this paper is to describe a research protocol for evaluating the effectiveness of an automated screening and referral intervention. These findings will inform ongoing discussions about the use of ML and artificial intelligence to augment medical decision-making. Methods: To assess the effectiveness of our program for patients receiving the falls risk intervention, our primary analysis will be to obtain referral completion rates at 3 different EDs. We will use a quasi-experimental design known as a sharp regression discontinuity with regard to intent-to-treat, since the intervention is administered to patients whose risk score falls above a threshold. A conditional logistic regression model will be built to describe 6-month fall risk at each site as a function of the intervention, patient demographics, and risk score. The odds ratio of a return visit for a fall and the 95\% CI will be estimated by comparing those identified as high risk by the ML-based CDS (ML-CDS) and those who were not but had a similar risk profile. Results: The ML-CDS tool under study has been implemented at 2 of the 3 EDs in our study. As of April 2023, a total of 1326 patient encounters have been flagged for providers, and 339 unique patients have been referred to the mobility and falls clinic. To date, 15\% (45/339) of patients have scheduled an appointment with the clinic. Conclusions: This study seeks to quantify the impact of an ML-CDS intervention on patient behavior and outcomes. Our end-to-end data set allows for a more meaningful analysis of patient outcomes than other studies focused on interim outcomes, and our multisite implementation plan will demonstrate applicability to a broad population and the possibility to adapt the intervention to other EDs and achieve similar results. Our statistical methodology, regression discontinuity design, allows for causal inference from observational data and a staggered implementation strategy allows for the identification of secular trends that could affect causal associations and allow mitigation as necessary. Trial Registration: ClinicalTrials.gov NCT05810064; https://www.clinicaltrials.gov/study/NCT05810064 International Registered Report Identifier (IRRID): DERR1-10.2196/48128 ", doi="10.2196/48128", url="https://www.researchprotocols.org/2023/1/e48128", url="http://www.ncbi.nlm.nih.gov/pubmed/37535416" } @Article{info:doi/10.2196/47547, author="El-Khatib, Ziad and Richter, Lukas and Reich, Andreas and Benka, Bernhard and Assadian, Ojan", title="Implementation of a Surveillance System for Severe Acute Respiratory Infections at a Tertiary Care Hospital in Austria: Protocol for a Retrospective Longitudinal Feasibility Study", journal="JMIR Res Protoc", year="2023", month="Aug", day="3", volume="12", pages="e47547", keywords="severe acute respiratory infection", keywords="SARI", keywords="Austria", keywords="influenza", keywords="European Union", keywords="COVID-19", keywords="respiratory", keywords="data retrieval", keywords="information retrieval", keywords="electronic health record", keywords="EHR", keywords="health records", keywords="health record", keywords="surveillance", keywords="risk", keywords="database structure", keywords="incidence", keywords="data collection", abstract="Background: The risk of a large number of severe acute respiratory infection (SARI) cases emerging is a global concern. SARI can overwhelm the health care capacity and cause several deaths. Therefore, the Austrian Agency for Health and Food Safety will explore the feasibility of implementing an automatic electronically based SARI surveillance system at a tertiary care hospital in Austria as part of the hospital network, initiated by the European Centre for Disease Prevention and Control. Objective: We aim to investigate the availability of routinely collected health record data pertaining to respiratory infections and the optimal approach to use such available data for systematic surveillance of SARI in a real-world setting, describe the characteristics of patients with SARI before and after the beginning of the COVID-19 pandemic, and investigate the feasibility of identifying the risk factors for a severe outcome (intensive care unit admission or death) in patients with SARI. Methods: We will test the feasibility of a surveillance system, as part of a large European network, at a tertiary care hospital in the province of Lower Austria (called Regional Hospital Wiener Neustadt). It will be a cross-sectional study for the inventory of the electronic data records and implementation of automatic data retrieval for the period of January 2019 through the end of December 2022. The analysis will include an exploration of the database structure, descriptive analysis of the general characteristics of the patients with SARI, estimation of the SARI incidence rate, and assessment of the risk factors and different levels of severity of patients with SARI using logistic regression analysis. Results: This will be the first study to assess the feasibility of SARI surveillance at a large 800-bed tertiary care hospital in Austria. It will provide a general overview of the potential for establishing a hospital-based surveillance system for SARI. In addition, if successful, the electronic surveillance will be able to improve the response to early warning signs of new SARI, which will better inform policy makers in strengthening the surveillance system. Conclusions: The findings will support the expansion of the SARI hospital-based surveillance system to other hospitals in Austria. This network will be of use to Austria in preparing for future pandemics. International Registered Report Identifier (IRRID): PRR1-10.2196/47547 ", doi="10.2196/47547", url="https://www.researchprotocols.org/2023/1/e47547", url="http://www.ncbi.nlm.nih.gov/pubmed/37535414" } @Article{info:doi/10.2196/49034, author="Harada, Yukinori and Tomiyama, Shusaku and Sakamoto, Tetsu and Sugimoto, Shu and Kawamura, Ren and Yokose, Masashi and Hayashi, Arisa and Shimizu, Taro", title="Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence--Driven Automated History--Taking System: Pilot Cross-Sectional Study", journal="JMIR Form Res", year="2023", month="Aug", day="2", volume="7", pages="e49034", keywords="collective intelligence", keywords="differential diagnosis generator", keywords="diagnostic accuracy", keywords="automated medical history taking system", keywords="artificial intelligence", keywords="AI", abstract="Background: Low diagnostic accuracy is a major concern in automated medical history--taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution. Objective: The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists. Methods: We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)--driven automated medical history--taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history--taking system without reading the index lists generated by the automated medical history--taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases {\texttimes} 2 physicians' input). Results: The diagnostic accuracy of the index lists was 46\% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65\%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52\%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14\%, P<.001 in the only shared diagnoses among the 3 DDx lists). Conclusions: Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20\%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial. ", doi="10.2196/49034", url="https://formative.jmir.org/2023/1/e49034", url="http://www.ncbi.nlm.nih.gov/pubmed/37531164" } @Article{info:doi/10.2196/46477, author="Rossander, Anna and Karlsson, Daniel", title="Structure of Health Information With Different Information Models: Evaluation Study With Competency Questions", journal="JMIR Med Inform", year="2023", month="Jul", day="31", volume="11", pages="e46477", keywords="informatics", keywords="health care", keywords="information model", keywords="terminology", keywords="terminologies", keywords="interoperability", keywords="competency question", keywords="interoperable", keywords="competency", keywords="EHR", keywords="electronic health record", keywords="guideline", keywords="standard", keywords="recommendation", keywords="information system", abstract="Background: There is a flora of health care information models but no consensus on which to use. This leads to poor information sharing and duplicate modelling work. The amount and type of differences between models has, to our knowledge, not been evaluated. Objective: This work aims to explore how information structured with various information models differ in practice. Our hypothesis is that differences between information models are overestimated. This work will also assess the usability of competency questions as a method for evaluation of information models within health care. Methods: In this study, 4 information standards, 2 standards for secondary use, and 2 electronic health record systems were included as material. Competency questions were developed for a random selection of recommendations from a clinical guideline. The information needed to answer the competency questions was modelled according to each included information model, and the results were analyzed. Differences in structure and terminology were quantified for each combination of standards. Results: In this study, 36 competency questions were developed and answered. In general, similarities between the included information models were larger than the differences. The demarcation between information model and terminology was overall similar; on average, 45\% of the included structures were identical between models. Choices of terminology differed within and between models; on average, 11\% was usable in interaction with each other. The information models included in this study were able to represent most information required for answering the competency questions. Conclusions: Different but same same; in practice, different information models structure much information in a similar fashion. To increase interoperability within and between systems, it is more important to move toward structuring information with any information model rather than finding or developing a perfect information model. Competency questions are a feasible way of evaluating how information models perform in practice. ", doi="10.2196/46477", url="https://medinform.jmir.org/2023/1/e46477", url="http://www.ncbi.nlm.nih.gov/pubmed/37523221" } @Article{info:doi/10.2196/47735, author="Duarte, Miguel and Pereira-Rodrigues, Pedro and Ferreira-Santos, Daniela", title="The Role of Novel Digital Clinical Tools in the Screening or Diagnosis of Obstructive Sleep Apnea: Systematic Review", journal="J Med Internet Res", year="2023", month="Jul", day="26", volume="25", pages="e47735", keywords="obstructive sleep apnea", keywords="diagnosis", keywords="digital tools", keywords="smartphone", keywords="wearables", keywords="sensor", keywords="polysomnography", keywords="systematic review", keywords="mobile phone", abstract="Background: Digital clinical tools are a new technology that can be used in the screening or diagnosis of obstructive sleep apnea (OSA), notwithstanding the crucial role of polysomnography, the gold standard. Objective: This study aimed to identify, gather, and analyze the most accurate digital tools and smartphone-based health platforms used for OSA screening or diagnosis in the adult population. Methods: We performed a comprehensive literature search of PubMed, Scopus, and Web of Science databases for studies evaluating the validity of digital tools in OSA screening or diagnosis until November 2022. The risk of bias was assessed using the Joanna Briggs Institute critical appraisal tool for diagnostic test accuracy studies. The sensitivity, specificity, and area under the curve (AUC) were used as discrimination measures. Results: We retrieved 1714 articles, 41 (2.39\%) of which were included in the study. From these 41 articles, we found 7 (17\%) smartphone-based tools, 10 (24\%) wearables, 11 (27\%) bed or mattress sensors, 5 (12\%) nasal airflow devices, and 8 (20\%) other sensors that did not fit the previous categories. Only 8 (20\%) of the 41 studies performed external validation of the developed tool. Of these, the highest reported values for AUC, sensitivity, and specificity were 0.99, 96\%, and 92\%, respectively, for a clinical cutoff of apnea-hypopnea index (AHI)?30. These values correspond to a noncontact audio recorder that records sleep sounds, which are then analyzed by a deep learning technique that automatically detects sleep apnea events, calculates the AHI, and identifies OSA. Looking at the studies that only internally validated their models, the work that reported the highest accuracy measures showed AUC, sensitivity, and specificity values of 1.00, 100\%, and 96\%, respectively, for a clinical cutoff AHI?30. It uses the Sonomat---a foam mattress that, aside from recording breath sounds, has pressure sensors that generate voltage when deformed, thus detecting respiratory movements, and uses it to classify OSA events. Conclusions: These clinical tools presented promising results with high discrimination measures (best results reached AUC>0.99). However, there is still a need for quality studies comparing the developed tools with the gold standard and validating them in external populations and other environments before they can be used in clinical settings. Trial Registration: PROSPERO CRD42023387748; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=387748 ", doi="10.2196/47735", url="https://www.jmir.org/2023/1/e47735", url="http://www.ncbi.nlm.nih.gov/pubmed/37494079" } @Article{info:doi/10.2196/41858, author="Huang, Shih-Tsung and Hsiao, Fei-Yuan and Tsai, Tsung-Hsien and Chen, Pei-Jung and Peng, Li-Ning and Chen, Liang-Kung", title="Using Hypothesis-Led Machine Learning and Hierarchical Cluster Analysis to Identify Disease Pathways Prior to Dementia: Longitudinal Cohort Study", journal="J Med Internet Res", year="2023", month="Jul", day="26", volume="25", pages="e41858", keywords="dementia", keywords="machine learning", keywords="cluster analysis", keywords="disease", keywords="condition", keywords="symptoms", keywords="data", keywords="data set", keywords="cardiovascular", keywords="neuropsychiatric", keywords="infection", keywords="mobility", keywords="mental conditions", keywords="development", abstract="Background: Dementia development is a complex process in which the occurrence and sequential relationships of different diseases or conditions may construct specific patterns leading to incident dementia. Objective: This study aimed to identify patterns of disease or symptom clusters and their sequences prior to incident dementia using a novel approach incorporating machine learning methods. Methods: Using Taiwan's National Health Insurance Research Database, data from 15,700 older people with dementia and 15,700 nondementia controls matched on age, sex, and index year (n=10,466, 67\% for the training data set and n=5234, 33\% for the testing data set) were retrieved for analysis. Using machine learning methods to capture specific hierarchical disease triplet clusters prior to dementia, we designed a study algorithm with four steps: (1) data preprocessing, (2) disease or symptom pathway selection, (3) model construction and optimization, and (4) data visualization. Results: Among 15,700 identified older people with dementia, 10,466 and 5234 subjects were randomly assigned to the training and testing data sets, and 6215 hierarchical disease triplet clusters with positive correlations with dementia onset were identified. We subsequently generated 19,438 features to construct prediction models, and the model with the best performance was support vector machine (SVM) with the by-group LASSO (least absolute shrinkage and selection operator) regression method (total corresponding features=2513; accuracy=0.615; sensitivity=0.607; specificity=0.622; positive predictive value=0.612; negative predictive value=0.619; area under the curve=0.639). In total, this study captured 49 hierarchical disease triplet clusters related to dementia development, and the most characteristic patterns leading to incident dementia started with cardiovascular conditions (mainly hypertension), cerebrovascular disease, mobility disorders, or infections, followed by neuropsychiatric conditions. Conclusions: Dementia development in the real world is an intricate process involving various diseases or conditions, their co-occurrence, and sequential relationships. Using a machine learning approach, we identified 49 hierarchical disease triplet clusters with leading roles (cardio- or cerebrovascular disease) and supporting roles (mental conditions, locomotion difficulties, infections, and nonspecific neurological conditions) in dementia development. Further studies using data from other countries are needed to validate the prediction algorithms for dementia development, allowing the development of comprehensive strategies to prevent or care for dementia in the real world. ", doi="10.2196/41858", url="https://www.jmir.org/2023/1/e41858", url="http://www.ncbi.nlm.nih.gov/pubmed/37494081" } @Article{info:doi/10.2196/47736, author="Lolak, Sermkiat and Attia, John and McKay, J. Gareth and Thakkinstian, Ammarin", title="Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study", journal="JMIR Cardio", year="2023", month="Jul", day="26", volume="7", pages="e47736", keywords="stroke", keywords="machine learning", keywords="risk prediction model", keywords="explainable artificial Intelligence", keywords="risk factor", keywords="cohort study", keywords="high-risk patient", keywords="hypertension", abstract="Background: Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes. Objective: We aim to assess the performance of explainable machine learning models in predicting stroke risk factors using real-world cohort data by comparing explainable machine learning models with conventional statistical methods. Methods: This retrospective cohort included high-risk patients from Ramathibodi Hospital in Thailand between January 2010 and December 2020. We compared the performance and explainability of logistic regression (LR), Cox proportional hazard, Bayesian network (BN), tree-augmented Na{\"i}ve Bayes (TAN), extreme gradient boosting (XGBoost), and explainable boosting machine (EBM) models. We used multiple imputation by chained equations for missing data and discretized continuous variables as needed. Models were evaluated using C-statistics and F1-scores. Results: Out of 275,247 high-risk patients, 9659 (3.5\%) experienced a stroke. XGBoost demonstrated the highest performance with a C-statistic of 0.89 and an F1-score of 0.80 followed by EBM and TAN with C-statistics of 0.87 and 0.83, respectively; LR and BN had similar C-statistics of 0.80. Significant factors associated with stroke included atrial fibrillation (AF), hypertension (HT), antiplatelets, HDL, and age. AF, HT, and antihypertensive medication were common significant factors across most models, with AF being the strongest factor in LR, XGBoost, BN, and TAN models. Conclusions: Our study developed stroke prediction models to identify crucial predictive factors such as AF, HT, or systolic blood pressure or antihypertensive medication, anticoagulant medication, HDL, age, and statin use in high-risk patients. The explainable XGBoost was the best model in predicting stroke risk, followed by EBM. ", doi="10.2196/47736", url="https://cardio.jmir.org/2023/1/e47736", url="http://www.ncbi.nlm.nih.gov/pubmed/37494080" } @Article{info:doi/10.2196/36121, author="Mohd Hisham, Faiz Muhammad and Lodz, Aliza Noor and Muhammad, Nurhadzira Eida and Asari, Noor Filza and Mahmood, Ihsani Mohd and Abu Bakar, Zamzurina", title="Evaluation of 2 Artificial Intelligence Software for Chest X-Ray Screening and Pulmonary Tuberculosis Diagnosis: Protocol for a Retrospective Case-Control Study", journal="JMIR Res Protoc", year="2023", month="Jul", day="25", volume="12", pages="e36121", keywords="artificial intelligence", keywords="AI", keywords="evaluation", keywords="pulmonary tuberculosis", keywords="PTB", keywords="chest x-ray", keywords="CXR", keywords="screening", abstract="Background: According to the World Bank, Malaysia reported an estimated 97 tuberculosis cases per 100,000 people in 2021. Chest x-ray (CXR) remains the best conventional method for the early detection of pulmonary tuberculosis (PTB) infection. The intervention of artificial intelligence (AI) in PTB diagnosis could efficiently aid human interpreters and reduce health professionals' work burden. To date, no AI studies have been evaluated in Malaysia. Objective: This study aims to evaluate the performance of Putralytica and Qure.ai software for CXR screening and PTB diagnosis among the Malaysian population. Methods: We will conduct a retrospective case-control study at the Respiratory Medicine Institute, National Cancer Institute, and Sungai Buloh Health Clinic. A total of 1500 CXR images of patients who completed treatments or check-ups will be selected and categorized into three groups: (1) abnormal PTB cases, (2) abnormal non-PTB cases, and (3) normal cases. These CXR images, along with their clinical findings, will be the reference standard in this study. All patient data, including sociodemographic characteristics and clinical history, will be collected prior to screening via Putralytica and Qure.ai software and readers' interpretation, which are the index tests for this study. Interpretation from all 3 index tests will be compared with the reference standard, and significant statistical analysis will be computed. Results: Data collection is expected to commence in August 2023. It is anticipated that 1 year will be needed to conduct the study. Conclusions: This study will measure the accuracy of Putralytica and Qure.ai software and whether their findings will concur with readers' interpretation and the reference standard, thus providing evidence toward the effectiveness of implementing AI in the medical setting. International Registered Report Identifier (IRRID): PRR1-10.2196/36121 ", doi="10.2196/36121", url="https://www.researchprotocols.org/2023/1/e36121", url="http://www.ncbi.nlm.nih.gov/pubmed/37490330" } @Article{info:doi/10.2196/43384, author="Chamarthi, Gajapathiraju and Orozco, Tatiana and Shell, Popy and Fu, Devin and Hale-Gallardo, Jennifer and Jia, Huanguang and Shukla, M. Ashutosh", title="Electronic Phenotype for Advanced Chronic Kidney Disease in a Veteran Health Care System Clinical Database: Systems-Based Strategy for Model Development and Evaluation", journal="Interact J Med Res", year="2023", month="Jul", day="24", volume="12", pages="e43384", keywords="advanced chronic kidney disease", keywords="EHR phenotype", keywords="Veteran Health System", keywords="CKD cohort", keywords="kidney disease", keywords="chronic", keywords="clinical", keywords="database", keywords="data", keywords="diagnosis", keywords="risk", keywords="disease", abstract="Background: Identifying advanced (stages 4 and 5) chronic kidney disease (CKD) cohorts in clinical databases is complicated and often unreliable. Accurately identifying these patients can allow targeting this population for their specialized clinical and research needs. Objective: This study was conducted as a system-based strategy to identify all prevalent Veterans with advanced CKD for subsequent enrollment in a clinical trial. We aimed to examine the prevalence and accuracy of conventionally used diagnosis codes and estimated glomerular filtration rate (eGFR)-based phenotypes for advanced CKD in an electronic health record (EHR) database. We sought to develop a pragmatic EHR phenotype capable of improving the real-time identification of advanced CKD cohorts in a regional Veterans health care system. Methods: Using the Veterans Affairs Informatics and Computing Infrastructure services, we extracted the source cohort of Veterans with advanced CKD based on a combination of the latest eGFR value ?30 ml{\textperiodcentered}min--1{\textperiodcentered}1.73 m--2 or existing International Classification of Diseases (ICD)-10 diagnosis codes for advanced CKD (N18.4 and N18.5) in the last 12 months. We estimated the prevalence of advanced CKD using various prior published EHR phenotypes (ie, advanced CKD diagnosis codes, using the latest single eGFR <30 ml{\textperiodcentered}min--1{\textperiodcentered}1.73 m--2, utilizing two eGFR values) and our operational EHR phenotypes of a high-, intermediate-, and low-risk advanced CKD cohort. We evaluated the accuracy of these phenotypes by examining the likelihood of a sustained reduction of eGFR <30 ml{\textperiodcentered}min--1{\textperiodcentered}1.73 m--2 over a 6-month follow-up period. Results: Of the 133,756 active Veteran enrollees at North Florida/South Georgia Veterans Health System (NF/SG VHS), we identified a source cohort of 1759 Veterans with advanced nondialysis CKD. Among these, 1102 (62.9\%) Veterans had diagnosis codes for advanced CKD; 1391(79.1\%) had the index eGFR <30 ml{\textperiodcentered}min--1{\textperiodcentered}1.73 m--2; and 928 (52.7\%), 480 (27.2\%), and 315 (17.9\%) Veterans had high-, intermediate-, and low-risk advanced CKD, respectively. The prevalence of advanced CKD among Veterans at NF/SG VHS varied between 1\% and 1.5\% depending on the EHR phenotype. At the 6-month follow-up, the probability of Veterans remaining in the advanced CKD stage was 65.3\% in the group defined by the ICD-10 codes and 90\% in the groups defined by eGFR values. Based on our phenotype, 94.2\% of high-risk, 71\% of intermediate-risk, and 16.1\% of low-risk groups remained in the advanced CKD category. Conclusions: While the prevalence of advanced CKD has limited variation between different EHR phenotypes, the accuracy can be improved by utilizing two eGFR values in a stratified manner. We report the development of a pragmatic EHR-based model to identify advanced CKD within a regional Veterans health care system in real time with a tiered approach that allows targeting the needs of the groups at risk of progression to end-stage kidney disease. ", doi="10.2196/43384", url="https://www.i-jmr.org/2023/1/e43384", url="http://www.ncbi.nlm.nih.gov/pubmed/37486757" } @Article{info:doi/10.2196/46340, author="Wang, Changyu and Liu, Siru and Tang, Yu and Yang, Hao and Liu, Jialin", title="Diagnostic Test Accuracy of Deep Learning Prediction Models on COVID-19 Severity: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2023", month="Jul", day="21", volume="25", pages="e46340", keywords="COVID-19", keywords="deep learning", keywords="prognostics and health management", keywords="Severity of Illness Index", keywords="accuracy", keywords="AI", keywords="prediction model", keywords="systematic review", keywords="meta-analysis", keywords="disease severity", keywords="prognosis", keywords="digital health intervention", abstract="Background: Deep learning (DL) prediction models hold great promise in the triage of COVID-19. Objective: We aimed to evaluate the diagnostic test accuracy of DL prediction models for assessing and predicting the severity of COVID-19. Methods: We searched PubMed, Scopus, LitCovid, Embase, Ovid, and the Cochrane Library for studies published from December 1, 2019, to April 30, 2022. Studies that used DL prediction models to assess or predict COVID-19 severity were included, while those without diagnostic test accuracy analysis or severity dichotomies were excluded. QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2), PROBAST (Prediction Model Risk of Bias Assessment Tool), and funnel plots were used to estimate the bias and applicability. Results: A total of 12 retrospective studies involving 2006 patients reported the cross-sectionally assessed value of DL on COVID-19 severity. The pooled sensitivity and area under the curve were 0.92 (95\% CI 0.89-0.94; I2=0.00\%) and 0.95 (95\% CI 0.92-0.96), respectively. A total of 13 retrospective studies involving 3951 patients reported the longitudinal predictive value of DL for disease severity. The pooled sensitivity and area under the curve were 0.76 (95\% CI 0.74-0.79; I2=0.00\%) and 0.80 (95\% CI 0.76-0.83), respectively. Conclusions: DL prediction models can help clinicians identify potentially severe cases for early triage. However, high-quality research is lacking. Trial Registration: PROSPERO CRD42022329252; https://www.crd.york.ac.uk/prospero/display\_record.php?ID=CRD 42022329252 ", doi="10.2196/46340", url="https://www.jmir.org/2023/1/e46340", url="http://www.ncbi.nlm.nih.gov/pubmed/37477951" } @Article{info:doi/10.2196/40639, author="Potter, H. Thomas B. and Pratap, Sharmila and Nicolas, Carlos Juan and Khan, S. Osman and Pan, P. Alan and Bako, T. Abdulaziz and Hsu, Enshuo and Johnson, Carnayla and Jefferson, N. Imory and Adegbindin, K. Sofiat and Baig, Eman and Kelly, R. Hannah and Jones, L. Stephen and Britz, W. Gavin and Tannous, Jonika and Vahidy, S. Farhaan", title="A Neuro-Informatics Pipeline for Cerebrovascular Disease: Research Registry Development", journal="JMIR Form Res", year="2023", month="Jul", day="21", volume="7", pages="e40639", keywords="clinical outcome", keywords="intracerebral hemorrhage", keywords="acute ischemic stroke", keywords="transient ischemic attack", keywords="subarachnoid hemorrhage", keywords="cerebral amyloid angiopathy", keywords="learning health system", keywords="electronic health record", keywords="data curation", keywords="database", abstract="Background: Although stroke is well recognized as a critical disease, treatment options are often limited. Inpatient stroke encounters carry critical information regarding the mechanisms of stroke and patient outcomes; however, these data are typically formatted to support administrative functions instead of research. To support improvements in the care of patients with stroke, a substantive research data platform is needed. Objective: To advance a stroke-oriented learning health care system, we sought to establish a comprehensive research repository of stroke data using the Houston Methodist electronic health record (EHR) system. Methods: Dedicated processes were developed to import EHR data of patients with primary acute ischemic stroke, intracerebral hemorrhage (ICH), transient ischemic attack, and subarachnoid hemorrhage under a review board--approved protocol. Relevant patients were identified from discharge diagnosis codes and assigned registry patient identification numbers. For identified patients, extract, transform, and load processes imported EHR data of primary cerebrovascular disease admissions and available data from any previous or subsequent admissions. Data were loaded into patient-focused SQL objects to enable cross-sectional and longitudinal analyses. Primary data domains (admission details, comorbidities, laboratory data, medications, imaging data, and discharge characteristics) were loaded into separate relational tables unified by patient and encounter identification numbers. Computed tomography, magnetic resonance, and angiography images were retrieved. Imaging data from patients with ICH were assessed for hemorrhage characteristics and cerebral small vessel disease markers. Patient information needed to interface with other local and national databases was retained. Prospective patient outreach was established, with patients contacted via telephone to assess functional outcomes 30, 90, 180, and 365 days after discharge. Dashboards were constructed to provide investigators with data summaries to support access. Results: The Registry of Neurological Endpoint Assessments among Patients with Ischemic and Hemorrhagic Stroke (REINAH) database was constructed as a series of relational category-specific SQL objects. Encounter summaries and dashboards were constructed to draw from these objects, providing visual data summaries for investigators seeking to build studies based on REINAH data. As of June 2022, the database contains 18,061 total patients, including 1809 (10.02\%) with ICH, 13,444 (74.43\%) with acute ischemic stroke, 1221 (6.76\%) with subarachnoid hemorrhage, and 3165 (17.52\%) with transient ischemic attack. Depending on the cohort, imaging data from computed tomography are available for 85.83\% (1048/1221) to 98.4\% (1780/1809) of patients, with magnetic resonance imaging available for 27.85\% (340/1221) to 85.54\% (11,500/13,444) of patients. Outcome assessment has successfully contacted 56.1\% (240/428) of patients after ICH, with 71.3\% (171/240) of responders providing consent for assessment. Responders reported a median modified Rankin Scale score of 3 at 90 days after discharge. Conclusions: A highly curated and clinically focused research platform for stroke data will establish a foundation for future research that may fundamentally improve poststroke patient care and outcomes. ", doi="10.2196/40639", url="https://formative.jmir.org/2023/1/e40639", url="http://www.ncbi.nlm.nih.gov/pubmed/37477961" } @Article{info:doi/10.2196/46165, author="Lee, Yun Dong and Choi, Byungjin and Kim, Chungsoo and Fridgeirsson, Egill and Reps, Jenna and Kim, Myoungsuk and Kim, Jihyeong and Jang, Jae-Won and Rhee, Youl Sang and Seo, Won-Woo and Lee, Seunghoon and Son, Joon Sang and Park, Woong Rae", title="Privacy-Preserving Federated Model Predicting Bipolar Transition in Patients With Depression: Prediction Model Development Study", journal="J Med Internet Res", year="2023", month="Jul", day="20", volume="25", pages="e46165", keywords="federated learning", keywords="depression", keywords="bipolar disorder", keywords="data standardization", keywords="differential privacy", abstract="Background: Mood disorder has emerged as a serious concern for public health; in particular, bipolar disorder has a less favorable prognosis than depression. Although prompt recognition of depression conversion to bipolar disorder is needed, early prediction is challenging due to overlapping symptoms. Recently, there have been attempts to develop a prediction model by using federated learning. Federated learning in medical fields is a method for training multi-institutional machine learning models without patient-level data sharing. Objective: This study aims to develop and validate a federated, differentially private multi-institutional bipolar transition prediction model. Methods: This retrospective study enrolled patients diagnosed with the first depressive episode at 5 tertiary hospitals in South Korea. We developed models for predicting bipolar transition by using data from 17,631 patients in 4 institutions. Further, we used data from 4541 patients for external validation from 1 institution. We created standardized pipelines to extract large-scale clinical features from the 4 institutions without any code modification. Moreover, we performed feature selection in a federated environment for computational efficiency and applied differential privacy to gradient updates. Finally, we compared the federated and the 4 local models developed with each hospital's data on internal and external validation data sets. Results: In the internal data set, 279 out of 17,631 patients showed bipolar disorder transition. In the external data set, 39 out of 4541 patients showed bipolar disorder transition. The average performance of the federated model in the internal test (area under the curve [AUC] 0.726) and external validation (AUC 0.719) data sets was higher than that of the other locally developed models (AUC 0.642-0.707 and AUC 0.642-0.699, respectively). In the federated model, classifications were driven by several predictors such as the Charlson index (low scores were associated with bipolar transition, which may be due to younger age), severe depression, anxiolytics, young age, and visiting months (the bipolar transition was associated with seasonality, especially during the spring and summer months). Conclusions: We developed and validated a differentially private federated model by using distributed multi-institutional psychiatric data with standardized pipelines in a real-world environment. The federated model performed better than models using local data only. ", doi="10.2196/46165", url="https://www.jmir.org/2023/1/e46165", url="http://www.ncbi.nlm.nih.gov/pubmed/37471130" } @Article{info:doi/10.2196/48795, author="Bani Hani, Salam and Ahmad, Muayyad", title="Effective Prediction of Mortality by Heart Disease Among Women in Jordan Using the Chi-Squared Automatic Interaction Detection Model: Retrospective Validation Study", journal="JMIR Cardio", year="2023", month="Jul", day="20", volume="7", pages="e48795", keywords="coronary heart disease", keywords="mortality", keywords="artificial intelligence", keywords="machine learning", keywords="algorithms", keywords="algorithm", keywords="women", keywords="death", keywords="predict", keywords="prediction", keywords="predictive", keywords="heart", keywords="cardiology", keywords="coronary", keywords="CHD", keywords="cardiovascular disease", keywords="CVD", keywords="cardiovascular", abstract="Background: Many current studies have claimed that the actual risk of heart disease among women is equal to that in men. Using a large machine learning algorithm (MLA) data set to predict mortality in women, data mining techniques have been used to identify significant aspects of variables that help in identifying the primary causes of mortality within this target category of the population. Objective: This study aims to predict mortality caused by heart disease among women, using an artificial intelligence technique--based MLA. Methods: A retrospective design was used to retrieve big data from the electronic health records of 2028 women with heart disease. Data were collected for Jordanian women who were admitted to public health hospitals from 2015 to the end of 2021. We checked the extracted data for noise, consistency issues, and missing values. After categorizing, organizing, and cleaning the extracted data, the redundant data were eliminated. Results: Out of 9 artificial intelligence models, the Chi-squared Automatic Interaction Detection model had the highest accuracy (93.25\%) and area under the curve (0.825) among the build models. The participants were 62.6 (SD 15.4) years old on average. Angina pectoris was the most frequent diagnosis in the women's extracted files (n=1,264,000, 62.3\%), followed by congestive heart failure (n=764,000, 37.7\%). Age, systolic blood pressure readings with a cutoff value of >187 mm Hg, medical diagnosis (women diagnosed with congestive heart failure were at a higher risk of death [n=31, 16.58\%]), pulse pressure with a cutoff value of 98 mm Hg, and oxygen saturation (measured using pulse oximetry) with a cutoff value of 93\% were the main predictors for death among women. Conclusions: To predict the outcomes in this study, we used big data that were extracted from the clinical variables from the electronic health records. The Chi-squared Automatic Interaction Detection model---an MLA---confirmed the precise identification of the key predictors of cardiovascular mortality among women and can be used as a practical tool for clinical prediction. ", doi="10.2196/48795", url="https://cardio.jmir.org/2023/1/e48795", url="http://www.ncbi.nlm.nih.gov/pubmed/37471126" } @Article{info:doi/10.2196/44362, author="Nagraj, Shobhana and Kennedy, Stephen and Jha, Vivekananda and Norton, Robyn and Hinton, Lisa and Billot, Laurent and Rajan, Eldho and Mohammed Abdul, Ameer and Phalswal, Anita and Arora, Varun and Praveen, Devarsetty and Hirst, Jane", title="A Mobile Clinical Decision Support System for High-Risk Pregnant Women in Rural India (SMARThealth Pregnancy): Pilot Cluster Randomized Controlled Trial", journal="JMIR Form Res", year="2023", month="Jul", day="20", volume="7", pages="e44362", keywords="decision support systems", keywords="clinical", keywords="telemedicine", keywords="community health workers", keywords="pregnancy", keywords="high risk", keywords="diabetes", keywords="gestational", keywords="cardiovascular diseases", abstract="Background: Cardiovascular disease (CVD) is the leading cause of death in women in India. Early identification is crucial to reducing deaths. Hypertensive disorders of pregnancy (HDP) and gestational diabetes mellitus (GDM) carry independent risks for future CVD, and antenatal care is a window to screen and counsel high-risk women. In rural India, community health workers (CHWs) deliver antenatal and postnatal care. We developed a complex intervention (SMARThealth Pregnancy) involving mobile clinical decision support for CHWs and evaluated it in a pilot cluster randomized controlled trial (cRCT). Objective: The aim of the study is to co-design a theory-informed intervention for CHWs to screen, refer, and counsel pregnant women at high risk of future CVD in rural India and evaluate its feasibility and acceptability. Methods: In phase 1, we used qualitative methods to explore community priorities for high-risk pregnant women in rural areas of 2 diverse states in India. In phase 2, informed by behavior change theory and human-centered design, we used these qualitative data to develop the intervention components and implementation strategies for SMARThealth Pregnancy in an iterative process with end users. In phase 3, using mixed methods, we evaluated the intervention in a cRCT with an embedded qualitative substudy across 4 primary health centres: 2 in Jhajjar district, Haryana, and 2 in Guntur district, Andhra Pradesh. Results: SMARThealth Pregnancy embedded a total of 15 behavior change techniques and included (1) community awareness programs; (2) targeted training, including point-of-care blood pressure and hemoglobin measurement; and (3) mobile clinical decision support for CHWs to screen women in their homes. The intervention focused on 3 priority conditions: anemia, HDP, and GDM. The evaluation involved a total of 200 pregnant women, equally randomized to intervention or enhanced standard care (control). Recruitment was completed within 5 months, with minimal loss to follow-up (4/200, 2\%) at 6 weeks postpartum. A total of 4 primary care doctors and 54 CHWs in the intervention clusters took part in the study. Fidelity to intervention practices was 100\% prepandemic. Over half the study population was affected by moderate to severe anemia at baseline. The prevalence of HDP (2.5\%) and GDM (2\%) was low in our study population. Results suggest a possible improvement in mean hemoglobin (anemia) in the intervention group, although an adequately powered trial is needed. The model of home-based care was feasible and acceptable for pregnant or postpartum women and CHWs, who perceived improvements in quality of care, self-efficacy, and professional recognition. Conclusions: SMARThealth Pregnancy is an innovative model of home-based care for high-risk pregnant women during the transitions between antenatal and postnatal care and adult health services. The use of theory and co-design during intervention development facilitated acceptability of the intervention and implementation strategies. Our experience has informed the decision to initiate a larger-scale cRCT. Trial Registration: ClinicalTrials.gov NCT03968952; https://clinicaltrials.gov/ct2/show/NCT03968952 International Registered Report Identifier (IRRID): RR2-10.3389/fgwh.2021.620759 ", doi="10.2196/44362", url="https://formative.jmir.org/2023/1/e44362", url="http://www.ncbi.nlm.nih.gov/pubmed/37471135" } @Article{info:doi/10.2196/44700, author="Woods, Andrew and Kramer, T. Skyler and Xu, Dong and Jiang, Wei", title="Secure Comparisons of Single Nucleotide Polymorphisms Using Secure Multiparty Computation: Method Development", journal="JMIR Bioinform Biotech", year="2023", month="Jul", day="18", volume="4", pages="e44700", keywords="secure multiparty computation", keywords="single nucleotide polymorphism", keywords="Variant Call Format", keywords="Jaccard similarity", abstract="Background: While genomic variations can provide valuable information for health care and ancestry, the privacy of individual genomic data must be protected. Thus, a secure environment is desirable for a human DNA database such that the total data are queryable but not directly accessible to involved parties (eg, data hosts and hospitals) and that the query results are learned only by the user or authorized party. Objective: In this study, we provide efficient and secure computations on panels of single nucleotide polymorphisms (SNPs) from genomic sequences as computed under the following set operations: union, intersection, set difference, and symmetric difference. Methods: Using these operations, we can compute similarity metrics, such as the Jaccard similarity, which could allow querying a DNA database to find the same person and genetic relatives securely. We analyzed various security paradigms and show metrics for the protocols under several security assumptions, such as semihonest, malicious with honest majority, and malicious with a malicious majority. Results: We show that our methods can be used practically on realistically sized data. Specifically, we can compute the Jaccard similarity of two genomes when considering sets of SNPs, each with 400,000 SNPs, in 2.16 seconds with the assumption of a malicious adversary in an honest majority and 0.36 seconds under a semihonest model. Conclusions: Our methods may help adopt trusted environments for hosting individual genomic data with end-to-end data security. ", doi="10.2196/44700", url="https://bioinform.jmir.org/2023/1/e44700" } @Article{info:doi/10.2196/47592, author="Williams, D. David and Ferro, Diana and Mullaney, Colin and Skrabonja, Lydia and Barnes, S. Mitchell and Patton, R. Susana and Lockee, Brent and Tallon, M. Erin and Vandervelden, A. Craig and Schweisberger, Cintya and Mehta, Sanjeev and McDonough, Ryan and Lind, Marcus and D'Avolio, Leonard and Clements, A. Mark", title="An ``All-Data-on-Hand'' Deep Learning Model to Predict Hospitalization for Diabetic Ketoacidosis in Youth With Type 1 Diabetes: Development and Validation Study", journal="JMIR Diabetes", year="2023", month="Jul", day="18", volume="8", pages="e47592", keywords="type 1 diabetes", keywords="T1D", keywords="diabetic ketoacidosis", keywords="DKA", keywords="machine learning", keywords="deep learning", keywords="artificial intelligence", keywords="AI", keywords="recurrent neural network", keywords="RNN", keywords="long short-term memory", keywords="LSTM", keywords="natural language processing", keywords="NLP", abstract="Background: Although prior research has identified multiple risk factors for diabetic ketoacidosis (DKA), clinicians continue to lack clinic-ready models to predict dangerous and costly episodes of DKA. We asked whether we could apply deep learning, specifically the use of a long short-term memory (LSTM) model, to accurately predict the 180-day risk of DKA-related hospitalization for youth with type 1 diabetes (T1D). Objective: We aimed to describe the development of an LSTM model to predict the 180-day risk of DKA-related hospitalization for youth with T1D. Methods: We used 17 consecutive calendar quarters of clinical data (January 10, 2016, to March 18, 2020) for 1745 youths aged 8 to 18 years with T1D from a pediatric diabetes clinic network in the Midwestern United States. The input data included demographics, discrete clinical observations (laboratory results, vital signs, anthropometric measures, diagnosis, and procedure codes), medications, visit counts by type of encounter, number of historic DKA episodes, number of days since last DKA admission, patient-reported outcomes (answers to clinic intake questions), and data features derived from diabetes- and nondiabetes-related clinical notes via natural language processing. We trained the model using input data from quarters 1 to 7 (n=1377), validated it using input from quarters 3 to 9 in a partial out-of-sample (OOS-P; n=1505) cohort, and further validated it in a full out-of-sample (OOS-F; n=354) cohort with input from quarters 10 to 15. Results: DKA admissions occurred at a rate of 5\% per 180-days in both out-of-sample cohorts. In the OOS-P and OOS-F cohorts, the median age was 13.7 (IQR 11.3-15.8) years and 13.1 (IQR 10.7-15.5) years; median glycated hemoglobin levels at enrollment were 8.6\% (IQR 7.6\%-9.8\%) and 8.1\% (IQR 6.9\%-9.5\%); recall was 33\% (26/80) and 50\% (9/18) for the top-ranked 5\% of youth with T1D; and 14.15\% (213/1505) and 12.7\% (45/354) had prior DKA admissions (after the T1D diagnosis), respectively. For lists rank ordered by the probability of hospitalization, precision increased from 33\% to 56\% to 100\% for positions 1 to 80, 1 to 25, and 1 to 10 in the OOS-P cohort and from 50\% to 60\% to 80\% for positions 1 to 18, 1 to 10, and 1 to 5 in the OOS-F cohort, respectively. Conclusions: The proposed LSTM model for predicting 180-day DKA-related hospitalization was valid in this sample. Future research should evaluate model validity in multiple populations and settings to account for health inequities that may be present in different segments of the population (eg, racially or socioeconomically diverse cohorts). Rank ordering youth by probability of DKA-related hospitalization will allow clinics to identify the most at-risk youth. The clinical implication of this is that clinics may then create and evaluate novel preventive interventions based on available resources. ", doi="10.2196/47592", url="https://diabetes.jmir.org/2023/1/e47592", url="http://www.ncbi.nlm.nih.gov/pubmed/37224506" } @Article{info:doi/10.2196/42970, author="Rosen, Barthlow Claire and Roberts, Eugene Sanford and Syvyk, Solomiya and Finn, Caitlin and Tong, Jason and Wirtalla, Christopher and Spinks, Hunter and Kelz, Rapaport Rachel", title="A Novel Mobile App to Identify Patients With Multimorbidity in the Emergency Setting: Development of an App and Feasibility Trial", journal="JMIR Form Res", year="2023", month="Jul", day="13", volume="7", pages="e42970", keywords="clinical operationalization", keywords="delphi", keywords="development", keywords="emergency", keywords="general surgery", keywords="mHealth", keywords="mobile app", keywords="mobile health", keywords="morbidity", keywords="multimorbidity", keywords="qualifying comorbidity set", keywords="surgery", keywords="usability", abstract="Background: Multimorbidity is associated with an increased risk of poor surgical outcomes among older adults; however, identifying multimorbidity in the clinical setting can be a challenge. Objective: We created the Multimorbid Patient Identifier App (MMApp) to easily identify patients with multimorbidity identified by the presence of a Qualifying Comorbidity Set and tested its feasibility for use in future clinical research, validation, and eventually to guide clinical decision-making. Methods: We adapted the Qualifying Comorbidity Sets' claims-based definition of multimorbidity for clinical use through a modified Delphi approach and developed MMApp. A total of 10 residents input 5 hypothetical emergency general surgery patient scenarios, common among older adults, into the MMApp and examined MMApp test characteristics for a total of 50 trials. For MMApp, comorbidities selected for each scenario were recorded, along with the number of comorbidities correctly chosen, incorrectly chosen, and missed for each scenario. The sensitivity and specificity of identifying a patient as multimorbid using MMApp were calculated using composite data from all scenarios. To assess model feasibility, we compared the mean task completion by scenario to that of the American College of Surgeons National Surgical Quality Improvement Program Surgical Risk Calculator (ACS-NSQIP-SRC) using paired t tests. Usability and satisfaction with MMApp were assessed using an 18-item questionnaire administered immediately after completing all 5 scenarios. Results: There was no significant difference in the task completion time between the MMApp and the ACS-NSQIP-SRC for scenarios A (86.3 seconds vs 74.3 seconds, P=.85) or C (58.4 seconds vs 68.9 seconds,P=.064), MMapp took less time for scenarios B (76.1 seconds vs 87.4 seconds, P=.03) and E (20.7 seconds vs 73 seconds, P<.001), and more time for scenario D (78.8 seconds vs 58.5 seconds, P=.02). The MMApp identified multimorbidity with 96.7\% (29/30) sensitivity and 95\% (19/20) specificity. User feedback was positive regarding MMApp's usability, efficiency, and usefulness. Conclusions: The MMApp identified multimorbidity with high sensitivity and specificity and did not require significantly more time to complete than a commonly used web-based risk-stratification tool for most scenarios. Mean user times were well under 2 minutes. Feedback was overall positive from residents regarding the usability and usefulness of this app, even in the emergency general surgery setting. It would be feasible to use MMApp to identify patients with multimorbidity in the emergency general surgery setting for validation, research, and eventual clinical use. This type of mobile app could serve as a template for other research teams to create a tool to easily screen participants for potential enrollment. ", doi="10.2196/42970", url="https://formative.jmir.org/2023/1/e42970", url="http://www.ncbi.nlm.nih.gov/pubmed/37440310" } @Article{info:doi/10.2196/42262, author="Besculides, Melanie and Mazumdar, Madhu and Phlegar, Sydney and Freeman, Robert and Wilson, Sara and Joshi, Himanshu and Kia, Arash and Gorbenko, Ksenia", title="Implementing a Machine Learning Screening Tool for Malnutrition: Insights From Qualitative Research Applicable to Other Machine Learning--Based Clinical Decision Support Systems", journal="JMIR Form Res", year="2023", month="Jul", day="13", volume="7", pages="e42262", keywords="machine learning", keywords="AI", keywords="CDSS", keywords="evaluation", keywords="nutrition", keywords="screening", keywords="clinical", keywords="usability", keywords="effectiveness", keywords="treatment", keywords="malnutrition", keywords="decision-making", keywords="tool", keywords="data", keywords="acceptability", abstract="Background: Machine learning (ML)--based clinical decision support systems (CDSS) are popular in clinical practice settings but are often criticized for being limited in usability, interpretability, and effectiveness. Evaluating the implementation of ML-based CDSS is critical to ensure CDSS is acceptable and useful to clinicians and helps them deliver high-quality health care. Malnutrition is a common and underdiagnosed condition among hospital patients, which can have serious adverse impacts. Early identification and treatment of malnutrition are important. Objective: This study aims to evaluate the implementation of an ML tool, Malnutrition Universal Screening Tool (MUST)--Plus, that predicts hospital patients at high risk for malnutrition and identify best implementation practices applicable to this and other ML-based CDSS. Methods: We conducted a qualitative postimplementation evaluation using in-depth interviews with registered dietitians (RDs) who use MUST-Plus output in their everyday work. After coding the data, we mapped emergent themes onto select domains of the nonadoption, abandonment, scale-up, spread, and sustainability (NASSS) framework. Results: We interviewed 17 of the 24 RDs approached (71\%), representing 37\% of those who use MUST-Plus output. Several themes emerged: (1) enhancements to the tool were made to improve accuracy and usability; (2) MUST-Plus helped identify patients that would not otherwise be seen; perceived usefulness was highest in the original site; (3) perceived accuracy varied by respondent and site; (4) RDs valued autonomy in prioritizing patients; (5) depth of tool understanding varied by hospital and level; (6) MUST-Plus was integrated into workflows and electronic health records; and (7) RDs expressed a desire to eventually have 1 automated screener. Conclusions: Our findings suggest that continuous involvement of stakeholders at new sites given staff turnover is vital to ensure buy-in. Qualitative research can help identify the potential bias of ML tools and should be widely used to ensure health equity. Ongoing collaboration among CDSS developers, data scientists, and clinical providers may help refine CDSS for optimal use and improve the acceptability of CDSS in the clinical context. ", doi="10.2196/42262", url="https://formative.jmir.org/2023/1/e42262", url="http://www.ncbi.nlm.nih.gov/pubmed/37440303" } @Article{info:doi/10.2196/44327, author="Almashmoum, Maryam and Cunningham, James and Alkhaldi, Ohoud and Anisworth, John", title="Factors That Affect Knowledge-Sharing Behaviors in Medical Imaging Departments in Cancer Centers: Systematic Review", journal="JMIR Hum Factors", year="2023", month="Jul", day="12", volume="10", pages="e44327", keywords="knowledge management", keywords="knowledge sharing", keywords="medical imaging department", keywords="radiology department", keywords="nuclear medicine department", keywords="facilitators", keywords="barriers", keywords="systematic review", abstract="Background: Knowledge management plays a significant role in health care institutions. It consists of 4 processes: knowledge creation, knowledge capture, knowledge sharing, and knowledge application. The success of health care institutions relies on effective knowledge sharing among health care professionals, so the facilitators and barriers to knowledge sharing must be identified and understood. Medical imaging departments play a key role in cancer centers. Therefore, an understanding of the factors that affect knowledge sharing in medical imaging departments should be sought to increase patient outcomes and reduce medical errors. Objective: The purpose of this systematic review was to identify the facilitators and barriers that affect knowledge-sharing behaviors in medical imaging departments and identify the differences between medical imaging departments in general hospitals and cancer centers. Methods: We performed a systematic search in PubMed Central, EBSCOhost (CINAHL), Ovid MEDLINE, Ovid Embase, Elsevier (Scopus), ProQuest, and Clarivate (Web of Science) in December 2021. Relevant articles were identified by examining the titles and abstracts. In total, 2 reviewers independently screened the full texts of relevant papers according to the inclusion and exclusion criteria. We included qualitative, quantitative, and mixed methods studies that investigated the facilitators and barriers that affect knowledge sharing. We used the Mixed Methods Appraisal Tool to assess the quality of the included articles and narrative synthesis to report the results. Results: A total of 49 articles were selected for the full in-depth analysis, and 38 (78\%) studies were included in the final review, with 1 article added from other selected databases. There were 31 facilitators and 10 barriers identified that affected knowledge-sharing practices in medical imaging departments. These facilitators were divided according to their characteristics into 3 categories: individual, departmental, and technological facilitators. The barriers that hindered knowledge sharing were divided into 4 categories: financial, administrative, technological, and geographical barriers. Conclusions: This review highlighted the factors that influenced knowledge-sharing practices in medical imaging departments in cancer centers and general hospitals. In terms of the facilitators and barriers to knowledge sharing, this study shows that these are the same in medical imaging departments, whether in general hospitals or cancer centers. Our findings can be used as guidelines for medical imaging departments to support knowledge-sharing frameworks and enhance knowledge sharing by understanding the facilitators and barriers. ", doi="10.2196/44327", url="https://humanfactors.jmir.org/2023/1/e44327", url="http://www.ncbi.nlm.nih.gov/pubmed/37436810" } @Article{info:doi/10.2196/45850, author="Calvo-Cidoncha, Elena and Verdinelli, Juli{\'a}n and Gonz{\'a}lez-Bueno, Javier and L{\'o}pez-Soto, Alfonso and Camacho Hernando, Concepci{\'o}n and Pastor-Duran, Xavier and Codina-Jan{\'e}, Carles and Lozano-Rub{\'i}, Raimundo", title="An Ontology-Based Approach to Improving Medication Appropriateness in Older Patients: Algorithm Development and Validation Study", journal="JMIR Med Inform", year="2023", month="Jul", day="10", volume="11", pages="e45850", keywords="biological ontologies", keywords="decision support systems", keywords="inappropriate prescribing", keywords="elderly", keywords="medication regimen complexity", keywords="anticholinergic drug burden", keywords="trigger tool", keywords="clinical", keywords="ontologies", keywords="pharmacy", keywords="medication", keywords="decision support", keywords="pharmaceutic", keywords="pharmacology", keywords="chronic condition", keywords="chronic disease", keywords="domain", keywords="adverse event", keywords="ontology-based", keywords="alert", abstract="Background: Inappropriate medication in older patients with multimorbidity results in a greater risk of adverse drug events. Clinical decision support systems (CDSSs) are intended to improve medication appropriateness. One approach to improving CDSSs is to use ontologies instead of relational databases. Previously, we developed OntoPharma---an ontology-based CDSS for reducing medication prescribing errors. Objective: The primary aim was to model a domain for improving medication appropriateness in older patients (chronic patient domain). The secondary aim was to implement the version of OntoPharma containing the chronic patient domain in a hospital setting. Methods: A 4-step process was proposed. The first step was defining the domain scope. The chronic patient domain focused on improving medication appropriateness in older patients. A group of experts selected the following three use cases: medication regimen complexity, anticholinergic and sedative drug burden, and the presence of triggers for identifying possible adverse events. The second step was domain model representation. The implementation was conducted by medical informatics specialists and clinical pharmacists using Prot{\'e}g{\'e}-OWL (Stanford Center for Biomedical Informatics Research). The third step was OntoPharma-driven alert module adaptation. We reused the existing framework based on SPARQL to query ontologies. The fourth step was implementing the version of OntoPharma containing the chronic patient domain in a hospital setting. Alerts generated from July to September 2022 were analyzed. Results: We proposed 6 new classes and 5 new properties, introducing the necessary changes in the ontologies previously created. An alert is shown if the Medication Regimen Complexity Index is ?40, if the Drug Burden Index is ?1, or if there is a trigger based on an abnormal laboratory value. A total of 364 alerts were generated for 107 patients; 154 (42.3\%) alerts were accepted. Conclusions: We proposed an ontology-based approach to provide support for improving medication appropriateness in older patients with multimorbidity in a scalable, sustainable, and reusable way. The chronic patient domain was built based on our previous research, reusing the existing framework. OntoPharma has been implemented in clinical practice and generates alerts, considering the following use cases: medication regimen complexity, anticholinergic and sedative drug burden, and the presence of triggers for identifying possible adverse events. ", doi="10.2196/45850", url="https://medinform.jmir.org/2023/1/e45850" } @Article{info:doi/10.2196/42898, author="Liu, Ruyue and Li, Qiuxia and Li, Yifan and Wei, Wenjian and Ma, Siqi and Wang, Jialin and Zhang, Nan", title="Public Preference Heterogeneity and Predicted Uptake Rate of Upper Gastrointestinal Cancer Screening Programs in Rural China: Discrete Choice Experiments and Latent Class Analysis", journal="JMIR Public Health Surveill", year="2023", month="Jul", day="10", volume="9", pages="e42898", keywords="upper gastrointestinal cancer", keywords="screening programs", keywords="discrete choice experiment", keywords="latent class logit model", keywords="public preference heterogeneity", keywords="uptake rate", abstract="Background: Rapid increases in the morbidity and mortality of patients with upper gastrointestinal cancer (UGC) in high-incidence countries in Asia have raised public health concerns. Screening can effectively reduce the incidence and mortality of patients with UGC, but the low population uptake rate seriously affects the screening effect. Objective: We aimed to determine the characteristics that influence residents' preference heterogeneity for a UGC-screening program and the extent to which these characteristics predict residents' uptake rates. Methods: A discrete choice experiment was conducted in 1000 residents aged 40-69 years who were randomly selected from 3 counties (Feicheng, Linqu, and Dongchangfu) in Shandong Province, China. Each respondent was repeatedly asked to choose from 9 discrete choice questions of 2 hypothetical screening programs comprising 5 attributes: screening interval, screening technique, regular follow-up for precancerous lesions, mortality reduction, and out-of-pocket costs. The latent class logit model was used to estimate residents' preference heterogeneity for each attribute level, their willingness to pay, and the expected uptake rates. Results: Of the 1000 residents invited, 926 (92.6\%) were included in the final analyses. The mean age was 57.32 (SD 7.22) years. The best model contained 4 classes of respondents (Akaike information criterion=7140.989, Bayesian information criterion=7485.373) defined by different preferences for the 5 attributes. In the 4-class model, out of 926 residents, 88 (9.5\%) were assigned to class 1, named as the negative latent type; 216 (3.3\%) were assigned to class 2, named as the positive integrated type; 434 (46.9\%) were assigned to class 3, named as the positive comfortable type; and 188 (20.3\%) were assigned to class 4, named as the neutral quality type. For these 4 latent classes, ``out-of-pocket cost'' is the most preferred attribute in negative latent type and positive integrated type residents (45.04\% vs 66.04\% importance weights), whereas ``screening technique'' is the most preferred factor in positive comfortable type residents (62.56\% importance weight) and ``screening interval'' is the most valued attribute in neutral quality type residents (47.05\% importance weight). Besides, residents in different classes had common preference for painless endoscopy, and their willingness to pay were CNY {\textyen}385.369 (US \$59.747), CNY {\textyen}93.44 (US \$14.486), CNY {\textyen}1946.48 (US \$301.810), and CNY {\textyen}3566.60 (US \$552.961), respectively. Residents' participation rate could increase by more than 89\% (except for the 60.98\% in class 2) if the optimal UGC screening option with free, follow-up for precancerous lesions, 45\% mortality reduction, screening every year, and painless endoscopy was implemented. Conclusions: Public preference heterogeneity for UGC screening does exist. Most residents have a positive attitude toward UGC screening, but their preferences vary in selected attributes and levels, except for painless endoscopy. Policy makers should consider these heterogeneities to formulate UGC-screening programs that incorporate the public's needs and preferences to improve participation rates. ", doi="10.2196/42898", url="https://publichealth.jmir.org/2023/1/e42898", url="http://www.ncbi.nlm.nih.gov/pubmed/37428530" } @Article{info:doi/10.2196/47612, author="Jang, Woocheol and Choi, Sung Yong and Kim, Yoo Ji and Yon, Keon Dong and Lee, Joo Young and Chung, Sung-Hoon and Kim, Young Chae and Yeo, Geun Seung and Lee, Jinseok", title="Artificial Intelligence--Driven Respiratory Distress Syndrome Prediction for Very Low Birth Weight Infants: Korean Multicenter Prospective Cohort Study", journal="J Med Internet Res", year="2023", month="Jul", day="10", volume="25", pages="e47612", keywords="artificial intelligence", keywords="deep neural network", keywords="premature infants", keywords="respiratory distress syndrome", keywords="AI", keywords="AI model", keywords="pediatrics", keywords="neonatal", keywords="maternal health", keywords="machine learning", abstract="Background: Respiratory distress syndrome (RDS) is a disease that commonly affects premature infants whose lungs are not fully developed. RDS results from a lack of surfactant in the lungs. The more premature the infant is, the greater is the likelihood of having RDS. However, even though not all premature infants have RDS, preemptive treatment with artificial pulmonary surfactant is administered in most cases. Objective: We aimed to develop an artificial intelligence model to predict RDS in premature infants to avoid unnecessary treatment. Methods: In this study, 13,087 very low birth weight infants who were newborns weighing less than 1500 grams were assessed in 76 hospitals of the Korean Neonatal Network. To predict RDS in very low birth weight infants, we used basic infant information, maternity history, pregnancy/birth process, family history, resuscitation procedure, and test results at birth such as blood gas analysis and Apgar score. The prediction performances of 7 different machine learning models were compared, and a 5-layer deep neural network was proposed in order to enhance the prediction performance from the selected features. An ensemble approach combining multiple models from the 5-fold cross-validation was subsequently developed. Results: Our proposed ensemble 5-layer deep neural network consisting of the top 20 features provided high sensitivity (83.03\%), specificity (87.50\%), accuracy (84.07\%), balanced accuracy (85.26\%), and area under the curve (0.9187). Based on the model that we developed, a public web application that enables easy access for the prediction of RDS in premature infants was deployed. Conclusions: Our artificial intelligence model may be useful for preparations for neonatal resuscitation, particularly in cases involving the delivery of very low birth weight infants, as it can aid in predicting the likelihood of RDS and inform decisions regarding the administration of surfactant. ", doi="10.2196/47612", url="https://www.jmir.org/2023/1/e47612", url="http://www.ncbi.nlm.nih.gov/pubmed/37428525" } @Article{info:doi/10.2196/42016, author="Monahan, Corneille Ann and Feldman, S. Sue", title="The Utility of Predictive Modeling and a Systems Process Approach to Reduce Emergency Department Crowding: A Position Paper", journal="Interact J Med Res", year="2023", month="Jul", day="10", volume="12", pages="e42016", keywords="emergency care, prehospital", keywords="information systems", keywords="crowding", keywords="healthcare service", keywords="healthcare system", keywords="emergency department", keywords="boarding", keywords="exit block", keywords="medical informatics, application", keywords="health services research", keywords="personalized medicine", keywords="predictive medicine", keywords="model, probabilistic", keywords="polynomial model", keywords="decision support technique", keywords="systems approach", keywords="predict", keywords="evidence based health care", keywords="hospital bed management", keywords="management information systems", keywords="position paper", doi="10.2196/42016", url="https://www.i-jmr.org/2023/1/e42016", url="http://www.ncbi.nlm.nih.gov/pubmed/37428536" } @Article{info:doi/10.2196/47930, author="Blecker, Saul and Schoenthaler, Antoinette and Martinez, Rose Tiffany and Belli, M. Hayley and Zhao, Yunan and Wong, Christina and Fitchett, Cassidy and Bearnot, R. Harris and Mann, Devin", title="Leveraging Electronic Health Record Technology and Team Care to Address Medication Adherence: Protocol for a Cluster Randomized Controlled Trial", journal="JMIR Res Protoc", year="2023", month="Jul", day="7", volume="12", pages="e47930", keywords="medication adherence", keywords="hypertension", keywords="clinical decision support", keywords="proportion of days covered", keywords="EHR", keywords="electronic health record", keywords="technology", keywords="adherence", keywords="primary care", abstract="Background: Low medication adherence is a common cause of high blood pressure but is often unrecognized in clinical practice. Electronic data linkages between electronic health records (EHRs) and pharmacies offer the opportunity to identify low medication adherence, which can be used for interventions at the point of care. We developed a multicomponent intervention that uses linked EHR and pharmacy data to automatically identify patients with elevated blood pressure and low medication adherence. The intervention then combines team-based care with EHR-based workflows to address medication nonadherence. Objective: This study aims to describe the design of the Leveraging EHR Technology and Team Care to Address Medication Adherence (TEAMLET) trial, which tests the effectiveness of a multicomponent intervention that leverages EHR-based data and team-based care on medication adherence among patients with hypertension. Methods: TEAMLET is a pragmatic, cluster randomized controlled trial in which 10 primary care practices will be randomized 1:1 to the multicomponent intervention or usual care. We will include all patients with hypertension and low medication adherence who are seen at enrolled practices. The primary outcome is medication adherence, as measured by the proportion of days covered, and the secondary outcome is clinic systolic blood pressure. We will also assess intervention implementation, including adoption, acceptability, fidelity, cost, and sustainability. Results: As of May 2023, we have randomized 10 primary care practices into the study, with 5 practices assigned to each arm of the trial. The enrollment for the study commenced on October 5, 2022, and the trial is currently ongoing. We anticipate patient recruitment to go through the fall of 2023 and the primary outcomes to be assessed in the fall of 2024. Conclusions: The TEAMLET trial will evaluate the effectiveness of a multicomponent intervention that leverages EHR-based data and team-based care on medication adherence. If successful, the intervention could offer a scalable approach to address inadequate blood pressure control among millions of patients with hypertension. Trial Registration: ClinicalTrials.gov NCT05349422; https://clinicaltrials.gov/ct2/show/NCT05349422 International Registered Report Identifier (IRRID): DERR1-10.2196/47930 ", doi="10.2196/47930", url="https://www.researchprotocols.org/2023/1/e47930", url="http://www.ncbi.nlm.nih.gov/pubmed/37418304" } @Article{info:doi/10.2196/46427, author="Jing, Yu and Qin, Peinuan and Fan, Xiangmin and Qiang, Wei and Wencheng, Zhu and Sun, Wei and Tian, Feng and Wang, Dakuo", title="Deep Learning--Assisted Gait Parameter Assessment for Neurodegenerative Diseases: Model Development and Validation", journal="J Med Internet Res", year="2023", month="Jul", day="5", volume="25", pages="e46427", keywords="deep learning", keywords="neurodegenerative disease", keywords="auxiliary medical care", keywords="gait parameter assessment", abstract="Background: Neurodegenerative diseases (NDDs) are prevalent among older adults worldwide. Early diagnosis of NDD is challenging yet crucial. Gait status has been identified as an indicator of early-stage NDD changes and can play a significant role in diagnosis, treatment, and rehabilitation. Historically, gait assessment has relied on intricate but imprecise scales by trained professionals or required patients to wear additional equipment, causing discomfort. Advancements in artificial intelligence may completely transform this and offer a novel approach to gait evaluation. Objective: This study aimed to use cutting-edge machine learning techniques to offer patients a noninvasive, entirely contactless gait assessment and provide health care professionals with precise gait assessment results covering all common gait-related parameters to assist in diagnosis and rehabilitation planning. Methods: Data collection involved motion data from 41 different participants aged 25 to 85 (mean 57.51, SD 12.93) years captured in motion sequences using the Azure Kinect (Microsoft Corp; a 3D camera with a 30-Hz sampling frequency). Support vector machine (SVM) and bidirectional long short-term memory (Bi-LSTM) classifiers trained using spatiotemporal features extracted from raw data were used to identify gait types in each walking frame. Gait semantics could then be obtained from the frame labels, and all the gait parameters could be calculated accordingly. For optimal generalization performance of the model, the classifiers were trained using a 10-fold cross-validation strategy. The proposed algorithm was also compared with the previous best heuristic method. Qualitative and quantitative feedback from medical staff and patients in actual medical scenarios was extensively collected for usability analysis. Results: The evaluations comprised 3 aspects. Regarding the classification results from the 2 classifiers, Bi-LSTM achieved an average precision, recall, and F1-score of 90.54\%, 90.41\%, and 90.38\%, respectively, whereas these metrics were 86.99\%, 86.62\%, and 86.67\%, respectively, for SVM. Moreover, the Bi-LSTM--based method attained 93.2\% accuracy in gait segmentation evaluation (tolerance set to 2), whereas that of the SVM-based method achieved only 77.5\% accuracy. For the final gait parameter calculation result, the average error rate of the heuristic method, SVM, and Bi-LSTM was 20.91\% (SD 24.69\%), 5.85\% (SD 5.45\%), and 3.17\% (SD 2.75\%), respectively. Conclusions: This study demonstrated that the Bi-LSTM--based approach can effectively support accurate gait parameter assessment, assisting medical professionals in making early diagnoses and reasonable rehabilitation plans for patients with NDD. ", doi="10.2196/46427", url="https://www.jmir.org/2023/1/e46427", url="http://www.ncbi.nlm.nih.gov/pubmed/37405831" } @Article{info:doi/10.2196/47479, author="Walker, Louise Harriet and Ghani, Shahi and Kuemmerli, Christoph and Nebiker, Andreas Christian and M{\"u}ller, Peter Beat and Raptis, Aristotle Dimitri and Staubli, Manuel Sebastian", title="Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument", journal="J Med Internet Res", year="2023", month="Jun", day="30", volume="25", pages="e47479", keywords="artificial intelligence", keywords="internet information", keywords="patient information", keywords="ChatGPT", keywords="EQIP tool", keywords="chatbot", keywords="chatbots", keywords="conversational agent", keywords="conversational agents", keywords="internal medicine", keywords="pancreas", keywords="liver", keywords="hepatic", keywords="biliary", keywords="gall", keywords="bile", keywords="gallstone", keywords="pancreatitis", keywords="pancreatic", keywords="medical information", abstract="Background: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI. Objective: We aimed to assess the reliability of medical information provided by ChatGPT. Methods: Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT. Results: Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60\% (15/25). Interrater agreement as measured by the Fleiss $\kappa$ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100\%. Conclusions: ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information. ", doi="10.2196/47479", url="https://www.jmir.org/2023/1/e47479", url="http://www.ncbi.nlm.nih.gov/pubmed/37389908" } @Article{info:doi/10.2196/45550, author="O'Neill, Braden and Ferguson, Jacob and Dalueg, Lauren and Yusuf, Abban and Kirubarajan, Abirami and Lloyd, Taryn and Mollanji, Eisi and Persaud, Navindra", title="Evaluating the Supporting Evidence of Medical Cannabis Claims Made on Clinic Websites: Cross-Sectional Study", journal="J Med Internet Res", year="2023", month="Jun", day="29", volume="25", pages="e45550", keywords="cannabis", keywords="evidence-based medicine", keywords="adverse effects", keywords="consumer health information", abstract="Background: Since the legalization of medical cannabis in Canada in 2013, prescription of cannabis for medical purposes has become commonplace and a multibillion dollar industry has formed. Much of the media coverage surrounding medical cannabis has been positive in nature, leading to Canadians potentially underestimating the adverse effects of medical cannabis use. In recent years, there has been a large increase in clinic websites advertising the use of medical cannabis for health indications. However, little is known about the quality of the evidence used by these clinic websites to describe the effectiveness of cannabis used for medical purposes. Objective: We aimed to identify the indications for medical cannabis reported by cannabis clinics in Ontario, Canada, and the evidence these clinics cited to support cannabis prescription. Methods: We conducted a cross-sectional web search to identify all cannabis clinic websites within Ontario, Canada, that had physician involvement and identified their primary purpose as cannabis prescription. Two reviewers independently searched these websites to identify all medical indications for which cannabis was promoted and reviewed and critically appraised all studies cited using the Oxford Centre for Evidence-Based Medicine Levels of Evidence rubric. Results: A total of 29 clinics were identified, promoting cannabis for 20 different medical indications including migraines, insomnia, and fibromyalgia. There were 235 unique studies cited on these websites to support the effectiveness of cannabis for these indications. A high proportion (36/235, 15.3\%) of the studies were identified to be at the lowest level of evidence (level 5). Only 4 clinic websites included any mention of harms associated with cannabis. Conclusions: Cannabis clinic websites generally promote cannabis use as medically effective but cite low-quality evidence to support these claims and rarely discuss harms. The recommendation of cannabis as a general therapeutic for many indications unsupported by high-quality evidence is potentially misleading for medical practitioners and patients. This disparity should be carefully evaluated in context of the specific medical indication and an individualized patient risk assessment. Our work illustrates the need to increase the quality of research performed on the medical effects of cannabis. ", doi="10.2196/45550", url="https://www.jmir.org/2023/1/e45550", url="http://www.ncbi.nlm.nih.gov/pubmed/37384372" } @Article{info:doi/10.2196/44331, author="Nelson, Walter and Khanna, Nityan and Ibrahim, Mohamed and Fyfe, Justin and Geiger, Maxwell and Edwards, Keith and Petch, Jeremy", title="Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation", journal="JMIR Form Res", year="2023", month="Jun", day="29", volume="7", pages="e44331", keywords="medical record linkage", keywords="electronic health records", keywords="medical record systems", keywords="computerized", keywords="machine learning", keywords="quality of care", keywords="health care system", keywords="open-source software", keywords="Bayesian optimization", keywords="pilot", keywords="data linkage", keywords="master patient index", keywords="master index", keywords="record link", keywords="matching algorithm", keywords="FEBRL", abstract="Background: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automated matching algorithms. These matching algorithms must be configured in advance, such as by setting the weights of patient attributes, usually by someone with knowledge of both the matching algorithm and the patient population being served. Objective: We aimed to develop and evaluate a machine learning--based software tool, which automatically configures a patient matching algorithm by learning from pairs of patient records previously linked by humans already present in the database. Methods: We built a free and open-source software tool to optimize record linkage algorithm parameters based on historical record linkages. The tool uses Bayesian optimization to identify the set of configuration parameters that lead to optimal matching performance in a given patient population, by learning from prior record linkages by humans. The tool is written assuming only the existence of a minimal HTTP application programming interface (API), and so is agnostic to the choice of MPI software, record linkage algorithm, and patient population. As a proof of concept, we integrated our tool with Sant{\'e}MPI, an open-source MPI. We validated the tool using several synthetic patient populations in Sant{\'e}MPI by comparing the performance of the optimized configuration in held-out data to Sant{\'e}MPI's default matching configuration using sensitivity and specificity. Results: The machine learning--optimized configurations correctly detect over 90\% of true record linkages as definite matches in all data sets, with 100\% specificity and positive predictive value in all data sets, whereas the baseline detects none. In the largest data set examined, the baseline matching configuration detects possible record linkages with a sensitivity of 90.2\% (95\% CI 88.4\%-92.0\%) and specificity of 100\%. By comparison, the machine learning--optimized matching configuration attains a sensitivity of 100\%, with a decreased specificity of 95.9\% (95\% CI 95.9\%-96.0\%). We report significant gains in sensitivity in all data sets examined, at the cost of only marginally decreased specificity. The configuration optimization tool, data, and data set generator have been made freely available. Conclusions: Our machine learning software tool can be used to significantly improve the performance of existing record linkage algorithms, without knowledge of the algorithm being used or specific details of the patient population being served. ", doi="10.2196/44331", url="https://formative.jmir.org/2023/1/e44331", url="http://www.ncbi.nlm.nih.gov/pubmed/37384382" } @Article{info:doi/10.2196/48568, author="Liu, Jialin and Wang, Changyu and Liu, Siru", title="Utility of ChatGPT in Clinical Practice", journal="J Med Internet Res", year="2023", month="Jun", day="28", volume="25", pages="e48568", keywords="ChatGPT", keywords="artificial intelligence", keywords="large language models", keywords="clinical practice", keywords="large language model", keywords="natural language processing", keywords="NLP", keywords="doctor-patient", keywords="patient-physician", keywords="communication", keywords="challenges", keywords="barriers", keywords="recommendations", keywords="guidance", keywords="guidelines", keywords="best practices", keywords="risks", doi="10.2196/48568", url="https://www.jmir.org/2023/1/e48568", url="http://www.ncbi.nlm.nih.gov/pubmed/37379067" } @Article{info:doi/10.2196/42816, author="Singh, Ashima and Sontag, K. Marci and Zhou, Mei and Dasgupta, Mahua and Crume, Tessa and McLemore, Morgan and Galadanci, Najibah and Randall, Eldrida and Steiner, Nicole and Brandow, M. Amanda and Koch, Kathryn and Field, J. Joshua and Hassell, Kathryn and Snyder, B. Angela and Kanter, Julie", title="Evaluating the Discriminatory Ability of the Sickle Cell Data Collection Program's Administrative Claims Case Definition in Identifying Adults With Sickle Cell Disease: Validation Study", journal="JMIR Public Health Surveill", year="2023", month="Jun", day="28", volume="9", pages="e42816", keywords="surveillance using administrative data", keywords="rare conditions", keywords="sickle cell disease", keywords="disease", keywords="surveillance", keywords="genetic", keywords="prevention", keywords="data", keywords="adults", keywords="epidemiology", keywords="utilization", abstract="Background: Sickle cell disease (SCD) was first recognized in 1910 and identified as a genetic condition in 1949. However, there is not a universal clinical registry that can be used currently to estimate its prevalence. The Sickle Cell Data Collection (SCDC) program, funded by the Centers for Disease Control and Prevention, funds state-level grantees to compile data within their states from various sources including administrative claims to identify individuals with SCD. The performance of the SCDC administrative claims case definition has been validated in a pediatric population with SCD, but it has not been tested in adults. Objective: The objective of our study is to evaluate the discriminatory ability of the SCDC administrative claims case definition to accurately identify adults with SCD using Medicaid insurance claims data. Methods: Our study used Medicaid claims data in combination with hospital-based medical record data from the Alabama, Georgia, and Wisconsin SCDC programs to identify individuals aged 18 years or older meeting the SCDC administrative claims case definition. In order to validate this definition, our study included only those individuals who were identified in both Medicaid's and the partnering clinical institution's records. We used clinical laboratory tests and diagnostic algorithms to determine the true SCD status of this subset of patients. Positive predictive values (PPV) are reported overall and by state under several scenarios. Results: There were 1219 individuals (354 from Alabama and 865 from Georgia) who were identified through a 5-year time period. The 5-year time period yielded a PPV of 88.4\% (91\% for data from Alabama and 87\% for data from Georgia), when only using data with laboratory-confirmed (gold standard) cases as true positives. With a narrower time period (3-year period) and data from 3 states (Alabama, Georgia, and Wisconsin), a total of 1432 individuals from these states were included in our study. The overall 3-year PPV was 89.4\% (92\%, 93\%, and 81\% for data from Alabama, Georgia, and Wisconsin, respectively) when only considering laboratory-confirmed cases as true cases. Conclusions: Adults identified as having SCD from administrative claims data based on the SCDC case definition have a high probability of truly having the disease, especially if those hospitals have active SCD programs. Administrative claims are thus a valuable data source to identify adults with SCD in a state and understand their epidemiology and health care service usage. ", doi="10.2196/42816", url="https://publichealth.jmir.org/2023/1/e42816", url="http://www.ncbi.nlm.nih.gov/pubmed/37379070" } @Article{info:doi/10.2196/44549, author="Alpers, Rieke and K{\"u}hne, Lisa and Truong, Hong-Phuc and Zeeb, Hajo and Westphal, Max and J{\"a}ckle, Sonja", title="Evaluation of the EsteR Toolkit for COVID-19 Decision Support: Sensitivity Analysis and Usability Study", journal="JMIR Form Res", year="2023", month="Jun", day="27", volume="7", pages="e44549", keywords="COVID-19", keywords="public health", keywords="decision support tool", keywords="sensitivity analysis", keywords="web application", keywords="usability study", abstract="Background: During the COVID-19 pandemic, local health authorities were responsible for managing and reporting current cases in Germany. Since March 2020, employees had to contain the spread of COVID-19 by monitoring and contacting infected persons as well as tracing their contacts. In the EsteR project, we implemented existing and newly developed statistical models as decision support tools to assist in the work of the local health authorities. Objective: The main goal of this study was to validate the EsteR toolkit in two complementary ways: first, investigating the stability of the answers provided by our statistical tools regarding model parameters in the back end and, second, evaluating the usability and applicability of our web application in the front end by test users. Methods: For model stability assessment, a sensitivity analysis was carried out for all 5 developed statistical models. The default parameters of our models as well as the test ranges of the model parameters were based on a previous literature review on COVID-19 properties. The obtained answers resulting from different parameters were compared using dissimilarity metrics and visualized using contour plots. In addition, the parameter ranges of general model stability were identified. For the usability evaluation of the web application, cognitive walk-throughs and focus group interviews were conducted with 6 containment scouts located at 2 different local health authorities. They were first asked to complete small tasks with the tools and then express their general impressions of the web application. Results: The simulation results showed that some statistical models were more sensitive to changes in their parameters than others. For each of the single-person use cases, we determined an area where the respective model could be rated as stable. In contrast, the results of the group use cases highly depended on the user inputs, and thus, no area of parameters with general model stability could be identified. We have also provided a detailed simulation report of the sensitivity analysis. In the user evaluation, the cognitive walk-throughs and focus group interviews revealed that the user interface needed to be simplified and more information was necessary as guidance. In general, the testers rated the web application as helpful, especially for new employees. Conclusions: This evaluation study allowed us to refine the EsteR toolkit. Using the sensitivity analysis, we identified suitable model parameters and analyzed how stable the statistical models were in terms of changes in their parameters. Furthermore, the front end of the web application was improved with the results of the conducted cognitive walk-throughs and focus group interviews regarding its user-friendliness. ", doi="10.2196/44549", url="https://formative.jmir.org/2023/1/e44549", url="http://www.ncbi.nlm.nih.gov/pubmed/37368487" } @Article{info:doi/10.2196/45614, author="Boussina, Aaron and Wardi, Gabriel and Shashikumar, Prajwal Supreeth and Malhotra, Atul and Zheng, Kai and Nemati, Shamim", title="Representation Learning and Spectral Clustering for the Development and External Validation of Dynamic Sepsis Phenotypes: Observational Cohort Study", journal="J Med Internet Res", year="2023", month="Jun", day="23", volume="25", pages="e45614", keywords="sepsis", keywords="phenotype", keywords="emergency service, hospital", keywords="disease progression", keywords="artificial intelligence", keywords="machine learning", keywords="emergency", keywords="infection", keywords="clinical phenotype", keywords="clinical phenotyping", keywords="transition model", keywords="transition modeling", abstract="Background: Recent attempts at clinical phenotyping for sepsis have shown promise in identifying groups of patients with distinct treatment responses. Nonetheless, the replicability and actionability of these phenotypes remain an issue because the patient trajectory is a function of both the patient's physiological state and the interventions they receive. Objective: We aimed to develop a novel approach for deriving clinical phenotypes using unsupervised learning and transition modeling. Methods: Forty commonly used clinical variables from the electronic health record were used as inputs to a feed-forward neural network trained to predict the onset of sepsis. Using spectral clustering on the representations from this network, we derived and validated consistent phenotypes across a diverse cohort of patients with sepsis. We modeled phenotype dynamics as a Markov decision process with transitions as a function of the patient's current state and the interventions they received. Results: Four consistent and distinct phenotypes were derived from over 11,500 adult patients who were admitted from the University of California, San Diego emergency department (ED) with sepsis between January 1, 2016, and January 31, 2020. Over 2000 adult patients admitted from the University of California, Irvine ED with sepsis between November 4, 2017, and August 4, 2022, were involved in the external validation. We demonstrate that sepsis phenotypes are not static and evolve in response to physiological factors and based on interventions. We show that roughly 45\% of patients change phenotype membership within the first 6 hours of ED arrival. We observed consistent trends in patient dynamics as a function of interventions including early administration of antibiotics. Conclusions: We derived and describe 4 sepsis phenotypes present within 6 hours of triage in the ED. We observe that the administration of a 30 mL/kg fluid bolus may be associated with worse outcomes in certain phenotypes, whereas prompt antimicrobial therapy is associated with improved outcomes. ", doi="10.2196/45614", url="https://www.jmir.org/2023/1/e45614", url="http://www.ncbi.nlm.nih.gov/pubmed/37351927" } @Article{info:doi/10.2196/45334, author="Kraska, Jake and Bell, Karen and Costello, Shane", title="Graded Response Model Analysis and Computer Adaptive Test Simulation of the Depression Anxiety Stress Scale 21: Evaluation and Validation Study", journal="J Med Internet Res", year="2023", month="Jun", day="22", volume="25", pages="e45334", keywords="graded response model", keywords="DASS-21", keywords="CAT", keywords="computer adaptive testing", keywords="simulation", keywords="psychological distress", keywords="depression", keywords="anxiety", keywords="stress", keywords="mental health", keywords="screening tool", keywords="tool", keywords="reliability", keywords="development", keywords="model", abstract="Background: The Depression Anxiety Stress Scale 21 (DASS-21) is a mental health screening tool with conflicting studies regarding its factor structure. No studies have yet attempted to develop a computer adaptive test (CAT) version of it. Objective: This study calibrated items for, and simulated, a DASS-21 CAT using a nonclinical sample. Methods: An evaluation sample (n=580) was used to evaluate the DASS-21 scales via confirmatory factor analysis, Mokken analysis, and graded response modeling. A CAT was simulated with a validation sample (n=248) and a simulated sample (n=10,000) to confirm the generalizability of the model developed. Results: A bifactor model, also known as the ``quadripartite'' model (1 general factor with 3 specific factors) in the context of the DASS-21, displayed good fit. All scales displayed acceptable fit with the graded response model. Simulation of 3 unidimensional (depression, anxiety, and stress) CATs resulted in an average 17\% to 48\% reduction in items administered when a reliability of 0.80 was acceptable. Conclusions: This study clarifies previous conflicting findings regarding the DASS-21 factor structure and suggests that the quadripartite model for the DASS-21 items fits best. Item response theory modeling suggests that the items measure their respective constructs best between 0$\theta$ and 3$\theta$ (mild to moderate severity). ", doi="10.2196/45334", url="https://www.jmir.org/2023/1/e45334", url="http://www.ncbi.nlm.nih.gov/pubmed/37347530" } @Article{info:doi/10.2196/43333, author="Malerbi, Korn Fernando and Nakayama, Filipe Luis and Gayle Dychiao, Robyn and Zago Ribeiro, Lucas and Villanueva, Cleva and Celi, Anthony Leo and Regatieri, Vinicius Caio", title="Digital Education for the Deployment of Artificial Intelligence in Health Care", journal="J Med Internet Res", year="2023", month="Jun", day="22", volume="25", pages="e43333", keywords="artificial intelligence", keywords="digital health", keywords="health education", keywords="machine learning", keywords="digital education", keywords="digital", keywords="education", keywords="transformation", keywords="neural", keywords="network", keywords="evaluation", keywords="dataset", keywords="data", keywords="set", keywords="clinical", doi="10.2196/43333", url="https://www.jmir.org/2023/1/e43333", url="http://www.ncbi.nlm.nih.gov/pubmed/37347537" } @Article{info:doi/10.2196/44876, author="Gendrin, Aline and Souliotis, Leonidas and Loudon-Griffiths, James and Aggarwal, Ravisha and Amoako, Daniel and Desouza, Gregory and Dimitrievska, Sashka and Metcalfe, Paul and Louvet, Emilie and Sahni, Harpreet", title="Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning--Based Information Extraction: Development of a Natural Language Processing Algorithm", journal="JMIR Form Res", year="2023", month="Jun", day="22", volume="7", pages="e44876", keywords="algorithm", keywords="artificial intelligence", keywords="BERT", keywords="cancer", keywords="classification", keywords="data extraction", keywords="data mining", keywords="deep-learning", keywords="development", keywords="drug approval", keywords="free text", keywords="information retrieval", keywords="line of therapy", keywords="machine learning", keywords="natural language processing", keywords="NLP", keywords="oncology", keywords="pharmaceutic", keywords="pharmacology", keywords="pharmacy", keywords="stage of cancer", keywords="text extraction", keywords="text mining", keywords="unstructured data", abstract="Background: New drug treatments are regularly approved, and it is challenging to remain up-to-date in this rapidly changing environment. Fast and accurate visualization is important to allow a global understanding of the drug market. Automation of this information extraction provides a helpful starting point for the subject matter expert, helps to mitigate human errors, and saves time. Objective: We aimed to semiautomate disease population extraction from the free text of oncology drug approval descriptions from the BioMedTracker database for 6 selected drug targets. More specifically, we intended to extract (1) line of therapy, (2) stage of cancer of the patient population described in the approval, and (3) the clinical trials that provide evidence for the approval. We aimed to use these results in downstream applications, aiding the searchability of relevant content against related drug project sources. Methods: We fine-tuned a state-of-the-art deep learning model, Bidirectional Encoder Representations from Transformers, for each of the 3 desired outputs. We independently applied rule-based text mining approaches. We compared the performances of deep learning and rule-based approaches and selected the best method, which was then applied to new entries. The results were manually curated by a subject matter expert and then used to train new models. Results: The training data set is currently small (433 entries) and will enlarge over time when new approval descriptions become available or if a choice is made to take another drug target into account. The deep learning models achieved 61\% and 56\% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively, which were treated as classification tasks. Trial identification is treated as a named entity recognition task, and the 5-fold cross-validated F1-score is currently 87\%. Although the scores of the classification tasks could seem low, the models comprise 5 classes each, and such scores are a marked improvement when compared to random classification. Moreover, we expect improved performance as the input data set grows, since deep learning models need to be trained on a large enough amount of data to be able to learn the task they are taught. The rule-based approach achieved 60\% and 74\% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively. No attempt was made to define a rule-based approach for trial identification. Conclusions: We developed a natural language processing algorithm that is currently assisting subject matter experts in disease population extraction, which supports health authority approvals. This algorithm achieves semiautomation, enabling subject matter experts to leverage the results for deeper analysis and to accelerate information retrieval in a crowded clinical environment such as oncology. ", doi="10.2196/44876", url="https://formative.jmir.org/2023/1/e44876", url="http://www.ncbi.nlm.nih.gov/pubmed/37347514" } @Article{info:doi/10.2196/45352, author="Killian, O. Michael and Tian, Shubo and Xing, Aiwen and Hughes, Dana and Gupta, Dipankar and Wang, Xiaoyu and He, Zhe", title="Prediction of Outcomes After Heart Transplantation in Pediatric Patients Using National Registry Data: Evaluation of Machine Learning Approaches", journal="JMIR Cardio", year="2023", month="Jun", day="20", volume="7", pages="e45352", keywords="explainable artificial intelligence", keywords="machine learning", keywords="mortality", keywords="outcome prediction", keywords="organ rejection", keywords="organ transplantation", keywords="pediatrics", keywords="United Network for Organ Sharing", abstract="Background: The prediction of posttransplant health outcomes for pediatric heart transplantation is critical for risk stratification and high-quality posttransplant care. Objective: The purpose of this study was to examine the use of machine learning (ML) models to predict rejection and mortality for pediatric heart transplant recipients. Methods: Various ML models were used to predict rejection and mortality at 1, 3, and 5 years after transplantation in pediatric heart transplant recipients using United Network for Organ Sharing data from 1987 to 2019. The variables used for predicting posttransplant outcomes included donor and recipient as well as medical and social factors. We evaluated 7 ML models---extreme gradient boosting (XGBoost), logistic regression, support vector machine, random forest (RF), stochastic gradient descent, multilayer perceptron, and adaptive boosting (AdaBoost)---as well as a deep learning model with 2 hidden layers with 100 neurons and a rectified linear unit (ReLU) activation function followed by batch normalization for each and a classification head with a softmax activation function. We used 10-fold cross-validation to evaluate model performance. Shapley additive explanations (SHAP) values were calculated to estimate the importance of each variable for prediction. Results: RF and AdaBoost models were the best-performing algorithms for different prediction windows across outcomes. RF outperformed other ML algorithms in predicting 5 of the 6 outcomes (area under the receiver operating characteristic curve [AUROC] 0.664 and 0.706 for 1-year and 3-year rejection, respectively, and AUROC 0.697, 0.758, and 0.763 for 1-year, 3-year, and 5-year mortality, respectively). AdaBoost achieved the best performance for prediction of 5-year rejection (AUROC 0.705). Conclusions: This study demonstrates the comparative utility of ML approaches for modeling posttransplant health outcomes using registry data. ML approaches can identify unique risk factors and their complex relationship with outcomes, thereby identifying patients considered to be at risk and informing the transplant community about the potential of these innovative approaches to improve pediatric care after heart transplantation. Future studies are required to translate the information derived from prediction models to optimize counseling, clinical care, and decision-making within pediatric organ transplant centers. ", doi="10.2196/45352", url="https://cardio.jmir.org/2023/1/e45352", url="http://www.ncbi.nlm.nih.gov/pubmed/37338974" } @Article{info:doi/10.2196/43311, author="Stamer, Tjorven and Steinh{\"a}user, Jost and Fl{\"a}gel, Kristina", title="Artificial Intelligence Supporting the Training of Communication Skills in the Education of Health Care Professions: Scoping Review", journal="J Med Internet Res", year="2023", month="Jun", day="19", volume="25", pages="e43311", keywords="communication", keywords="education", keywords="artificial intelligence", keywords="machine learning", keywords="health care", keywords="skill", keywords="use", keywords="academic", keywords="students", keywords="training", keywords="cost", keywords="cost-effective", keywords="health care professional", abstract="Background: Communication is a crucial element of every health care profession, rendering communication skills training in all health care professions as being of great importance. Technological advances such as artificial intelligence (AI) and particularly machine learning (ML) may support this cause: it may provide students with an opportunity for easily accessible and readily available communication training. Objective: This scoping review aimed to summarize the status quo regarding the use of AI or ML in the acquisition of communication skills in academic health care professions. Methods: We conducted a comprehensive literature search across the PubMed, Scopus, Cochrane Library, Web of Science Core Collection, and CINAHL databases to identify articles that covered the use of AI or ML in communication skills training of undergraduate students pursuing health care profession education. Using an inductive approach, the included studies were organized into distinct categories. The specific characteristics of the studies, methods and techniques used by AI or ML applications, and main outcomes of the studies were evaluated. Furthermore, supporting and hindering factors in the use of AI and ML for communication skills training of health care professionals were outlined. Results: The titles and abstracts of 385 studies were identified, of which 29 (7.5\%) underwent full-text review. Of the 29 studies, based on the inclusion and exclusion criteria, 12 (3.1\%) were included. The studies were organized into 3 distinct categories: studies using AI and ML for text analysis and information extraction, studies using AI and ML and virtual reality, and studies using AI and ML and the simulation of virtual patients, each within the academic training of the communication skills of health care professionals. Within these thematic domains, AI was also used for the provision of feedback. The motivation of the involved agents played a major role in the implementation process. Reported barriers to the use of AI and ML in communication skills training revolved around the lack of authenticity and limited natural flow of language exhibited by the AI- and ML-based virtual patient systems. Furthermore, the use of educational AI- and ML-based systems in communication skills training for health care professionals is currently limited to only a few cases, topics, and clinical domains. Conclusions: The use of AI and ML in communication skills training for health care professionals is clearly a growing and promising field with a potential to render training more cost-effective and less time-consuming. Furthermore, it may serve learners as an individualized and readily available exercise method. However, in most cases, the outlined applications and technical solutions are limited in terms of access, possible scenarios, the natural flow of a conversation, and authenticity. These issues still stand in the way of any widespread implementation ambitions. ", doi="10.2196/43311", url="https://www.jmir.org/2023/1/e43311", url="http://www.ncbi.nlm.nih.gov/pubmed/37335593" } @Article{info:doi/10.2196/46938, author="Hoang, Uy and Williams, Alice and Smylie, Jessica and Aspden, Carole and Button, Elizabeth and Macartney, Jack and Okusi, Cecilia and Byford, Rachel and Ferreira, Filipa and Leston, Meredith and Xie, Xuan Charis and Joy, Mark and Marsden, Gemma and Clark, Tristan and de Lusignan, Simon", title="The Impact of Point-of-Care Testing for Influenza on Antimicrobial Stewardship (PIAMS) in UK Primary Care: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2023", month="Jun", day="16", volume="12", pages="e46938", keywords="medical records systems, computerized", keywords="influenza point-of-care systems", keywords="general practice", keywords="RSV", keywords="implementation", keywords="outcome assessment", keywords="health care", keywords="antimicrobial stewardship", keywords="acute respiratory infection", keywords="antimicrobial", keywords="influenza", keywords="primary care", keywords="respiratory symptom", abstract="Background: Molecular point-of-care testing (POCT) used in primary care can inform whether a patient presenting with an acute respiratory infection has influenza. A confirmed clinical diagnosis, particularly early in the disease, could inform better antimicrobial stewardship. Social distancing and lockdowns during the COVID-19 pandemic have disturbed previous patterns of influenza infections in 2021. However, data from samples taken in the last quarter of 2022 suggest that influenza represents 36\% of sentinel network positive virology, compared with 24\% for respiratory syncytial virus. Problems with integration into the clinical workflow is a known barrier to incorporating technology into routine care. Objective: This study aims to report the impact of POCT for influenza on antimicrobial prescribing in primary care. We will additionally describe severe outcomes of infection (hospitalization and mortality) and how POCT is integrated into primary care workflows. Methods: The impact of POCT for influenza on antimicrobial stewardship (PIAMS) in UK primary care is an observational study being conducted between December 2022 and May 2023 and involving 10 practices that contribute data to the English sentinel network. Up to 1000 people who present to participating practices with respiratory symptoms will be swabbed and tested with a rapid molecular POCT analyzer in the practice. Antimicrobial prescribing and other study outcomes will be collected by linking information from the POCT analyzer with data from the patient's computerized medical record. We will collect data on how POCT is incorporated into practice using data flow diagrams, unified modeling language use case diagrams, and Business Process Modeling Notation. Results: We will present the crude and adjusted odds of antimicrobial prescribing (all antibiotics and antivirals) given a POCT diagnosis of influenza, stratifying by whether individuals have a respiratory or other relevant diagnosis (eg, bronchiectasis). We will also present the rates of hospital referrals and deaths related to influenza infection in PIAMS study practices compared with a set of matched practices in the sentinel network and the rest of the network. We will describe any difference in implementation models in terms of staff involved and workflow. Conclusions: This study will generate data on the impact of POCT testing for influenza in primary care as well as help to inform about the feasibility of incorporating POCT into primary care workflows. It will inform the design of future larger studies about the effectiveness and cost-effectiveness of POCT to improve antimicrobial stewardship and any impact on severe outcomes. International Registered Report Identifier (IRRID): DERR1-10.2196/46938 ", doi="10.2196/46938", url="https://www.researchprotocols.org/2023/1/e46938", url="http://www.ncbi.nlm.nih.gov/pubmed/37327029" } @Article{info:doi/10.2196/44042, author="Sigle, Manuel and Berliner, Leon and Richter, Erich and van Iersel, Mart and Gorgati, Eleonora and Hubloue, Ives and Bamberg, Maximilian and Grasshoff, Christian and Rosenberger, Peter and Wunderlich, Robert", title="Development of an Anticipatory Triage-Ranking Algorithm Using Dynamic Simulation of the Expected Time Course of Patients With Trauma: Modeling and Simulation Study", journal="J Med Internet Res", year="2023", month="Jun", day="15", volume="25", pages="e44042", keywords="novel triage algorithm", keywords="patient with trauma", keywords="dynamic patient simulation", keywords="mathematic model", keywords="artificial patient database", keywords="semisupervised generation of patients with artificial trauma", keywords="high-dimensional analysis of patient database", keywords="Germany", keywords="algorithm", keywords="trauma", keywords="proof-of-concept", keywords="model", keywords="emergency", keywords="triage", keywords="simulation", keywords="urgency", keywords="urgent", keywords="severity", keywords="rank", keywords="vital sign", abstract="Background: In cases of terrorism, disasters, or mass casualty incidents, far-reaching life-and-death decisions about prioritizing patients are currently made using triage algorithms that focus solely on the patient's current health status rather than their prognosis, thus leaving a fatal gap of patients who are under- or overtriaged. Objective: The aim of this proof-of-concept study is to demonstrate a novel approach for triage that no longer classifies patients into triage categories but ranks their urgency according to the anticipated survival time without intervention. Using this approach, we aim to improve the prioritization of casualties by respecting individual injury patterns and vital signs, survival likelihoods, and the availability of rescue resources. Methods: We designed a mathematical model that allows dynamic simulation of the time course of a patient's vital parameters, depending on individual baseline vital signs and injury severity. The 2 variables were integrated using the well-established Revised Trauma Score (RTS) and the New Injury Severity Score (NISS). An artificial patient database of unique patients with trauma (N=82,277) was then generated and used for analysis of the time course modeling and triage classification. Comparative performance analysis of different triage algorithms was performed. In addition, we applied a sophisticated, state-of-the-art clustering method using the Gower distance to visualize patient cohorts at risk for mistriage. Results: The proposed triage algorithm realistically modeled the time course of a patient's life, depending on injury severity and current vital parameters. Different casualties were ranked by their anticipated time course, reflecting their priority for treatment. Regarding the identification of patients at risk for mistriage, the model outperformed the Simple Triage And Rapid Treatment's triage algorithm but also exclusive stratification by the RTS or the NISS. Multidimensional analysis separated patients with similar patterns of injuries and vital parameters into clusters with different triage classifications. In this large-scale analysis, our algorithm confirmed the previously mentioned conclusions during simulation and descriptive analysis and underlined the significance of this novel approach to triage. Conclusions: The findings of this study suggest the feasibility and relevance of our model, which is unique in terms of its ranking system, prognosis outline, and time course anticipation. The proposed triage-ranking algorithm could offer an innovative triage method with a wide range of applications in prehospital, disaster, and emergency medicine, as well as simulation and research. ", doi="10.2196/44042", url="https://www.jmir.org/2023/1/e44042", url="http://www.ncbi.nlm.nih.gov/pubmed/37318826" } @Article{info:doi/10.2196/47672, author="Herranz, Carmen and Mart{\'i}n-Moreno Banegas, Laura and Dana Muzzio, Fernando and Siso-Almirall, Antoni and Roca, Josep and Cano, Isaac", title="A Practice-Proven Adaptive Case Management Approach for Innovative Health Care Services (Health Circuit): Cluster Randomized Clinical Pilot and Descriptive Observational Study", journal="J Med Internet Res", year="2023", month="Jun", day="14", volume="25", pages="e47672", keywords="continuum of care management", keywords="innovative healthcare services", keywords="collaborative tools", keywords="digital health transformation", keywords="usability", keywords="acceptability", keywords="health care service", keywords="Health Circuit", keywords="health management", keywords="management", keywords="support", keywords="digital aid", keywords="aid", keywords="care", keywords="prototype", keywords="surgery", keywords="testing", abstract="Background: Digital health tools may facilitate the continuity of care. Enhancement of digital aid is imperative to prevent information gaps or redundancies, as well as to facilitate support of flexible care plans. Objective: The study presents Health Circuit, an adaptive case management approach that empowers health care professionals and patients to implement personalized evidence-based interventions, thanks to dynamic communication channels and patient-centered service workflows; analyze the health care impact; and determine its usability and acceptability among health care professionals and patients. Methods: From September 2019 to March 2020, the health impact, usability (measured with the system usability scale; SUS), and acceptability (measured with the net promoter score; NPS) of an initial prototype of Health Circuit were tested in a cluster randomized clinical pilot (n=100) in patients with high risk for hospitalization (study 1). From July 2020 to July 2021, a premarket pilot study of usability (with the SUS) and acceptability (with the NPS) was conducted among 104 high-risk patients undergoing prehabilitation before major surgery (study 2). Results: In study 1, Health Circuit resulted in a reduction of emergency room visits (4/7, 13\% vs 7/16, 44\%), enhanced patients' empowerment (P<.001) and showed good acceptability and usability scores (NPS: 31; SUS: 54/100). In study 2, the NPS was 40 and the SUS was 85/100. The acceptance rate was also high (mean score of 8.4/10). Conclusions: Health Circuit showed potential for health care value generation and good acceptability and usability despite being a prototype system, prompting the need for testing a completed system in real-world scenarios. Trial Registration: ClinicalTrials.gov NCT04056663; https://clinicaltrials.gov/ct2/show/NCT04056663 ", doi="10.2196/47672", url="https://www.jmir.org/2023/1/e47672", url="http://www.ncbi.nlm.nih.gov/pubmed/37314850" } @Article{info:doi/10.2196/43838, author="C Manikis, Georgios and Simos, J. Nicholas and Kourou, Konstantina and Kondylakis, Haridimos and Poikonen-Saksela, Paula and Mazzocco, Ketti and Pat-Horenczyk, Ruth and Sousa, Berta and Oliveira-Maia, J. Albino and Mattson, Johanna and Roziner, Ilan and Marzorati, Chiara and Marias, Kostas and Nuutinen, Mikko and Karademas, Evangelos and Fotiadis, Dimitrios", title="Personalized Risk Analysis to Improve the Psychological Resilience of Women Undergoing Treatment for Breast Cancer: Development of a Machine Learning--Driven Clinical Decision Support Tool", journal="J Med Internet Res", year="2023", month="Jun", day="12", volume="25", pages="e43838", keywords="breast cancer", keywords="classification", keywords="machine learning", keywords="mental health", keywords="well-being", keywords="explainability", keywords="interventions", keywords="risk assessment", abstract="Background: Health professionals are often faced with the need to identify women at risk of manifesting poor psychological resilience following the diagnosis and treatment of breast cancer. Machine learning algorithms are increasingly used to support clinical decision support (CDS) tools in helping health professionals identify women who are at risk of adverse well-being outcomes and plan customized psychological interventions for women at risk. Clinical flexibility, cross-validated performance accuracy, and model explainability permitting person-specific identification of risk factors are highly desirable features of such tools. Objective: This study aimed to develop and cross-validate machine learning models designed to identify breast cancer survivors at risk of poor overall mental health and global quality of life and identify potential targets of personalized psychological interventions according to an extensive set of clinical recommendations. Methods: A set of 12 alternative models was developed to improve the clinical flexibility of the CDS tool. All models were validated using longitudinal data from a prospective, multicenter clinical pilot at 5 major oncology centers in 4 countries (Italy, Finland, Israel, and Portugal; the Predicting Effective Adaptation to Breast Cancer to Help Women to BOUNCE Back [BOUNCE] project). A total of 706 patients with highly treatable breast cancer were enrolled shortly after diagnosis and before the onset of oncological treatments and were followed up for 18 months. An extensive set of demographic, lifestyle, clinical, psychological, and biological variables measured within 3 months after enrollment served as predictors. Rigorous feature selection isolated key psychological resilience outcomes that could be incorporated into future clinical practice. Results: Balanced random forest classifiers were successful at predicting well-being outcomes, with accuracies ranging between 78\% and 82\% (for 12-month end points after diagnosis) and between 74\% and 83\% (for 18-month end points after diagnosis). Explainability and interpretability analyses built on the best-performing models were used to identify potentially modifiable psychological and lifestyle characteristics that, if addressed systematically in the context of personalized psychological interventions, would be most likely to promote resilience for a given patient. Conclusions: Our results highlight the clinical utility of the BOUNCE modeling approach by focusing on resilience predictors that can be readily available to practicing clinicians at major oncology centers. The BOUNCE CDS tool paves the way for personalized risk assessment methods to identify patients at high risk of adverse well-being outcomes and direct valuable resources toward those most in need of specialized psychological interventions. ", doi="10.2196/43838", url="https://www.jmir.org/2023/1/e43838", url="http://www.ncbi.nlm.nih.gov/pubmed/37307043" } @Article{info:doi/10.2196/43896, author="Kloka, Andreas Jan and Holtmann, C. Sophie and N{\"u}renberg-Goloub, Elina and Piekarski, Florian and Zacharowski, Kai and Friedrichson, Benjamin", title="Expectations of Anesthesiology and Intensive Care Professionals Toward Artificial Intelligence: Observational Study", journal="JMIR Form Res", year="2023", month="Jun", day="12", volume="7", pages="e43896", keywords="anesthesiology", keywords="artificial intelligence", keywords="health care", keywords="intensive care", keywords="medical informatics", keywords="technology acceptance", keywords="Europe-wide survey", abstract="Background: Artificial intelligence (AI) applications offer numerous opportunities to improve health care. To be used in the intensive care unit, AI must meet the needs of staff, and potential barriers must be addressed through joint action by all stakeholders. It is thus critical to assess the needs and concerns of anesthesiologists and intensive care physicians related to AI in health care throughout Europe. Objective: This Europe-wide, cross-sectional observational study investigates how potential users of AI systems in anesthesiology and intensive care assess the opportunities and risks of the new technology. The web-based questionnaire was based on the established analytic model of acceptance of innovations by Rogers to record 5 stages of innovation acceptance. Methods: The questionnaire was sent twice in 2 months (March 11, 2021, and November 5, 2021) through the European Society of Anaesthesiology and Intensive Care (ESAIC) member email distribution list. A total of 9294 ESAIC members were reached, of whom 728 filled out the questionnaire (response rate 728/9294, 8\%). Due to missing data, 27 questionnaires were excluded. The analyses were conducted with 701 participants. Results: A total of 701 questionnaires (female: n=299, 42\%) were analyzed. Overall, 265 (37.8\%) of the participants have been in contact with AI and evaluated the benefits of this technology higher (mean 3.22, SD 0.39) than participants who stated no previous contact (mean 3.01, SD 0.48). Physicians see the most benefits of AI application in early warning systems (335/701, 48\% strongly agreed, and 358/701, 51\% agreed). Major potential disadvantages were technical problems (236/701, 34\% strongly agreed, and 410/701, 58\% agreed) and handling difficulties (126/701, 18\% strongly agreed, and 462/701, 66\% agreed), both of which could be addressed by Europe-wide digitalization and education. In addition, the lack of a secure legal basis for the research and use of medical AI in the European Union leads doctors to expect problems with legal liability (186/701, 27\% strongly agreed, and 374/701, 53\% agreed) and data protection (148/701, 21\% strongly agreed, and 343/701, 49\% agreed). Conclusions: Anesthesiologists and intensive care personnel are open to AI applications in their professional field and expect numerous benefits for staff and patients. Regional differences in the digitalization of the private sector are not reflected in the acceptance of AI among health care professionals. Physicians anticipate technical difficulties and lack a stable legal basis for the use of AI. Training for medical staff could increase the benefits of AI in professional medicine. Therefore, we suggest that the development and implementation of AI in health care require a solid technical, legal, and ethical basis, as well as adequate education and training of users. ", doi="10.2196/43896", url="https://formative.jmir.org/2023/1/e43896", url="http://www.ncbi.nlm.nih.gov/pubmed/37307038" } @Article{info:doi/10.2196/43665, author="Guven, Emine", title="Decision of the Optimal Rank of a Nonnegative Matrix Factorization Model for Gene Expression Data Sets Utilizing the Unit Invariant Knee Method: Development and Evaluation of the Elbow Method for Rank Selection", journal="JMIR Bioinform Biotech", year="2023", month="Jun", day="6", volume="4", pages="e43665", keywords="gene expression data", keywords="nonnegative matrix factorization", keywords="rank factorization", keywords="optimal rank", keywords="unit invariant knee method", keywords="elbow method", keywords="consensus matrix", abstract="Background: There is a great need to develop a computational approach to analyze and exploit the information contained in gene expression data. The recent utilization of nonnegative matrix factorization (NMF) in computational biology has demonstrated the capability to derive essential details from a high amount of data in particular gene expression microarrays. A common problem in NMF is finding the proper number rank (r) of factors of the degraded demonstration, but no agreement exists on which technique is most appropriate to utilize for this purpose. Thus, various techniques have been suggested to select the optimal value of rank factorization (r). Objective: In this work, a new metric for rank selection is proposed based on the elbow method, which was methodically compared against the cophenetic metric. Methods: To decide the optimum number rank (r), this study focused on the unit invariant knee (UIK) method of the NMF on gene expression data sets. Since the UIK method requires an extremum distance estimator that is eventually employed for inflection and identification of a knee point, the proposed method finds the first inflection point of the curvature of the residual sum of squares of the proposed algorithms using the UIK method on gene expression data sets as a target matrix. Results: Computation was conducted for the UIK task using gene expression data of acute lymphoblastic leukemia and acute myeloid leukemia samples. Consequently, the distinct results of NMF were subjected to comparison on different algorithms. The proposed UIK method is easy to perform, fast, free of a priori rank value input, and does not require initial parameters that significantly influence the model's functionality. Conclusions: This study demonstrates that the elbow method provides a credible prediction for both gene expression data and for precisely estimating simulated mutational processes data with known dimensions. The proposed UIK method is faster than conventional methods, including metrics utilizing the consensus matrix as a criterion for rank selection, while achieving significantly better computational efficiency without visual inspection on the curvatives. Finally, the suggested rank tuning method based on the elbow method for gene expression data is arguably theoretically superior to the cophenetic measure. ", doi="10.2196/43665", url="https://bioinform.jmir.org/2023/1/e43665" } @Article{info:doi/10.2196/43551, author="Stringer, Eleah and Lum, J. Julian and Livergant, Jonathan and Kushniruk, W. Andre", title="Decision Aids for Patients With Head and Neck Cancer: Qualitative Elicitation of Design Recommendations From Patient End Users", journal="JMIR Hum Factors", year="2023", month="Jun", day="5", volume="10", pages="e43551", keywords="decision support", keywords="decision aid", keywords="app design", keywords="oncology", keywords="head and neck cancer", keywords="patient information needs", keywords="qualitative", abstract="Background: Patients with head and neck cancer (HNC) carry a clinically significant symptom burden, have alterations in function (eg, impaired ability to chew, swallow, and talk), and decrease in quality of life. Furthermore, treatment impacts social activities and interactions as patients report reduced sexuality and shoulder the highest rates of depression across cancer types. Patients suffer undue anxiety because they find the treatment incomprehensible, which is partially a function of limited, understandable information. Patients' perceptions of having obtained adequate information prior to and during treatment are predictive of positive outcomes. Providing patient-centered decision support and utilizing visual images may increase understanding of treatment options and associated risks to improve satisfaction with their decision and consultation, while reducing decisional conflict. Objective: This study aims to gather requirements from survivors of HNC on the utility of key visual components to be used in the design of an electronic decision aid (eDA) to assist with decision-making on treatment options. Methods: Informed by a scoping review on eDAs for patients with HNC, screens and visualizations for an eDA were created and then presented to 12 survivors of HNC for feedback on their utility, features, and further requirements. The semistructured interviews were video-recorded and thematically analyzed to inform co-design recommendations. Results: A total of 9 themes were organized into 2 categories. The first category, eDAs and decision support, included 3 themes: familiarity with DAs, support of concept, and versatility of the prototype. The second category, evaluation of mock-up, contained 6 themes: reaction to the screens and visualizations, favorite features, complexity, preference for customizability, presentation device, and suggestions for improvement. Conclusions: All participants felt an eDA, used in the presence of their oncologist, would support a more thorough and transparent explanation of treatment or augment the quality of education received. Participants liked the simple design of the mock-ups they were shown but, ultimately, desired customizability to adapt the eDA to their individual information needs. This research highlights the value of user-centered design, rooted in acceptability and utility, in medical health informatics, recognizing cancer survivors as the ultimate knowledge holders. This research highlights the value of incorporating visuals into technology-based innovations to engage all patients in treatment decisions. ", doi="10.2196/43551", url="https://humanfactors.jmir.org/2023/1/e43551", url="http://www.ncbi.nlm.nih.gov/pubmed/37276012" } @Article{info:doi/10.2196/44081, author="Ren, Yang and Wu, Dezhi and Tong, Yan and L{\'o}pez-DeFede, Ana and Gareau, Sarah", title="Issue of Data Imbalance on Low Birthweight Baby Outcomes Prediction and Associated Risk Factors Identification: Establishment of Benchmarking Key Machine Learning Models With Data Rebalancing Strategies", journal="J Med Internet Res", year="2023", month="May", day="31", volume="25", pages="e44081", keywords="low birthweight", keywords="machine learning", keywords="risk factor", keywords="benchmark", keywords="data rebalance", abstract="Background: Low birthweight (LBW) is a leading cause of neonatal mortality in the United States and a major causative factor of adverse health effects in newborns. Identifying high-risk patients early in prenatal care is crucial to preventing adverse outcomes. Previous studies have proposed various machine learning (ML) models for LBW prediction task, but they were limited by small and imbalanced data sets. Some authors attempted to address this through different data rebalancing methods. However, most of their reported performances did not reflect the models' actual performance in real-life scenarios. To date, few studies have successfully benchmarked the performance of ML models in maternal health; thus, it is critical to establish benchmarks to advance ML use to subsequently improve birth outcomes. Objective: This study aimed to establish several key benchmarking ML models to predict LBW and systematically apply different rebalancing optimization methods to a large-scale and extremely imbalanced all-payer hospital record data set that connects mother and baby data at a state level in the United States. We also performed feature importance analysis to identify the most contributing features in the LBW classification task, which can aid in targeted intervention. Methods: Our large data set consisted of 266,687 birth records across 6 years, and 8.63\% (n=23,019) of records were labeled as LBW. To set up benchmarking ML models to predict LBW, we applied 7 classic ML models (ie, logistic regression, naive Bayes, random forest, extreme gradient boosting, adaptive boosting, multilayer perceptron, and sequential artificial neural network) while using 4 different data rebalancing methods: random undersampling, random oversampling, synthetic minority oversampling technique, and weight rebalancing. Owing to ethical considerations, in addition to ML evaluation metrics, we primarily used recall to evaluate model performance, indicating the number of correctly predicted LBW cases out of all actual LBW cases, as false negative health care outcomes could be fatal. We further analyzed feature importance to explore the degree to which each feature contributed to ML model prediction among our best-performing models. Results: We found that extreme gradient boosting achieved the highest recall score---0.70---using the weight rebalancing method. Our results showed that various data rebalancing methods improved the prediction performance of the LBW group substantially. From the feature importance analysis, maternal race, age, payment source, sum of predelivery emergency department and inpatient hospitalizations, predelivery disease profile, and different social vulnerability index components were important risk factors associated with LBW. Conclusions: Our findings establish useful ML benchmarks to improve birth outcomes in the maternal health domain. They are informative to identify the minority class (ie, LBW) based on an extremely imbalanced data set, which may guide the development of personalized LBW early prevention, clinical interventions, and statewide maternal and infant health policy changes. ", doi="10.2196/44081", url="https://www.jmir.org/2023/1/e44081", url="http://www.ncbi.nlm.nih.gov/pubmed/37256674" } @Article{info:doi/10.2196/40402, author="Waugh, Lim Mihyun and Boltin, Nicholas and Wolf, Lauren and Goodwin, Jane and Parker, Patti and Horner, Ronnie and Hermes, Matthew and Wheeler, Thomas and Goodwin, Richard and Moss, Melissa", title="Prediction of Pelvic Organ Prolapse Postsurgical Outcome Using Biomaterial-Induced Blood Cytokine Levels: Machine Learning Approach", journal="JMIR Perioper Med", year="2023", month="May", day="31", volume="6", pages="e40402", keywords="pelvic organ prolapse", keywords="polypropylene mesh", keywords="inflammatory response", keywords="cytokines", keywords="principal component analysis", keywords="supervised machine learning models", keywords="surgical outcome prediction", keywords="biomaterial", keywords="repair surgery", abstract="Background: Pelvic organ prolapse (POP) refers to symptomatic descent of the vaginal wall. To reduce surgical failure rates, surgical correction can be augmented with the insertion of polypropylene mesh. This benefit is offset by the risk of mesh complication, predominantly mesh exposure through the vaginal wall. If mesh placement is under consideration as part of prolapse repair, patient selection and counseling would benefit from the prediction of mesh exposure; yet, no such reliable preoperative method currently exists. Past studies indicate that inflammation and associated cytokine release is correlated with mesh complication. While some degree of mesh-induced cytokine response accompanies implantation, excessive or persistent cytokine responses may elicit inflammation and implant rejection. Objective: Here, we explore the levels of biomaterial-induced blood cytokines from patients who have undergone POP repair surgery to (1) identify correlations among cytokine expression and (2) predict postsurgical mesh exposure through the vaginal wall. Methods: Blood samples from 20 female patients who previously underwent surgical intervention with transvaginal placement of polypropylene mesh to correct POP were collected for the study. These included 10 who experienced postsurgical mesh exposure through the vaginal wall and 10 who did not. Blood samples incubated with inflammatory agent lipopolysaccharide, with sterile polypropylene mesh, or alone were analyzed for plasma levels of 13 proinflammatory and anti-inflammatory cytokines using multiplex assay. Data were analyzed by principal component analysis (PCA) to uncover associations among cytokines and identify cytokine patterns that correlate with postsurgical mesh exposure through the vaginal wall. Supervised machine learning models were created to predict the presence or absence of mesh exposure and probe the number of cytokine measurements required for effective predictions. Results: PCA revealed that proinflammatory cytokines interferon gamma, interleukin 12p70, and interleukin 2 are the largest contributors to the variance explained in PC 1, while anti-inflammatory cytokines interleukins 10, 4, and 6 are the largest contributors to the variance explained in PC 2. Additionally, PCA distinguished cytokine correlations that implicate prospective therapies to improve postsurgical outcomes. Among machine learning models trained with all 13 cytokines, the artificial neural network, the highest performing model, predicted POP surgical outcomes with 83\% (15/18) accuracy; the same model predicted POP surgical outcomes with 78\% (14/18) accuracy when trained with just 7 cytokines, demonstrating retention of predictive capability using a smaller cytokine group. Conclusions: This preliminary study, incorporating a sample size of just 20 participants, identified correlations among cytokines and demonstrated the potential of this novel approach to predict mesh exposure through the vaginal wall following transvaginal POP repair surgery. Further study with a larger sample size will be pursued to confirm these results. If corroborated, this method could provide a personalized medicine approach to assist surgeons in their recommendation of POP repair surgeries with minimal potential for adverse outcomes. ", doi="10.2196/40402", url="https://periop.jmir.org/2023/1/e40402", url="http://www.ncbi.nlm.nih.gov/pubmed/37256676" } @Article{info:doi/10.2196/34453, author="Ye, Jiancheng", title="Patient Safety of Perioperative Medication Through the Lens of Digital Health and Artificial Intelligence", journal="JMIR Perioper Med", year="2023", month="May", day="31", volume="6", pages="e34453", keywords="perioperative medicine", keywords="patient safety", keywords="anesthesiology", keywords="human factors", keywords="medication errors", keywords="digital health", keywords="health information technology", doi="10.2196/34453", url="https://periop.jmir.org/2023/1/e34453", url="http://www.ncbi.nlm.nih.gov/pubmed/37256663" } @Article{info:doi/10.2196/41725, author="Rajkumar, Ethan and Nguyen, Kevin and Radic, Sandra and Paa, Jubelle and Geng, Qiyang", title="Machine Learning and Causal Approaches to Predict Readmissions and Its Economic Consequences Among Canadian Patients With Heart Disease: Retrospective Study", journal="JMIR Form Res", year="2023", month="May", day="26", volume="7", pages="e41725", keywords="patient readmission", keywords="health care economics", keywords="ensemble", keywords="prediction model", keywords="classification", keywords="linear regression resource intensity value", keywords="hospital", keywords="health care", keywords="principal component analysis", keywords="PCA", abstract="Background: Unplanned patient readmissions within 30 days of discharge pose a substantial challenge in Canadian health care economics. To address this issue, risk stratification, machine learning, and linear regression paradigms have been proposed as potential predictive solutions. Ensemble machine learning methods, such as stacked ensemble models with boosted tree algorithms, have shown promise for early risk identification in specific patient groups. Objective: This study aims to implement an ensemble model with submodels for structured data, compare metrics, evaluate the impact of optimized data manipulation with principal component analysis on shorter readmissions, and quantitatively verify the causal relationship between expected length of stay (ELOS) and resource intensity weight (RIW) value for a comprehensive economic perspective. Methods: This retrospective study used Python 3.9 and streamlined libraries to analyze data obtained from the Discharge Abstract Database covering 2016 to 2021. The study used 2 sub--data sets, clinical and geographical data sets, to predict patient readmission and analyze its economic implications, respectively. A stacking classifier ensemble model was used after principal component analysis to predict patient readmission. Linear regression was performed to determine the relationship between RIW and ELOS. Results: The ensemble model achieved precision and slightly higher recall (0.49 and 0.68), indicating a higher instance of false positives. The model was able to predict cases better than other models in the literature. Per the ensemble model, readmitted women and men aged 40 to 44 and 35 to 39 years, respectively, were more likely to use resources. The regression tables verified the causality of the model and confirmed the trend that patient readmission is much more costly than continued hospital stay without discharge for both the patient and health care system. Conclusions: This study validates the use of hybrid ensemble models for predicting economic cost models in health care with the goal of reducing the bureaucratic and utility costs associated with hospital readmissions. The availability of robust and efficient predictive models, as demonstrated in this study, can help hospitals focus more on patient care while maintaining low economic costs. This study predicts the relationship between ELOS and RIW, which can indirectly impact patient outcomes by reducing administrative tasks and physicians' burden, thereby reducing the cost burdens placed on patients. It is recommended that changes to the general ensemble model and linear regressions be made to analyze new numerical data for predicting hospital costs. Ultimately, the proposed work hopes to emphasize the advantages of implementing hybrid ensemble models in forecasting health care economic cost models, empowering hospitals to prioritize patient care while simultaneously decreasing administrative and bureaucratic expenses. ", doi="10.2196/41725", url="https://formative.jmir.org/2023/1/e41725", url="http://www.ncbi.nlm.nih.gov/pubmed/37234042" } @Article{info:doi/10.2196/45662, author="Hou, Jue and Zhao, Rachel and Gronsbell, Jessica and Lin, Yucong and Bonzel, Clara-Lea and Zeng, Qingyi and Zhang, Sinian and Beaulieu-Jones, K. Brett and Weber, M. Griffin and Jemielita, Thomas and Wan, Sabrina Shuyan and Hong, Chuan and Cai, Tianrun and Wen, Jun and Ayakulangara Panickan, Vidul and Liaw, Kai-Li and Liao, Katherine and Cai, Tianxi", title="Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies", journal="J Med Internet Res", year="2023", month="May", day="25", volume="25", pages="e45662", keywords="electronic health records", keywords="real-world evidence", keywords="data curation", keywords="medical informatics", keywords="randomized controlled trials", keywords="reproducibility", doi="10.2196/45662", url="https://www.jmir.org/2023/1/e45662", url="http://www.ncbi.nlm.nih.gov/pubmed/37227772" } @Article{info:doi/10.2196/40031, author="Chenais, Gabrielle and Lagarde, Emmanuel and Gil-Jardin{\'e}, C{\'e}dric", title="Artificial Intelligence in Emergency Medicine: Viewpoint of Current Applications and Foreseeable Opportunities and Challenges", journal="J Med Internet Res", year="2023", month="May", day="23", volume="25", pages="e40031", keywords="viewpoint", keywords="ethics", keywords="artificial intelligence", keywords="emergency medicine", keywords="perspectives", keywords="mobile phone", doi="10.2196/40031", url="https://www.jmir.org/2023/1/e40031", url="http://www.ncbi.nlm.nih.gov/pubmed/36972306" } @Article{info:doi/10.2196/48109, author="May, P. Heather and Griffin, M. Joan and Herges, R. Joseph and Kashani, B. Kianoush and Kattah, G. Andrea and Mara, C. Kristin and McCoy, G. Rozalina and Rule, D. Andrew and Tinaglia, G. Angeliki and Barreto, F. Erin", title="Comprehensive Acute Kidney Injury Survivor Care: Protocol for the Randomized Acute Kidney Injury in Care Transitions Pilot Trial", journal="JMIR Res Protoc", year="2023", month="May", day="22", volume="12", pages="e48109", keywords="acute kidney injury", keywords="acute renal failure", keywords="care transitions", keywords="chronic kidney disease", keywords="nephrologists", keywords="randomized controlled trials", abstract="Background: Innovative care models are needed to address gaps in kidney care follow-up among acute kidney injury (AKI) survivors. We developed the multidisciplinary AKI in Care Transitions (ACT) program, which embeds post-AKI care in patients' primary care clinic. Objective: The objective of this randomized pilot trial is to test the feasibility and acceptability of the ACT program and study protocol, including recruitment and retention, procedures, and outcome measures. Methods: The study will be conducted at Mayo Clinic in Rochester, Minnesota, a tertiary care center with a local primary care practice. Individuals who are included have stage 3 AKI during their hospitalization, do not require dialysis at discharge, have a local primary care provider, and are discharged to their home. Patients unable or unwilling to provide informed consent and recipients of any transplant within 100 days of enrollment are excluded. Consented patients are randomized to receive the intervention (ie, ACT program) or usual care. The ACT program intervention includes predischarge kidney health education from nurses and coordinated postdischarge laboratory monitoring (serum creatinine and urine protein assessment) and follow-up with a primary care provider and pharmacist within 14 days. The usual care group receives no specific study-related intervention, and any aspects of AKI care are at the direction of the treating team. This study will examine the feasibility of the ACT program, including recruitment, randomization and retention in a trial setting, and intervention fidelity. The feasibility and acceptability of participating in the ACT program will also be examined in qualitative interviews with patients and staff and through surveys. Qualitative interviews will be deductively and inductively coded and themes compared across data types. Observations of clinical encounters will be examined for discussion and care plans related to kidney health. Descriptive analyses will summarize quantitative measures of the feasibility and acceptability of ACT. Participants' knowledge about kidney health, quality of life, and process outcomes (eg, type and timing of laboratory assessments) will be described for both groups. Clinical outcomes (eg, unplanned rehospitalization) up to 12 months will be compared with Cox proportional hazards models. Results: This study received funding from the Agency for Health Care Research and Quality on April 21, 2021, and was approved by the Institutional Review Board on December 14, 2021. As of March 14, 2023, seventeen participants each have been enrolled in the intervention and usual care groups. Conclusions: Feasible and generalizable AKI survivor care delivery models are needed to improve care processes and health outcomes. This pilot trial will test the ACT program, which uses a multidisciplinary model focused on primary care to address this gap. Trial Registration: ClinicalTrials.gov NCT05184894; https://www.clinicaltrials.gov/ct2/show/NCT05184894 International Registered Report Identifier (IRRID): DERR1-10.2196/48109 ", doi="10.2196/48109", url="https://www.researchprotocols.org/2023/1/e48109", url="http://www.ncbi.nlm.nih.gov/pubmed/37213187" } @Article{info:doi/10.2196/39072, author="Hodroj, Khalil and Pellegrin, David and Menard, Cindy and Bachelot, Thomas and Durand, Thierry and Toussaint, Philippe and Dufresne, Armelle and Mery, Benoite and Tredan, Olivier and Goulvent, Thibaut and Heudel, Pierre", title="A Digital Solution for an Advanced Breast Tumor Board: Pilot Application Cocreation and Implementation Study", journal="JMIR Cancer", year="2023", month="May", day="18", volume="9", pages="e39072", keywords="digital health", keywords="multidisciplinary meeting", keywords="advanced breast cancer", keywords="cancer", keywords="breast cancer", keywords="tumor", keywords="clinician", keywords="confidence", keywords="treatment", keywords="pathology", keywords="genomic", keywords="care", keywords="patient", keywords="software", keywords="data", keywords="neoplastic", keywords="pain", keywords="follow-up", keywords="electronic medical records", keywords="records", abstract="Background: Cancer treatment is constantly evolving toward a more personalized approach based on clinical features, imaging, and genomic pathology information. To ensure the best care for patients, multidisciplinary teams (MDTs) meet regularly to review cases. Notwithstanding, the conduction of MDT meetings is challenged by medical time restrictions, the unavailability of critical MDT members, and the additional administrative work required. These issues may result in members missing information during MDT meetings and postponed treatment. To explore and facilitate improved approaches for MDT meetings in France, using advanced breast cancers (ABCs) as a model, Centre L{\'e}on B{\'e}rard (CLB) and ROCHE Diagnostics cocreated an MDT application prototype based on structured data. Objective: In this paper, we want to describe how an application prototype was implemented for ABC MDT meetings at CLB to support clinical decisions. Methods: Prior to the initiation of cocreation activities, an organizational audit of ABC MDT meetings identified the following four key phases for the MDT: the instigation, preparation, execution, and follow-up phases. For each phase, challenges and opportunities were identified that informed the new cocreation activities. The MDT application prototype became software that integrated structured data from medical files for the visualization of the neoplastic history of a patient. The digital solution was assessed via a before-and-after audit and a survey questionnaire that was administered to health care professionals involved in the MDT. Results: The ABC MDT meeting audit was carried out during 3 MDT meetings, including 70 discussions of clinical cases before and 58 such discussions after the implementation of the MDT application prototype. We identified 33 pain points related to the preparation, execution, and follow-up phases. No issues were identified related to the instigation phase. Difficulties were grouped as follows: process challenges (n=18), technological limitations (n=9), and the lack of available resources (n=6). The preparation of MDT meetings was the phase in which the most issues (n=16) were seen. A repeat audit, which was undertaken after the implementation of the MDT application, demonstrated that (1) the discussion times per case remained comparable (2 min and 22 s vs 2 min and 14 s), (2) the capture of MDT decisions improved (all cases included a therapeutic proposal), (3) there was no postponement of treatment decisions, and (4) the mean confidence of medical oncologists in decision-making increased. Conclusions: The introduction of the MDT application prototype at CLB to support the ABC MDT seemed to improve the quality of and confidence in clinical decisions. The integration of an MDT application with the local electronic medical record and the utilization of structured data conforming to international terminologies could enable a national network of MDTs to support sustained improvements to patient care. ", doi="10.2196/39072", url="https://cancer.jmir.org/2023/1/e39072", url="http://www.ncbi.nlm.nih.gov/pubmed/37200077" } @Article{info:doi/10.2196/43518, author="Strickland, Caroline and Chi, Nancy and Ditz, Laura and Gomez, Luisa and Wagner, Brittin and Wang, Stanley and Lizotte, J. Daniel", title="Factors Influencing Admission Decisions in Skilled Nursing Facilities: Retrospective Quantitative Study", journal="J Med Internet Res", year="2023", month="May", day="17", volume="25", pages="e43518", keywords="decision-making", keywords="skilled nursing facility", keywords="patient admission", keywords="decision", keywords="nursing", keywords="clinical", keywords="database", keywords="health informatics", keywords="diagnosis", keywords="modeling", keywords="connection", keywords="patient", abstract="Background: Occupancy rates within skilled nursing facilities (SNFs) in the United States have reached a record low. Understanding drivers of occupancy, including admission decisions, is critical for assessing the recovery of the long-term care sector as a whole. We provide the first comprehensive analysis of financial, clinical, and operational factors that impact whether a patient referral to an SNF is accepted or denied, using a large health informatics database. Objective: Our key objectives were to describe the distribution of referrals sent to SNFs in terms of key referral- and facility-level features; analyze key financial, clinical, and operational variables and their relationship to admission decisions; and identify the key potential reasons behind referral decisions in the context of learning health systems. Methods: We extracted and cleaned referral data from 627 SNFs from January 2020 to March 2022, including information on SNF daily operations (occupancy and nursing hours), referral-level factors (insurance type and primary diagnosis), and facility-level factors (overall 5-star rating and urban versus rural status). We computed descriptive statistics and applied regression modeling to identify and describe the relationships between these factors and referral decisions, considering them individually and controlling for other factors to understand their impact on the decision-making process. Results: When analyzing daily operation values, no significant relationship between SNF occupancy or nursing hours and referral acceptance was observed (P>.05). By analyzing referral-level factors, we found that the primary diagnosis category and insurance type of the patient were significantly related to referral acceptance (P<.05). Referrals with primary diagnoses within the category ``Diseases of the Musculoskeletal System'' are least often denied whereas those with diagnoses within the ``Mental Illness'' category are most often denied (compared with other diagnosis categories). Furthermore, private insurance holders are least often denied whereas ``medicaid'' holders are most often denied (compared with other insurance types). When analyzing facility-level factors, we found that the overall 5-star rating and urban versus rural status of an SNF are significantly related to referral acceptance (P<.05). We found a positive but nonmonotonic relationship between the 5-star rating and referral acceptance rates, with the highest acceptance rates found among 5-star facilities. In addition, we found that SNFs in urban areas have lower acceptance rates than their rural counterparts. Conclusions: While many factors may influence a referral acceptance, care challenges associated with individual diagnoses and financial challenges associated with different remuneration types were found to be the strongest drivers. Understanding these drivers is essential in being more intentional in the process of accepting or denying referrals. We have interpreted our results using an adaptive leadership framework and suggested how SNFs can be more purposeful with their decisions while striving to achieve appropriate occupancy levels in ways that meet their goals and patients' needs. ", doi="10.2196/43518", url="https://www.jmir.org/2023/1/e43518", url="http://www.ncbi.nlm.nih.gov/pubmed/37195755" } @Article{info:doi/10.2196/43017, author="Mason, A. Joseph and Friedman, E. Eleanor and Devlin, A. Samantha and Schneider, A. John and Ridgway, P. Jessica", title="Predictive Modeling of Lapses in Care for People Living with HIV in Chicago: Algorithm Development and Interpretation", journal="JMIR Public Health Surveill", year="2023", month="May", day="17", volume="9", pages="e43017", keywords="HIV", keywords="predictive model", keywords="lapse in care", keywords="retention in care", keywords="people living with HIV", keywords="Chicago", keywords="HIV care continuum", keywords="electronic health record", keywords="EHR", abstract="Background: Reducing care lapses for people living with HIV is critical to ending the HIV epidemic and beneficial for their health. Predictive modeling can identify clinical factors associated with HIV care lapses. Previous studies have identified these factors within a single clinic or using a national network of clinics, but public health strategies to improve retention in care in the United States often occur within a regional jurisdiction (eg, a city or county). Objective: We sought to build predictive models of HIV care lapses using a large, multisite, noncurated database of electronic health records (EHRs) in Chicago, Illinois. Methods: We used 2011-2019 data from the Chicago Area Patient-Centered Outcomes Research Network (CAPriCORN), a database including multiple health systems, covering the majority of 23,580 people with an HIV diagnosis living in Chicago. CAPriCORN uses a hash-based data deduplication method to follow people across multiple Chicago health care systems with different EHRs, providing a unique citywide view of retention in HIV care. From the database, we used diagnosis codes, medications, laboratory tests, demographics, and encounter information to build predictive models. Our primary outcome was lapses in HIV care, defined as having more than 12 months between subsequent HIV care encounters. We built logistic regression, random forest, elastic net logistic regression, and XGBoost models using all variables and compared their performance to a baseline logistic regression model containing only demographics and retention history. Results: We included people living with HIV with at least 2 HIV care encounters in the database, yielding 16,930 people living with HIV with 191,492 encounters. All models outperformed the baseline logistic regression model, with the most improvement from the XGBoost model (area under the receiver operating characteristic curve 0.776, 95\% CI 0.768-0.784 vs 0.674, 95\% CI 0.664-0.683; P<.001). Top predictors included the history of care lapses, being seen by an infectious disease provider (vs a primary care provider), site of care, Hispanic ethnicity, and previous HIV laboratory testing. The random forest model (area under the receiver operating characteristic curve 0.751, 95\% CI 0.742-0.759) revealed age, insurance type, and chronic comorbidities (eg, hypertension), as important variables in predicting a care lapse. Conclusions: We used a real-world approach to leverage the full scope of data available in modern EHRs to predict HIV care lapses. Our findings reinforce previously known factors, such as the history of prior care lapses, while also showing the importance of laboratory testing, chronic comorbidities, sociodemographic characteristics, and clinic-specific factors for predicting care lapses for people living with HIV in Chicago. We provide a framework for others to use data from multiple different health care systems within a single city to examine lapses in care using EHR data, which will aid in jurisdictional efforts to improve retention in HIV care. ", doi="10.2196/43017", url="https://publichealth.jmir.org/2023/1/e43017", url="http://www.ncbi.nlm.nih.gov/pubmed/37195750" } @Article{info:doi/10.2196/45190, author="Zoodsma, S. Ruben and Bosch, Rian and Alderliesten, Thomas and Bollen, W. Casper and Kappen, H. Teus and Koomen, Erik and Siebes, Arno and Nijman, Joppe", title="Continuous Data-Driven Monitoring in Critical Congenital Heart Disease: Clinical Deterioration Model Development", journal="JMIR Cardio", year="2023", month="May", day="16", volume="7", pages="e45190", keywords="artificial intelligence", keywords="aberration detection", keywords="clinical deterioration", keywords="classification model", keywords="paediatric intensive care", keywords="pediatric intensive care", keywords="congenital heart disease", keywords="cardiac monitoring", keywords="machine learning", keywords="peri-operative", keywords="perioperative", keywords="surgery", abstract="Background: Critical congenital heart disease (cCHD)---requiring cardiac intervention in the first year of life for survival---occurs globally in 2-3 of every 1000 live births. In the critical perioperative period, intensive multimodal monitoring at a pediatric intensive care unit (PICU) is warranted, as their organs---especially the brain---may be severely injured due to hemodynamic and respiratory events. These 24/7 clinical data streams yield large quantities of high-frequency data, which are challenging in terms of interpretation due to the varying and dynamic physiology innate to cCHD. Through advanced data science algorithms, these dynamic data can be condensed into comprehensible information, reducing the cognitive load on the medical team and providing data-driven monitoring support through automated detection of clinical deterioration, which may facilitate timely intervention. Objective: This study aimed to develop a clinical deterioration detection algorithm for PICU patients with cCHD. Methods: Retrospectively, synchronous per-second data of cerebral regional oxygen saturation (rSO2) and 4 vital parameters (respiratory rate, heart rate, oxygen saturation, and invasive mean blood pressure) in neonates with cCHD admitted to the University Medical Center Utrecht, the Netherlands, between 2002 and 2018 were extracted. Patients were stratified based on mean oxygen saturation during admission to account for physiological differences between acyanotic and cyanotic cCHD. Each subset was used to train our algorithm in classifying data as either stable, unstable, or sensor dysfunction. The algorithm was designed to detect combinations of parameters abnormal to the stratified subpopulation and significant deviations from the patient's unique baseline, which were further analyzed to distinguish clinical improvement from deterioration. Novel data were used for testing, visualized in detail, and internally validated by pediatric intensivists. Results: A retrospective query yielded 4600 hours and 209 hours of per-second data in 78 and 10 neonates for, respectively, training and testing purposes. During testing, stable episodes occurred 153 times, of which 134 (88\%) were correctly detected. Unstable episodes were correctly noted in 46 of 57 (81\%) observed episodes. Twelve expert-confirmed unstable episodes were missed in testing. Time-percentual accuracy was 93\% and 77\% for, respectively, stable and unstable episodes. A total of 138 sensorial dysfunctions were detected, of which 130 (94\%) were correct. Conclusions: In this proof-of-concept study, a clinical deterioration detection algorithm was developed and retrospectively evaluated to classify clinical stability and instability, achieving reasonable performance considering the heterogeneous population of neonates with cCHD. Combined analysis of baseline (ie, patient-specific) deviations and simultaneous parameter-shifting (ie, population-specific) proofs would be promising with respect to enhancing applicability to heterogeneous critically ill pediatric populations. After prospective validation, the current---and comparable---models may, in the future, be used in the automated detection of clinical deterioration and eventually provide data-driven monitoring support to the medical team, allowing for timely intervention. ", doi="10.2196/45190", url="https://cardio.jmir.org/2023/1/e45190", url="http://www.ncbi.nlm.nih.gov/pubmed/37191988" } @Article{info:doi/10.2196/41884, author="Ma, E. Jessica and Lowe, Jared and Berkowitz, Callie and Kim, Azalea and Togo, Ira and Musser, Clayton R. and Fischer, Jonathan and Shah, Kevin and Ibrahim, Salam and Bosworth, B. Hayden and Totten, M. Annette and Dolor, Rowena", title="Provider Interaction With an Electronic Health Record Notification to Identify Eligible Patients for a Cluster Randomized Trial of Advance Care Planning in Primary Care: Secondary Analysis", journal="J Med Internet Res", year="2023", month="May", day="12", volume="25", pages="e41884", keywords="advance care planning", keywords="electronic health record", keywords="notification", keywords="EHR", keywords="provider interaction", keywords="primary care", keywords="clinical study", keywords="referral", keywords="notifications", keywords="alerts", abstract="Background: Advance care planning (ACP) improves patient-provider communication and aligns care to patient values, preferences, and goals. Within a multisite Meta-network Learning and Research Center ACP study, one health system deployed an electronic health record (EHR) notification and algorithm to alert providers about patients potentially appropriate for ACP and the clinical study. Objective: The aim of the study is to describe the implementation and usage of an EHR notification for referring patients to an ACP study, evaluate the association of notifications with study referrals and engagement in ACP, and assess provider interactions with and perspectives on the notifications. Methods: A secondary analysis assessed provider usage and their response to the notification (eg, acknowledge, dismiss, or engage patient in ACP conversation and refer patient to the clinical study). We evaluated all patients identified by the EHR algorithm during the Meta-network Learning and Research Center ACP study. Descriptive statistics compared patients referred to the study to those who were not referred to the study. Health care utilization, hospice referrals, and mortality as well as documentation and billing for ACP and related legal documents are reported. We evaluated associations between notifications with provider actions (ie, referral to study, ACP not documentation, and ACP billing). Provider free-text comments in the notifications were summarized qualitatively. Providers were surveyed on their satisfaction with the notification. Results: Among the 2877 patients identified by the EHR algorithm over 20 months, 17,047 unique notifications were presented to 45 providers in 6 clinics, who then referred 290 (10\%) patients. Providers had a median of 269 (IQR 65-552) total notifications, and patients had a median of 4 (IQR 2-8). Patients with more (over 5) notifications were less likely to be referred to the study than those with fewer notifications (57/1092, 5.2\% vs 233/1785, 13.1\%; P<.001). The most common free-text comment on the notification was lack of time. Providers who referred patients to the study were more likely to document ACP and submit ACP billing codes (P<.001). In the survey, 11 providers would recommend the notification (n=7, 64\%); however, the notification impacted clinical workflow (n=9, 82\%) and was difficult to navigate (n=6, 55\%). Conclusions: An EHR notification can be implemented to remind providers to both perform ACP conversations and refer patients to a clinical study. There were diminishing returns after the fifth EHR notification where additional notifications did not lead to more trial referrals, ACP documentation, or ACP billing. Creation and optimization of EHR notifications for study referrals and ACP should consider the provider user, their workflow, and alert fatigue to improve implementation and adoption. Trial Registration: ClinicalTrials.gov NCT03577002; https://clinicaltrials.gov/ct2/show/NCT03577002 ", doi="10.2196/41884", url="https://www.jmir.org/2023/1/e41884", url="http://www.ncbi.nlm.nih.gov/pubmed/37171856" } @Article{info:doi/10.2196/44455, author="Montgomery, Anna and Tarasovsky, Gary and Izadi, Zara and Shiboski, Stephen and Whooley, A. Mary and Dana, Jo and Ehiorobo, Iziegbe and Barton, Jennifer and Bennett, Lori and Chung, Lorinda and Reiter, Kimberly and Wahl, Elizabeth and Subash, Meera and Schmajuk, Gabriela", title="An Electronic Dashboard to Improve Dosing of Hydroxychloroquine Within the Veterans Health Care System: Time Series Analysis", journal="JMIR Med Inform", year="2023", month="May", day="12", volume="11", pages="e44455", keywords="medical informatics", keywords="patient safety", keywords="health IT", keywords="hydroxychloroquine", keywords="dashboard", keywords="Veterans Health Administration", keywords="audit and feedback", keywords="electronic health record", abstract="Background: Hydroxychloroquine (HCQ) is commonly used for patients with autoimmune conditions. Long-term use of HCQ can cause retinal toxicity, but this risk can be reduced if high doses are avoided. Objective: We developed and piloted an electronic health record--based dashboard to improve the safe prescribing of HCQ within the Veterans Health Administration (VHA). We observed pilot facilities over a 1-year period to determine whether they were able to improve the proportion of patients receiving inappropriate doses of HCQ. Methods: Patients receiving HCQ were identified from the VHA corporate data warehouse. Using PowerBI (Microsoft Corp), we constructed a dashboard to display patient identifiers and the most recent HCQ dose and weight (flagged if ?5.2 mg/kg/day). Six VHA pilot facilities were enlisted to test the dashboard and invited to participate in monthly webinars. We performed an interrupted time series analysis using synthetic controls to assess changes in the proportion of patients receiving HCQ ?5.2 mg/kg/day between October 2020 and November 2021. Results: At the start of the study period, we identified 18,525 total users of HCQ nationwide at 128 facilities in the VHA, including 1365 patients at the 6 pilot facilities. Nationwide, at baseline, 19.8\% (3671/18,525) of patients were receiving high doses of HCQ. We observed significant improvements in the proportion of HCQ prescribed at doses ?5.2 mg/kg/day among pilot facilities after the dashboard was deployed (--0.06; 95\% CI --0.08 to --0.04). The difference in the postintervention linear trend for pilot versus synthetic controls was also significant (--0.06; 95\% CI --0.08 to --0.05). Conclusions: The use of an electronic health record--based dashboard reduced the proportion of patients receiving higher than recommended doses of HCQ and significantly improved performance at 6 VHA facilities. National roll-out of the dashboard will enable further improvements in the safe prescribing of HCQ. ", doi="10.2196/44455", url="https://medinform.jmir.org/2023/1/e44455", url="http://www.ncbi.nlm.nih.gov/pubmed/37171858" } @Article{info:doi/10.2196/44804, author="Han, Jing and Montagna, Marco and Grammenos, Andreas and Xia, Tong and Bondareva, Erika and Siegele-Brown, Chlo{\"e} and Chauhan, Jagmohan and Dang, Ting and Spathis, Dimitris and Floto, Andres R. and Cicuta, Pietro and Mascolo, Cecilia", title="Evaluating Listening Performance for COVID-19 Detection by Clinicians and Machine Learning: Comparative Study", journal="J Med Internet Res", year="2023", month="May", day="9", volume="25", pages="e44804", keywords="audio analysis", keywords="COVID-19 detection", keywords="deep learning", keywords="respiratory disease diagnosis", keywords="mobile health", keywords="detection", keywords="clinicians", keywords="machine learning", keywords="respiratory diagnosis", keywords="clinical decisions", keywords="respiratory", abstract="Background: To date, performance comparisons between men and machines have been carried out in many health domains. Yet machine learning (ML) models and human performance comparisons in audio-based respiratory diagnosis remain largely unexplored. Objective: The primary objective of this study was to compare human clinicians and an ML model in predicting COVID-19 from respiratory sound recordings. Methods: In this study, we compared human clinicians and an ML model in predicting COVID-19 from respiratory sound recordings. Prediction performance on 24 audio samples (12 tested positive) made by 36 clinicians with experience in treating COVID-19 or other respiratory illnesses was compared with predictions made by an ML model trained on 1162 samples. Each sample consisted of voice, cough, and breathing sound recordings from 1 subject, and the length of each sample was around 20 seconds. We also investigated whether combining the predictions of the model and human experts could further enhance the performance in terms of both accuracy and confidence. Results: The ML model outperformed the clinicians, yielding a sensitivity of 0.75 and a specificity of 0.83, whereas the best performance achieved by the clinicians was 0.67 in terms of sensitivity and 0.75 in terms of specificity. Integrating the clinicians' and the model's predictions, however, could enhance performance further, achieving a sensitivity of 0.83 and a specificity of 0.92. Conclusions: Our findings suggest that the clinicians and the ML model could make better clinical decisions via a cooperative approach and achieve higher confidence in audio-based respiratory diagnosis. ", doi="10.2196/44804", url="https://www.jmir.org/2023/1/e44804", url="http://www.ncbi.nlm.nih.gov/pubmed/37126593" } @Article{info:doi/10.2196/41177, author="Lichtner, Gregor and Spies, Claudia and Jurth, Carlo and Bienert, Thomas and Mueller, Anika and Kumpf, Oliver and Piechotta, Vanessa and Skoetz, Nicole and Nothacker, Monika and Boeker, Martin and Meerpohl, J. Joerg and von Dincklage, Falk", title="Automated Monitoring of Adherence to Evidenced-Based Clinical Guideline Recommendations: Design and Implementation Study", journal="J Med Internet Res", year="2023", month="May", day="4", volume="25", pages="e41177", keywords="clinical decision support", keywords="evidence-based medicine", keywords="computer-interpretable guidelines", keywords="COVID-19", keywords="clinical guideline recommendations", keywords="monitoring", keywords="clinical", keywords="patient", keywords="prototype", keywords="utility", keywords="data", keywords="system", abstract="Background: Clinical practice guidelines are systematically developed statements intended to optimize patient care. However, a gapless implementation of guideline recommendations requires health care personnel not only to be aware of the recommendations and to support their content but also to recognize every situation in which they are applicable. To not miss situations in which recommendations should be applied, computerized clinical decision support can be provided through a system that allows an automated monitoring of adherence to clinical guideline recommendations in individual patients. Objective: This study aims to collect and analyze the requirements for a system that allows the monitoring of adherence to evidence-based clinical guideline recommendations in individual patients and, based on these requirements, to design and implement a software prototype that integrates guideline recommendations with individual patient data, and to demonstrate the prototype's utility in treatment recommendations. Methods: We performed a work process analysis with experienced intensive care clinicians to develop a conceptual model of how to support guideline adherence monitoring in clinical routine and identified which steps in the model could be supported electronically. We then identified the core requirements of a software system to support recommendation adherence monitoring in a consensus-based requirements analysis within the loosely structured focus group work of key stakeholders (clinicians, guideline developers, health data engineers, and software developers). On the basis of these requirements, we designed and implemented a modular system architecture. To demonstrate its utility, we applied the prototype to monitor adherence to a COVID-19 treatment recommendation using clinical data from a large European university hospital. Results: We designed a system that integrates guideline recommendations with real-time clinical data to evaluate individual guideline recommendation adherence and developed a functional prototype. The needs analysis with clinical staff resulted in a flowchart describing the work process of how adherence to recommendations should be monitored. Four core requirements were identified: the ability to decide whether a recommendation is applicable and implemented for a specific patient, the ability to integrate clinical data from different data formats and data structures, the ability to display raw patient data, and the use of a Fast Healthcare Interoperability Resources--based format for the representation of clinical practice guidelines to provide an interoperable, standards-based guideline recommendation exchange format. Conclusions: Our system has advantages in terms of individual patient treatment and quality management in hospitals. However, further studies are needed to measure its impact on patient outcomes and evaluate its resource effectiveness in different clinical settings. We specified a modular software architecture that allows experts from different fields to work independently and focus on their area of expertise. We have released the source code of our system under an open-source license and invite for collaborative further development of the system. ", doi="10.2196/41177", url="https://www.jmir.org/2023/1/e41177", url="http://www.ncbi.nlm.nih.gov/pubmed/36996044" } @Article{info:doi/10.2196/44870, author="Nishiyama, Tomohiro and Yada, Shuntaro and Wakamiya, Shoko and Hori, Satoko and Aramaki, Eiji", title="Transferability Based on Drug Structure Similarity in the Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach", journal="J Med Internet Res", year="2023", month="May", day="3", volume="25", pages="e44870", keywords="data mining", keywords="machine learning", keywords="medication noncompliance", keywords="natural language processing", keywords="pharmacovigilance", keywords="transfer learning", keywords="text classification", abstract="Background: Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media--based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients. Objective: This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance. Methods: This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs). Results: The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small. Conclusions: The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured. ", doi="10.2196/44870", url="https://www.jmir.org/2023/1/e44870", url="http://www.ncbi.nlm.nih.gov/pubmed/37133915" } @Article{info:doi/10.2196/44373, author="Yang, Ju-Yeh and Shu, Kai-Hsiang and Peng, Yu-Sen and Hsu, Shih-Ping and Chiu, Yen-Ling and Pai, Mei-Fen and Wu, Hon-Yen and Tsai, Wan-Chuan and Tung, Kuei-Ting and Kuo, N. Raymond", title="Physician Compliance With a Computerized Clinical Decision Support System for Anemia Management of Patients With End-stage Kidney Disease on Hemodialysis: Retrospective Electronic Health Record Observational Study", journal="JMIR Form Res", year="2023", month="May", day="3", volume="7", pages="e44373", keywords="clinical decision support system", keywords="erythropoietin-stimulating agent", keywords="end-stage kidney disease", keywords="hemodialysis", keywords="physician compliance", keywords="kidney disease", keywords="clinical decision support", keywords="electronic health records", keywords="decision support", keywords="anemia management", keywords="patient outcome", abstract="Background: Previous studies on clinical decision support systems (CDSSs) for the management of renal anemia in patients with end-stage kidney disease undergoing hemodialysis have previously focused solely on the effects of the CDSS. However, the role of physician compliance in the efficacy of the CDSS remains ill-defined. Objective: We aimed to investigate whether physician compliance was an intermediate variable between the CDSS and the management outcomes of renal anemia. Methods: We extracted the electronic health records of patients with end-stage kidney disease on hemodialysis at the Far Eastern Memorial Hospital Hemodialysis Center (FEMHHC) from 2016 to 2020. FEMHHC implemented a rule-based CDSS for the management of renal anemia in 2019. We compared the clinical outcomes of renal anemia between the pre- and post-CDSS periods using random intercept models. Hemoglobin levels of 10 to 12 g/dL were defined as the on-target range. Physician compliance was defined as the concordance of adjustments of the erythropoietin-stimulating agent (ESA) between the CDSS recommendations and the actual physician prescriptions. Results: We included 717 eligible patients on hemodialysis (mean age 62.9, SD 11.6 years; male n=430, 59.9\%) with a total of 36,091 hemoglobin measurements (average hemoglobin and on-target rate were 11.1, SD 1.4, g/dL and 59.9\%, respectively). The on-target rate decreased from 61.3\% (pre-CDSS) to 56.2\% (post-CDSS) owing to a high hemoglobin percentage of >12 g/dL (pre: 21.5\%; post: 29\%). The failure rate (hemoglobin <10 g/dL) decreased from 17.2\% (pre-CDSS) to 14.8\% (post-CDSS). The average weekly ESA use of 5848 (SD 4211) units per week did not differ between phases. The overall concordance between CDSS recommendations and physician prescriptions was 62.3\%. The CDSS concordance increased from 56.2\% to 78.6\%. In the adjusted random intercept model, the post-CDSS phase showed increased hemoglobin by 0.17 (95\% CI 0.14-0.21) g/dL, weekly ESA by 264 (95\% CI 158-371) units per week, and 3.4-fold (95\% CI 3.1-3.6) increased concordance rate. However, the on-target rate (29\%; odds ratio 0.71, 95\% CI 0.66-0.75) and failure rate (16\%; odds ratio 0.84, 95\% CI 0.76-0.92) were reduced. After additional adjustments for concordance in the full models, increased hemoglobin and decreased on-target rate tended toward attenuation (from 0.17 to 0.13 g/dL and 0.71 to 0.73 g/dL, respectively). Increased ESA and decreased failure rate were completely mediated by physician compliance (from 264 to 50 units and 0.84 to 0.97, respectively). Conclusions: Our results confirmed that physician compliance was a complete intermediate factor accounting for the efficacy of the CDSS. The CDSS reduced failure rates of anemia management through physician compliance. Our study highlights the importance of optimizing physician compliance in the design and implementation of CDSSs to improve patient outcomes. ", doi="10.2196/44373", url="https://formative.jmir.org/2023/1/e44373", url="http://www.ncbi.nlm.nih.gov/pubmed/37133912" } @Article{info:doi/10.2196/43695, author="Ehrler, Frederic and Tuor, Carlotta and Rey, Robin and Trompier, R{\'e}my and Berger, Antoine and Ramusi, Michael and Courvoisier, S. Delphine and Siebert, N. Johan", title="Effectiveness of a Mobile App (PIMPmyHospital) in Reducing Therapeutic Turnaround Times in an Emergency Department: Protocol for a Pre- and Posttest Study", journal="JMIR Res Protoc", year="2023", month="May", day="3", volume="12", pages="e43695", keywords="clinical laboratory information systems", keywords="laboratory results", keywords="digital technology", keywords="emergency department", keywords="emergency service", keywords="hospital", keywords="length of stay", keywords="mobile app", keywords="mobile health", keywords="mHealth", keywords="pediatrics", keywords="therapeutic turnaround time", abstract="Background: Delays in reviewing issued laboratory results in emergency departments (EDs) can adversely affect efficiency and quality of care. One opportunity to improve therapeutic turnaround time could be to provide real-time access to laboratory results on mobile devices available to every caregiver. We developed a mobile app named ``Patients In My Pocket in my Hospital'' (PIMPmyHospital) to help ED caregivers automatically obtain and share relevant information about the patients they care for including laboratory results. Objective: This pre- and posttest study aims to explore whether the implementation of the PIMPmyHospital app impacts the timeliness with which ED physicians and nurses remotely access laboratory results while actively working in their real-world environment, including ED length of stay, technology acceptance and usability among users, and how specific in-app alerts impact on its effectiveness. Methods: This single-center study of nonequivalent pre- and posttest comparison group design will be conducted before and after the implementation of the app in a tertiary pediatric ED in Switzerland. The retrospective period will cover the previous 12 months, and the prospective period will cover the following 6 months. Participants will be postgraduate residents pursuing a ?6-year residency in pediatrics, pediatric emergency medicine fellows, and registered nurses from the pediatric ED. The primary outcome will be the mean elapsed time in minutes from delivery of laboratory results to caregivers' consideration by accessing them either through the hospital's electronic medical records or through the app before and after the implementation of the app, respectively. As secondary outcomes, participants will be queried about the acceptance and usability of the app using the Unified Theory of Acceptance and Use of Technology model and the System Usability Scale. ED length of stay will be compared before and after the implementation of the app for patients with laboratory results. The impact of specific alerts on the app, such as a flashing icon or sound for reported pathological values, will be reported. Results: Retrospective data collection gathered from the institutional data set will span a 12-month period from October 2021 to October 2022, while the 6-month prospective collection will begin with the implementation of the app in November 2022 and is expected to cease at the end of April 2023. We expect the results of the study to be published in a peer-reviewed journal in late 2023. Conclusions: This study will show the potential reach, effectiveness, acceptance, and use of the PIMPmyHospital app among ED caregivers. The findings of this study will serve as the basis for future research on the app and any further development to improve its effectiveness.Trial Registration:?ClinicalTrials.gov NCT05557331; https://clinicaltrials.gov/ct2/show/NCT05557331 Trial Registration: ClinicalTrials.gov NCT05557331; https://clinicaltrials.gov/ct2/show/NCT05557331 International Registered Report Identifier (IRRID): PRR1-10.2196/43695 ", doi="10.2196/43695", url="https://www.researchprotocols.org/2023/1/e43695", url="http://www.ncbi.nlm.nih.gov/pubmed/37133909" } @Article{info:doi/10.2196/42978, author="Yoon, Ho Chang and Nolan, Imogen and Humphrey, Gayl and Duffy, J. Eamon and Thomas, G. Mark and Ritchie, R. Stephen", title="Long-Term Impact of a Smartphone App on Prescriber Adherence to Antibiotic Guidelines for Adult Patients With Community-Acquired Pneumonia: Interrupted Time-Series Study", journal="J Med Internet Res", year="2023", month="May", day="2", volume="25", pages="e42978", keywords="app", keywords="antimicrobial stewardship", keywords="antibiotic adherence", keywords="community", keywords="pneumonia", keywords="smartphone", keywords="mobile health", keywords="mHealth", keywords="antibiotic", keywords="behavior", keywords="adults", keywords="diagnosis", keywords="pulmonary", keywords="patient", abstract="Background: Mobile health platforms like smartphone apps that provide clinical guidelines are ubiquitous, yet their long-term impact on guideline adherence remains unclear. In 2016, an antibiotic guidelines app, called SCRIPT, was introduced in Auckland City Hospital, New Zealand, to provide local antibiotic guidelines to clinicians on their smartphones. Objective: We aimed to assess whether the provision of antibiotic guidelines in a smartphone app resulted in sustained changes in antibiotic guideline adherence by prescribers. Methods: We analyzed antibiotic guideline adherence rates during the first 24 hours of hospital admission in adults diagnosed with community-acquired pneumonia using an interrupted time-series study with 3 distinct periods post app implementation (ie, 3, 12, and 24 months). Results: Adherence increased from 23\% (46/200) at baseline to 31\% (73/237) at 3 months and 34\% (69/200) at 12 months, reducing to 31\% (62/200) at 24 months post app implementation (P=.07 vs baseline). However, increased adherence was sustained in patients with pulmonary consolidation on x-ray (9/63, 14\% at baseline; 23/77, 30\% after 3 months; 32/92, 35\% after 12 month; and 32/102, 31\% after 24 months; P=.04 vs baseline). Conclusions: An antibiotic guidelines app increased overall adherence, but this was not sustained. In patients with pulmonary consolidation, the increased adherence was sustained. ", doi="10.2196/42978", url="https://www.jmir.org/2023/1/e42978", url="http://www.ncbi.nlm.nih.gov/pubmed/37129941" } @Article{info:doi/10.2196/44791, author="Park, Jiesuck and Yoon, Yeonyee and Cho, Youngjin and Kim, Joonghee", title="Feasibility of Artificial Intelligence--Based Electrocardiography Analysis for the Prediction of Obstructive Coronary Artery Disease in Patients With Stable Angina: Validation Study", journal="JMIR Cardio", year="2023", month="May", day="2", volume="7", pages="e44791", keywords="artificial intelligence", keywords="AI", keywords="coronary artery disease", keywords="coronary stenosis", keywords="electrocardiography", keywords="stable angina", abstract="Background: Despite accumulating research on artificial intelligence--based electrocardiography (ECG) algorithms for predicting acute coronary syndrome (ACS), their application in stable angina is not well evaluated. Objective: We evaluated the utility of an existing artificial intelligence--based quantitative electrocardiography (QCG) analyzer in stable angina and developed a new ECG biomarker more suitable for stable angina. Methods: This single-center study comprised consecutive patients with stable angina. The independent and incremental value of QCG scores for coronary artery disease (CAD)--related conditions (ACS, myocardial injury, critical status, ST-elevation myocardial infarction, and left ventricular dysfunction) for predicting obstructive CAD confirmed by invasive angiography was examined. Additionally, ECG signals extracted by the QCG analyzer were used as input to develop a new QCG score. Results: Among 723 patients with stable angina (median age 68 years; male: 470/723, 65\%), 497 (69\%) had obstructive CAD. QCG scores for ACS and myocardial injury were independently associated with obstructive CAD (odds ratio [OR] 1.09, 95\% CI 1.03-1.17 and OR 1.08, 95\% CI 1.02-1.16 per 10-point increase, respectively) but did not significantly improve prediction performance compared to clinical features. However, our new QCG score demonstrated better prediction performance for obstructive CAD (area under the receiver operating characteristic curve 0.802) than the original QCG scores, with incremental predictive value in combination with clinical features (area under the receiver operating characteristic curve 0.827 vs 0.730; P<.001). Conclusions: QCG scores developed for acute conditions show limited performance in identifying obstructive CAD in stable angina. However, improvement in the QCG analyzer, through training on comprehensive ECG signals in patients with stable angina, is feasible. ", doi="10.2196/44791", url="https://cardio.jmir.org/2023/1/e44791", url="http://www.ncbi.nlm.nih.gov/pubmed/37129937" } @Article{info:doi/10.2196/43006, author="Wang, Tongnian and Du, Yan and Gong, Yanmin and Choo, Raymond Kim-Kwang and Guo, Yuanxiong", title="Applications of Federated Learning in Mobile Health: Scoping Review", journal="J Med Internet Res", year="2023", month="May", day="1", volume="25", pages="e43006", keywords="decision support", keywords="distributed systems", keywords="federated learning", keywords="health monitoring", keywords="mHealth", keywords="privacy", abstract="Background: The proliferation of mobile health (mHealth) applications is partly driven by the advancements in sensing and communication technologies, as well as the integration of artificial intelligence techniques. Data collected from mHealth applications, for example, on sensor devices carried by patients, can be mined and analyzed using artificial intelligence--based solutions to facilitate remote and (near) real-time decision-making in health care settings. However, such data often sit in data silos, and patients are often concerned about the privacy implications of sharing their raw data. Federated learning (FL) is a potential solution, as it allows multiple data owners to collaboratively train a machine learning model without requiring access to each other's raw data. Objective: The goal of this scoping review is to gain an understanding of FL and its potential in dealing with sensitive and heterogeneous data in mHealth applications. Through this review, various stakeholders, such as health care providers, practitioners, and policy makers, can gain insight into the limitations and challenges associated with using FL in mHealth and make informed decisions when considering implementing FL-based solutions. Methods: We conducted a scoping review following the guidelines of PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). We searched 7 commonly used databases. The included studies were analyzed and summarized to identify the possible real-world applications and associated challenges of using FL in mHealth settings. Results: A total of 1095 articles were retrieved during the database search, and 26 articles that met the inclusion criteria were included in the review. The analysis of these articles revealed 2 main application areas for FL in mHealth, that is, remote monitoring and diagnostic and treatment support. More specifically, FL was found to be commonly used for monitoring self-care ability, health status, and disease progression, as well as in diagnosis and treatment support of diseases. The review also identified several challenges (eg, expensive communication, statistical heterogeneity, and system heterogeneity) and potential solutions (eg, compression schemes, model personalization, and active sampling). Conclusions: This scoping review has highlighted the potential of FL as a privacy-preserving approach in mHealth applications and identified the technical limitations associated with its use. The challenges and opportunities outlined in this review can inform the research agenda for future studies in this field, to overcome these limitations and further advance the use of FL in mHealth. ", doi="10.2196/43006", url="https://www.jmir.org/2023/1/e43006", url="http://www.ncbi.nlm.nih.gov/pubmed/37126398" } @Article{info:doi/10.2196/41748, author="He, Ying and Zamani, Efpraxia and Yevseyeva, Iryna and Luo, Cunjin", title="Artificial Intelligence--Based Ethical Hacking for Health Information Systems: Simulation Study", journal="J Med Internet Res", year="2023", month="Apr", day="25", volume="25", pages="e41748", keywords="health information system", keywords="HIS", keywords="ethical hacking", keywords="open-source electronic medical record", keywords="OpenEMR", keywords="artificial intelligence", keywords="AI-based hacking", keywords="cyber defense solutions", abstract="Background: Health information systems (HISs) are continuously targeted by hackers, who aim to bring down critical health infrastructure. This study was motivated by recent attacks on health care organizations that have resulted in the compromise of sensitive data held in HISs. Existing research on cybersecurity in the health care domain places an imbalanced focus on protecting medical devices and data. There is a lack of a systematic way to investigate how attackers may breach an HIS and access health care records. Objective: This study aimed to provide new insights into HIS cybersecurity protection. We propose a systematic, novel, and optimized (artificial intelligence--based) ethical hacking method tailored specifically for HISs, and we compared it with the traditional unoptimized ethical hacking method. This allows researchers and practitioners to identify the points and attack pathways of possible penetration attacks on the HIS more efficiently. Methods: In this study, we propose a novel methodological approach to ethical hacking in HISs. We implemented ethical hacking using both optimized and unoptimized methods in an experimental setting. Specifically, we set up an HIS simulation environment by implementing the open-source electronic medical record (OpenEMR) system and followed the National Institute of Standards and Technology's ethical hacking framework to launch the attacks. In the experiment, we launched 50 rounds of attacks using both unoptimized and optimized ethical hacking methods. Results: Ethical hacking was successfully conducted using both optimized and unoptimized methods. The results show that the optimized ethical hacking method outperforms the unoptimized method in terms of average time used, the average success rate of exploit, the number of exploits launched, and the number of successful exploits. We were able to identify the successful attack paths and exploits that are related to remote code execution, cross-site request forgery, improper authentication, vulnerability in the Oracle Business Intelligence Publisher, an elevation of privilege vulnerability (in MediaTek), and remote access backdoor (in the web graphical user interface for the Linux Virtual Server). Conclusions: This research demonstrates systematic ethical hacking against an HIS using optimized and unoptimized methods, together with a set of penetration testing tools to identify exploits and combining them to perform ethical hacking. The findings contribute to the HIS literature, ethical hacking methodology, and mainstream artificial intelligence--based ethical hacking methods because they address some key weaknesses of these research fields. These findings also have great significance for the health care sector, as OpenEMR is widely adopted by health care organizations. Our findings offer novel insights for the protection of HISs and allow researchers to conduct further research in the HIS cybersecurity domain. ", doi="10.2196/41748", url="https://www.jmir.org/2023/1/e41748", url="http://www.ncbi.nlm.nih.gov/pubmed/37097723" } @Article{info:doi/10.2196/46348, author="Weng, Kung-Hsun and Liu, Chung-Feng and Chen, Chia-Jung", title="Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study", journal="JMIR Med Inform", year="2023", month="Apr", day="25", volume="11", pages="e46348", keywords="radiology report", keywords="natural language processing", keywords="negation", keywords="deep learning", keywords="transfer learning", keywords="supervised learning", keywords="validation study", keywords="Bidirectional Encoder Representations from Transformers", keywords="BERT", keywords="clinical application", keywords="radiology", abstract="Background: Negation and speculation unrelated to abnormal findings can lead to false-positive alarms for automatic radiology report highlighting or flagging by laboratory information systems. Objective: This internal validation study evaluated the performance of natural language processing methods (NegEx, NegBio, NegBERT, and transformers). Methods: We annotated all negative and speculative statements unrelated to abnormal findings in reports. In experiment 1, we fine-tuned several transformer models (ALBERT [A Lite Bidirectional Encoder Representations from Transformers], BERT [Bidirectional Encoder Representations from Transformers], DeBERTa [Decoding-Enhanced BERT With Disentangled Attention], DistilBERT [Distilled version of BERT], ELECTRA [Efficiently Learning an Encoder That Classifies Token Replacements Accurately], ERNIE [Enhanced Representation through Knowledge Integration], RoBERTa [Robustly Optimized BERT Pretraining Approach], SpanBERT, and XLNet) and compared their performance using precision, recall, accuracy, and F1-scores. In experiment 2, we compared the best model from experiment 1 with 3 established negation and speculation-detection algorithms (NegEx, NegBio, and NegBERT). Results: Our study collected 6000 radiology reports from 3 branches of the Chi Mei Hospital, covering multiple imaging modalities and body parts. A total of 15.01\% (105,755/704,512) of words and 39.45\% (4529/11,480) of important diagnostic keywords occurred in negative or speculative statements unrelated to abnormal findings. In experiment 1, all models achieved an accuracy of >0.98 and F1-score of >0.90 on the test data set. ALBERT exhibited the best performance (accuracy=0.991; F1-score=0.958). In experiment 2, ALBERT outperformed the optimized NegEx, NegBio, and NegBERT methods in terms of overall performance (accuracy=0.996; F1-score=0.991), in the prediction of whether diagnostic keywords occur in speculative statements unrelated to abnormal findings, and in the improvement of the performance of keyword extraction (accuracy=0.996; F1-score=0.997). Conclusions: The ALBERT deep learning method showed the best performance. Our results represent a significant advancement in the clinical applications of computer-aided notification systems. ", doi="10.2196/46348", url="https://medinform.jmir.org/2023/1/e46348", url="http://www.ncbi.nlm.nih.gov/pubmed/37097731" } @Article{info:doi/10.2196/43153, author="Jin, Zhi-Geng and Zhang, Hui and Tai, Mei-Hui and Yang, Ying and Yao, Yuan and Guo, Yu-Tao", title="Natural Language Processing in a Clinical Decision Support System for the Identification of Venous Thromboembolism: Algorithm Development and Validation", journal="J Med Internet Res", year="2023", month="Apr", day="24", volume="25", pages="e43153", keywords="venous thromboembolism", keywords="deep vein thrombosis", keywords="pulmonary embolism", keywords="natural language processing", keywords="electronic health record", abstract="Background: It remains unknown whether capturing data from electronic health records (EHRs) using natural language processing (NLP) can improve venous thromboembolism (VTE) detection in different clinical settings. Objective: The aim of this study was to validate the NLP algorithm in a clinical decision support system for VTE risk assessment and integrated care (DeVTEcare) to identify VTEs from EHRs. Methods: All inpatients aged ?18 years in the Sixth Medical Center of the Chinese People's Liberation Army General Hospital from January 1 to December 31, 2021, were included as the validation cohort. The sensitivity, specificity, positive and negative likelihood ratios (LR+ and LR--, respectively), area under the receiver operating characteristic curve (AUC), and F1-scores along with their 95\% CIs were used to analyze the performance of the NLP tool, with manual review of medical records as the reference standard for detecting deep vein thrombosis (DVT) and pulmonary embolism (PE). The primary end point was the performance of the NLP approach embedded into the EHR for VTE identification. The secondary end points were the performances to identify VTE among different hospital departments with different VTE risks. Subgroup analyses were performed among age, sex, and the study season. Results: Among 30,152 patients (median age 56 [IQR 41-67] years; 14,247/30,152, 47.3\% females), the prevalence of VTE, PE, and DVT was 2.1\% (626/30,152), 0.6\% (177/30,152), and 1.8\% (532/30,152), respectively. The sensitivity, specificity, LR+, LR--, AUC, and F1-score of NLP-facilitated VTE detection were 89.9\% (95\% CI 87.3\%-92.2\%), 99.8\% (95\% CI 99.8\%-99.9\%), 483 (95\% CI 370-629), 0.10 (95\% CI 0.08-0.13), 0.95 (95\% CI 0.94-0.96), and 0.90 (95\% CI 0.90-0.91), respectively. Among departments of surgery, internal medicine, and intensive care units, the highest specificity (100\% vs 99.7\% vs 98.8\%, respectively), LR+ (3202 vs 321 vs 77, respectively), and F1-score (0.95 vs 0.89 vs 0.92, respectively) were in the surgery department (all P<.001). Among low, intermediate, and high VTE risks in hospital departments, the low-risk department had the highest AUC (1.00 vs 0.94 vs 0.96, respectively) and F1-score (0.97 vs 0.90 vs 0.90, respectively) as well as the lowest LR-- (0.00 vs 0.13 vs 0.08, respectively) (DeLong test for AUC; all P<.001). Subgroup analysis of the age, sex, and season demonstrated consistently good performance of VTE detection with >87\% sensitivity and specificity and >89\% AUC and F1-score. The NLP algorithm performed better among patients aged ?65 years than among those aged >65 years (F1-score 0.93 vs 0.89, respectively; P<.001). Conclusions: The NLP algorithm in our DeVTEcare identified VTE well across different clinical settings, especially in patients in surgery units, departments with low-risk VTE, and patients aged ?65 years. This algorithm can help to inform accurate in-hospital VTE rates and enhance risk-classified VTE integrated care in future research. ", doi="10.2196/43153", url="https://www.jmir.org/2023/1/e43153", url="http://www.ncbi.nlm.nih.gov/pubmed/37093636" } @Article{info:doi/10.2196/38039, author="King, Henry and Wright, Judy and Treanor, Darren and Williams, Bethany and Randell, Rebecca", title="What Works Where and How for Uptake and Impact of Artificial Intelligence in Pathology: Review of Theories for a Realist Evaluation", journal="J Med Internet Res", year="2023", month="Apr", day="24", volume="25", pages="e38039", keywords="artificial intelligence", keywords="AI", keywords="machine learning", keywords="histopathology", keywords="pathology", keywords="implementation", keywords="review", abstract="Background: There is increasing interest in the use of artificial intelligence (AI) in pathology to increase accuracy and efficiency. To date, studies of clinicians' perceptions of AI have found only moderate acceptability, suggesting the need for further research regarding how to integrate it into clinical practice. Objective: The aim of the study was to determine contextual factors that may support or constrain the uptake of AI in pathology. Methods: To go beyond a simple listing of barriers and facilitators, we drew on the approach of realist evaluation and undertook a review of the literature to elicit stakeholders' theories of how, for whom, and in what circumstances AI can provide benefit in pathology. Searches were designed by an information specialist and peer-reviewed by a second information specialist. Searches were run on the arXiv.org repository, MEDLINE, and the Health Management Information Consortium, with additional searches undertaken on a range of websites to identify gray literature. In line with a realist approach, we also made use of relevant theory. Included documents were indexed in NVivo 12, using codes to capture different contexts, mechanisms, and outcomes that could affect the introduction of AI in pathology. Coded data were used to produce narrative summaries of each of the identified contexts, mechanisms, and outcomes, which were then translated into theories in the form of context-mechanism-outcome configurations. Results: A total of 101 relevant documents were identified. Our analysis indicates that the benefits that can be achieved will vary according to the size and nature of the pathology department's workload and the extent to which pathologists work collaboratively; the major perceived benefit for specialist centers is in reducing workload. For uptake of AI, pathologists' trust is essential. Existing theories suggest that if pathologists are able to ``make sense'' of AI, engage in the adoption process, receive support in adapting their work processes, and can identify potential benefits to its introduction, it is more likely to be accepted. Conclusions: For uptake of AI in pathology, for all but the most simple quantitative tasks, measures will be required that either increase confidence in the system or provide users with an understanding of the performance of the system. For specialist centers, efforts should focus on reducing workload rather than increasing accuracy. Designers also need to give careful thought to usability and how AI is integrated into pathologists' workflow. ", doi="10.2196/38039", url="https://www.jmir.org/2023/1/e38039", url="http://www.ncbi.nlm.nih.gov/pubmed/37093631" } @Article{info:doi/10.2196/44977, author="Afshar, Majid and Adelaine, Sabrina and Resnik, Felice and Mundt, P. Marlon and Long, John and Leaf, Margaret and Ampian, Theodore and Wills, J. Graham and Schnapp, Benjamin and Chao, Michael and Brown, Randy and Joyce, Cara and Sharma, Brihat and Dligach, Dmitriy and Burnside, S. Elizabeth and Mahoney, Jane and Churpek, M. Matthew and Patterson, W. Brian and Liao, Frank", title="Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults", journal="JMIR Med Inform", year="2023", month="Apr", day="20", volume="11", pages="e44977", keywords="clinical decision support", keywords="natural language processing", keywords="medical informatics", keywords="opioid related disorder", keywords="opioid use", keywords="electronic health record", keywords="clinical note", keywords="cloud service", keywords="artificial intelligence", keywords="AI", abstract="Background: The clinical narrative in electronic health records (EHRs) carries valuable information for predictive analytics; however, its free-text form is difficult to mine and analyze for clinical decision support (CDS). Large-scale clinical natural language processing (NLP) pipelines have focused on data warehouse applications for retrospective research efforts. There remains a paucity of evidence for implementing NLP pipelines at the bedside for health care delivery. Objective: We aimed to detail a hospital-wide, operational pipeline to implement a real-time NLP-driven CDS tool and describe a protocol for an implementation framework with a user-centered design of the CDS tool. Methods: The pipeline integrated a previously trained open-source convolutional neural network model for screening opioid misuse that leveraged EHR notes mapped to standardized medical vocabularies in the Unified Medical Language System. A sample of 100 adult encounters were reviewed by a physician informaticist for silent testing of the deep learning algorithm before deployment. An end user interview survey was developed to examine the user acceptability of a best practice alert (BPA) to provide the screening results with recommendations. The planned implementation also included a human-centered design with user feedback on the BPA, an implementation framework with cost-effectiveness, and a noninferiority patient outcome analysis plan. Results: The pipeline was a reproducible workflow with a shared pseudocode for a cloud service to ingest, process, and store clinical notes as Health Level 7 messages from a major EHR vendor in an elastic cloud computing environment. Feature engineering of the notes used an open-source NLP engine, and the features were fed into the deep learning algorithm, with the results returned as a BPA in the EHR. On-site silent testing of the deep learning algorithm demonstrated a sensitivity of 93\% (95\% CI 66\%-99\%) and specificity of 92\% (95\% CI 84\%-96\%), similar to published validation studies. Before deployment, approvals were received across hospital committees for inpatient operations. Five interviews were conducted; they informed the development of an educational flyer and further modified the BPA to exclude certain patients and allow the refusal of recommendations. The longest delay in pipeline development was because of cybersecurity approvals, especially because of the exchange of protected health information between the Microsoft (Microsoft Corp) and Epic (Epic Systems Corp) cloud vendors. In silent testing, the resultant pipeline provided a BPA to the bedside within minutes of a provider entering a note in the EHR. Conclusions: The components of the real-time NLP pipeline were detailed with open-source tools and pseudocode for other health systems to benchmark. The deployment of medical artificial intelligence systems in routine clinical care presents an important yet unfulfilled opportunity, and our protocol aimed to close the gap in the implementation of artificial intelligence--driven CDS. Trial Registration: ClinicalTrials.gov NCT05745480; https://www.clinicaltrials.gov/ct2/show/NCT05745480 ", doi="10.2196/44977", url="https://medinform.jmir.org/2023/1/e44977", url="http://www.ncbi.nlm.nih.gov/pubmed/37079367" } @Article{info:doi/10.2196/44237, author="Hopcroft, EM Lisa and Massey, Jon and Curtis, J. Helen and Mackenna, Brian and Croker, Richard and Brown, D. Andrew and O'Dwyer, Thomas and Macdonald, Orla and Evans, David and Inglesby, Peter and Bacon, CJ Sebastian and Goldacre, Ben and Walker, J. Alex", title="Data-Driven Identification of Unusual Prescribing Behavior: Analysis and Use of an Interactive Data Tool Using 6 Months of Primary Care Data From 6500 Practices in England", journal="JMIR Med Inform", year="2023", month="Apr", day="19", volume="11", pages="e44237", keywords="dashboard", keywords="data science", keywords="EHR", keywords="electronic health records", keywords="general practice", keywords="outliers", keywords="prescribing", keywords="primary care", abstract="Background: Approaches to addressing unwarranted variation in health care service delivery have traditionally relied on the prospective identification of activities and outcomes, based on a hypothesis, with subsequent reporting against defined measures. Practice-level prescribing data in England are made publicly available by the National Health Service (NHS) Business Services Authority for all general practices. There is an opportunity to adopt a more data-driven approach to capture variability and identify outliers by applying hypothesis-free, data-driven algorithms to national data sets. Objective: This study aimed to develop and apply a hypothesis-free algorithm to identify unusual prescribing behavior in primary care data at multiple administrative levels in the NHS in England and to visualize these results using organization-specific interactive dashboards, thereby demonstrating proof of concept for prioritization approaches. Methods: Here we report a new data-driven approach to quantify how ``unusual'' the prescribing rates of a particular chemical within an organization are as compared to peer organizations, over a period of 6 months (June-December 2021). This is followed by a ranking to identify which chemicals are the most notable outliers in each organization. These outlying chemicals are calculated for all practices, primary care networks, clinical commissioning groups, and sustainability and transformation partnerships in England. Our results are presented via organization-specific interactive dashboards, the iterative development of which has been informed by user feedback. Results: We developed interactive dashboards for every practice (n=6476) in England, highlighting the unusual prescribing of 2369 chemicals (dashboards are also provided for 42 sustainability and transformation partnerships, 106 clinical commissioning groups, and 1257 primary care networks). User feedback and internal review of case studies demonstrate that our methodology identifies prescribing behavior that sometimes warrants further investigation or is a known issue. Conclusions: Data-driven approaches have the potential to overcome existing biases with regard to the planning and execution of audits, interventions, and policy making within NHS organizations, potentially revealing new targets for improved health care service delivery. We present our dashboards as a proof of concept for generating candidate lists to aid expert users in their interpretation of prescribing data and prioritize further investigations and qualitative research in terms of potential targets for improved performance. ", doi="10.2196/44237", url="https://medinform.jmir.org/2023/1/e44237", url="http://www.ncbi.nlm.nih.gov/pubmed/37074763" } @Article{info:doi/10.2196/46127, author="Sung, Sumi and Park, Hyeoun-Ae and Jung, Hyesil and Kang, Hannah", title="A SNOMED CT Mapping Guideline for the Local Terms Used to Document Clinical Findings and Procedures in Electronic Medical Records in South Korea: Methodological Study", journal="JMIR Med Inform", year="2023", month="Apr", day="18", volume="11", pages="e46127", keywords="semantic interoperability", keywords="Systematized Nomenclature of Medicine--Clinical Terms", keywords="mapping guideline", keywords="local terms", keywords="mapping", keywords="guideline", keywords="SNOMED", keywords="nomenclature", keywords="interoperable", keywords="interoperability", keywords="terminology", keywords="medical term", keywords="health term", keywords="terminologies", keywords="ontologies", abstract="Background: South Korea joined SNOMED International as the 39th member country. To ensure semantic interoperability, South Korea introduced SNOMED CT (Systemized Nomenclature of Medicine--Clinical Terms) in 2020. However, there is no methodology to map local Korean terms to SNOMED CT. Instead, this is performed sporadically and independently at each local medical institution. The quality of the mapping, therefore, cannot be guaranteed. Objective: This study aimed to develop and introduce a guideline to map local Korean terms to the SNOMED CT used to document clinical findings and procedures in electronic health records at health care institutions in South Korea. Methods: The guidelines were developed from December 2020 to December 2022. An extensive literature review was conducted. The overall structures and contents of the guidelines with diverse use cases were developed by referencing the existing SNOMED CT mapping guidelines, previous studies related to SNOMED CT mapping, and the experiences of the committee members. The developed guidelines were validated by a guideline review panel. Results: The SNOMED CT mapping guidelines developed in this study recommended the following 9 steps: define the purpose and scope of the map, extract terms, preprocess source terms, preprocess source terms using clinical context, select a search term, use search strategies to find SNOMED CT concepts using a browser, classify mapping correlations, validate the map, and build the final map format. Conclusions: The guidelines developed in this study can support the standardized mapping of local Korean terms into SNOMED CT. Mapping specialists can use this guideline to improve the mapping quality performed at individual local medical institutions. ", doi="10.2196/46127", url="https://medinform.jmir.org/2023/1/e46127", url="http://www.ncbi.nlm.nih.gov/pubmed/37071456" } @Article{info:doi/10.2196/43958, author="Weinert, Lina and Klass, Maximilian and Schneider, Gerd and Heinze, Oliver", title="Exploring Stakeholder Requirements to Enable Research and Development of Artificial Intelligence Algorithms in a Hospital-Based Generic Infrastructure: Results of a Multistep Mixed Methods Study", journal="JMIR Form Res", year="2023", month="Apr", day="18", volume="7", pages="e43958", keywords="artificial intelligence", keywords="requirements analysis", keywords="mixed-methods", keywords="data availability", keywords="qualitative research", abstract="Background: Legal, controlled, and regulated access to high-quality data from academic hospitals currently poses a barrier to the development and testing of new artificial intelligence (AI) algorithms. To overcome this barrier, the German Federal Ministry of Health supports the ``pAItient'' (Protected Artificial Intelligence Innovation Environment for Patient Oriented Digital Health Solutions for developing, testing and evidence-based evaluation of clinical value) project, with the goal to establish an AI Innovation Environment at the Heidelberg University Hospital, Germany. It is designed as a proof-of-concept extension to the preexisting Medical Data Integration Center. Objective: The first part of the pAItient project aims to explore stakeholders' requirements for developing AI in partnership with an academic hospital and granting AI experts access to anonymized personal health data. Methods: We designed a multistep mixed methods approach. First, researchers and employees from stakeholder organizations were invited to participate in semistructured interviews. In the following step, questionnaires were developed based on the participants' answers and distributed among the stakeholders' organizations. In addition, patients and physicians were interviewed. Results: The identified requirements covered a wide range and were conflicting sometimes. Relevant patient requirements included adequate provision of necessary information for data use, clear medical objective of the research and development activities, trustworthiness of the organization collecting the patient data, and data should not be reidentifiable. Requirements of AI researchers and developers encompassed contact with clinical users, an acceptable user interface (UI) for shared data platforms, stable connection to the planned infrastructure, relevant use cases, and assistance in dealing with data privacy regulations. In a next step, a requirements model was developed, which depicts the identified requirements in different layers. This developed model will be used to communicate stakeholder requirements within the pAItient project consortium. Conclusions: The study led to the identification of necessary requirements for the development, testing, and validation of AI applications within a hospital-based generic infrastructure. A requirements model was developed, which will inform the next steps in the development of an AI innovation environment at our institution. Results from our study replicate previous findings from other contexts and will add to the emerging discussion on the use of routine medical data for the development of AI applications. International Registered Report Identifier (IRRID): RR2-10.2196/42208 ", doi="10.2196/43958", url="https://formative.jmir.org/2023/1/e43958", url="http://www.ncbi.nlm.nih.gov/pubmed/37071450" } @Article{info:doi/10.2196/45268, author="Costello, Jeremy and Kaur, Manpreet and Reformat, Z. Marek and Bolduc, V. Francois", title="Leveraging Knowledge Graphs and Natural Language Processing for Automated Web Resource Labeling and Knowledge Mobilization in Neurodevelopmental Disorders: Development and Usability Study", journal="J Med Internet Res", year="2023", month="Apr", day="17", volume="25", pages="e45268", keywords="knowledge graph", keywords="natural language processing", keywords="neurodevelopmental disorders", keywords="autism spectrum disorder", keywords="intellectual disability", keywords="attention deficit hyperactivity disorder", keywords="named entity recognition", keywords="topic modeling", keywords="aggregation operator", abstract="Background: Patients and families need to be provided with trusted information more than ever with the abundance of online information. Several organizations aim to build databases that can be searched based on the needs of target groups. One such group is individuals with neurodevelopmental disorders (NDDs) and their families. NDDs affect up to 18\% of the population and have major social and economic impacts. The current limitations in communicating information for individuals with NDDs include the absence of shared terminology and the lack of efficient labeling processes for web resources. Because of these limitations, health professionals, support groups, and families are unable to share, combine, and access resources. Objective: We aimed to develop a natural language--based pipeline to label resources by leveraging standard and free-text vocabularies obtained through text analysis, and then represent those resources as a weighted knowledge graph. Methods: Using a combination of experts and service/organization databases, we created a data set of web resources for NDDs. Text from these websites was scraped and collected into a corpus of textual data on NDDs. This corpus was used to construct a knowledge graph suitable for use by both experts and nonexperts. Named entity recognition, topic modeling, document classification, and location detection were used to extract knowledge from the corpus. Results: We developed a resource annotation pipeline using diverse natural language processing algorithms to annotate web resources and stored them in a structured knowledge graph. The graph contained 78,181 annotations obtained from the combination of standard terminologies and a free-text vocabulary obtained using topic modeling. An application of the constructed knowledge graph is a resource search interface using the ordered weighted averaging operator to rank resources based on a user query. Conclusions: We developed an automated labeling pipeline for web resources on NDDs. This work showcases how artificial intelligence--based methods, such as natural language processing and knowledge graphs for information representation, can enhance knowledge extraction and mobilization, and could be used in other fields of medicine. ", doi="10.2196/45268", url="https://www.jmir.org/2023/1/e45268", url="http://www.ncbi.nlm.nih.gov/pubmed/37067865" } @Article{info:doi/10.2196/43960, author="Rui, Angela and Garabedian, M. Pamela and Marceau, Marlika and Syrowatka, Ania and Volk, A. Lynn and Edrees, H. Heba and Seger, L. Diane and Amato, G. Mary and Cambre, Jacob and Dulgarian, Sevan and Newmark, P. Lisa and Nanji, C. Karen and Schultz, Petra and Jackson, Purcell Gretchen and Rozenblum, Ronen and Bates, W. David", title="Performance of a Web-Based Reference Database With Natural Language Searching Capabilities: Usability Evaluation of DynaMed and Micromedex With Watson", journal="JMIR Hum Factors", year="2023", month="Apr", day="17", volume="10", pages="e43960", keywords="medication safety", keywords="patient safety", keywords="usability", keywords="searching behavior", keywords="efficiency", keywords="quality of care", keywords="web-based databases", keywords="point-of-care information", keywords="POCI", keywords="point-of-care tools", keywords="artificial intelligence", keywords="machine learning", keywords="clinical decision support", keywords="natural language processing", abstract="Background: Evidence-based point-of-care information (POCI) tools can facilitate patient safety and care by helping clinicians to answer disease state and drug information questions in less time and with less effort. However, these tools may also be visually challenging to navigate or lack the comprehensiveness needed to sufficiently address a medical issue. Objective: This study aimed to collect clinicians' feedback and directly observe their use of the combined POCI tool DynaMed and Micromedex with Watson, now known as DynaMedex. EBSCO partnered with IBM Watson Health, now known as Merative, to develop the combined tool as a resource for clinicians. We aimed to identify areas for refinement based on participant feedback and examine participant perceptions to inform further development. Methods: Participants (N=43) within varying clinical roles and specialties were recruited from Brigham and Women's Hospital and Massachusetts General Hospital in Boston, Massachusetts, United States, between August 10, 2021, and December 16, 2021, to take part in usability sessions aimed at evaluating the efficiency and effectiveness of, as well as satisfaction with, the DynaMed and Micromedex with Watson tool. Usability testing methods, including think aloud and observations of user behavior, were used to identify challenges regarding the combined tool. Data collection included measurements of time on task; task ease; satisfaction with the answer; posttest feedback on likes, dislikes, and perceived reliability of the tool; and interest in recommending the tool to a colleague. Results: On a 7-point Likert scale, pharmacists rated ease (mean 5.98, SD 1.38) and satisfaction (mean 6.31, SD 1.34) with the combined POCI tool higher than the physicians, nurse practitioner, and physician's assistants (ease: mean 5.57, SD 1.64, and satisfaction: mean 5.82, SD 1.60). Pharmacists spent longer (mean 2 minutes, 26 seconds, SD 1 minute, 41 seconds) on average finding an answer to their question than the physicians, nurse practitioner, and physician's assistants (mean 1 minute, 40 seconds, SD 1 minute, 23 seconds). Conclusions: Overall, the tool performed well, but this usability evaluation identified multiple opportunities for improvement that would help inexperienced users. ", doi="10.2196/43960", url="https://humanfactors.jmir.org/2023/1/e43960", url="http://www.ncbi.nlm.nih.gov/pubmed/37067858" } @Article{info:doi/10.2196/43682, author="Gilbert, Stephen and Anderson, Stuart and Daumer, Martin and Li, Phoebe and Melvin, Tom and Williams, Robin", title="Learning From Experience and Finding the Right Balance in the Governance of Artificial Intelligence and Digital Health Technologies", journal="J Med Internet Res", year="2023", month="Apr", day="14", volume="25", pages="e43682", keywords="artificial intelligence", keywords="machine learning", keywords="regulation", keywords="algorithm change protocol", keywords="health care", keywords="regulatory framework", keywords="medical tool", keywords="tool", keywords="patient", keywords="intervention", keywords="safety", keywords="performance", keywords="technology", keywords="implementation", doi="10.2196/43682", url="https://www.jmir.org/2023/1/e43682", url="http://www.ncbi.nlm.nih.gov/pubmed/37058329" } @Article{info:doi/10.2196/38159, author="Gamble, Eoin and Linehan, Conor and Heavin, Ciara", title="Establishing Requirements for Technology to Support Clinical Trial Retention: Systematic Scoping Review and Analysis Using Self-determination Theory", journal="J Med Internet Res", year="2023", month="Apr", day="13", volume="25", pages="e38159", keywords="clinical trial", keywords="clinical research", keywords="retention strategies", keywords="participant retention", keywords="technology strategy", keywords="decentralized clinical trial", keywords="participant motivation", keywords="patient centric", keywords="engagement strategies", keywords="self-determination theory", abstract="Background: Retaining participants in clinical trials is an established challenge. Currently, the industry is moving to a technology-mediated, decentralized model for running trials. The shift presents an opportunity for technology design to aid the participant experience and promote retention; however, there are many open questions regarding how this can be best supported. We advocate the adoption of a stronger theoretical position to improve the quality of design decisions for clinical trial technology to promote participant engagement. Objective: This study aimed to identify and analyze the types of retention strategies used in published clinical trials that successfully retain participants. Methods: A systematic scoping review was carried out on 6 electronic databases for articles published from 1990 to September 2020, namely CINAHL, The Cochrane Library, EBSCO, Embase, PsycINFO, and PubMed, using the concepts ``retention,'' ``strategy,'' ``clinal trial,'' and ``clinical research.'' This was followed by an analysis of the included articles through the lens of self-determination theory, an evidence-based theory of human motivation. Results: A total of 26 articles were included in this review. The motivational strategies identified in the clinical trials in our sample were categorized into 8 themes: autonomy; competence; relatedness; controlled motivation; branding, communication material, and marketing literature; contact, tracking, and scheduling methods and data collection; convenience to contribute to data collection; and organizational competence. The trials used a wide range of motivational strategies. Notably, the trials often relied on controlled motivation interventions and underused strategies to support intrinsic motivation. Moreover, traditional clinical trials relied heavily on human interaction and ``relatedness'' to support motivation and retention, which may cause problems in the move to technology-led decentralized trials. We found inconsistency in the data-reporting methods and that motivational theory--based approaches were not evident in strategy design. Conclusions: This study offers direction and a framework to guide digital technology design decisions for future decentralized clinical trials to enhance participant retention during clinical trials. This research defines previous clinical trial retention strategies in terms of participant motivation, identifies motivational strategies, and offers a rationale for selecting strategies that will improve retention. It emphasizes the benefits of using theoretical frameworks to analyze strategic approaches and aid decision-making to improve the quality of technology design decisions. ", doi="10.2196/38159", url="https://www.jmir.org/2023/1/e38159", url="http://www.ncbi.nlm.nih.gov/pubmed/37052985" } @Article{info:doi/10.2196/38941, author="Cheng, Lucille and Senathirajah, Yalini", title="Using Clinical Data Visualizations in Electronic Health Record User Interfaces to Enhance Medical Student Diagnostic Reasoning: Randomized Experiment", journal="JMIR Hum Factors", year="2023", month="Apr", day="13", volume="10", pages="e38941", keywords="electronic health record", keywords="EHR", keywords="System-1--type diagnostic reasoning", keywords="type-1 reasoning", keywords="diagnostic", keywords="diagnosis", keywords="user interface", keywords="user design", keywords="heuristics", keywords="medical education", keywords="clinical reasoning", keywords="reasoning process", keywords="data visualization", keywords="hGraph", keywords="cognitive burden", keywords="cognitive load", keywords="medical student", keywords="medical school", abstract="Background: In medicine, the clinical decision-making process can be described using the dual-process theory consisting of the fast, intuitive ``System 1,'' commonly seen in seasoned physicians, and the slow, deliberative ``System 2,'' associated with medical students. System-1---type diagnostic reasoning is thought to be less cognitively burdensome, thereby reducing physician error. To date, limited literature exists on inducing System-1--type diagnosis in medical students through cognitive heuristics, particularly while using modern electronic health record (EHR) interfaces. Objective: In this experimental pilot study, we aimed to (1) attempt to induce System-1---type diagnostic reasoning in inexperienced medical students through the acquisition of cognitive user interface heuristics and (2) understand the impact of clinical patient data visualizations on students' cognitive load and medical education. Methods: The participants were third- and fourth-year medical students recruited from the University of Pittsburgh School of Medicine who had completed 1+ clinical rotations. The students were presented 8 patient cases on a novel EHR, featuring a prominent data visualization designed to foster at-a-glance rapid case assessment, and asked to diagnose the patient. Half of the participants were shown 4 of the 8 cases repeatedly, up to 4 times with 30 seconds per case (Group A), and the other half of the participants were shown cases twice with 2 minutes per case (Group B). All participants were then asked to provide full diagnoses of all 8 cases. Finally, the participants were asked to evaluate and elaborate on their experience with the system; content analysis was subsequently performed on these user experience interviews. Results: A total of 15 students participated. The participants in Group A scored slightly higher on average than those in Group B, with a mean percentage correct of 76\% (95\% CI 0.68-0.84) versus 69\% (95\% CI 0.58-0.80), and spent on average 50\% less time per question than Group B diagnosing patients (13.98 seconds vs 19.13 seconds, P=.03, respectively). When comparing the novel EHR design to previously used EHRs, 73\% (n=11) of participants rated the new version on par or higher (3+/5). Ease of use and intuitiveness of this new system rated similarly high (mean score 3.73/5 and 4.2/5, respectively). In qualitative thematic analysis of poststudy interviews, most participants (n=11, 73\%) spoke to ``pattern-recognition'' cognitive heuristic strategies consistent with System 1 decision-making. Conclusions: These results support the possibility of inducing type-1 diagnostics in learners and the potential for data visualization and user design heuristics to reduce cognitive burden in clinical settings. Clinical data presentation in the diagnostic reasoning process is ripe for innovation, and further research is needed to explore the benefit of using such visualizations in medical education. ", doi="10.2196/38941", url="https://humanfactors.jmir.org/2023/1/e38941", url="http://www.ncbi.nlm.nih.gov/pubmed/37053000" } @Article{info:doi/10.2196/41223, author="Feldman, Jonah and Goodman, Adam and Hochman, Katherine and Chakravartty, Eesha and Austrian, Jonathan and Iturrate, Eduardo and Bosworth, Brian and Saxena, Archana and Moussa, Marwa and Chenouda, Dina and Volpicelli, Frank and Adler, Nicole and Weisstuch, Joseph and Testa, Paul", title="Novel Note Templates to Enhance Signal and Reduce Noise in Medical Documentation: Prospective Improvement Study", journal="JMIR Form Res", year="2023", month="Apr", day="12", volume="7", pages="e41223", keywords="medical informatics", keywords="decision support", keywords="hospital data", keywords="clinical documentation", keywords="clinical informatics", abstract="Background: The introduction of electronic workflows has allowed for the flow of raw uncontextualized clinical data into medical documentation. As a result, many electronic notes have become replete of ``noise'' and deplete clinically significant ``signals.'' There is an urgent need to develop and implement innovative approaches in electronic clinical documentation that improve note quality and reduce unnecessary bloating. Objective: This study aims to describe the development and impact of a novel set of templates designed to change the flow of information in medical documentation. Methods: This is a multihospital nonrandomized prospective improvement study conducted on the inpatient general internal medicine service across 3 hospital campuses at the New York University Langone Health System. A group of physician leaders representing each campus met biweekly for 6 months. The output of these meetings included (1) a conceptualization of the note bloat problem as a dysfunction in information flow, (2) a set of guiding principles for organizational documentation improvement, (3) the design and build of novel electronic templates that reduced the flow of extraneous information into provider notes by providing link outs to best practice data visualizations, and (4) a documentation improvement curriculum for inpatient medicine providers. Prior to go-live, pragmatic usability testing was performed with the new progress note template, and the overall user experience was measured using the System Usability Scale (SUS). Primary outcome measures after go-live include template utilization rate and note length in characters. Results: In usability testing among 22 medicine providers, the new progress note template averaged a usability score of 90.6 out of 100 on the SUS. A total of 77\% (17/22) of providers strongly agreed that the new template was easy to use, and 64\% (14/22) strongly agreed that they would like to use the template frequently. In the 3 months after template implementation, general internal medicine providers wrote 67\% (51,431/76,647) of all inpatient notes with the new templates. During this period, the organization saw a 46\% (2768/6191), 47\% (3505/7819), and 32\% (3427/11,226) reduction in note length for general medicine progress notes, consults, and history and physical notes, respectively, when compared to a baseline measurement period prior to interventions. Conclusions: A bundled intervention that included the deployment of novel templates for inpatient general medicine providers significantly reduced average note length on the clinical service. Templates designed to reduce the flow of extraneous information into provider notes performed well during usability testing, and these templates were rapidly adopted across all hospital campuses. Further research is needed to assess the impact of novel templates on note quality, provider efficiency, and patient outcomes. ", doi="10.2196/41223", url="https://formative.jmir.org/2023/1/e41223", url="http://www.ncbi.nlm.nih.gov/pubmed/36821760" } @Article{info:doi/10.2196/43815, author="Wang, Jun and Chen, Hongmei and Wang, Houwei and Liu, Weichu and Peng, Daomei and Zhao, Qinghua and Xiao, Mingzhao", title="A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study", journal="J Med Internet Res", year="2023", month="Apr", day="6", volume="25", pages="e43815", keywords="physical restraint", keywords="prediction model", keywords="machine learning", keywords="stacking ensemble model", keywords="model", keywords="older adults", keywords="elderly", keywords="risk factor", keywords="learning model", keywords="development", keywords="support", keywords="accuracy", keywords="precision", keywords="cognitive impairment", keywords="utility", keywords="management", abstract="Background: Numerous studies have identified risk factors for physical restraint (PR) use in older adults in long-term care facilities. Nevertheless, there is a lack of predictive tools to identify high-risk individuals. Objective: We aimed to develop machine learning (ML)--based models to predict the risk of PR in older adults. Methods: This study conducted a cross-sectional secondary data analysis based on 1026 older adults from 6 long-term care facilities in Chongqing, China, from July 2019 to November 2019. The primary outcome was the use of PR (yes or no), identified by 2 collectors' direct observation. A total of 15 candidate predictors (older adults' demographic and clinical factors) that could be commonly and easily collected from clinical practice were used to build 9 independent ML models: Gaussian Na{\"i}ve Bayesian (GNB), k-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), and light gradient boosting machine (Lightgbm), as well as stacking ensemble ML. Performance was evaluated using accuracy, precision, recall, an F score, a comprehensive evaluation indicator (CEI) weighed by the above indicators, and the area under the receiver operating characteristic curve (AUC). A net benefit approach using the decision curve analysis (DCA) was performed to evaluate the clinical utility of the best model. Models were tested via 10-fold cross-validation. Feature importance was interpreted using Shapley Additive Explanations (SHAP). Results: A total of 1026 older adults (mean 83.5, SD 7.6 years; n=586, 57.1\% male older adults) and 265 restrained older adults were included in the study. All ML models performed well, with an AUC above 0.905 and an F score above 0.900. The 2 best independent models are RF (AUC 0.938, 95\% CI 0.914-0.947) and SVM (AUC 0.949, 95\% CI 0.911-0.953). The DCA demonstrated that the RF model displayed better clinical utility than other models. The stacking model combined with SVM, RF, and MLP performed best with AUC (0.950) and CEI (0.943) values, as well as the DCA curve indicated the best clinical utility. The SHAP plots demonstrated that the significant contributors to model performance were related to cognitive impairment, care dependency, mobility decline, physical agitation, and an indwelling tube. Conclusions: The RF and stacking models had high performance and clinical utility. ML prediction models for predicting the probability of PR in older adults could offer clinical screening and decision support, which could help medical staff in the early identification and PR management of older adults. ", doi="10.2196/43815", url="https://www.jmir.org/2023/1/e43815", url="http://www.ncbi.nlm.nih.gov/pubmed/37023416" } @Article{info:doi/10.2196/43386, author="Khan, Ullah Waqas and Seto, Emily", title="A ``Do No Harm'' Novel Safety Checklist and Research Approach to Determine Whether to Launch an Artificial Intelligence--Based Medical Technology: Introducing the Biological-Psychological, Economic, and Social (BPES) Framework", journal="J Med Internet Res", year="2023", month="Apr", day="5", volume="25", pages="e43386", keywords="artificial intelligence", keywords="AI", keywords="safety checklist", keywords="Do No Harm", keywords="biological-psychological factors", keywords="economic factors", keywords="social factors", keywords="AI medical hardware devices", keywords="AI medical mobile apps", keywords="AI medical software programs", doi="10.2196/43386", url="https://www.jmir.org/2023/1/e43386", url="http://www.ncbi.nlm.nih.gov/pubmed/37018019" } @Article{info:doi/10.2196/44248, author="Jan, Zainab and El Assadi, Farah and Abd-alrazaq, Alaa and Jithesh, Veettil Puthen", title="Artificial Intelligence for the Prediction and Early Diagnosis of Pancreatic Cancer: Scoping Review", journal="J Med Internet Res", year="2023", month="Mar", day="31", volume="25", pages="e44248", keywords="artificial Intelligence", keywords="pancreatic cancer", keywords="diagnosis", keywords="diagnostic", keywords="prediction", keywords="machine learning", keywords="deep learning", keywords="scoping", keywords="review method", keywords="predict", keywords="cancer", keywords="oncology", keywords="pancreatic", keywords="algorithm", abstract="Background: Pancreatic cancer is the 12th most common cancer worldwide, with an overall survival rate of 4.9\%. Early diagnosis of pancreatic cancer is essential for timely treatment and survival. Artificial intelligence (AI) provides advanced models and algorithms for better diagnosis of pancreatic cancer. Objective: This study aims to explore AI models used for the prediction and early diagnosis of pancreatic cancers as reported in the literature. Methods: A scoping review was conducted and reported in line with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. PubMed, Google Scholar, Science Direct, BioRXiv, and MedRxiv were explored to identify relevant articles. Study selection and data extraction were independently conducted by 2 reviewers. Data extracted from the included studies were synthesized narratively. Results: Of the 1185 publications, 30 studies were included in the scoping review. The included articles reported the use of AI for 6 different purposes. Of these included articles, AI techniques were mostly used for the diagnosis of pancreatic cancer (14/30, 47\%). Radiological images (14/30, 47\%) were the most frequently used data in the included articles. Most of the included articles used data sets with a size of <1000 samples (11/30, 37\%). Deep learning models were the most prominent branch of AI used for pancreatic cancer diagnosis in the studies, and the convolutional neural network was the most used algorithm (18/30, 60\%). Six validation approaches were used in the included studies, of which the most frequently used approaches were k-fold cross-validation (10/30, 33\%) and external validation (10/30, 33\%). A higher level of accuracy (99\%) was found in studies that used support vector machine, decision trees, and k-means clustering algorithms. Conclusions: This review presents an overview of studies based on AI models and algorithms used to predict and diagnose pancreatic cancer patients. AI is expected to play a vital role in advancing pancreatic cancer prediction and diagnosis. Further research is required to provide data that support clinical decisions in health care. ", doi="10.2196/44248", url="https://www.jmir.org/2023/1/e44248", url="http://www.ncbi.nlm.nih.gov/pubmed/37000507" } @Article{info:doi/10.2196/44765, author="Lee, Mauricette and Bin Mahmood, Shakran Abu Bakar and Lee, Sing Eng and Smith, Elizabeth Helen and Tudor Car, Lorainne", title="Smartphone and Mobile App Use Among Physicians in Clinical Practice: Scoping Review", journal="JMIR Mhealth Uhealth", year="2023", month="Mar", day="31", volume="11", pages="e44765", keywords="evidence-based medicine", keywords="specialist", keywords="general practitioners", keywords="GP", keywords="primary care physicians", keywords="mobile apps", keywords="consultants", keywords="surgeons", keywords="pediatricians", keywords="clinical care", keywords="mobile phone", abstract="Background: Health care professionals are increasingly using smartphones in clinical care. Smartphone use can affect patient quality of care and clinical outcomes. Objective: This scoping review aimed to describe how physicians use smartphones and mobile apps in clinical settings. Methods: We conducted a scoping review using the Joanna Briggs Institute methodology and reported the results according to PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. We used the following databases in our literature search: MEDLINE, Embase, Cochrane Library, Web of Science, Google Scholar, and gray literature for studies published since 2010. An additional search was also performed by scanning the reference lists of included studies. A narrative synthesis approach was used. Results: A total of 10 studies, published between 2016 and 2021, were included in this review. Of these studies, 8 used surveys and 2 used surveys with focus group study designs to explore smartphone use, its adoption, experience of using it, and views on the use of smartphones among physicians. There were studies with only general practitioners (n=3), studies with only specialists (n=3), and studies with both general practitioners and specialists (n=4). Physicians use smartphones and mobile apps for communication (n=9), clinical decision-making (n=7), drug compendium (n=7), medical education and training (n=7), maintaining health records (n=4), managing time (n=4), and monitoring patients (n=2) in clinical practice. The Medscape medical app was frequently used for information gathering. WhatsApp, a nonmedical app, was commonly used for physician-patient communication. The commonly reported barriers were lack of regulatory oversight, privacy concerns, and limited Wi-Fi or internet access. The commonly reported facilitator was convenience and having access to evidence-based medicine, clinical decision-making support, and a wide array of apps. Conclusions: Smartphones and mobile apps were used for communication, medical education and training, clinical decision-making, and drug compendia in most studies. Although the benefits of smartphones and mobile apps for physicians at work were promising, there were concerns about patient privacy and confidentiality. Legislation is urgently needed to protect the liability of health care professionals using smartphones. ", doi="10.2196/44765", url="https://mhealth.jmir.org/2023/1/e44765", url="http://www.ncbi.nlm.nih.gov/pubmed/37000498" } @Article{info:doi/10.2196/42452, author="Li, Jiang and Xi, Fengchan and Yu, Wenkui and Sun, Chuanrui and Wang, Xiling", title="Real-Time Prediction of Sepsis in Critical Trauma Patients: Machine Learning--Based Modeling Study", journal="JMIR Form Res", year="2023", month="Mar", day="31", volume="7", pages="e42452", keywords="sepsis", keywords="trauma", keywords="intensive care unit", keywords="machine learning", keywords="real-time prediction", abstract="Background: Sepsis is a leading cause of death in patients with trauma, and the risk of mortality increases significantly for each hour of delay in treatment. A hypermetabolic baseline and explosive inflammatory immune response mask clinical signs and symptoms of sepsis in trauma patients, making early diagnosis of sepsis more challenging. Machine learning--based predictive modeling has shown great promise in evaluating and predicting sepsis risk in the general intensive care unit (ICU) setting, but there has been no sepsis prediction model specifically developed for trauma patients so far. Objective: To develop a machine learning model to predict the risk of sepsis at an hourly scale among ICU-admitted trauma patients. Methods: We extracted data from adult trauma patients admitted to the ICU at Beth Israel Deaconess Medical Center between 2008 and 2019. A total of 42 raw variables were collected, including demographics, vital signs, arterial blood gas, and laboratory tests. We further derived a total of 485 features, including measurement pattern features, scoring features, and time-series variables, from the raw variables by feature engineering. The data set was randomly split into 70\% for model development with stratified 5-fold cross-validation, 15\% for calibration, and 15\% for testing. An Extreme Gradient Boosting (XGBoost) model was developed to predict the hourly risk of sepsis at prediction windows of 4, 6, 8, 12, and 24 hours. We evaluated model performance for discrimination and calibration both at time-step and outcome levels. Clinical applicability of the model was evaluated with varying levels of precision, and the potential clinical net benefit was assessed with decision curve analysis (DCA). A Shapley additive explanation algorithm was applied to show the effect of features on the prediction model. In addition, we trained an L2-regularized logistic regression model to compare its performance with XGBoost. Results: We included 4603 trauma patients in the study, 1196 (26\%) of whom developed sepsis. The XGBoost model achieved an area under the receiver operating characteristics curve (AUROC) ranging from 0.83 to 0.88 at the 4-to-24-hour prediction window in the test set. With a ratio of 9 false alerts for every true alert, it predicted 73\% (386/529) of sepsis-positive timesteps and 91\% (163/179) of sepsis events in the subsequent 6 hours. The DCA showed our model had a positive net benefit in the threshold probability range of 0 to 0.6. In comparison, the logistic regression model achieved lower performance, with AUROC ranging from 0.76 to 0.84 at the 4-to-24-hour prediction window. Conclusions: The machine learning--based model had good discrimination and calibration performance for sepsis prediction in critical trauma patients. Using the model in clinical practice might help to identify patients at risk of sepsis in a time window that enables personalized intervention and early treatment. ", doi="10.2196/42452", url="https://formative.jmir.org/2023/1/e42452", url="http://www.ncbi.nlm.nih.gov/pubmed/37000488" } @Article{info:doi/10.2196/43269, author="Nyberg, Andr{\'e} and Sondell, Anna and Lundell, Sara and Marklund, Sarah and Tistad, Malin and Wadell, Karin", title="Experiences of Using an Electronic Health Tool Among Health Care Professionals Involved in Chronic Obstructive Pulmonary Disease Management: Qualitative Analysis", journal="JMIR Hum Factors", year="2023", month="Mar", day="30", volume="10", pages="e43269", keywords="COPD", keywords="eHealth", keywords="internet", keywords="web based", keywords="health care professionals", keywords="primary care", keywords="pulmonary", keywords="management", keywords="tools", keywords="chronic", keywords="clinical", keywords="support", keywords="care", keywords="electronic", keywords="implementation", abstract="Background: Chronic obstructive pulmonary disease (COPD) is one of the most common and deadliest chronic diseases of the 21st century. eHealth tools are seen as a promising way of supporting health care professionals in providing evidence-based COPD care, for example, by reinforcing information and interventions provided to the patients and providing easier access and support to the health care professional themselves. Still, knowledge is scarce on the experience of using eHealth tools from the perspective of the health care professional involved in COPD management. Objective: The study explored the experiences of using an eHealth tool among health care professionals that worked with patients with COPD in their daily clinical practice. Methods: This exploratory qualitative study is part of a process evaluation in a parallel group, controlled, pragmatic pilot trial. Semistructured interviews were performed with 10 health care professionals 3 and 12 months after getting access to an eHealth tool, the COPD Web. The COPD Web, developed using cocreation, is an interactive web-based platform that aims to help health care professionals provide health-promoting strategies. Data from the interviews were analyzed using qualitative content analysis with an inductive approach. Results: The main results reflected health care professionals' experiences in 3 categories: receiving competence support and adjusting practice, improving quality of care, and efforts required for implementation. These categories highlighted that using an eHealth tool such as the COPD Web was experienced to provide knowledge support for health care professionals that led to adaptation and facilitation of working procedures and person-centered care. Taken together, these changes were perceived to improve the quality of care through enhanced patient contact and encouragement of interprofessional collaboration. In addition, health care professionals expressed that patients using the COPD Web were better equipped to tackle their disease and adhered better to provided treatment, increasing their self-management ability. However, structural and external barriers bar the successful implementation of an eHealth tool in daily praxis. Conclusions: This study is among the first to explore experiences of using an eHealth tool among health care professionals involved in COPD management. Our novel findings highlight that using an eHealth tool such as the COPD Web may improve the quality of care for patients with COPD (eg, by providing knowledge support for health care professionals and adapting and facilitating working procedures). Our results also indicate that an eHealth tool fosters collaborative interactions between patients and health care professionals, which explains why eHealth is a valuable means of encouraging well-informed and autonomous patients. However, structural and external barriers requiring time, support, and education must be addressed to ensure that an eHealth tool can be successfully implemented in daily praxis. Trial Registration: ClinicalTrials.gov NCT02696187; https://clinicaltrials.gov/ct2/show/NCT02696187 ", doi="10.2196/43269", url="https://humanfactors.jmir.org/2023/1/e43269", url="http://www.ncbi.nlm.nih.gov/pubmed/36995743" } @Article{info:doi/10.2196/43277, author="Mason, A. Joseph and Friedman, E. Eleanor and Rojas, C. Juan and Ridgway, P. Jessica", title="No-show Prediction Model Performance Among People With HIV: External Validation Study", journal="J Med Internet Res", year="2023", month="Mar", day="29", volume="25", pages="e43277", keywords="no-show", keywords="prediction model", keywords="Epic systems", keywords="people with HIV", keywords="human immunodeficiency virus", keywords="electronic medical record", keywords="external validation", keywords="technology", keywords="model", keywords="care", keywords="patient", keywords="HIV", abstract="Background: Regular medical care is important for people living with HIV. A no-show predictive model among people with HIV could improve clinical care by allowing providers to proactively engage patients at high risk of missing appointments. Epic, a major provider of electronic medical record systems, created a model that predicts a patient's probability of being a no-show for an outpatient health care appointment; however, this model has not been externally validated in people with HIV. Objective: We examined the performance of Epic's no-show model among people with HIV at an academic medical center and assessed whether the performance was impacted by the addition of demographic and HIV clinical information. Methods: We obtained encounter data from all in-person appointments among people with HIV from January 21 to March 30, 2022, at the University of Chicago Medicine. We compared the predicted no-show probability at the time of the encounter to the actual outcome of these appointments. We also examined the performance of the Epic model among people with HIV for only HIV care appointments in the infectious diseases department. We further compared the no-show model among people with HIV for HIV care appointments to an alternate random forest model we created using a subset of seven readily accessible features used in the Epic model and four additional features related to HIV clinical care or demographics. Results: We identified 674 people with HIV who contributed 1406 total scheduled in-person appointments during the study period. Of those, we identified 331 people with HIV who contributed 440 HIV care appointments. The performance of the Epic model among people with HIV for all appointments in any outpatient clinic had an area under the receiver operating characteristic curve (AUC) of 0.65 (95\% CI 0.63-0.66) and for only HIV care appointments had an AUC of 0.63 (95\% CI 0.59-0.67). The alternate model we created for people with HIV attending HIV care appointments had an AUC of 0.78 (95\% CI 0.75-0.82), a significant improvement over the Epic model restricted to HIV care appointments (P<.001). Features identified as important in the alternate model included lead time, appointment length, HIV viral load >200 copies per mL, lower CD4 T cell counts (both 50 to <200 cells/mm3 and 200 to <350 cells/mm3), and female sex. Conclusions: For both models among people with HIV, performance was significantly lower than reported by Epic. The improvement in the performance of the alternate model over the proprietary Epic model demonstrates that, among people with HIV, the inclusion of demographic information may enhance the prediction of appointment attendance. The alternate model further reveals that the prediction of appointment attendance in people with HIV can be improved by using HIV clinical information such as CD4 count and HIV viral load test results as features in the model. ", doi="10.2196/43277", url="https://www.jmir.org/2023/1/e43277", url="http://www.ncbi.nlm.nih.gov/pubmed/36989038" } @Article{info:doi/10.2196/39972, author="Lee, Tsung-Ju Leon and Yang, Hsuan-Chia and Nguyen, Anh Phung and Muhtar, Solihuddin Muhammad and Li, Jack Yu-Chuan", title="Machine Learning Approaches for Predicting Psoriatic Arthritis Risk Using Electronic Medical Records: Population-Based Study", journal="J Med Internet Res", year="2023", month="Mar", day="28", volume="25", pages="e39972", keywords="convolutional neural network", keywords="deep learning, machine learning", keywords="prediction model", keywords="psoriasis", keywords="psoriatic arthritis", keywords="temporal phenomic map", keywords="electronic medical records", abstract="Background: Psoriasis (PsO) is a chronic, systemic, immune-mediated disease with multiorgan involvement. Psoriatic arthritis (PsA) is an inflammatory arthritis that is present in 6\%-42\% of patients with PsO. Approximately 15\% of patients with PsO have undiagnosed PsA. Predicting patients with a risk of PsA is crucial for providing them with early examination and treatment that can prevent irreversible disease progression and function loss. Objective: The aim of this study was to develop and validate a prediction model for PsA based on chronological large-scale and multidimensional electronic medical records using a machine learning algorithm. Methods: This case-control study used Taiwan's National Health Insurance Research Database from January 1, 1999, to December 31, 2013. The original data set was split into training and holdout data sets in an 80:20 ratio. A convolutional neural network was used to develop a prediction model. This model used 2.5-year diagnostic and medical records (inpatient and outpatient) with temporal-sequential information to predict the risk of PsA for a given patient within the next 6 months. The model was developed and cross-validated using the training data and was tested using the holdout data. An occlusion sensitivity analysis was performed to identify the important features of the model. Results: The prediction model included a total of 443 patients with PsA with earlier diagnosis of PsO and 1772 patients with PsO without PsA for the control group. The 6-month PsA risk prediction model that uses sequential diagnostic and drug prescription information as a temporal phenomic map yielded an area under the receiver operating characteristic curve of 0.70 (95\% CI 0.559-0.833), a mean sensitivity of 0.80 (SD 0.11), a mean specificity of 0.60 (SD 0.04), and a mean negative predictive value of 0.93 (SD 0.04). Conclusions: The findings of this study suggest that the risk prediction model can identify patients with PsO at a high risk of PsA. This model may help health care professionals to prioritize treatment for target high-risk populations and prevent irreversible disease progression and functional loss. ", doi="10.2196/39972", url="https://www.jmir.org/2023/1/e39972", url="http://www.ncbi.nlm.nih.gov/pubmed/36976633" } @Article{info:doi/10.2196/42683, author="Iacobelli, Francisco and Yang, Anna and Tom, Laura and Leung, S. Ivy and Crissman, John and Salgado, Rufino and Simon, Melissa", title="Predicting Social Determinants of Health in Patient Navigation: Case Study", journal="JMIR Form Res", year="2023", month="Mar", day="28", volume="7", pages="e42683", keywords="patient navigation", keywords="machine learning", keywords="social determinants of health", keywords="health care disparities", keywords="health equity", keywords="case study", abstract="Background: Patient navigation (PN) programs have demonstrated efficacy in improving health outcomes for marginalized populations across a range of clinical contexts by addressing barriers to health care, including social determinants of health (SDoHs). However, it can be challenging for navigators to identify SDoHs by asking patients directly because of many factors, including patients' reluctance to disclose information, communication barriers, and the variable resources and experience levels of patient navigators. Navigators could benefit from strategies that augment their ability to gather SDoH data. Machine learning can be leveraged as one of these strategies to identify SDoH-related barriers. This could further improve health outcomes, particularly in underserved populations. Objective: In this formative study, we explored novel machine learning--based approaches to predict SDoHs in 2 Chicago area PN studies. In the first approach, we applied machine learning to data that include comments and interaction details between patients and navigators, whereas the second approach augmented patients' demographic information. This paper presents the results of these experiments and provides recommendations for data collection and the application of machine learning techniques more generally to the problem of predicting SDoHs. Methods: We conducted 2 experiments to explore the feasibility of using machine learning to predict patients' SDoHs using data collected from PN research. The machine learning algorithms were trained on data collected from 2 Chicago area PN studies. In the first experiment, we compared several machine learning algorithms (logistic regression, random forest, support vector machine, artificial neural network, and Gaussian naive Bayes) to predict SDoHs from both patient demographics and navigator's encounter data over time. In the second experiment, we used multiclass classification with augmented information, such as transportation time to a hospital, to predict multiple SDoHs for each patient. Results: In the first experiment, the random forest classifier achieved the highest accuracy among the classifiers tested. The overall accuracy to predict SDoHs was 71.3\%. In the second experiment, multiclass classification effectively predicted a few patients' SDoHs based purely on demographic and augmented data. The best accuracy of these predictions overall was 73\%. However, both experiments yielded high variability in individual SDoH predictions and correlations that become salient among SDoHs. Conclusions: To our knowledge, this study is the first approach to applying PN encounter data and multiclass learning algorithms to predict SDoHs. The experiments discussed yielded valuable lessons, including the awareness of model limitations and bias, planning for standardization of data sources and measurement, and the need to identify and anticipate the intersectionality and clustering of SDoHs. Although our focus was on predicting patients' SDoHs, machine learning can have a broad range of applications in the field of PN, from tailoring intervention delivery (eg, supporting PN decision-making) to informing resource allocation for measurement, and PN supervision. ", doi="10.2196/42683", url="https://formative.jmir.org/2023/1/e42683", url="http://www.ncbi.nlm.nih.gov/pubmed/36976634" } @Article{info:doi/10.2196/43251, author="Chen, You and Clayton, Wright Ellen and Novak, Lovett Laurie and Anders, Shilo and Malin, Bradley", title="Human-Centered Design to Address Biases in Artificial Intelligence", journal="J Med Internet Res", year="2023", month="Mar", day="24", volume="25", pages="e43251", keywords="artificial intelligence", keywords="human-centered AI", keywords="biases", keywords="AI", keywords="care", keywords="biomedical", keywords="research", keywords="application", keywords="human-centered", keywords="development", keywords="design", keywords="patient", keywords="health", keywords="benefits", doi="10.2196/43251", url="https://www.jmir.org/2023/1/e43251", url="http://www.ncbi.nlm.nih.gov/pubmed/36961506" } @Article{info:doi/10.2196/38125, author="Puts, Sander and Nobel, Martijn and Zegers, Catharina and Bermejo, I{\~n}igo and Robben, Simon and Dekker, Andre", title="How Natural Language Processing Can Aid With Pulmonary Oncology Tumor Node Metastasis Staging From Free-Text Radiology Reports: Algorithm Development and Validation", journal="JMIR Form Res", year="2023", month="Mar", day="22", volume="7", pages="e38125", keywords="radiology", keywords="reporting", keywords="natural language processing", keywords="free text", keywords="classification system", keywords="oncology", keywords="pulmonary", keywords="clinical decision", keywords="clinical", abstract="Background: Natural language processing (NLP) is thought to be a promising solution to extract and store concepts from free text in a structured manner for data mining purposes. This is also true for radiology reports, which still consist mostly of free text. Accurate and complete reports are very important for clinical decision support, for instance, in oncological staging. As such, NLP can be a tool to structure the content of the radiology report, thereby increasing the report's value. Objective: This study describes the implementation and validation of an N-stage classifier for pulmonary oncology. It is based on free-text radiological chest computed tomography reports according to the tumor, node, and metastasis (TNM) classification, which has been added to the already existing T-stage classifier to create a combined TN-stage classifier. Methods: SpaCy, PyContextNLP, and regular expressions were used for proper information extraction, after additional rules were set to accurately extract N-stage. Results: The overall TN-stage classifier accuracy scores were 0.84 and 0.85, respectively, for the training (N=95) and validation (N=97) sets. This is comparable to the outcomes of the T-stage classifier (0.87-0.92). Conclusions: This study shows that NLP has potential in classifying pulmonary oncology from free-text radiological reports according to the TNM classification system as both the T- and N-stages can be extracted with high accuracy. ", doi="10.2196/38125", url="https://formative.jmir.org/2023/1/e38125", url="http://www.ncbi.nlm.nih.gov/pubmed/36947118" } @Article{info:doi/10.2196/39767, author="Sharma, Yashoda and Cheung, Lovisa and Patterson, K. Kara and Iaboni, Andrea", title="Factors Influencing the Clinical Adoption of Quantitative Gait Analysis Technologies for Adult Patient Populations With a Focus on Clinical Efficacy and Clinician Perspectives: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2023", month="Mar", day="22", volume="12", pages="e39767", keywords="quantitative gait analysis", keywords="clinical adoption", keywords="clinical efficacy", keywords="clinician perspectives", keywords="barriers", keywords="facilitators", keywords="adults", abstract="Background: Quantitative gait analysis can support clinical decision-making. These analyses can be performed using wearable sensors, nonwearable sensors, or a combination of both. However, to date, they have not been widely adopted in clinical practice. Technology adoption literature has highlighted the clinical efficacy of technology and the users' perspective on the technology (eg, ease of use and usefulness) as some factors that influence their widespread adoption. Objective: To assist with the clinical adoption of quantitative gait technologies, this scoping review will synthesize the literature on their clinical efficacy and clinician perspectives on their use in the clinical care of adult patient populations. Methods: This scoping review protocol follows the Joanna Briggs Institute methodology for scoping reviews. The review will include both peer-reviewed and gray literature (ie, conference abstracts) regarding the clinical efficacy of quantitative gait technologies and clinician perspectives on their use in the clinical care of adult patient populations. A comprehensive search strategy was created in MEDLINE (Ovid), which was then translated to 4 other databases: CENTRAL (Ovid), Embase (Ovid), CINAHL (EBSCO), and SPORTDiscus (EBSCO). The title and abstract screening, full-text review, and data extraction of relevant articles will be performed independently by 2 reviewers, with a third reviewer involved to support the resolution of conflicts. Data will be analyzed using content analysis and summarized in tabular and diagram formats. Results: A search of relevant articles will be conducted in all 5 databases, and through hand-searching in Google Scholar and PEDro, including articles published up until December 2022. The research team plans to submit the final scoping review for publication in a peer-reviewed journal in 2023. Conclusions: The findings of this review will be presented at clinical science conferences and published in a peer-reviewed journal. This review will inform future studies designed to develop, evaluate, or implement quantitative gait analysis technologies in clinical practice. International Registered Report Identifier (IRRID): DERR1-10.2196/39767 ", doi="10.2196/39767", url="https://www.researchprotocols.org/2023/1/e39767", url="http://www.ncbi.nlm.nih.gov/pubmed/36947120" } @Article{info:doi/10.2196/44666, author="Liu, Yuxuan and Lyu, Xiaoguang and Yang, Bo and Fang, Zhixiang and Hu, Dejun and Shi, Lei and Wu, Bisheng and Tian, Yong and Zhang, Enli and Yang, YuanChao", title="Early Triage of Critically Ill Adult Patients With Mushroom Poisoning: Machine Learning Approach", journal="JMIR Form Res", year="2023", month="Mar", day="21", volume="7", pages="e44666", keywords="mushroom poisoning", keywords="triage", keywords="model", keywords="machine learning", keywords="XGBoost", keywords="extreme gradient boosting", abstract="Background: Early triage of patients with mushroom poisoning is essential for administering precise treatment and reducing mortality. To our knowledge, there has been no established method to triage patients with mushroom poisoning based on clinical data. Objective: The purpose of this work was to construct a triage system to identify patients with mushroom poisoning based on clinical indicators using several machine learning approaches and to assess the prediction accuracy of these strategies. Methods: In all, 567 patients were collected from 5 primary care hospitals and facilities in Enshi, Hubei Province, China, and divided into 2 groups; 322 patients from 2 hospitals were used as the training cohort, and 245 patients from 3 hospitals were used as the test cohort. Four machine learning algorithms were used to construct the triage model for patients with mushroom poisoning. Performance was assessed using the area under the receiver operating characteristic curve (AUC), decision curve, sensitivity, specificity, and other representative statistics. Feature contributions were evaluated using Shapley additive explanations. Results: Among several machine learning algorithms, extreme gradient boosting (XGBoost) showed the best discriminative ability in 5-fold cross-validation (AUC=0.83, 95\% CI 0.77-0.90) and the test set (AUC=0.90, 95\% CI 0.83-0.96). In the test set, the XGBoost model had a sensitivity of 0.93 (95\% CI 0.81-0.99) and a specificity of 0.79 (95\% CI 0.73-0.85), whereas the physicians' assessment had a sensitivity of 0.86 (95\% CI 0.72-0.95) and a specificity of 0.66 (95\% CI 0.59-0.73). Conclusions: The 14-factor XGBoost model for the early triage of mushroom poisoning can rapidly and accurately identify critically ill patients and will possibly serve as an important basis for the selection of treatment options and referral of patients, potentially reducing patient mortality and improving clinical outcomes. ", doi="10.2196/44666", url="https://formative.jmir.org/2023/1/e44666", url="http://www.ncbi.nlm.nih.gov/pubmed/36943366" } @Article{info:doi/10.2196/41787, author="Coats, Heather and Shive, Nadia and Adrian, Bonnie and Boyd, D. Andrew and Doorenbos, Z. Ardith and Schmiege, J. Sarah", title="An Electronically Delivered Person-Centered Narrative Intervention for Persons Receiving Palliative Care: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2023", month="Mar", day="21", volume="12", pages="e41787", keywords="electronic health record", keywords="mixed methods", keywords="narrative", keywords="person-centered care", keywords="palliative care", abstract="Background: In the health care setting, electronic health records (EHRs) are one of the primary modes of communication about patients, but most of this information is clinician centered. There is a need to consider the patient as a person and integrate their perspectives into their health record. Incorporating a patient's narrative into the EHR provides an opportunity to communicate patients' cultural values and beliefs to the health care team and has the potential to improve patient-clinician communication. This paper describes the protocol to evaluate the integration of an adapted person-centered narrative intervention (PCNI). This adaptation builds on our previous research centered on the implementation of PCNIs. The adaptation for this study includes an all-electronic delivery of a PCNI in an outpatient clinical setting. Objective: This research protocol aims to evaluate the feasibility, usability, and effects of the all-electronic delivery of a PCNI in an outpatient setting on patient-reported outcomes. The first objective of this study is to identify the barriers and facilitators of an internet-based--delivered PCNI from the perspectives of persons living with serious illness and their clinicians. The second objective is to conduct acceptability, usability, and intervention fidelity testing to determine the essential requirements for the EHR integration of an internet-based--delivered PCNI. The third objective is to test the feasibility of the PCNI in an outpatient clinic setting. Methods: Using a mixed method design, this single-arm intervention feasibility study was delivered over approximately 3 to 4 weeks. Patient participant recruitment was conducted via screening outpatient palliative care clinic schedules weekly for upcoming new palliative care patient visits and then emailing potential patient participants to notify them about the study. The PCNI was delivered via email and Zoom app. Patient-reported outcome measures were completed by patient participants at baseline, 24 to 48 hours after PCNI, and after the initial palliative care clinic visit, approximately 1 month after baseline. Inclusion criteria included having the capacity to give consent and having an upcoming initial outpatient palliative care clinic visit. Results: The recruitment of participants began in April 2021. A total of 189 potential patient participants were approached via email, and 20 patient participants were enrolled, with data having been collected from May 2021 to September 2022. A total of 7 clinician participants were enrolled, with a total of 3 clinician exit interviews and 1 focus group (n=5), which was conducted in October 2022. Data analysis is expected to be completed by the end of June 2023. Conclusions: The findings from this study, combined with those from other PCNI studies conducted in acute care settings, have the potential to influence clinical practices and policies and provide innovative avenues to integrate more person-centered care delivery. International Registered Report Identifier (IRRID): DERR1-10.2196/41787 ", doi="10.2196/41787", url="https://www.researchprotocols.org/2023/1/e41787", url="http://www.ncbi.nlm.nih.gov/pubmed/36943346" } @Article{info:doi/10.2196/42639, author="Blease, Charlotte and Kharko, Anna and Bernstein, Michael and Bradley, Colin and Houston, Muiris and Walsh, Ian and D Mandl, Kenneth", title="Computerization of the Work of General Practitioners: Mixed Methods Survey of Final-Year Medical Students in Ireland", journal="JMIR Med Educ", year="2023", month="Mar", day="20", volume="9", pages="e42639", keywords="medical students", keywords="medical education", keywords="general practitioners", keywords="artificial intelligence", keywords="machine learning", keywords="digital health", keywords="technology", keywords="tool", keywords="medical professional", keywords="biomedical", keywords="design", keywords="survey", keywords="COVID-19", abstract="Background: The potential for digital health technologies, including machine learning (ML)--enabled tools, to disrupt the medical profession is the subject of ongoing debate within biomedical informatics. Objective: We aimed to describe the opinions of final-year medical students in Ireland regarding the potential of future technology to replace or work alongside general practitioners (GPs) in performing key tasks. Methods: Between March 2019 and April 2020, using a convenience sample, we conducted a mixed methods paper-based survey of final-year medical students. The survey was administered at 4 out of 7 medical schools in Ireland across each of the 4 provinces in the country. Quantitative data were analyzed using descriptive statistics and nonparametric tests. We used thematic content analysis to investigate free-text responses. Results: In total, 43.1\% (252/585) of the final-year students at 3 medical schools responded, and data collection at 1 medical school was terminated due to disruptions associated with the COVID-19 pandemic. With regard to forecasting the potential impact of artificial intelligence (AI)/ML on primary care 25 years from now, around half (127/246, 51.6\%) of all surveyed students believed the work of GPs will change minimally or not at all. Notably, students who did not intend to enter primary care predicted that AI/ML will have a great impact on the work of GPs. Conclusions: We caution that without a firm curricular foundation on advances in AI/ML, students may rely on extreme perspectives involving self-preserving optimism biases that demote the impact of advances in technology on primary care on the one hand and technohype on the other. Ultimately, these biases may lead to negative consequences in health care. Improvements in medical education could help prepare tomorrow's doctors to optimize and lead the ethical and evidence-based implementation of AI/ML-enabled tools in medicine for enhancing the care of tomorrow's patients. ", doi="10.2196/42639", url="https://mededu.jmir.org/2023/1/e42639", url="http://www.ncbi.nlm.nih.gov/pubmed/36939809" } @Article{info:doi/10.2196/41516, author="Ferrell, Brian and Raskin, E. Sarah and Zimmerman, B. Emily", title="Calibrating a Transformer-Based Model's Confidence on Community-Engaged Research Studies: Decision Support Evaluation Study", journal="JMIR Form Res", year="2023", month="Mar", day="20", volume="7", pages="e41516", keywords="explainable artificial intelligence", keywords="XAI", keywords="Bidirectional Encoder Representations From Transformers", keywords="BERT", keywords="transformer-based models", keywords="text classification", keywords="community engagement", keywords="community-engaged research", keywords="deep learning", keywords="decision support", keywords="trust", keywords="confidence", abstract="Background: Deep learning offers great benefits in classification tasks such as medical imaging diagnostics or stock trading, especially when compared with human-level performances, and can be a viable option for classifying distinct levels within community-engaged research (CEnR). CEnR is a collaborative approach between academics and community partners with the aim of conducting research that is relevant to community needs while incorporating diverse forms of expertise. In the field of deep learning and artificial intelligence (AI), training multiple models to obtain the highest validation accuracy is common practice; however, it can overfit toward that specific data set and not generalize well to a real-world population, which creates issues of bias and potentially dangerous algorithmic decisions. Consequently, if we plan on automating human decision-making, there is a need for creating techniques and exhaustive evaluative processes for these powerful unexplainable models to ensure that we do not incorporate and blindly trust poor AI models to make real-world decisions. Objective: We aimed to conduct an evaluation study to see whether our most accurate transformer-based models derived from previous studies could emulate our own classification spectrum for tracking CEnR studies as well as whether the use of calibrated confidence scores was meaningful. Methods: We compared the results from 3 domain experts, who classified a sample of 45 studies derived from our university's institutional review board database, with those from 3 previously trained transformer-based models, as well as investigated whether calibrated confidence scores can be a viable technique for using AI in a support role for complex decision-making systems. Results: Our findings reveal that certain models exhibit an overestimation of their performance through high confidence scores, despite not achieving the highest validation accuracy. Conclusions: Future studies should be conducted with larger sample sizes to generalize the results more effectively. Although our study addresses the concerns of bias and overfitting in deep learning models, there is a need to further explore methods that allow domain experts to trust our models more. The use of a calibrated confidence score can be a misleading metric when determining our AI model's level of competency. ", doi="10.2196/41516", url="https://formative.jmir.org/2023/1/e41516", url="http://www.ncbi.nlm.nih.gov/pubmed/36939830" } @Article{info:doi/10.2196/43725, author="Han, Yuting and Zhu, Xia and Hu, Yizhen and Yu, Canqing and Guo, Yu and Hang, Dong and Pang, Yuanjie and Pei, Pei and Ma, Hongxia and Sun, Dianjianyi and Yang, Ling and Chen, Yiping and Du, Huaidong and Yu, Min and Chen, Junshi and Chen, Zhengming and Huo, Dezheng and Jin, Guangfu and Lv, Jun and Hu, Zhibin and Shen, Hongbing and Li, Liming", title="Electronic Health Record--Based Absolute Risk Prediction Model for Esophageal Cancer in the Chinese Population: Model Development and External Validation", journal="JMIR Public Health Surveill", year="2023", month="Mar", day="15", volume="9", pages="e43725", keywords="esophageal cancer", keywords="prediction model", keywords="absolute risk", keywords="China", keywords="prospective cohort", keywords="screening", keywords="primary prevention", keywords="development", keywords="external validation", keywords="electronic health record", abstract="Background: China has the largest burden of esophageal cancer (EC). Prediction models can be used to identify high-risk individuals for intensive lifestyle interventions and endoscopy screening. However, the current prediction models are limited by small sample size and a lack of external validation, and none of them can be embedded into the booming electronic health records (EHRs) in China. Objective: This study aims to develop and validate absolute risk prediction models for EC in the Chinese population. In particular, we assessed whether models that contain only EHR-available predictors performed well. Methods: A prospective cohort recruiting 510,145 participants free of cancer from both high EC-risk and low EC-risk areas in China was used to develop EC models. Another prospective cohort of 18,441 participants was used for validation. A flexible parametric model was used to develop a 10-year absolute risk model by considering the competing risks (full model). The full model was then abbreviated by keeping only EHR-available predictors. We internally and externally validated the models by using the area under the receiver operating characteristic curve (AUC) and calibration plots and compared them based on classification measures. Results: During a median of 11.1 years of follow-up, we observed 2550 EC incident cases. The models consisted of age, sex, regional EC-risk level (high-risk areas: 2 study regions; low-risk areas: 8 regions), education, family history of cancer (simple model), smoking, alcohol use, BMI (intermediate model), physical activity, hot tea consumption, and fresh fruit consumption (full model). The performance was only slightly compromised after the abbreviation. The simple and intermediate models showed good calibration and excellent discriminating ability with AUCs (95\% CIs) of 0.822 (0.783-0.861) and 0.830 (0.792-0.867) in the external validation and 0.871 (0.858-0.884) and 0.879 (0.867-0.892) in the internal validation, respectively. Conclusions: Three nested 10-year EC absolute risk prediction models for Chinese adults aged 30-79 years were developed and validated, which may be particularly useful for populations in low EC-risk areas. Even the simple model with only 5 predictors available from EHRs had excellent discrimination and good calibration, indicating its potential for broader use in tailored EC prevention. The simple and intermediate models have the potential to be widely used for both primary and secondary prevention of EC. ", doi="10.2196/43725", url="https://publichealth.jmir.org/2023/1/e43725", url="http://www.ncbi.nlm.nih.gov/pubmed/36781293" } @Article{info:doi/10.2196/42890, author="Molokwu, Jennifer and Mendez, Melissa and Bracamontes, Christina", title="The Effect of Clinical Decision Prompts in Improving Human Papillomavirus Vaccination Rates in a Multispecialty Practice in a Predominantly Hispanic Population: Quasi-Experimental Study", journal="JMIR Cancer", year="2023", month="Mar", day="15", volume="9", pages="e42890", keywords="HPV", keywords="HPV vaccination", keywords="electronic clinical decision support", keywords="EMR prompt", keywords="clinical", keywords="decision", keywords="vaccine", keywords="pediatrics", keywords="age", keywords="ethnicity", keywords="race", keywords="language", keywords="immunization", abstract="Background: The human papillomavirus (HPV) is implicated in the causal pathway of cancers of the vulva, vagina, penis, cervix, anus, and oropharyngeal region. It is the most common sexually transmitted infection in the United States. Despite the documented safety and effectiveness of the HPV vaccine, rates lag behind those of other vaccines given at the same age. Objective: Provider recommendation is identified as a robust predictor of HPV vaccine uptake, and physician-prompting is shown to increase the provision of preventive care services in general. Theoretically, providing reminders to providers should increase opportunities for providing HPV vaccine recommendations and therefore affect vaccination rates. The objective of our study was to assess the effectiveness of an electronic medical record (EMR) prompt in improving HPV vaccination rates in an academic clinic setting caring for a predominantly Hispanic border population. Methods: We used a quasi-experimental design with a retrospective chart audit to evaluate the effect of a clinical decision prompt (CDP) on improving HPV immunization rates in different specialty settings. We introduced an EMR prompt to remind providers to recommend the HPV vaccine when seeing appropriate patients in an obstetrics and gynecology (OBGYN), pediatrics (PD), and family medicine (FM) clinic in a large multispecialty academic group located along the Texas-Mexico border. We assessed HPV vaccination rates in all the departments involved before and after introducing the prompts. Participants included male and female patients between the ages of 9 and 26 years, presenting at the clinics between January 2014 and December 2015. Results: We reviewed over 2800 charts in all 3 clinics. After adjusting for age, ethnicity, race, type of insurance, preferred language, and clinic, the odds of immunization were 92\% (P<.001) higher in patients after the prompt implementation of the EMR. In addition, there was an overall statistically significant increase in the overall HPV vaccination completion rates after implementing the CDP (31.96\% vs 21.22\%; P<.001). Again, OBGYN saw the most significant improvement in vaccination completion rates, with rates at follow-up 66.02\% higher than baseline rates (P=.04). PD and FM had somewhat similar but no less impressive improvements (57.7\% and 58.36\%; P<.001). Conclusions: Implementing an EMR CDP improved our overall odds of HPV vaccination completion by 92\%. We theorize that the decision prompts remind health care providers to discuss or recommend the HPV vaccination during clinical service delivery. CDPs in the EMR help increase HPV vaccination rates in multiple specialties and are a low-cost intervention for improving vaccination rates. ", doi="10.2196/42890", url="https://cancer.jmir.org/2023/1/e42890", url="http://www.ncbi.nlm.nih.gov/pubmed/36920453" } @Article{info:doi/10.2196/42435, author="Wang, Xue and Yang, Fengchun and Zhu, Mingwei and Cui, Hongyuan and Wei, Junmin and Li, Jiao and Chen, Wei", title="Development and Assessment of Assisted Diagnosis Models Using Machine Learning for Identifying Elderly Patients With Malnutrition: Cohort Study", journal="J Med Internet Res", year="2023", month="Mar", day="14", volume="25", pages="e42435", keywords="disease-related malnutrition", keywords="global leadership initiative on malnutrition", keywords="GLIM", keywords="older inpatients", keywords="machine learning", keywords="Shapley additive explanation", keywords="SHAP", keywords="malnutrition", keywords="nutrition", keywords="older adult", keywords="elder", keywords="XGBoost", keywords="model", keywords="diagnose", keywords="diagnosis", keywords="diagnostic", keywords="visualization", keywords="risk", keywords="algorithm", abstract="Background: Older patients are at an increased risk of malnutrition due to many factors related to poor clinical outcomes. Objective: This study aims to develop an assisted diagnosis model using machine learning (ML) for identifying older patients with malnutrition and providing the focus of individualized treatment. Methods: We reanalyzed a multicenter, observational cohort study including 2660 older patients. Baseline malnutrition was defined using the global leadership initiative on malnutrition (GLIM) criteria, and the study population was randomly divided into a derivation group (2128/2660, 80\%) and a validation group (532/2660, 20\%). We applied 5 ML algorithms and further explored the relationship between features and the risk of malnutrition by using the Shapley additive explanations visualization method. Results: The proposed ML models were capable to identify older patients with malnutrition. In the external validation cohort, the top 3 models by the area under the receiver operating characteristic curve were light gradient boosting machine (92.1\%), extreme gradient boosting (91.9\%), and the random forest model (91.5\%). Additionally, the analysis of the importance of features revealed that BMI, weight loss, and calf circumference were the strongest predictors to affect GLIM. A BMI of below 21 kg/m2 was associated with a higher risk of GLIM in older people. Conclusions: We developed ML models for assisting diagnosis of malnutrition based on the GLIM criteria. The cutoff values of laboratory tests generated by Shapley additive explanations could provide references for the identification of malnutrition. Trial Registration: Chinese Clinical Trial Registry ChiCTR-EPC-14005253; https://www.chictr.org.cn/showproj.aspx?proj=9542 ", doi="10.2196/42435", url="https://www.jmir.org/2023/1/e42435", url="http://www.ncbi.nlm.nih.gov/pubmed/36917167" } @Article{info:doi/10.2196/41100, author="Karapetian, Karina and Jeon, Min Soo and Kwon, Jin-Won and Suh, Young-Kyoon", title="Supervised Relation Extraction Between Suicide-Related Entities and Drugs: Development and Usability Study of an Annotated PubMed Corpus", journal="J Med Internet Res", year="2023", month="Mar", day="8", volume="25", pages="e41100", keywords="suicide", keywords="adverse drug events", keywords="information extraction", keywords="relation classification", keywords="bidirectional encoder representations from transformers", keywords="pharmacovigilance", keywords="natural language processing", keywords="PubMed", keywords="corpus", keywords="language model", abstract="Background: Drug-induced suicide has been debated as a crucial issue in both clinical and public health research. Published research articles contain valuable data on the drugs associated with suicidal adverse events. An automated process that extracts such information and rapidly detects drugs related to suicide risk is essential but has not been well established. Moreover, few data sets are available for training and validating classification models on drug-induced suicide. Objective: This study aimed to build a corpus of drug-suicide relations containing annotated entities for drugs, suicidal adverse events, and their relations. To confirm the effectiveness of the drug-suicide relation corpus, we evaluated the performance of a relation classification model using the corpus in conjunction with various embeddings. Methods: We collected the abstracts and titles of research articles associated with drugs and suicide from PubMed and manually annotated them along with their relations at the sentence level (adverse drug events, treatment, suicide means, or miscellaneous). To reduce the manual annotation effort, we preliminarily selected sentences with a pretrained zero-shot classifier or sentences containing only drug and suicide keywords. We trained a relation classification model using various Bidirectional Encoder Representations from Transformer embeddings with the proposed corpus. We then compared the performances of the model with different Bidirectional Encoder Representations from Transformer--based embeddings and selected the most suitable embedding for our corpus. Results: Our corpus comprised 11,894 sentences extracted from the titles and abstracts of the PubMed research articles. Each sentence was annotated with drug and suicide entities and the relationship between these 2 entities (adverse drug events, treatment, means, and miscellaneous). All of the tested relation classification models that were fine-tuned on the corpus accurately detected sentences of suicidal adverse events regardless of their pretrained type and data set properties. Conclusions: To our knowledge, this is the first and most extensive corpus of drug-suicide relations. ", doi="10.2196/41100", url="https://www.jmir.org/2023/1/e41100", url="http://www.ncbi.nlm.nih.gov/pubmed/36884281" } @Article{info:doi/10.2196/44547, author="Frid, Santiago and Pastor Duran, Xavier and Bracons Cuc{\'o}, Guillem and Pedrera-Jim{\'e}nez, Miguel and Serrano-Balazote, Pablo and Mu{\~n}oz Carrero, Adolfo and Lozano-Rub{\'i}, Raimundo", title="An Ontology-Based Approach for Consolidating Patient Data Standardized With European Norm/International Organization for Standardization 13606 (EN/ISO 13606) Into Joint Observational Medical Outcomes Partnership (OMOP) Repositories: Description of a Methodology", journal="JMIR Med Inform", year="2023", month="Mar", day="8", volume="11", pages="e44547", keywords="health information interoperability", keywords="health research", keywords="health information standards", keywords="dual model", keywords="secondary use of health data", keywords="Observational Medical Outcomes Partnership Common Data Model", keywords="European Norm/International Organization for Standardization 13606", keywords="health records", keywords="ontologies", keywords="clinical data", abstract="Background: To discover new knowledge from data, they must be correct and in a consistent format. OntoCR, a clinical repository developed at Hospital Cl{\'i}nic de Barcelona, uses ontologies to represent clinical knowledge and map locally defined variables to health information standards and common data models. Objective: The aim of the study is to design and implement a scalable methodology based on the dual-model paradigm and the use of ontologies to consolidate clinical data from different organizations in a standardized repository for research purposes without loss of meaning. Methods: First, the relevant clinical variables are defined, and the corresponding European Norm/International Organization for Standardization (EN/ISO) 13606 archetypes are created. Data sources are then identified, and an extract, transform, and load process is carried out. Once the final data set is obtained, the data are transformed to create EN/ISO 13606--normalized electronic health record (EHR) extracts. Afterward, ontologies that represent archetyped concepts and map them to EN/ISO 13606 and Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) standards are created and uploaded to OntoCR. Data stored in the extracts are inserted into its corresponding place in the ontology, thus obtaining instantiated patient data in the ontology-based repository. Finally, data can be extracted via SPARQL queries as OMOP CDM--compliant tables. Results: Using this methodology, EN/ISO 13606--standardized archetypes that allow for the reuse of clinical information were created, and the knowledge representation of our clinical repository by modeling and mapping ontologies was extended. Furthermore, EN/ISO 13606--compliant EHR extracts of patients (6803), episodes (13,938), diagnosis (190,878), administered medication (222,225), cumulative drug dose (222,225), prescribed medication (351,247), movements between units (47,817), clinical observations (6,736,745), laboratory observations (3,392,873), limitation of life-sustaining treatment (1,298), and procedures (19,861) were created. Since the creation of the application that inserts data from extracts into the ontologies is not yet finished, the queries were tested and the methodology was validated by importing data from a random subset of patients into the ontologies using a locally developed Prot{\'e}g{\'e} plugin (``OntoLoad''). In total, 10 OMOP CDM--compliant tables (``Condition\_occurrence,'' 864 records; ``Death,'' 110; ``Device\_exposure,'' 56; ``Drug\_exposure,'' 5609; ``Measurement,'' 2091; ``Observation,'' 195; ``Observation\_period,'' 897; ``Person,'' 922; ``Visit\_detail,'' 772; and ``Visit\_occurrence,'' 971) were successfully created and populated. Conclusions: This study proposes a methodology for standardizing clinical data, thus allowing its reuse without any changes in the meaning of the modeled concepts. Although this paper focuses on health research, our methodology suggests that the data be initially standardized per EN/ISO 13606 to obtain EHR extracts with a high level of granularity that can be used for any purpose. Ontologies constitute a valuable approach for knowledge representation and standardization of health information in a standard-agnostic manner. With the proposed methodology, institutions can go from local raw data to standardized, semantically interoperable EN/ISO 13606 and OMOP repositories. ", doi="10.2196/44547", url="https://medinform.jmir.org/2023/1/e44547", url="http://www.ncbi.nlm.nih.gov/pubmed/36884279" } @Article{info:doi/10.2196/43097, author="Chen, Jinying and Cutrona, L. Sarah and Dharod, Ajay and Bunch, C. Stephanie and Foley, L. Kristie and Ostasiewski, Brian and Hale, R. Erica and Bridges, Aaron and Moses, Adam and Donny, C. Eric and Sutfin, L. Erin and Houston, K. Thomas and ", title="Monitoring the Implementation of Tobacco Cessation Support Tools: Using Novel Electronic Health Record Activity Metrics", journal="JMIR Med Inform", year="2023", month="Mar", day="2", volume="11", pages="e43097", keywords="medical informatics", keywords="electronic health records", keywords="EHR metrics", keywords="alerts", keywords="alert burden", keywords="tobacco cessation", keywords="monitoring", keywords="clinical decision support", keywords="implementation science", keywords="smoking cessation", keywords="decision tool", abstract="Background: Clinical decision support (CDS) tools in electronic health records (EHRs) are often used as core strategies to support quality improvement programs in the clinical setting. Monitoring the impact (intended and unintended) of these tools is crucial for program evaluation and adaptation. Existing approaches for monitoring typically rely on health care providers' self-reports or direct observation of clinical workflows, which require substantial data collection efforts and are prone to reporting bias. Objective: This study aims to develop a novel monitoring method leveraging EHR activity data and demonstrate its use in monitoring the CDS tools implemented by a tobacco cessation program sponsored by the National Cancer Institute's Cancer Center Cessation Initiative (C3I). Methods: We developed EHR-based metrics to monitor the implementation of two CDS tools: (1) a screening alert reminding clinic staff to complete the smoking assessment and (2) a support alert prompting health care providers to discuss support and treatment options, including referral to a cessation clinic. Using EHR activity data, we measured the completion (encounter-level alert completion rate) and burden (the number of times an alert was fired before completion and time spent handling the alert) of the CDS tools. We report metrics tracked for 12 months post implementation, comparing 7 cancer clinics (2 clinics implemented the screening alert and 5 implemented both alerts) within a C3I center, and identify areas to improve alert design and adoption. Results: The screening alert fired in 5121 encounters during the 12 months post implementation. The encounter-level alert completion rate (clinic staff acknowledged completion of screening in EHR: 0.55; clinic staff completed EHR documentation of screening results: 0.32) remained stable over time but varied considerably across clinics. The support alert fired in 1074 encounters during the 12 months. Providers acted upon (ie, not postponed) the support alert in 87.3\% (n=938) of encounters, identified a patient ready to quit in 12\% (n=129) of encounters, and ordered a referral to the cessation clinic in 2\% (n=22) of encounters. With respect to alert burden, on average, both alerts fired over 2 times (screening alert: 2.7; support alert: 2.1) before completion; time spent postponing the screening alert was similar to completing (52 vs 53 seconds) the alert, and time spent postponing the support alert was more than completing (67 vs 50 seconds) the alert per encounter. These findings inform four areas where the alert design and use can be improved: (1) improving alert adoption and completion through local adaptation, (2) improving support alert efficacy by additional strategies including training in provider-patient communication, (3) improving the accuracy of tracking for alert completion, and (4) balancing alert efficacy with the burden. Conclusions: EHR activity metrics were able to monitor the success and burden of tobacco cessation alerts, allowing for a more nuanced understanding of potential trade-offs associated with alert implementation. These metrics can be used to guide implementation adaptation and are scalable across diverse settings. ", doi="10.2196/43097", url="https://medinform.jmir.org/2023/1/e43097", url="http://www.ncbi.nlm.nih.gov/pubmed/36862466" } @Article{info:doi/10.2196/43005, author="Slezak, Jeff and Sacks, David and Chiu, Vicki and Avila, Chantal and Khadka, Nehaa and Chen, Jiu-Chiuan and Wu, Jun and Getahun, Darios", title="Identification of Postpartum Depression in Electronic Health Records: Validation in a Large Integrated Health Care System", journal="JMIR Med Inform", year="2023", month="Mar", day="1", volume="11", pages="e43005", keywords="validation", keywords="postpartum depression", keywords="electronic health records", keywords="pregnancy", keywords="health care system", keywords="diagnosis codes", keywords="pharmacy records", keywords="health data", keywords="data collection", keywords="implementation", keywords="eHealth record", keywords="depression", keywords="mental well-being", keywords="women's health", abstract="Background: The accuracy of electronic health records (EHRs) for identifying postpartum depression (PPD) is not well studied. Objective: This study aims to evaluate the accuracy of PPD reporting in EHRs and compare the quality of PPD data collected before and after the implementation of the International Classification of Diseases, Tenth Revision (ICD-10) coding in the health care system. Methods: Information on PPD was extracted from a random sample of 400 eligible Kaiser Permanente Southern California patients' EHRs. Clinical diagnosis codes and pharmacy records were abstracted for two time periods: January 1, 2012, through December 31, 2014 (International Classification of Diseases, Ninth Revision [ICD-9] period), and January 1, 2017, through December 31, 2019 (ICD-10 period). Manual chart reviews of clinical records for PPD were considered the gold standard and were compared with corresponding electronically coded diagnosis and pharmacy records using sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Kappa statistic was calculated to measure agreement. Results: Overall agreement between the identification of depression using combined diagnosis codes and pharmacy records with that of medical record review was strong ($\kappa$=0.85, sensitivity 98.3\%, specificity 83.3\%, PPV 93.7\%, NPV 95.0\%). Using only diagnosis codes resulted in much lower sensitivity (65.4\%) and NPV (50.5\%) but good specificity (88.6\%) and PPV (93.5\%). Separately, examining agreement between chart review and electronic coding among diagnosis codes and pharmacy records showed sensitivity, specificity, and NPV higher with prescription use records than with clinical diagnosis coding for PPD, 96.5\% versus 72.0\%, 96.5\% versus 65.0\%, and 96.5\% versus 65.0\%, respectively. There was no notable difference in agreement between ICD-9 (overall $\kappa$=0.86) and ICD-10 (overall $\kappa$=0.83) coding periods. Conclusions: PPD is not reliably captured in the clinical diagnosis coding of EHRs. The accuracy of PPD identification can be improved by supplementing clinical diagnosis with pharmacy use records. The completeness of PPD data remained unchanged after the implementation of the ICD-10 diagnosis coding. ", doi="10.2196/43005", url="https://medinform.jmir.org/2023/1/e43005", url="http://www.ncbi.nlm.nih.gov/pubmed/36857123" } @Article{info:doi/10.2196/45529, author="Marri, Shankar Shiva and Inamadar, C. Arun and Janagond, B. Ajit and Albadri, Warood", title="Analyzing the Predictability of an Artificial Intelligence App (Tibot) in the Diagnosis of Dermatological Conditions: A Cross-sectional Study", journal="JMIR Dermatol", year="2023", month="Mar", day="1", volume="6", pages="e45529", keywords="artificial intelligence", keywords="AI-assisted diagnosis", keywords="machine learning", keywords="neural network", keywords="deep learning", keywords="dermatology", keywords="mobile", keywords="application", keywords="app", abstract="Background: Artificial intelligence (AI) aims to create programs that reproduce human cognition and processes involved in interpreting complex data. Dermatology relies on morphological features and is ideal for applying AI image recognition for assisted diagnosis. Tibot is an AI app that analyzes skin conditions and works on the principle of a convolutional neural network. Appropriate research analyzing the accuracy of such apps is necessary. Objective: This study aims to analyze the predictability of the Tibot AI app in the identification of dermatological diseases as compared to a dermatologist. Methods: This is a cross-sectional study. After taking informed consent, photographs of lesions of patients with different skin conditions were uploaded to the app. In every condition, the AI predicted three diagnoses based on probability, and these were compared with that by a dermatologist. The ability of the AI app to predict the actual diagnosis in the top one and top three anticipated diagnoses (prediction accuracy) was used to evaluate the app's effectiveness. Sensitivity, specificity, and positive predictive value were also used to assess the app's performance. Chi-square test was used to contrast categorical variables. P<.05 was considered statistically significant. Results: A total of 600 patients were included. Clinical conditions included alopecia, acne, eczema, immunological disorders, pigmentary disorders, psoriasis, infestation, tumors, and infections. In the anticipated top three diagnoses, the app's mean prediction accuracy was 96.1\% (95\% CI 94.3\%-97.5\%), while for the exact diagnosis, it was 80.6\% (95\% CI 77.2\%-83.7\%). The prediction accuracy (top one) for alopecia, acne, pigmentary disorders, and fungal infections was 97.7\%, 91.7\%, 88.5\%, and 82.9\%, respectively. Prediction accuracy (top three) for alopecia, eczema, and tumors was 100\%. The sensitivity and specificity of the app were 97\% (95\% CI 95\%-98\%) and 98\% (95\% CI 98\%-99\%), respectively. There is a statistically significant association between clinical and AI-predicted diagnoses in all conditions (P<.001). Conclusions: The AI app has shown promising results in diagnosing various dermatological conditions, and there is great potential for practical applicability. ", doi="10.2196/45529", url="https://derma.jmir.org/2023/1/e45529", url="http://www.ncbi.nlm.nih.gov/pubmed/37632978" } @Article{info:doi/10.2196/40463, author="Aune, Anders and Vartdal, Gunnar and Jimenez Diaz, Gabriela and Gierman, Marijn Lobke and Bergseng, H{\aa}kon and Darj, Elisabeth", title="Iterative Development, Validation, and Certification of a Smartphone System to Assess Neonatal Jaundice: Development and Usability Study", journal="JMIR Pediatr Parent", year="2023", month="Feb", day="28", volume="6", pages="e40463", keywords="neonatal jaundice", keywords="neonatal hyperbilirubinemia", keywords="newborns", keywords="mobile app", keywords="design", keywords="validation", keywords="regulatory processes", keywords="mobile health", keywords="mHealth", keywords="mobile phone", abstract="Background: Medical device development is an area facing multiple challenges, resulting in a high number of products not reaching the clinical setting. Neonatal hyperbilirubinemia, manifesting as neonatal jaundice (NNJ), is an important cause of newborn morbidity and mortality. It is important to identify infants with neonatal hyperbilirubinemia at an early stage, but currently there is a lack of tools that are both accurate and affordable. Objective: This study aimed to develop a novel system to assess the presence of NNJ. The device should provide accurate results, be approved as a medical device, be easy to use, and be produced at a price that is affordable even in low-resource settings. Methods: We used an iterative approach to develop a smartphone-based system to detect the presence of NNJ. We performed technical development, followed by clinical and usability testing in parallel, after which we initiated the regulatory processes for certification. We updated the system in each iteration, and the final version underwent a clinical validation study on healthy term newborns aged 1 to 15 days before all documentation was submitted for conformity assessment to obtain Conformit{\'e} Europ{\'e}enne (CE) certification. We developed a system that incorporates a smartphone app, a color calibration card, and a server. Results: Three iterations of the smartphone-based system were developed; the final version was approved as a medical device after complying with Medical Device Regulation guidelines. A total of 201 infants were included in the validation study. Bilirubin values using the system highly correlated with total serum or plasma bilirubin levels (r=0.84). The system had a high sensitivity (94\%) to detect severe jaundice, defined as total serum or plasma bilirubin >250 {\textmu}mol/L, and maintained a high specificity (71\%). Conclusions: Our smartphone-based system has a high potential as a tool for identifying NNJ. An iterative approach to product development, conducted by working on different tasks in parallel, resulted in a functional and successful product. By adhering to the requirements for regulatory approval from the beginning of the project, we were able to develop a market-ready mobile health solution. ", doi="10.2196/40463", url="https://pediatrics.jmir.org/2023/1/e40463", url="http://www.ncbi.nlm.nih.gov/pubmed/36853753" } @Article{info:doi/10.2196/42324, author="Phumkuea, Thanakorn and Wongsirichot, Thakerng and Damkliang, Kasikrit and Navasakulpong, Asma", title="Classifying COVID-19 Patients From Chest X-ray Images Using Hybrid Machine Learning Techniques: Development and Evaluation", journal="JMIR Form Res", year="2023", month="Feb", day="28", volume="7", pages="e42324", keywords="COVID-19", keywords="machine learning", keywords="medical informatics", keywords="coronavirus", keywords="diagnosis", keywords="model", keywords="detection", keywords="healthy", keywords="unhealthy", keywords="public", keywords="usage", keywords="data", keywords="database", keywords="accuracy", keywords="development", keywords="x-ray", keywords="imaging", abstract="Background: The COVID-19 pandemic has raised global concern, with moderate to severe cases displaying lung inflammation and respiratory failure. Chest x-ray (CXR) imaging is crucial for diagnosis and is usually interpreted by experienced medical specialists. Machine learning has been applied with acceptable accuracy, but computational efficiency has received less attention. Objective: We introduced a novel hybrid machine learning model to accurately classify COVID-19, non-COVID-19, and healthy patients from CXR images with reduced computational time and promising results. Our proposed model was thoroughly evaluated and compared with existing models. Methods: A retrospective study was conducted to analyze 5 public data sets containing 4200 CXR images using machine learning techniques including decision trees, support vector machines, and neural networks. The images were preprocessed to undergo image segmentation, enhancement, and feature extraction. The best performing machine learning technique was selected and combined into a multilayer hybrid classification model for COVID-19 (MLHC-COVID-19). The model consisted of 2 layers. The first layer was designed to differentiate healthy individuals from infected patients, while the second layer aimed to classify COVID-19 and non-COVID-19 patients. Results: The MLHC-COVID-19 model was trained and evaluated on unseen COVID-19 CXR images, achieving reasonably high accuracy and F measures of 0.962 and 0.962, respectively. These results show the effectiveness of the MLHC-COVID-19 in classifying COVID-19 CXR images, with improved accuracy and a reduction in interpretation time. The model was also embedded into a web-based MLHC-COVID-19 computer-aided diagnosis system, which was made publicly available. Conclusions: The study found that the MLHC-COVID-19 model effectively differentiated CXR images of COVID-19 patients from those of healthy and non-COVID-19 individuals. It outperformed other state-of-the-art deep learning techniques and showed promising results. These results suggest that the MLHC-COVID-19 model could have been instrumental in early detection and diagnosis of COVID-19 patients, thus playing a significant role in controlling and managing the pandemic. Although the pandemic has slowed down, this model can be adapted and utilized for future similar situations. The model was also integrated into a publicly accessible web-based computer-aided diagnosis system. ", doi="10.2196/42324", url="https://formative.jmir.org/2023/1/e42324", url="http://www.ncbi.nlm.nih.gov/pubmed/36780315" } @Article{info:doi/10.2196/39077, author="Frei, Johann and Kramer, Frank", title="German Medical Named Entity Recognition Model and Data Set Creation Using Machine Translation and Word Alignment: Algorithm Development and Validation", journal="JMIR Form Res", year="2023", month="Feb", day="28", volume="7", pages="e39077", keywords="natural language processing", keywords="named entity recognition", keywords="information extraction", abstract="Background: Data mining in the field of medical data analysis often needs to rely solely on the processing of unstructured data to retrieve relevant data. For German natural language processing, few open medical neural named entity recognition (NER) models have been published before this work. A major issue can be attributed to the lack of German training data. Objective: We developed a synthetic data set and a novel German medical NER model for public access to demonstrate the feasibility of our approach. In order to bypass legal restrictions due to potential data leaks through model analysis, we did not make use of internal, proprietary data sets, which is a frequent veto factor for data set publication. Methods: The underlying German data set was retrieved by translation and word alignment of a public English data set. The data set served as a foundation for model training and evaluation. For demonstration purposes, our NER model follows a simple network architecture that is designed for low computational requirements. Results: The obtained data set consisted of 8599 sentences including 30,233 annotations. The model achieved a class frequency--averaged F1 score of 0.82 on the test set after training across 7 different NER types. Artifacts in the synthesized data set with regard to translation and alignment induced by the proposed method were exposed. The annotation performance was evaluated on an external data set and measured in comparison with an existing baseline model that has been trained on a dedicated German data set in a traditional fashion. We discussed the drop in annotation performance on an external data set for our simple NER model. Our model is publicly available. Conclusions: We demonstrated the feasibility of obtaining a data set and training a German medical NER model by the exclusive use of public training data through our suggested method. The discussion on the limitations of our approach includes ways to further mitigate remaining problems in future work. ", doi="10.2196/39077", url="https://formative.jmir.org/2023/1/e39077", url="http://www.ncbi.nlm.nih.gov/pubmed/36853741" } @Article{info:doi/10.2196/42181, author="Schallmoser, Simon and Zueger, Thomas and Kraus, Mathias and Saar-Tsechansky, Maytal and Stettler, Christoph and Feuerriegel, Stefan", title="Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study", journal="J Med Internet Res", year="2023", month="Feb", day="27", volume="25", pages="e42181", keywords="diabetes", keywords="prediabetes", keywords="machine learning", keywords="microvascular complications", keywords="macrovascular complications", abstract="Background: Micro- and macrovascular complications are a major burden for individuals with diabetes and can already arise in a prediabetic state. To allocate effective treatments and to possibly prevent these complications, identification of those at risk is essential. Objective: This study aimed to build machine learning (ML) models that predict the risk of developing a micro- or macrovascular complication in individuals with prediabetes or diabetes. Methods: In this study, we used electronic health records from Israel that contain information about demographics, biomarkers, medications, and disease codes; span from 2003 to 2013; and were queried to identify individuals with prediabetes or diabetes in 2008. Subsequently, we aimed to predict which of these individuals developed a micro- or macrovascular complication within the next 5 years. We included 3 microvascular complications: retinopathy, nephropathy, and neuropathy. In addition, we considered 3 macrovascular complications: peripheral vascular disease (PVD), cerebrovascular disease (CeVD), and cardiovascular disease (CVD). Complications were identified via disease codes, and, for nephropathy, the estimated glomerular filtration rate and albuminuria were considered additionally. Inclusion criteria were complete information on age and sex and on disease codes (or measurements of estimated glomerular filtration rate and albuminuria for nephropathy) until 2013 to account for patient dropout. Exclusion criteria for predicting a complication were diagnosis of this specific complication before or in 2008. In total, 105 predictors from demographics, biomarkers, medications, and disease codes were used to build the ML models. We compared 2 ML models: logistic regression and gradient-boosted decision trees (GBDTs). To explain the predictions of the GBDTs, we calculated Shapley additive explanations values. Results: Overall, 13,904 and 4259 individuals with prediabetes and diabetes, respectively, were identified in our underlying data set. For individuals with prediabetes, the areas under the receiver operating characteristic curve for logistic regression and GBDTs were, respectively, 0.657 and 0.681 (retinopathy), 0.807 and 0.815 (nephropathy), 0.727 and 0.706 (neuropathy), 0.730 and 0.727 (PVD), 0.687 and 0.693 (CeVD), and 0.707 and 0.705 (CVD); for individuals with diabetes, the areas under the receiver operating characteristic curve were, respectively, 0.673 and 0.726 (retinopathy), 0.763 and 0.775 (nephropathy), 0.745 and 0.771 (neuropathy), 0.698 and 0.715 (PVD), 0.651 and 0.646 (CeVD), and 0.686 and 0.680 (CVD). Overall, the prediction performance is comparable for logistic regression and GBDTs. The Shapley additive explanations values showed that increased levels of blood glucose, glycated hemoglobin, and serum creatinine are risk factors for microvascular complications. Age and hypertension were associated with an elevated risk for macrovascular complications. Conclusions: Our ML models allow for an identification of individuals with prediabetes or diabetes who are at increased risk of developing micro- or macrovascular complications. The prediction performance varied across complications and target populations but was in an acceptable range for most prediction tasks. ", doi="10.2196/42181", url="https://www.jmir.org/2023/1/e42181", url="http://www.ncbi.nlm.nih.gov/pubmed/36848190" } @Article{info:doi/10.2196/42881, author="Parpia, Camilla and Moore, Clara and Beatty, Madison and Miranda, Susan and Adams, Sherri and Stinson, Jennifer and Desai, Arti and Bartlett, Leah and Culbert, Erin and Cohen, Eyal and Orkin, Julia", title="Evaluation of a Secure Messaging System in the Care of Children With Medical Complexity: Mixed Methods Study", journal="JMIR Form Res", year="2023", month="Feb", day="23", volume="7", pages="e42881", keywords="secure messaging", keywords="children with medical complexity", keywords="patient-physician relationship", keywords="care coordination", keywords="partnership", keywords="email communication", keywords="online information", keywords="children", keywords="pediatrics", keywords="caregiver", keywords="information sharing", keywords="electronic medical record", abstract="Background: The Connecting2gether (C2) platform is a web and mobile--based information-sharing tool that aims to improve care for children with medical complexity and their families. A key feature of C2 is secure messaging, which enables parental caregivers (PCs) to communicate with their child's care team members (CTMs) in a timely manner. Objective: The objectives of this study were to (1) evaluate the use of a secure messaging system, (2) examine and compare the content of messages to email and phone calls, and (3) explore PCs' and CTMs' perceptions and experiences using secure messaging as a method of communication. Methods: This is a substudy of a larger feasibility evaluation of the C2 platform. PCs of children with medical complexity were recruited from a tertiary-level complex care program to use the C2 platform for 6 months. PCs could invite CTMs involved in their child's care to register on the platform. Messages were extracted from C2, and phone and email data were extracted from electronic medical records. Quantitative data from the use of C2 were analyzed using descriptive statistics. Messaging content codes were iteratively developed through a review of the C2 messages and phone and email communication. Semistructured interviews were completed with PCs and CTMs. Communication and interview data were analyzed using thematic analysis. Results: A total of 36 PCs and 66 CTMs registered on the C2 platform. A total of 1861 messages were sent on C2, with PCs and nurse practitioners sending a median of 30 and 74 messages, respectively. Of all the C2 messages, 85.45\% (1257/1471) were responded to within 24 hours. Email and phone calls focused primarily on clinical concerns and medications, whereas C2 messaging focused more on parent education, proactive check-ins, and nonmedical aspects of the child's life. Four themes emerged from the platform user interviews related to C2 messaging: (1) connection to the care team, (2) efficient communication, (3) clinical uses of secure messaging, and (4) barriers to use. Conclusions: Overall, our study provides valuable insight into the benefits of secure messaging in the care of children with medical complexity. Secure messaging provided the opportunity for continued family teaching, proactive check-ins from health care providers, and casual conversations about family and child life, which contributed to PCs feeling an improved sense of connection with their child's health care team. Secure messaging can be a beneficial additional communication method to improve communication between PCs and their care team, reducing the associated burden of care coordination and ultimately enhancing the experience of care delivery. Future directions include the evaluation of secure messaging when integrated into electronic medical records, as this has the potential to work well with CTM workflow, reduce redundancy, and allow for new features of secure messaging. ", doi="10.2196/42881", url="https://formative.jmir.org/2023/1/e42881", url="http://www.ncbi.nlm.nih.gov/pubmed/36821356" } @Article{info:doi/10.2196/44818, author="Le, Linh Vu and Kim, Daewoo and Cho, Eunsung and Jang, Hyeryung and Reyes, Delos Roben and Kim, Hyunggug and Lee, Dongheon and Yoon, In-Young and Hong, Joonki and Kim, Jeong-Whun", title="Real-Time Detection of Sleep Apnea Based on Breathing Sounds and Prediction Reinforcement Using Home Noises: Algorithm Development and Validation", journal="J Med Internet Res", year="2023", month="Feb", day="22", volume="25", pages="e44818", keywords="sleep apnea", keywords="OSA detection", keywords="home care", keywords="artificial intelligence", keywords="deep learning", keywords="prediction model", keywords="audio", keywords="diagnostic", keywords="home technology", keywords="sound", abstract="Background: Multinight monitoring can be helpful for the diagnosis and management of obstructive sleep apnea (OSA). For this purpose, it is necessary to be able to detect OSA in real time in a noisy home environment. Sound-based OSA assessment holds great potential since it can be integrated with smartphones to provide full noncontact monitoring of OSA at home. Objective: The purpose of this study is to develop a predictive model that can detect OSA in real time, even in a home environment where various noises exist. Methods: This study included 1018 polysomnography (PSG) audio data sets, 297 smartphone audio data sets synced with PSG, and a home noise data set containing 22,500 noises to train the model to predict breathing events, such as apneas and hypopneas, based on breathing sounds that occur during sleep. The whole breathing sound of each night was divided into 30-second epochs and labeled as ``apnea,'' ``hypopnea,'' or ``no-event,'' and the home noises were used to make the model robust to a noisy home environment. The performance of the prediction model was assessed using epoch-by-epoch prediction accuracy and OSA severity classification based on the apnea-hypopnea index (AHI). Results: Epoch-by-epoch OSA event detection showed an accuracy of 86\% and a macro F1-score of 0.75 for the 3-class OSA event detection task. The model had an accuracy of 92\% for ``no-event,'' 84\% for ``apnea,'' and 51\% for ``hypopnea.'' Most misclassifications were made for ``hypopnea,'' with 15\% and 34\% of ``hypopnea'' being wrongly predicted as ``apnea'' and ``no-event,'' respectively. The sensitivity and specificity of the OSA severity classification (AHI?15) were 0.85 and 0.84, respectively. Conclusions: Our study presents a real-time epoch-by-epoch OSA detector that works in a variety of noisy home environments. Based on this, additional research is needed to verify the usefulness of various multinight monitoring and real-time diagnostic technologies in the home environment. ", doi="10.2196/44818", url="https://www.jmir.org/2023/1/e44818", url="http://www.ncbi.nlm.nih.gov/pubmed/36811943" } @Article{info:doi/10.2196/42364, author="Rieger, Y. Erin and Anderson, J. Irsk and Press, G. Valerie and Cui, X. Michael and Arora, M. Vineet and Williams, C. Brent and Tang, W. Joyce", title="Implementation of a Biopsychosocial History and Physical Exam Template in the Electronic Health Record: Mixed Methods Study", journal="JMIR Med Educ", year="2023", month="Feb", day="21", volume="9", pages="e42364", keywords="medical education", keywords="electronic health record", keywords="hospital medicine", keywords="psychosocial factors", keywords="chronic condition", keywords="chronic", keywords="disease", keywords="management", keywords="prevention", keywords="clinical", keywords="engagement", abstract="Background: Patients' perspectives and social contexts are critical for prevention of hospital readmissions; however, neither is routinely assessed using the traditional history and physical (H\&P) examination nor commonly documented in the electronic health record (EHR). The H\&P 360 is a revised H\&P template that integrates routine assessment of patient perspectives and goals, mental health, and an expanded social history (behavioral health, social support, living environment and resources, function). Although the H\&P 360 has shown promise in increasing psychosocial documentation in focused teaching contexts, its uptake and impact in routine clinical settings are unknown. Objective: The aim of this study was to assess the feasibility, acceptability, and impact on care planning of implementing an inpatient H\&P 360 template in the EHR for use by fourth-year medical students. Methods: A mixed methods study design was used. Fourth-year medical students on internal medicine subinternship (subI) services were given a brief training on the H\&P 360 and access to EHR-based H\&P 360 templates. Students not working in the intensive care unit (ICU) were asked to use the templates at least once per call cycle, whereas use by ICU students was elective. An EHR query was used to identify all H\&P 360 and traditional H\&P admission notes authored by non-ICU students at University of Chicago (UC) Medicine. Of these notes, all H\&P 360 notes and a sample of traditional H\&P notes were reviewed by two researchers for the presence of H\&P 360 domains and impact on patient care. A postcourse survey was administered to query all students for their perspectives on the H\&P 360. Results: Of the 13 non-ICU subIs at UC Medicine, 6 (46\%) used the H\&P 360 templates at least once, which accounted for 14\%-92\% of their authored admission notes (median 56\%). Content analysis was performed with 45 H\&P 360 notes and 54 traditional H\&P notes. Psychosocial documentation across all H\&P 360 domains (patient perspectives and goals, mental health, expanded social history elements) was more common in H\&P 360 compared with traditional notes. Related to impact on patient care, H\&P 360 notes more commonly identified needs (20\% H\&P 360; 9\% H\&P) and described interdisciplinary coordination (78\% H\&P 360; 41\% H\&P). Of the 11 subIs completing surveys, the vast majority (n=10, 91\%) felt the H\&P 360 helped them understand patient goals and improved the patient-provider relationship. Most students (n=8, 73\%) felt the H\&P 360 took an appropriate amount of time. Conclusions: Students who applied the H\&P 360 using templated notes in the EHR found it feasible and helpful. These students wrote notes reflecting enhanced assessment of goals and perspectives for patient-engaged care and contextual factors important to preventing rehospitalization. Reasons some students did not use the templated H\&P 360 should be examined in future studies. Uptake may be enhanced through earlier and repeated exposure and greater engagement by residents and attendings. Larger-scale implementation studies can help further elucidate the complexities of implementing nonbiomedical information within EHRs. ", doi="10.2196/42364", url="https://mededu.jmir.org/2023/1/e42364", url="http://www.ncbi.nlm.nih.gov/pubmed/36802337" } @Article{info:doi/10.2196/42767, author="Moloney, Max and MacKinnon, Madison and Bullock, Emma and Morra, Alison and Barber, David and Gupta, Samir and Queenan, A. John and Digby, C. Genevi{\`e}ve and To, Teresa and Lougheed, Diane M.", title="Integrating User Preferences for Asthma Tools and Clinical Guidelines Into Primary Care Electronic Medical Records: Mixed Methods Study", journal="JMIR Form Res", year="2023", month="Feb", day="21", volume="7", pages="e42767", keywords="asthma", keywords="electronic medical records", keywords="quality improvement", keywords="knowledge translation", keywords="qualitative analysis", abstract="Background: Asthma is a chronic respiratory disease that poses a substantial burden on individuals and the health care system. Despite published national guidelines for the diagnosis and management of asthma, considerable care gaps exist. Suboptimal adherence to asthma diagnosis and management guidelines contributes to poor patient outcomes. The integration of electronic tools (eTools) into electronic medical records (EMRs) represents a knowledge translation opportunity to support best practices. Objective: The purpose of this study was to determine how best to integrate evidence-based asthma eTools into primary care EMRs across Ontario and Canada to improve adherence to guidelines as well as measure and monitor performance. Methods: In total, 2 focus groups comprising physicians and allied health professionals who were considered experts in primary care, asthma, and EMRs were convened. One focus group also included a patient participant. Focus groups used a semistructured discussion-based format to consider the optimal methods for integrating asthma eTools into EMRs. Discussions were held on the web via Microsoft Teams (Microsoft Corp). The first focus group discussed integrating asthma indicators into EMRs using eTools, and participants completed a questionnaire evaluating the clarity, relevance, and feasibility of collecting asthma performance indicator data at the point of care. The second focus group addressed how to incorporate eTools for asthma into a primary care setting and included a questionnaire evaluating the perceived utility of various eTools. Focus group discussions were recorded and analyzed using thematic qualitative analysis. The responses to focus group questionnaires were assessed using descriptive quantitative analysis. Results: Qualitative analysis of the 2 focus group discussions revealed 7 key themes: designing outcome-oriented tools, gaining stakeholder trust, facilitating open lines of communication, prioritizing the end user, striving for efficiency, ensuring adaptability, and developing within existing workflows. In addition, 24 asthma indicators were rated according to clarity, relevance, feasibility, and overall usefulness. In total, 5 asthma performance indicators were identified as the most relevant. These included smoking cessation support, monitoring using objective measures, the number of emergency department visits and hospitalizations, assessment of asthma control, and presence of an asthma action plan. The eTool questionnaire responses revealed that the Asthma Action Plan Wizard and Electronic Asthma Quality of Life Questionnaire were perceived to be the most useful in primary care. Conclusions: Primary care physicians, allied health professionals, and patients consider that eTools for asthma care present a unique opportunity to improve adherence to best-practice guidelines in primary care and collect performance indicators. The strategies and themes identified in this study can be leveraged to overcome barriers associated with asthma eTool integration into primary care EMRs. The most beneficial indicators and eTools, along with the key themes identified, will guide future asthma eTool implementation. ", doi="10.2196/42767", url="https://formative.jmir.org/2023/1/e42767", url="http://www.ncbi.nlm.nih.gov/pubmed/36809175" } @Article{info:doi/10.2196/42717, author="Lee, Woo Hyun and Yang, Jun Hyun and Kim, Hyungjin and Kim, Ue-Hwan and Kim, Hyun Dong and Yoon, Ho Soon and Ham, Soo-Youn and Nam, Da Bo and Chae, Ju Kum and Lee, Dabee and Yoo, Young Jin and Bak, Hyeon So and Kim, Young Jin and Kim, Hwan Jin and Kim, Beom Ki and Jung, Im Jung and Lim, Jae-Kwang and Lee, Eun Jong and Chung, Jin Myung and Lee, Kyung Young and Kim, Seon Young and Lee, Min Sang and Kwon, Woocheol and Park, Min Chang and Kim, Yun-Hyeon and Jeong, Joo Yeon and Jin, Nam Kwang and Goo, Mo Jin", title="Deep Learning With Chest Radiographs for Making Prognoses in Patients With COVID-19: Retrospective Cohort Study", journal="J Med Internet Res", year="2023", month="Feb", day="16", volume="25", pages="e42717", keywords="COVID-19", keywords="deep learning", keywords="artificial intelligence", keywords="radiography, thoracic", keywords="prognosis", keywords="AI model", keywords="prediction model", keywords="clinical outcome", keywords="medical imaging", keywords="machine learning", abstract="Background: An artificial intelligence (AI) model using chest radiography (CXR) may provide good performance in making prognoses for COVID-19. Objective: We aimed to develop and validate a prediction model using CXR based on an AI model and clinical variables to predict clinical outcomes in patients with COVID-19. Methods: This retrospective longitudinal study included patients hospitalized for COVID-19 at multiple COVID-19 medical centers between February 2020 and October 2020. Patients at Boramae Medical Center were randomly classified into training, validation, and internal testing sets (at a ratio of 8:1:1, respectively). An AI model using initial CXR images as input, a logistic regression model using clinical information, and a combined model using the output of the AI model (as CXR score) and clinical information were developed and trained to predict hospital length of stay (LOS) ?2 weeks, need for oxygen supplementation, and acute respiratory distress syndrome (ARDS). The models were externally validated in the Korean Imaging Cohort of COVID-19 data set for discrimination and calibration. Results: The AI model using CXR and the logistic regression model using clinical variables were suboptimal to predict hospital LOS ?2 weeks or the need for oxygen supplementation but performed acceptably in the prediction of ARDS (AI model area under the curve [AUC] 0.782, 95\% CI 0.720-0.845; logistic regression model AUC 0.878, 95\% CI 0.838-0.919). The combined model performed better in predicting the need for oxygen supplementation (AUC 0.704, 95\% CI 0.646-0.762) and ARDS (AUC 0.890, 95\% CI 0.853-0.928) compared to the CXR score alone. Both the AI and combined models showed good calibration for predicting ARDS (P=.079 and P=.859). Conclusions: The combined prediction model, comprising the CXR score and clinical information, was externally validated as having acceptable performance in predicting severe illness and excellent performance in predicting ARDS in patients with COVID-19. ", doi="10.2196/42717", url="https://www.jmir.org/2023/1/e42717", url="http://www.ncbi.nlm.nih.gov/pubmed/36795468" } @Article{info:doi/10.2196/40846, author="Gonz{\'a}lez-Colom, Rub{\`e}n and Herranz, Carmen and Vela, Emili and Monterde, David and Contel, Carles Joan and Sis{\'o}-Almirall, Antoni and Piera-Jim{\'e}nez, Jordi and Roca, Josep and Cano, Isaac", title="Prevention of Unplanned Hospital Admissions in Multimorbid Patients Using Computational Modeling: Observational Retrospective Cohort Study", journal="J Med Internet Res", year="2023", month="Feb", day="16", volume="25", pages="e40846", keywords="health risk assessment", keywords="health risk profiles", keywords="transitional care", keywords="hospital readmissions", keywords="mortality", abstract="Background: Enhanced management of multimorbidity constitutes a major clinical challenge. Multimorbidity shows well-established causal relationships with the high use of health care resources and, specifically, with unplanned hospital admissions. Enhanced patient stratification is vital for achieving effectiveness through personalized postdischarge service selection. Objective: The study has a 2-fold aim: (1) generation and assessment of predictive models of mortality and readmission at 90 days after discharge; and (2) characterization of patients' profiles for personalized service selection purposes. Methods: Gradient boosting techniques were used to generate predictive models based on multisource data (registries, clinical/functional and social support) from 761 nonsurgical patients admitted in a tertiary hospital over 12 months (October 2017 to November 2018). K-means clustering was used to characterize patient profiles. Results: Performance (area under the receiver operating characteristic curve, sensitivity, and specificity) of the predictive models was 0.82, 0.78, and 0.70 and 0.72, 0.70, and 0.63 for mortality and readmissions, respectively. A total of 4 patients' profiles were identified. In brief, the reference patients (cluster 1; 281/761, 36.9\%), 53.7\% (151/281) men and mean age of 71 (SD 16) years, showed 3.6\% (10/281) mortality and 15.7\% (44/281) readmissions at 90 days following discharge. The unhealthy lifestyle habit profile (cluster 2; 179/761, 23.5\%) predominantly comprised males (137/179, 76.5\%) with similar age, mean 70 (SD 13) years, but showed slightly higher mortality (10/179, 5.6\%) and markedly higher readmission rate (49/179, 27.4\%). Patients in the frailty profile (cluster 3; 152/761, 19.9\%) were older (mean 81 years, SD 13 years) and predominantly female (63/152, 41.4\%, males). They showed medical complexity with a high level of social vulnerability and the highest mortality rate (23/152, 15.1\%), but with a similar hospitalization rate (39/152, 25.7\%) compared with cluster 2. Finally, the medical complexity profile (cluster 4; 149/761, 19.6\%), mean age 83 (SD 9) years, 55.7\% (83/149) males, showed the highest clinical complexity resulting in 12.8\% (19/149) mortality and the highest readmission rate (56/149, 37.6\%). Conclusions: The results indicated the potential to predict mortality and morbidity-related adverse events leading to unplanned hospital readmissions. The resulting patient profiles fostered recommendations for personalized service selection with the capacity for value generation. ", doi="10.2196/40846", url="https://www.jmir.org/2023/1/e40846", url="http://www.ncbi.nlm.nih.gov/pubmed/36795471" } @Article{info:doi/10.2196/37685, author="Otokiti, Umar Ahmed and Ozoude, Maryann Makuochukwu and Williams, S. Karmen and Sadiq-onilenla, A. Rasheedat and Ojo, Akin Soji and Wasarme, B. Leyla and Walsh, Samantha and Edomwande, Maxwell", title="The Need to Prioritize Model-Updating Processes in Clinical Artificial Intelligence (AI) Models: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2023", month="Feb", day="16", volume="12", pages="e37685", keywords="model updating", keywords="model calibration", keywords="artificial intelligence", keywords="machine learning", keywords="direct clinical care", abstract="Background: With an increase in the number of artificial intelligence (AI) and machine learning (ML) algorithms available for clinical settings, appropriate model updating and implementation of updates are imperative to ensure applicability, reproducibility, and patient safety. Objective: The objective of this scoping review was to evaluate and assess the model-updating practices of AI and ML clinical models that are used in direct patient-provider clinical decision-making. Methods: We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist and the PRISMA-P protocol guidance in addition to a modified CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) checklist to conduct this scoping review. A comprehensive medical literature search of databases, including Embase, MEDLINE, PsycINFO, Cochrane, Scopus, and Web of Science, was conducted to identify AI and ML algorithms that would impact clinical decision-making at the level of direct patient care. Our primary end point is the rate at which model updating is recommended by published algorithms; we will also conduct an assessment of study quality and risk of bias in all publications reviewed. In addition, we will evaluate the rate at which published algorithms include ethnic and gender demographic distribution information in their training data as a secondary end point. Results: Our initial literature search yielded approximately 13,693 articles, with approximately 7810 articles to consider for full reviews among our team of 7 reviewers. We plan to complete the review process and disseminate the results by spring of 2023. Conclusions: Although AI and ML applications in health care have the potential to improve patient care by reducing errors between measurement and model output, currently there exists more hype than hope because of the lack of proper external validation of these models. We expect to find that the AI and ML model-updating methods are proxies for model applicability and generalizability on implementation. Our findings will add to the field by determining the degree to which published models meet the criteria for clinical validity, real-life implementation, and best practices to optimize model development, and in so doing, reduce the overpromise and underachievement of the contemporary model development process. International Registered Report Identifier (IRRID): PRR1-10.2196/37685 ", doi="10.2196/37685", url="https://www.researchprotocols.org/2023/1/e37685", url="http://www.ncbi.nlm.nih.gov/pubmed/36795464" } @Article{info:doi/10.2196/39876, author="Zhao, Yang and Tsubota, Tadashi", title="The Current Status of Secondary Use of Claims, Electronic Medical Records, and Electronic Health Records in Epidemiology in Japan: Narrative Literature Review", journal="JMIR Med Inform", year="2023", month="Feb", day="14", volume="11", pages="e39876", keywords="claims", keywords="electronic medical records", keywords="EMRs", keywords="electronic health records", keywords="EHRs", keywords="epidemiology", keywords="narrative literature review", abstract="Background: Real-world data, such as claims, electronic medical records (EMRs), and electronic health records (EHRs), are increasingly being used in clinical epidemiology. Understanding the current status of existing approaches can help in designing high-quality epidemiological studies. Objective: We conducted a comprehensive narrative literature review to clarify the secondary use of claims, EMRs, and EHRs in clinical epidemiology in Japan. Methods: We searched peer-reviewed publications in PubMed from January 1, 2006, to June 30, 2021 (the date of search), which met the following 3 inclusion criteria: involvement of claims, EMRs, EHRs, or medical receipt data; mention of Japan; and published from January 1, 2006, to June 30, 2021. Eligible articles that met any of the following 6 exclusion criteria were filtered: review articles; non--disease-related articles; articles in which the Japanese population is not the sample; articles without claims, EMRs, or EHRs; full text not available; and articles without statistical analysis. Investigations of the titles, abstracts, and full texts of eligible articles were conducted automatically or manually, from which 7 categories of key information were collected. The information included organization, study design, real-world data type, database, disease, outcome, and statistical method. Results: A total of 620 eligible articles were identified for this narrative literature review. The results of the 7 categories suggested that most of the studies were conducted by academic institutes (n=429); the cohort study was the primary design that longitudinally measured outcomes of proper patients (n=533); 594 studies used claims data; the use of databases was concentrated in well-known commercial and public databases; infections (n=105), cardiovascular diseases (n=100), neoplasms (n=78), and nutritional and metabolic diseases (n=75) were the most studied diseases; most studies have focused on measuring treatment patterns (n=218), physiological or clinical characteristics (n=184), and mortality (n=137); and multivariate models were commonly used (n=414). Most (375/414, 90.6\%) of these multivariate modeling studies were performed for confounder adjustment. Logistic regression was the first choice for assessing many of the outcomes, with the exception of hospitalization or hospital stay and resource use or costs, for both of which linear regression was commonly used. Conclusions: This literature review provides a good understanding of the current status and trends in the use of claims, EMRs, and EHRs data in clinical epidemiology in Japan. The results demonstrated appropriate statistical methods regarding different outcomes, Japan-specific trends of disease areas, and the lack of use of artificial intelligence techniques in existing studies. In the future, a more precise comparison of relevant domestic research with worldwide research will be conducted to clarify the Japan-specific status and challenges. ", doi="10.2196/39876", url="https://medinform.jmir.org/2023/1/e39876", url="http://www.ncbi.nlm.nih.gov/pubmed/36787161" } @Article{info:doi/10.2196/44238, author="Yang, Liuyang and Li, Gang and Yang, Jin and Zhang, Ting and Du, Jing and Liu, Tian and Zhang, Xingxing and Han, Xuan and Li, Wei and Ma, Libing and Feng, Luzhao and Yang, Weizhong", title="Deep-Learning Model for Influenza Prediction From Multisource Heterogeneous Data in a Megacity: Model Development and Evaluation", journal="J Med Internet Res", year="2023", month="Feb", day="13", volume="25", pages="e44238", keywords="influenza", keywords="ILI", keywords="multisource heterogeneous data", keywords="deep learning", keywords="MAL model", keywords="megacity", abstract="Background: In megacities, there is an urgent need to establish more sensitive forecasting and early warning methods for acute respiratory infectious diseases. Existing prediction and early warning models for influenza and other acute respiratory infectious diseases have limitations and therefore there is room for improvement. Objective: The aim of this study was to explore a new and better-performing deep-learning model to predict influenza trends from multisource heterogeneous data in a megacity. Methods: We collected multisource heterogeneous data from the 26th week of 2012 to the 25th week of 2019, including influenza-like illness (ILI) cases and virological surveillance, data of climate and demography, and search engines data. To avoid collinearity, we selected the best predictor according to the weight and correlation of each factor. We established a new multiattention-long short-term memory (LSTM) deep-learning model (MAL model), which was used to predict the percentage of ILI (ILI\%) cases and the product of ILI\% and the influenza-positive rate (ILI\%{\texttimes}positive\%), respectively. We also combined the data in different forms and added several machine-learning and deep-learning models commonly used in the past to predict influenza trends for comparison. The R2 value, explained variance scores, mean absolute error, and mean square error were used to evaluate the quality of the models. Results: The highest correlation coefficients were found for the Baidu search data for ILI\% and for air quality for ILI\%{\texttimes}positive\%. We first used the MAL model to calculate the ILI\%, and then combined ILI\% with climate, demographic, and Baidu data in different forms. The ILI\%+climate+demography+Baidu model had the best prediction effect, with the explained variance score reaching 0.78, R2 reaching 0.76, mean absolute error of 0.08, and mean squared error of 0.01. Similarly, we used the MAL model to calculate the ILI\%{\texttimes}positive\% and combined this prediction with different data forms. The ILI\%{\texttimes}positive\%+climate+demography+Baidu model had the best prediction effect, with an explained variance score reaching 0.74, R2 reaching 0.70, mean absolute error of 0.02, and mean squared error of 0.02. Comparisons with random forest, extreme gradient boosting, LSTM, and gated current unit models showed that the MAL model had the best prediction effect. Conclusions: The newly established MAL model outperformed existing models. Natural factors and search engine query data were more helpful in forecasting ILI patterns in megacities. With more timely and effective prediction of influenza and other respiratory infectious diseases and the epidemic intensity, early and better preparedness can be achieved to reduce the health damage to the population. ", doi="10.2196/44238", url="https://www.jmir.org/2023/1/e44238", url="http://www.ncbi.nlm.nih.gov/pubmed/36780207" } @Article{info:doi/10.2196/43486, author="Rogers, Parker and Boussina, E. Aaron and Shashikumar, P. Supreeth and Wardi, Gabriel and Longhurst, A. Christopher and Nemati, Shamim", title="Optimizing the Implementation of Clinical Predictive Models to Minimize National Costs: Sepsis Case Study", journal="J Med Internet Res", year="2023", month="Feb", day="13", volume="25", pages="e43486", keywords="sepsis", keywords="machine learning", keywords="evaluation", keywords="utility assessment", keywords="workflow simulation", keywords="simulation", keywords="model", keywords="implementation", keywords="data", keywords="acute kidney injury", keywords="injury", keywords="technology", keywords="care", keywords="diagnosis", keywords="clinical", keywords="cost", abstract="Background: Sepsis costs and incidence vary dramatically across diagnostic categories, warranting a customized approach for implementing predictive models. Objective: The aim of this study was to optimize the parameters of a sepsis prediction model within distinct patient groups to minimize the excess cost of sepsis care and analyze the potential effect of factors contributing to end-user response to sepsis alerts on overall model utility. Methods: We calculated the excess costs of sepsis to the Centers for Medicare and Medicaid Services (CMS) by comparing patients with and without a secondary sepsis diagnosis but with the same primary diagnosis and baseline comorbidities. We optimized the parameters of a sepsis prediction algorithm across different diagnostic categories to minimize these excess costs. At the optima, we evaluated diagnostic odds ratios and analyzed the impact of compliance factors such as noncompliance, treatment efficacy, and tolerance for false alarms on the net benefit of triggering sepsis alerts. Results: Compliance factors significantly contributed to the net benefit of triggering a sepsis alert. However, a customized deployment policy can achieve a significantly higher diagnostic odds ratio and reduced costs of sepsis care. Implementing our optimization routine with powerful predictive models could result in US \$4.6 billion in excess cost savings for CMS. Conclusions: We designed a framework for customizing sepsis alert protocols within different diagnostic categories to minimize excess costs and analyzed model performance as a function of false alarm tolerance and compliance with model recommendations. We provide a framework that CMS policymakers could use to recommend minimum adherence rates to the early recognition and appropriate care of sepsis that is sensitive to hospital department-level incidence rates and national excess costs. Customizing the implementation of clinical predictive models by accounting for various behavioral and economic factors may improve the practical benefit of predictive models. ", doi="10.2196/43486", url="https://www.jmir.org/2023/1/e43486", url="http://www.ncbi.nlm.nih.gov/pubmed/36780203" } @Article{info:doi/10.2196/39987, author="Glanville, David and Hutchinson, Anastasia and Khaw, Damien", title="Handheld Computer Devices to Support Clinical Decision-making in Acute Nursing Practice: Systematic Scoping Review", journal="J Med Internet Res", year="2023", month="Feb", day="13", volume="25", pages="e39987", keywords="handheld computer devices", keywords="smartphones", keywords="mobile computing", keywords="mobile health", keywords="nursing", keywords="acute care", keywords="decision-making", keywords="clinical decision-making", keywords="scoping review", keywords="mobile phone", abstract="Background: Nursing care is increasingly supported by computerized information systems and decision support aids. Since the advent of handheld computer devices (HCDs), there has been limited exploration of their use in nursing practice. Objective: The study aimed to understand the professional and clinical impacts of the use of mobile health apps in nursing to assist clinical decision-making in acute care settings. The study also aimed to explore the scope of published research and identify key nomenclature with respect to research in this emerging field within nursing practice. Methods: This scoping review involved a tripartite search of electronic databases (CINAHL, Embase, MEDLINE, and Google Scholar) using preliminary, broad, and comprehensive search terms. The included studies were hand searched for additional citations. Two researchers independently screened the studies for inclusion and appraised quality using structured critical appraisal tools. Results: Of the 2309 unique studies screened, 28 (1.21\%) were included in the final analyses: randomized controlled trials (n=3, 11\%) and quasi-experimental (n=9, 32\%), observational (n=10, 36\%), mixed methods (n=2, 7\%), qualitative descriptive (n=2, 7\%), and diagnostic accuracy (n=2, 7\%) studies. Studies investigated the impact of HCDs on nursing decisions (n=12, 43\%); the effectiveness, safety, and quality of care (n=9, 32\%); and HCD usability, uptake, and acceptance (n=14, 50\%) and were judged to contain moderate-to-high risk of bias. The terminology used to describe HCDs was heterogenous across studies, comprising 24 unique descriptors and 17 individual concepts that reflected 3 discrete technology platforms (``PDA technology,'' ``Smartphone/tablet technology,'' and ``Health care--specific technology''). Study findings varied, as did the range of decision-making modalities targeted by HCD interventions. Interventions varied according to the level of clinician versus algorithmic judgment: unstructured clinical judgment, structured clinical judgment, and computerized algorithmic judgment. Conclusions: The extant literature is varied but suggests that HCDs can be used effectively to support aspects of acute nursing care. However, there is a dearth of high-level evidence regarding this phenomenon and studies exploring the degree to which HCD implementation may affect acute nursing care delivery workflow. Additional targeted research using rigorous experimental designs is needed in this emerging field to determine the true potential of HCDs in optimizing acute nursing care. ", doi="10.2196/39987", url="https://www.jmir.org/2023/1/e39987", url="http://www.ncbi.nlm.nih.gov/pubmed/36780222" } @Article{info:doi/10.2196/41450, author="Bauer, Cici and Zhang, Kehe and Li, Wenjun and Bernson, Dana and Dammann, Olaf and LaRochelle, R. Marc and Stopka, J. Thomas", title="Small Area Forecasting of Opioid-Related Mortality: Bayesian Spatiotemporal Dynamic Modeling Approach", journal="JMIR Public Health Surveill", year="2023", month="Feb", day="10", volume="9", pages="e41450", keywords="opioid-related mortality", keywords="small area estimation", keywords="spatiotemporal models", keywords="Bayesian", keywords="forecasting", abstract="Background: Opioid-related overdose mortality has remained at crisis levels across the United States, increasing 5-fold and worsened during the COVID-19 pandemic. The ability to provide forecasts of opioid-related mortality at granular geographical and temporal scales may help guide preemptive public health responses. Current forecasting models focus on prediction on a large geographical scale, such as states or counties, lacking the spatial granularity that local public health officials desire to guide policy decisions and resource allocation. Objective: The overarching objective of our study was to develop Bayesian spatiotemporal dynamic models to predict opioid-related mortality counts and rates at temporally and geographically granular scales (ie, ZIP Code Tabulation Areas [ZCTAs]) for Massachusetts. Methods: We obtained decedent data from the Massachusetts Registry of Vital Records and Statistics for 2005 through 2019. We developed Bayesian spatiotemporal dynamic models to predict opioid-related mortality across Massachusetts' 537 ZCTAs. We evaluated the prediction performance of our models using the one-year ahead approach. We investigated the potential improvement of prediction accuracy by incorporating ZCTA-level demographic and socioeconomic determinants. We identified ZCTAs with the highest predicted opioid-related mortality in terms of rates and counts and stratified them by rural and urban areas. Results: Bayesian dynamic models with the full spatial and temporal dependency performed best. Inclusion of the ZCTA-level demographic and socioeconomic variables as predictors improved the prediction accuracy, but only in the model that did not account for the neighborhood-level spatial dependency of the ZCTAs. Predictions were better for urban areas than for rural areas, which were more sparsely populated. Using the best performing model and the Massachusetts opioid-related mortality data from 2005 through 2019, our models suggested a stabilizing pattern in opioid-related overdose mortality in 2020 and 2021 if there were no disruptive changes to the trends observed for 2005-2019. Conclusions: Our Bayesian spatiotemporal models focused on opioid-related overdose mortality data facilitated prediction approaches that can inform preemptive public health decision-making and resource allocation. While sparse data from rural and less populated locales typically pose special challenges in small area predictions, our dynamic Bayesian models, which maximized information borrowing across geographic areas and time points, were used to provide more accurate predictions for small areas. Such approaches can be replicated in other jurisdictions and at varying temporal and geographical levels. We encourage the formation of a modeling consortium for fatal opioid-related overdose predictions, where different modeling techniques could be ensembled to inform public health policy. ", doi="10.2196/41450", url="https://publichealth.jmir.org/2023/1/e41450", url="http://www.ncbi.nlm.nih.gov/pubmed/36763450" } @Article{info:doi/10.2196/41344, author="Kinast, Benjamin and Ulrich, Hannes and Bergh, Bj{\"o}rn and Schreiweis, Bj{\"o}rn", title="Functional Requirements for Medical Data Integration into Knowledge Management Environments: Requirements Elicitation Approach Based on Systematic Literature Analysis", journal="J Med Internet Res", year="2023", month="Feb", day="9", volume="25", pages="e41344", keywords="data integration", keywords="requirements engineering", keywords="requirements", keywords="knowledge management", keywords="software engineering", abstract="Background: In patient care, data are historically generated and stored in heterogeneous databases that are domain specific and often noninteroperable or isolated. As the amount of health data increases, the number of isolated data silos is also expected to grow, limiting the accessibility of the collected data. Medical informatics is developing ways to move from siloed data to a more harmonized arrangement in information architectures. This paradigm shift will allow future research to integrate medical data at various levels and from various sources. Currently, comprehensive requirements engineering is working on data integration projects in both patient care-- and research-oriented contexts, and it is significantly contributing to the success of such projects. In addition to various stakeholder-based methods, document-based requirement elicitation is a valid method for improving the scope and quality of requirements. Objective: Our main objective was to provide a general catalog of functional requirements for integrating medical data into knowledge management environments. We aimed to identify where integration projects intersect to derive consistent and representative functional requirements from the literature. On the basis of these findings, we identified which functional requirements for data integration exist in the literature and thus provide a general catalog of requirements. Methods: This work began by conducting a literature-based requirement elicitation based on a broad requirement engineering approach. Thus, in the first step, we performed a web-based systematic literature review to identify published articles that dealt with the requirements for medical data integration. We identified and analyzed the available literature by applying the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. In the second step, we screened the results for functional requirements using the requirements engineering method of document analysis and derived the requirements into a uniform requirement syntax. Finally, we classified the elicited requirements into a category scheme that represents the data life cycle. Results: Our 2-step requirements elicitation approach yielded 821 articles, of which 61 (7.4\%) were included in the requirement elicitation process. There, we identified 220 requirements, which were covered by 314 references. We assigned the requirements to different data life cycle categories as follows: 25\% (55/220) to data acquisition, 35.9\% (79/220) to data processing, 12.7\% (28/220) to data storage, 9.1\% (20/220) to data analysis, 6.4\% (14/220) to metadata management, 2.3\% (5/220) to data lineage, 3.2\% (7/220) to data traceability, and 5.5\% (12/220) to data security. Conclusions: The aim of this study was to present a cross-section of functional data integration--related requirements defined in the literature by other researchers. The aim was achieved with 220 distinct requirements from 61 publications. We concluded that scientific publications are, in principle, a reliable source of information for functional requirements with respect to medical data integration. Finally, we provide a broad catalog to support other scientists in the requirement elicitation phase. ", doi="10.2196/41344", url="https://www.jmir.org/2023/1/e41344", url="http://www.ncbi.nlm.nih.gov/pubmed/36757764" } @Article{info:doi/10.2196/43130, author="Yap, L. Tracey and Horn, D. Susan and Sharkey, D. Phoebe and Brooks, R. Katie and Kennerly, Susan", title="The Nursing Home Severity Index and Application to Pressure Injury Risk: Measure Development and Validation Study", journal="JMIR Aging", year="2023", month="Feb", day="9", volume="6", pages="e43130", keywords="geriatrics", keywords="nursing homes", keywords="pressure ulcer", keywords="propensity scores", keywords="severity of illness index", keywords="development", keywords="validation", keywords="clinical", keywords="treatment", keywords="pressure injury", keywords="injury", keywords="risk", keywords="prevention", abstract="Background: An assessment tool is needed to measure the clinical severity of nursing home residents to improve the prediction of outcomes and provide guidance in treatment planning. Objective: This study aims to describe the development of the Nursing Home Severity Index, a clinical severity measure targeted for nursing home residents with the potential to be individually tailored to different outcomes, such as pressure injury. Methods: A retrospective nonexperimental design was used to develop and validate the Nursing Home Severity Index using secondary data from 9 nursing homes participating in the 12-month preintervention period of the Turn Everyone and Move for Ulcer Prevention (TEAM-UP) pragmatic clinical trial. Expert opinion and clinical literature were used to identify indicators, which were grouped into severity dimensions. Index performance and validation to predict risk of pressure injury were accomplished using secondary data from nursing home electronic health records, Minimum Data Sets, and Risk Management Systems. Logistic regression models including a resident's Worst-Braden score with/without severity dimensions generated propensity scores. Goodness of fit for overall models was assessed using C statistic; the significance of improvement of fit after adding severity components to the model was determined using the likelihood ratio chi-square test. The significance of each component was assessed with odds ratios. Validation based on randomly selected 65\% training and 35\% validation data sets was used to confirm the reliability of the severity measure. Finally, the discriminating ability of models was evaluated using propensity stratification to evaluate which model best discriminated between residents with/without pressure injury. Results: Data from 1015 residents without pressure injuries on admission were used for the Nursing Home Severity Index--Pressure Injury and included laboratory, weights/vitals/pain, underweight, and locomotion severity dimensions. Logistic regression C statistic measuring predictive accuracy increased by 19.3\% (from 0.627 to 0.748; P<.001) when adding four severity dimensions to Worst-Braden scores. Significantly higher odds of developing pressure injuries were associated with increasing dimension scores. The use of the three highest propensity deciles predicting the greatest risk of pressure injury improved predictive accuracy by detecting 21 more residents who developed pressure injury (n=58, 65.2\% vs n=37, 42.0\%) when both severity dimensions and Worst-Braden score were included in prediction modeling. Conclusions: The clinical Nursing Home Severity Index--Pressure Injury was successfully developed and tested using the outcome of pressure injury. Overall predictive capacity was enhanced when using severity dimensions in combination with Worst-Braden scores. This index has the potential to significantly impact the quality of care decisions aimed at improving individual pressure injury prevention plans. Trial Registration: ClinicalTrials.gov NCT02996331; http://clinicaltrials.gov/ct2/show/NCT02996331 ", doi="10.2196/43130", url="https://aging.jmir.org/2023/1/e43130", url="http://www.ncbi.nlm.nih.gov/pubmed/36757779" } @Article{info:doi/10.2196/40455, author="Gabriel, Allanigue Rodney and Simpson, Sierra and Zhong, William and Burton, Nicole Brittany and Mehdipour, Soraya and Said, Tadros Engy", title="A Neural Network Model Using Pain Score Patterns to Predict the Need for Outpatient Opioid Refills Following Ambulatory Surgery: Algorithm Development and Validation", journal="JMIR Perioper Med", year="2023", month="Feb", day="8", volume="6", pages="e40455", keywords="opioids", keywords="ambulatory surgery", keywords="machine learning", keywords="surgery", keywords="outpatient", keywords="pain medication", keywords="pain", keywords="pain management", keywords="patient needs", keywords="predict", keywords="algorithms", keywords="clinical decision support", keywords="pain care", abstract="Background: Expansion of clinical guidance tools is crucial to identify patients at risk of requiring an opioid refill after outpatient surgery. Objective: The objective of this study was to develop machine learning algorithms incorporating pain and opioid features to predict the need for outpatient opioid refills following ambulatory surgery. Methods: Neural networks, regression, random forest, and a support vector machine were used to evaluate the data set. For each model, oversampling and undersampling techniques were implemented to balance the data set. Hyperparameter tuning based on k-fold cross-validation was performed, and feature importance was ranked based on a Shapley Additive Explanations (SHAP) explainer model. To assess performance, we calculated the average area under the receiver operating characteristics curve (AUC), F1-score, sensitivity, and specificity for each model. Results: There were 1333 patients, of whom 144 (10.8\%) refilled their opioid prescription within 2 weeks after outpatient surgery. The average AUC calculated from k-fold cross-validation was 0.71 for the neural network model. When the model was validated on the test set, the AUC was 0.75. The features with the highest impact on model output were performance of a regional nerve block, postanesthesia care unit maximum pain score, postanesthesia care unit median pain score, active smoking history, and total perioperative opioid consumption. Conclusions: Applying machine learning algorithms allows providers to better predict outcomes that require specialized health care resources such as transitional pain clinics. This model can aid as a clinical decision support for early identification of at-risk patients who may benefit from transitional pain clinic care perioperatively in ambulatory surgery. ", doi="10.2196/40455", url="https://periop.jmir.org/2023/1/e40455", url="http://www.ncbi.nlm.nih.gov/pubmed/36753316" } @Article{info:doi/10.2196/43734, author="Lu, Ya-Ting and Chao, Horng-Jiun and Chiang, Yi-Chun and Chen, Hsiang-Yin", title="Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation", journal="J Med Internet Res", year="2023", month="Feb", day="7", volume="25", pages="e43734", keywords="amiodarone", keywords="thyroid dysfunction", keywords="machine learning", keywords="oversampling", keywords="extreme gradient boosting", keywords="adverse effect", keywords="resampling", keywords="thyroid", keywords="predict", keywords="risk", abstract="Background: Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. Objective: We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning--based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. Methods: This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling--edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. Results: The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3\%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4\%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8\%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. Conclusions: Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support. ", doi="10.2196/43734", url="https://www.jmir.org/2023/1/e43734", url="http://www.ncbi.nlm.nih.gov/pubmed/36749620" } @Article{info:doi/10.2196/41577, author="Zale, D. Andrew and Abusamaan, S. Mohammed and McGready, John and Mathioudakis, Nestoras", title="Prediction of Next Glucose Measurement in Hospitalized Patients by Comparing Various Regression Methods: Retrospective Cohort Study", journal="JMIR Form Res", year="2023", month="Jan", day="31", volume="7", pages="e41577", keywords="hospital", keywords="glucose", keywords="inpatient", keywords="prediction", keywords="regression", keywords="machine learning", abstract="Background: Continuous glucose monitors have shown great promise in improving outpatient blood glucose (BG) control; however, continuous glucose monitors are not routinely used in hospitals, and glucose management is driven by point-of-care (finger stick) and serum glucose measurements in most patients. Objective: This study aimed to evaluate times series approaches for prediction of inpatient BG using only point-of-care and serum glucose observations. Methods: Our data set included electronic health record data from 184,320 admissions, from patients who received at least one unit of subcutaneous insulin, had at least 4 BG measurements, and were discharged between January 1, 2015, and May 31, 2019, from 5 Johns Hopkins Health System hospitals. A total of 2,436,228 BG observations were included after excluding measurements obtained in quick succession, from patients who received intravenous insulin, or from critically ill patients. After exclusion criteria, 2.85\% (3253/113,976), 32.5\% (37,045/113,976), and 1.06\% (1207/113,976) of admissions had a coded diagnosis of type 1, type 2, and other diabetes, respectively. The outcome of interest was the predicted value of the next BG measurement (mg/dL). Multiple time series predictors were created and analyzed by comparing those predictors and the index BG measurement (sample-and-hold technique) with next BG measurement. The population was classified by glycemic variability based on the coefficient of variation. To compare the performance of different time series predictors among one another, R2, root mean squared error, and Clarke Error Grid were calculated and compared with the next BG measurement. All these time series predictors were then used together in Cubist, linear, random forest, partial least squares, and k-nearest neighbor methods. Results: The median number of BG measurements from 113,976 admissions was 12 (IQR 5-24). The R2 values for the sample-and-hold, 2-hour, 4-hour, 16-hour, and 24-hour moving average were 0.529, 0.504, 0.481, 0.467, and 0.459, respectively. The R2 values for 4-hour moving average based on glycemic variability were 0.680, 0.480, 0.290, and 0.205 for low, medium, high, and very high glucose variability, respectively. The proportion of BG predictions in zone A of the Clarke Error Grid analysis was 61\%, 59\%, 27\%, and 53\% for 4-hour moving average, 24-hour moving average, 3 observation rolling regression, and recursive regression predictors, respectively. In a fully adjusted Cubist, linear, random forest, partial least squares, and k-nearest neighbor model, the R2 values were 0.563, 0.526, 0.538, and 0.472, respectively. Conclusions: When analyzing time series predictors independently, increasing variability in a patient's BG decreased predictive accuracy. Similarly, inclusion of older BG measurements decreased predictive accuracy. These relationships become weaker as glucose variability increases. Machine learning techniques marginally augmented the performance of time series predictors for predicting a patient's next BG measurement. Further studies should determine the potential of using time series analyses for prediction of inpatient dysglycemia. ", doi="10.2196/41577", url="https://formative.jmir.org/2023/1/e41577", url="http://www.ncbi.nlm.nih.gov/pubmed/36719713" } @Article{info:doi/10.2196/40194, author="Landau, Y. Aviv and Blanchard, Ashley and Atkins, Nia and Salazar, Stephanie and Cato, Kenrick and Patton, U. Desmond and Topaz, Maxim", title="Black and Latinx Primary Caregiver Considerations for Developing and Implementing a Machine Learning--Based Model for Detecting Child Abuse and Neglect With Implications for Racial Bias Reduction: Qualitative Interview Study With Primary Caregivers", journal="JMIR Form Res", year="2023", month="Jan", day="31", volume="7", pages="e40194", keywords="child abuse and neglect", keywords="pediatric emergency departments", keywords="machine learning--based risk models", keywords="electronic health records", keywords="primary caregivers", keywords="child", keywords="abuse", keywords="neglect", keywords="model", keywords="epidemic", keywords="development", keywords="implementation", keywords="community", keywords="machine learning", abstract="Background: Child abuse and neglect, once viewed as a social problem, is now an epidemic. Moreover, health providers agree that existing stereotypes may link racial and social class issues to child abuse. The broad adoption of electronic health records (EHRs) in clinical settings offers a new avenue for addressing this epidemic. To reduce racial bias and improve the development, implementation, and outcomes of machine learning (ML)--based models that use EHR data, it is crucial to involve marginalized members of the community in the process. Objective: This study elicited Black and Latinx primary caregivers' viewpoints regarding child abuse and neglect while living in underserved communities to highlight considerations for designing an ML-based model for detecting child abuse and neglect in emergency departments (EDs) with implications for racial bias reduction and future interventions. Methods: We conducted a qualitative study using in-depth interviews with 20 Black and Latinx primary caregivers whose children were cared for at a single pediatric tertiary-care ED to gain insights about child abuse and neglect and their experiences with health providers. Results: Three central themes were developed in the coding process: (1) primary caregivers' perspectives on the definition of child abuse and neglect, (2) primary caregivers' experiences with health providers and medical documentation, and (3) primary caregivers' perceptions of child protective services. Conclusions: Our findings highlight essential considerations from primary caregivers for developing an ML-based model for detecting child abuse and neglect in ED settings. This includes how to define child abuse and neglect from a primary caregiver lens. Miscommunication between patients and health providers can potentially lead to a misdiagnosis, and therefore, have a negative impact on medical documentation. Additionally, the outcome and application of the ML-based models for detecting abuse and neglect may cause additional harm than expected to the community. Further research is needed to validate these findings and integrate them into creating an ML-based model. ", doi="10.2196/40194", url="https://formative.jmir.org/2023/1/e40194", url="http://www.ncbi.nlm.nih.gov/pubmed/36719717" } @Article{info:doi/10.2196/39103, author="Ferreira-Santos, Daniela and Pereira Rodrigues, Pedro", title="The Association Between Comorbidities and Prescribed Drugs in Patients With Suspected Obstructive Sleep Apnea: Inductive Rule Learning Approach", journal="J Med Internet Res", year="2023", month="Jan", day="30", volume="25", pages="e39103", keywords="association rule mining", keywords="drug", keywords="electronic health records", keywords="obstructive sleep apnea", keywords="problem list", keywords="comorbidities", keywords="prescribed drugs", keywords="sleep apnea", keywords="disease-drug associations", keywords="diagnoses", keywords="clinical data", keywords="EHR", doi="10.2196/39103", url="https://www.jmir.org/2023/1/e39103", url="http://www.ncbi.nlm.nih.gov/pubmed/36716086" } @Article{info:doi/10.2196/36477, author="Chen, Min and Tan, Xuan and Padman, Rema", title="A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study", journal="J Med Internet Res", year="2023", month="Jan", day="30", volume="25", pages="e36477", keywords="stroke", keywords="diagnosis", keywords="triage", keywords="decision support", keywords="social determinants of health", keywords="prediction", keywords="machine learning", keywords="interpretability", keywords="medical decision-making", keywords="retrospective study", keywords="claims data", abstract="Background: The key to effective stroke management is timely diagnosis and triage. Machine learning (ML) methods developed to assist in detecting stroke have focused on interpreting detailed clinical data such as clinical notes and diagnostic imaging results. However, such information may not be readily available when patients are initially triaged, particularly in rural and underserved communities. Objective: This study aimed to develop an ML stroke prediction algorithm based on data widely available at the time of patients' hospital presentations and assess the added value of social determinants of health (SDoH) in stroke prediction. Methods: We conducted a retrospective study of the emergency department and hospitalization records from 2012 to 2014 from all the acute care hospitals in the state of Florida, merged with the SDoH data from the American Community Survey. A case-control design was adopted to construct stroke and stroke mimic cohorts. We compared the algorithm performance and feature importance measures of the ML models (ie, gradient boosting machine and random forest) with those of the logistic regression model based on 3 sets of predictors. To provide insights into the prediction and ultimately assist care providers in decision-making, we used TreeSHAP for tree-based ML models to explain the stroke prediction. Results: Our analysis included 143,203 hospital visits of unique patients, and it was confirmed based on the principal diagnosis at discharge that 73\% (n=104,662) of these patients had a stroke. The approach proposed in this study has high sensitivity and is particularly effective at reducing the misdiagnosis of dangerous stroke chameleons (false-negative rate <4\%). ML classifiers consistently outperformed the benchmark logistic regression in all 3 input combinations. We found significant consistency across the models in the features that explain their performance. The most important features are age, the number of chronic conditions on admission, and primary payer (eg, Medicare or private insurance). Although both the individual- and community-level SDoH features helped improve the predictive performance of the models, the inclusion of the individual-level SDoH features led to a much larger improvement (area under the receiver operating characteristic curve increased from 0.694 to 0.823) than the inclusion of the community-level SDoH features (area under the receiver operating characteristic curve increased from 0.823 to 0.829). Conclusions: Using data widely available at the time of patients' hospital presentations, we developed a stroke prediction model with high sensitivity and reasonable specificity. The prediction algorithm uses variables that are routinely collected by providers and payers and might be useful in underresourced hospitals with limited availability of sensitive diagnostic tools or incomplete data-gathering capabilities. ", doi="10.2196/36477", url="https://www.jmir.org/2023/1/e36477", url="http://www.ncbi.nlm.nih.gov/pubmed/36716097" } @Article{info:doi/10.2196/42540, author="Chrimes, Dillon", title="Using Decision Trees as an Expert System for Clinical Decision Support for COVID-19", journal="Interact J Med Res", year="2023", month="Jan", day="30", volume="12", pages="e42540", keywords="assessment tool", keywords="chatbot", keywords="clinical decision support", keywords="COVID-19", keywords="decision tree", keywords="digital health tool", keywords="framework", keywords="health informatics", keywords="health intervention", keywords="prototype", doi="10.2196/42540", url="https://www.i-jmr.org/2023/1/e42540", url="http://www.ncbi.nlm.nih.gov/pubmed/36645840" } @Article{info:doi/10.2196/41614, author="Kozak, Karol and Seidel, Andr{\'e} and Matvieieva, Nataliia and Neupetsch, Constanze and Teicher, Uwe and Lemme, Gordon and Ben Achour, Anas and Barth, Martin and Ihlenfeldt, Steffen and Drossel, Welf-Guntram", title="Unique Device Identification--Based Linkage of Hierarchically Accessible Data Domains in Prospective Surgical Hospital Data Ecosystems: User-Centered Design Approach", journal="JMIR Med Inform", year="2023", month="Jan", day="27", volume="11", pages="e41614", keywords="electronic health record", keywords="unique device identification", keywords="cyber-physical production systems", keywords="mHealth", keywords="data integration ecosystem", keywords="hierarchical data access", keywords="shell embedded role model", abstract="Background: The electronic health record (EHR) targets systematized collection of patient-specific, electronically stored health data. The EHR is an evolving concept driven by ongoing developments and open or unclear legal issues concerning medical technologies, cross-domain data integration, and unclear access roles. Consequently, an interdisciplinary discourse based on representative pilot scenarios is required to connect previously unconnected domains. Objective: We address cross-domain data integration including access control using the specific example of a unique device identification (UDI)--expanded hip implant. In fact, the integration of technical focus data into the hospital information system (HIS) is considered based on surgically relevant information. Moreover, the acquisition of social focus data based on mobile health (mHealth) is addressed, covering data integration and networking with therapeutic intervention and acute diagnostics data. Methods: In addition to the additive manufacturing of a hip implant with the integration of a UDI, we built a database that combines database technology and a wrapper layer known from extract, transform, load systems and brings it into a SQL database, WEB application programming interface (API) layer (back end), interface layer (rest API), and front end. It also provides semantic integration through connection mechanisms between data elements. Results: A hip implant is approached by design, production, and verification while linking operation-relevant specifics like implant-bone fit by merging patient-specific image material (computed tomography, magnetic resonance imaging, or a biomodel) and the digital implant twin for well-founded selection pairing. This decision-facilitating linkage, which improves surgical planning, relates to patient-specific postoperative influencing factors during the healing phase. A unique product identification approach is presented, allowing a postoperative read-out with state-of-the-art hospital technology while enabling future access scenarios for patient and implant data. The latter was considered from the manufacturing perspective using the process manufacturing chain for a (patient-specific) implant to identify quality-relevant data for later access. In addition, sensor concepts were identified to use to monitor the patient-implant interaction during the healing phase using wearables, for example. A data aggregation and integration concept for heterogeneous data sources from the considered focus domains is also presented. Finally, a hierarchical data access concept is shown, protecting sensitive patient data from misuse using existing scenarios. Conclusions: Personalized medicine requires cross-domain linkage of data, which, in turn, require an appropriate data infrastructure and adequate hierarchical data access solutions in a shared and federated data space. The hip implant is used as an example for the usefulness of cross-domain data linkage since it bundles social, medical, and technical aspects of the implantation. It is necessary to open existing databases using interfaces for secure integration of data from end devices and to assure availability through suitable access models while guaranteeing long-term, independent data persistence. A suitable strategy requires the combination of technical solutions from the areas of identity and trust, federated data storage, cryptographic procedures, and software engineering as well as organizational changes. ", doi="10.2196/41614", url="https://medinform.jmir.org/2023/1/e41614", url="http://www.ncbi.nlm.nih.gov/pubmed/36705946" } @Article{info:doi/10.2196/43028, author="Lackenbauer, Wolfgang and Gasselich, Simon and Lickel, Edda Martina and Beikircher, Reinhard and Keip, Christian and Rausch, Florian and Wieser, Manfred and Selfe, James and Janssen, Jessie", title="The Ability of Austrian Qualified Physiotherapists to Make Accurate Keep-Refer Decisions and to Detect Serious Pathologies Based on Clinical Vignettes: Protocol for a Cross-sectional Web-Based Survey", journal="JMIR Res Protoc", year="2023", month="Jan", day="24", volume="12", pages="e43028", keywords="red flags", keywords="clinical reasoning", keywords="physiotherapy", keywords="screening", keywords="referral", keywords="musculoskeletal", abstract="Background: The recognition of serious pathologies affecting the musculoskeletal (MSK) system, especially in the early stage of a disease, is an important but challenging task. The prevalence of such serious pathologies is currently low. However, in our progressing aging population, it is anticipated that serious pathologies affecting the MSK system will be on the rise. Physiotherapists, as part of a wider health care team, can play a valuable role in the recognition of serious pathologies. It is at present unknown how accurately Austrian qualified physiotherapists can detect the presence of serious pathologies affecting the MSK system and therefore determine whether physiotherapy management is indicated (keep patients) or not (refer patients to a medical doctor). Objective: We will explore the current ability of Austrian qualified physiotherapists to recognize serious pathologies by using validated clinical vignettes. Methods: As part of an electronic web-based survey, these vignettes will be distributed among a convenience sample of qualified Austrian physiotherapists working in a hospital or private setting. The survey will consist of four sections: (1) demographics and general information, (2) the clinical vignettes, (3) questions concerning the clinical vignettes, and (4) self-perceived knowledge gaps and learning preferences from the perspective of study participants. Results will further be used for (1) international comparison with similar studies from the existing literature and (2) gaining insight into the participants' self-perceived knowledge gaps and learning preferences for increasing their knowledge level about keep-refer decision-making and detecting serious pathologies. Results: Data collection took place between May 2022 and June 2022. As of June 2022, a total of 479 Austrian physiotherapists completed the survey. Data analysis has started, and we aim to publish the results in 2023. Conclusions: The results of this survey will provide insights into the ability of Austrian physiotherapists to make accurate keep-refer decisions and to recognize the presence of serious pathologies using clinical vignettes. The results of this survey are expected to serve as a basis for future training in this area. International Registered Report Identifier (IRRID): DERR1-10.2196/43028 ", doi="10.2196/43028", url="https://www.researchprotocols.org/2023/1/e43028", url="http://www.ncbi.nlm.nih.gov/pubmed/36692940" } @Article{info:doi/10.2196/39051, author="Nie, Xin Jason and Heidebrecht, Christine and Zettler, Andrea and Pearce, Jacklyn and Cunha, Rafael and Quan, Sherman and Mansfield, Elizabeth and Tang, Terence", title="The Perceived Ease of Use and Perceived Usefulness of a Web-Based Interprofessional Communication and Collaboration Platform in the Hospital Setting: Interview Study With Health Care Providers", journal="JMIR Hum Factors", year="2023", month="Jan", day="23", volume="10", pages="e39051", keywords="health information technology", keywords="communication and collaboration", keywords="teamwork", keywords="design", keywords="technology acceptance model", keywords="qualitative method", keywords="communication", keywords="collaboration", keywords="hospital", keywords="care", keywords="team", keywords="professional", keywords="support", keywords="health information", keywords="technology", keywords="clinician", keywords="members", keywords="complex", keywords="lesson", keywords="education", abstract="Background: Hospitalized patients with complex care needs require an interprofessional team of health professionals working together to support their care in hospitals and during discharge planning. However, interprofessional communication and collaboration in inpatient settings are often fragmented and inefficient, leading to poor patient outcomes and provider frustration. Health information technology can potentially help improve team communication and collaboration; however, to date, evidence of its effectiveness is lacking. There are also concerns that current implementations might further fragment communication and increase the clinician burden without proven benefits. Objective: In this study, we aimed to generate transferrable lessons for future designers of health information technology tools that facilitate team communication and collaboration. Methods: A secondary analysis of the qualitative component of the mixed methods evaluation was performed. The electronic communication and collaboration platform was implemented in 2 general internal medicine wards in a large community teaching hospital in Mississauga, Ontario, Canada. Fifteen inpatient clinicians in those wards, including nurses, physicians, and allied health care providers, were recruited to participate in semistructured interviews about their experience with a co-designed electronic communication and collaboration tool. Data were analyzed using the Technology Acceptance Model, and themes related to the constructs of perceived ease of use (PEOU) and perceived usefulness (PU) were identified. Results: A secondary analysis guided by the Technology Acceptance Model highlighted important points. Intuitive design precluded training as a barrier to use, but lack of training may hinder participants' PEOU if features designed for efficiency are not discovered by users. Organized information was found to be useful for creating a comprehensive clinical picture of each patient and facilitating improved handovers. However, information needs to be both comprehensive and succinct, and information overload may negatively impact PEOU. The mixed paper and electronic practice environment also negatively impacted PEOU owing to unavoidable double documentation and the need for printing. Participants perceived the tool to be useful as it improved efficiency in information retrieval and documentation, improved the handover process, afforded another mode of communication when face-to-face communication was impractical, and improved shared awareness. The PU of this tool depends on its optimal use by all team members. Conclusions: Electronic tools can support communication and collaboration among interprofessional teams caring for patients with complex needs. There are transferable lessons learned that can improve the PU and PEOU of future systems. ", doi="10.2196/39051", url="https://humanfactors.jmir.org/2023/1/e39051", url="http://www.ncbi.nlm.nih.gov/pubmed/36689261" } @Article{info:doi/10.2196/39490, author="Dorr, David and D'Autremont, Chris and Richardson, E. Joshua and Bobo, Michelle and Terndrup, Christopher and Dunne, J. M. and Cheng, Anthony and Rope, Robert", title="Patient-Facing Clinical Decision Support for High Blood Pressure Control: Patient Survey", journal="JMIR Cardio", year="2023", month="Jan", day="23", volume="7", pages="e39490", keywords="high blood pressure", keywords="hypertension", keywords="clinical decision support", keywords="shared decision-making", keywords="blood pressure control", keywords="decision-making support", keywords="patient engagement", keywords="patient support tool", abstract="Background: High blood pressure (HBP) affects nearly half of adults in the United States and is a major factor in heart attacks, strokes, kidney disease, and other morbidities. To reduce risk, guidelines for HBP contain more than 70 recommendations, including many related to patient behaviors, such as home monitoring and lifestyle changes. Thus, the patient's role in controlling HBP is crucial. Patient-facing clinical decision support (CDS) tools may help patients adhere to evidence-based care, but customization is required. Objective: Our objective was to understand how to adapt CDS to best engage patients in controlling HBP. Methods: We conducted a mixed methods study with two phases: (1) survey-guided interviews with a limited cohort and (2) a nationwide web-based survey. Participation in each phase was limited to adults aged between 18 and 85 years who had been diagnosed with hypertension. The survey included general questions that assessed goal setting, treatment priorities, medication load, comorbid conditions, satisfaction with blood pressure (BP) management, and attitudes toward CDS, and also a series of questions regarding A/B preferences using paired information displays to assess perceived trustworthiness of potential CDS user interface options. Results: We conducted 17 survey-guided interviews to gather patient needs from CDS, then analyzed results and created a second survey of 519 adults with clinically diagnosed HBP. A large majority of participants reported that BP control was a high priority (83\%), had monitored BP at home (82\%), and felt comfortable using technology (88\%). Survey respondents found displays with more detailed recommendations more trustworthy (56\%-77\% of them preferred simpler displays), especially when incorporating social trust and priorities from providers and patients like them, but had no differences in action taken. Conclusions: Respondents to the survey felt that CDS capabilities could help them with HBP control. The more detailed design options for BP display and recommendations messaging were considered the most trustworthy yet did not differentiate perceived actions. ", doi="10.2196/39490", url="https://cardio.jmir.org/2023/1/e39490", url="http://www.ncbi.nlm.nih.gov/pubmed/36689260" } @Article{info:doi/10.2196/41820, author="Shu, Chang and Chen, Yueyue and Yang, Huiyuan and Tao, Ran and Chen, Xiaoping and Yu, Jingjing", title="Investigation and Countermeasures Research of Hospital Information Construction of Tertiary Class-A Public Hospitals in China: Questionnaire Study", journal="JMIR Form Res", year="2023", month="Jan", day="20", volume="7", pages="e41820", keywords="public hospital", keywords="hospital information construction", keywords="current situation", keywords="development", keywords="countermeasures", abstract="Background: Medical informatization has initially demonstrated its advantages in improving the medical service industry. Over the past decade, the Chinese government have made a lot of effort to complete infrastructural information construction in the medical and health domain, and smart hospitals will be the next priority according to policies released by Chinese government in recent years. Objective: To provide strategic support for further development of medical information construction in China, this study aimed to investigate the current situation of medical information construction in tertiary class-A public hospitals and analyze the existing problems and countermeasures. Methods: This study surveyed 23 tertiary class-A public hospitals in China who voluntarily responded to a self-designed questionnaire distributed in April 2020 to investigate the current medical information construction status. Descriptive statistics were used to summarize the current configurations of hospital information department, hospital information systems, hospital internet service and its application, and the satisfaction of hospital information construction. Interviews were also conducted with the respondents in this study for requirement analysis. Results: The results show that hospital information construction has become one of the priorities of the hospitals' daily work, and the medical information infrastructural construction and internet service application of the hospitals are good; however, a remarkable gap among the different level of hospitals can be observed. Although most hospitals had built their own IT team to undertake information construction work, the actual utilization rate of big data collected and stored in the hospital information system was not satisfactory. Conclusions: Support for the construction of information technology in primary care institutions should be increased to balance the level of development of medical informatization in medical institutions at all levels. The training of complex talents with both IT and medical backgrounds should be emphasized, and specialized disease information standards should be developed to lay a solid data foundation for data utilization and improve the utilization of medical big data. ", doi="10.2196/41820", url="https://formative.jmir.org/2023/1/e41820", url="http://www.ncbi.nlm.nih.gov/pubmed/36662565" } @Article{info:doi/10.2196/43053, author="Jing, Xia and Min, Hua and Gong, Yang and Biondich, Paul and Robinson, David and Law, Timothy and Nohr, Christian and Faxvaag, Arild and Rennert, Lior and Hubig, Nina and Gimbel, Ronald", title="Ontologies Applied in Clinical Decision Support System Rules: Systematic Review", journal="JMIR Med Inform", year="2023", month="Jan", day="19", volume="11", pages="e43053", keywords="clinical decision support system rules", keywords="clinical decision support systems", keywords="interoperability", keywords="ontology", keywords="Semantic Web technology", abstract="Background: Clinical decision support systems (CDSSs) are important for the quality and safety of health care delivery. Although CDSS rules guide CDSS behavior, they are not routinely shared and reused. Objective: Ontologies have the potential to promote the reuse of CDSS rules. Therefore, we systematically screened the literature to elaborate on the current status of ontologies applied in CDSS rules, such as rule management, which uses captured CDSS rule usage data and user feedback data to tailor CDSS services to be more accurate, and maintenance, which updates CDSS rules. Through this systematic literature review, we aim to identify the frontiers of ontologies used in CDSS rules. Methods: The literature search was focused on the intersection of ontologies; clinical decision support; and rules in PubMed, the Association for Computing Machinery (ACM) Digital Library, and the Nursing \& Allied Health Database. Grounded theory and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines were followed. One author initiated the screening and literature review, while 2 authors validated the processes and results independently. The inclusion and exclusion criteria were developed and refined iteratively. Results: CDSSs were primarily used to manage chronic conditions, alerts for medication prescriptions, reminders for immunizations and preventive services, diagnoses, and treatment recommendations among 81 included publications. The CDSS rules were presented in Semantic Web Rule Language, Jess, or Jena formats. Despite the fact that ontologies have been used to provide medical knowledge, CDSS rules, and terminologies, they have not been used in CDSS rule management or to facilitate the reuse of CDSS rules. Conclusions: Ontologies have been used to organize and represent medical knowledge, controlled vocabularies, and the content of CDSS rules. So far, there has been little reuse of CDSS rules. More work is needed to improve the reusability and interoperability of CDSS rules. This review identified and described the ontologies that, despite their limitations, enable Semantic Web technologies and their applications in CDSS rules. ", doi="10.2196/43053", url="https://medinform.jmir.org/2023/1/e43053", url="http://www.ncbi.nlm.nih.gov/pubmed/36534739" } @Article{info:doi/10.2196/38150, author="Xu, Ayu He and Maccari, Bernard and Guillain, Herv{\'e} and Herzen, Julien and Agri, Fabio and Raisaro, Louis Jean", title="An End-to-End Natural Language Processing Application for Prediction of Medical Case Coding Complexity: Algorithm Development and Validation", journal="JMIR Med Inform", year="2023", month="Jan", day="19", volume="11", pages="e38150", keywords="medical coding", keywords="natural language processing", keywords="NLP", keywords="complexity prediction", keywords="prediction", keywords="decision support", keywords="machine learning", keywords="model", keywords="clinical decision support application", keywords="multimodal modeling", keywords="coding", keywords="algorithm", keywords="documentation", keywords="health record", keywords="electronic health record", keywords="EHR", keywords="development", abstract="Background: Medical coding is the process that converts clinical documentation into standard medical codes. Codes are used for several key purposes in a hospital (eg, insurance reimbursement and performance analysis); therefore, their optimization is crucial. With the rapid growth of natural language processing technologies, several solutions based on artificial intelligence have been proposed to aid in medical coding by automatically suggesting relevant codes for clinical documents. However, their effectiveness is still limited to simple cases, and it is not yet clear how much value they can bring in improving coding efficiency and accuracy. Objective: This study aimed to bring more efficiency to the coding process to improve the selection of codes by medical coders. To achieve this, we developed an innovative multimodal machine learning--based solution that, instead of predicting codes, detects the degree of coding complexity before coding is performed. The notion of coding complexity was used to better dispatch work among medical coders to eventually minimize errors and improve throughput. Methods: To train and evaluate our approach, we collected 2060 cases rated by coders in terms of coding complexity from 1 (simplest) to 4 (most complex). We asked 2 expert coders to rate 3.01\% (62/2060) of the cases as the gold standard. The agreements between experts were used as benchmarks for model evaluation. A case contains both clinical text and patient metadata from the hospital electronic health record. We extracted both text features and metadata features, then concatenated and fed them into several machine learning models. Finally, we selected 2 models. The first used cross-validated training on 1751 cases and testing on 309 cases aiming to assess the predictive power of the proposed approach and its generalizability. The second model was trained on 1998 cases and tested on the gold standard to validate the best model performance against human benchmarks. Results: Our first model achieved a macro--F1-score of 0.51 and an accuracy of 0.59 on classifying the 4-scale complexity. The model distinguished well between the simple (combined complexity 1-2) and complex (combined complexity 3-4) cases with a macro--F1-score of 0.65 and an accuracy of 0.71. Our second model achieved 61\% agreement with experts' ratings and a macro--F1-score of 0.62 on the gold standard, whereas the 2 experts had a 66\% (41/62) agreement ratio with a macro--F1-score of 0.67. Conclusions: We propose a multimodal machine learning approach that leverages information from both clinical text and patient metadata to predict the complexity of coding a case in the precoding phase. By integrating this model into the hospital coding system, distribution of cases among coders can be done automatically with performance comparable with that of human expert coders, thus improving coding efficiency and accuracy at scale. ", doi="10.2196/38150", url="https://medinform.jmir.org/2023/1/e38150", url="http://www.ncbi.nlm.nih.gov/pubmed/36656627" } @Article{info:doi/10.2196/38399, author="Baroutsou, Vasiliki and Cerqueira Gonzalez Pena, Rodrigo and Schweighoffer, Reka and Caiata-Zufferey, Maria and Kim, Sue and Hesse-Biber, Sharlene and Ciorba, M. Florina and Lauer, Gerhard and Katapodi, Maria and ", title="Predicting Openness of Communication in Families With Hereditary Breast and Ovarian Cancer Syndrome: Natural Language Processing Analysis", journal="JMIR Form Res", year="2023", month="Jan", day="19", volume="7", pages="e38399", keywords="cascade testing", keywords="dictionary-based approach", keywords="family communication", keywords="hereditary breast and ovarian cancer", keywords="HBOC", keywords="sentiment analysis", keywords="text mining", keywords="natural language processing", keywords="cancer", keywords="hereditary", abstract="Background: In health care research, patient-reported opinions are a critical element of personalized medicine and contribute to optimal health care delivery. The importance of integrating natural language processing (NLP) methods to extract patient-reported opinions has been gradually acknowledged over the past years. One form of NLP is sentiment analysis, which extracts and analyses information by detecting feelings (thoughts, emotions, attitudes, etc) behind words. Sentiment analysis has become particularly popular following the rise of digital interactions. However, NLP and sentiment analysis in the context of intrafamilial communication for genetic cancer risk is still unexplored. Due to privacy laws, intrafamilial communication is the main avenue to inform at-risk relatives about the pathogenic variant and the possibility of increased cancer risk. Objective: The study examined the role of sentiment in predicting openness of intrafamilial communication about genetic cancer risk associated with hereditary breast and ovarian cancer (HBOC) syndrome. Methods: We used narratives derived from 53 in-depth interviews with individuals from families that harbor pathogenic variants associated with HBOC: first, to quantify openness of communication about cancer risk, and second, to examine the role of sentiment in predicting openness of communication. The interviews were conducted between 2019 and 2021 in Switzerland and South Korea using the same interview guide. We used NLP to extract and quantify textual features to construct a handcrafted lexicon about interpersonal communication of genetic testing results and cancer risk associated with HBOC. Moreover, we examined the role of sentiment in predicting openness of communication using a stepwise linear regression model. To test model accuracy, we used a split-validation set. We measured the performance of the training and testing model using area under the curve, sensitivity, specificity, and root mean square error. Results: Higher ``openness of communication'' scores were associated with higher overall net sentiment score of the narrative, higher fear, being single, having nonacademic education, and higher informational support within the family. Our results demonstrate that NLP was highly effective in analyzing unstructured texts from individuals of different cultural and linguistic backgrounds and could also reliably predict a measure of ``openness of communication'' (area under the curve=0.72) in the context of genetic cancer risk associated with HBOC. Conclusions: Our study showed that NLP can facilitate assessment of openness of communication in individuals carrying a pathogenic variant associated with HBOC. Findings provided promising evidence that various features from narratives such as sentiment and fear are important predictors of interpersonal communication and self-disclosure in this context. Our approach is promising and can be expanded in the field of personalized medicine and technology-mediated communication. ", doi="10.2196/38399", url="https://formative.jmir.org/2023/1/e38399", url="http://www.ncbi.nlm.nih.gov/pubmed/36656633" } @Article{info:doi/10.2196/40725, author="Petrovic, Bojana and Julian, A. Jim and Liddy, Clare and Afkham, Amir and McGee, F. Sharon and Morgan, C. Scott and Segal, Roanne and Sussman, Jonathan and Pond, R. Gregory and O'Brien, Ann Mary and Bender, L. Jacqueline and Grunfeld, Eva", title="Web-Based Asynchronous Tool to Facilitate Communication Between Primary Care Providers and Cancer Specialists: Pragmatic Randomized Controlled Trial", journal="J Med Internet Res", year="2023", month="Jan", day="18", volume="25", pages="e40725", keywords="electronic communication", keywords="coordination of care", keywords="cancer", keywords="primary care", abstract="Background: Cancer poses a significant global health burden. With advances in screening and treatment, there are now a growing number of cancer survivors with complex needs, requiring the involvement of multiple health care providers. Previous studies have identified problems related to communication and care coordination between primary care providers (PCPs) and cancer specialists. Objective: This study aimed to examine whether a web- and text-based asynchronous system (eOncoNote) could facilitate communication between PCPs and cancer specialists (oncologists and oncology nurses) to improve patient-reported continuity of care among patients receiving treatment or posttreatment survivorship care. Methods: In this pragmatic randomized controlled trial, a total of 173 patients were randomly assigned to either the intervention group (eOncoNote plus usual methods of communication between PCPs and cancer specialists) or a control group (usual communication only), including 104 (60.1\%) patients in the survivorship phase (breast and colorectal cancer) and 69 (39.9\%) patients in the treatment phase (breast and prostate cancer). The primary outcome was patient-reported team and cross-boundary continuity (Nijmegen Continuity Questionnaire). Secondary outcome measures included the Generalized Anxiety Disorder Screener (GAD-7), Patient Health Questionnaire on Major Depression, and Picker Patient Experience Questionnaire. Patients completed the questionnaires at baseline and at 2 points following randomization. Patients in the treatment phase completed follow-up questionnaires at 1 month and at either 4 months (patients with prostate cancer) or 6 months following randomization (patients with breast cancer). Patients in the survivorship phase completed follow-up questionnaires at 6 months and at 12 months following randomization. Results: The results did not show an intervention effect on the primary outcome of team and cross-boundary continuity of care or on the secondary outcomes of depression and patient experience with their health care. However, there was an intervention effect on anxiety. In the treatment phase, there was a statistically significant difference in the change score from baseline to the 1-month follow-up for GAD-7 (mean difference ?2.3; P=.03). In the survivorship phase, there was a statistically significant difference in the change score for GAD-7 between baseline and the 6-month follow-up (mean difference ?1.7; P=.03) and between baseline and the 12-month follow-up (mean difference ?2.4; P=.004). Conclusions: PCPs' and cancer specialists' access to eOncoNote is not significantly associated with patient-reported continuity of care. However, PCPs' and cancer specialists' access to the eOncoNote intervention may be a factor in reducing patient anxiety. Trial Registration: ClinicalTrials.gov NCT03333785; https://clinicaltrials.gov/ct2/show/NCT03333785 ", doi="10.2196/40725", url="https://www.jmir.org/2023/1/e40725", url="http://www.ncbi.nlm.nih.gov/pubmed/36652284" } @Article{info:doi/10.2196/42653, author="Richardson, Safiya and Dauber-Decker, Katherine and Solomon, Jeffrey and Khan, Sundas and Barnaby, Douglas and Chelico, John and Qiu, Michael and Liu, Yan and Mann, Devin and Pekmezaris, Renee and McGinn, Thomas and Diefenbach, Michael", title="Nudging Health Care Providers' Adoption of Clinical Decision Support: Protocol for the User-Centered Development of a Behavioral Economics--Inspired Electronic Health Record Tool", journal="JMIR Res Protoc", year="2023", month="Jan", day="18", volume="12", pages="e42653", keywords="health informatics", keywords="clinical decision support", keywords="electronic health record", keywords="implementation science", keywords="behavioral economics", keywords="user-centered design", keywords="pulmonary embolism", abstract="Background: The improvements in care resulting from clinical decision support (CDS) have been significantly limited by consistently low health care provider adoption. Health care provider attitudes toward CDS, specifically psychological and behavioral barriers, are not typically addressed during any stage of CDS development, although they represent an important barrier to adoption. Emerging evidence has shown the surprising power of using insights from the field of behavioral economics to address psychological and behavioral barriers. Nudges are formal applications of behavioral economics, defined as positive reinforcement and indirect suggestions that have a nonforced effect on decision-making. Objective: Our goal is to employ a user-centered design process to develop a CDS tool---the pulmonary embolism (PE) risk calculator---for PE risk stratification in the emergency department that incorporates a behavior theory--informed nudge to address identified behavioral barriers to use. Methods: All study activities took place at a large academic health system in the New York City metropolitan area. Our study used a user-centered and behavior theory--based approach to achieve the following two aims: (1) use mixed methods to identify health care provider barriers to the use of an active CDS tool for PE risk stratification and (2) develop a new CDS tool---the PE risk calculator---that addresses behavioral barriers to health care providers' adoption of CDS by incorporating nudges into the user interface. These aims were guided by the revised Observational Research Behavioral Information Technology model. A total of 50 clinicians who used the original version of the tool were surveyed with a quantitative instrument that we developed based on a behavior theory framework---the Capability-Opportunity-Motivation-Behavior framework. A semistructured interview guide was developed based on the survey responses. Inductive methods were used to analyze interview session notes and audio recordings from 12 interviews. Revised versions of the tool were developed that incorporated nudges. Results: Functional prototypes were developed by using Axure PRO (Axure Software Solutions) software and usability tested with end users in an iterative agile process (n=10). The tool was redesigned to address 4 identified major barriers to tool use; we included 2 nudges and a default. The 6-month pilot trial for the tool was launched on October 1, 2021. Conclusions: Clinicians highlighted several important psychological and behavioral barriers to CDS use. Addressing these barriers, along with conducting traditional usability testing, facilitated the development of a tool with greater potential to transform clinical care. The tool will be tested in a prospective pilot trial. International Registered Report Identifier (IRRID): DERR1-10.2196/42653 ", doi="10.2196/42653", url="https://www.researchprotocols.org/2023/1/e42653", url="http://www.ncbi.nlm.nih.gov/pubmed/36652293" } @Article{info:doi/10.2196/39231, author="Rao, Suchitra and Bozio, Catherine and Butterfield, Kristen and Reynolds, Sue and Reese, E. Sarah and Ball, Sarah and Steffens, Andrea and Demarco, Maria and McEvoy, Charlene and Thompson, Mark and Rowley, Elizabeth and Porter, M. Rachael and Fink, V. Rebecca and Irving, A. Stephanie and Naleway, Allison", title="Accuracy of COVID-19--Like Illness Diagnoses in Electronic Health Record Data: Retrospective Cohort Study", journal="JMIR Form Res", year="2023", month="Jan", day="17", volume="7", pages="e39231", keywords="COVID-19", keywords="COVID-like illness", keywords="COVID-19 case definition", keywords="sensitivity", keywords="specificity", keywords="positive predictive value", keywords="negative predictive value", abstract="Background: Electronic health record (EHR) data provide a unique opportunity to study the epidemiology of COVID-19, clinical outcomes of the infection, comparative effectiveness of therapies, and vaccine effectiveness but require a well-defined computable phenotype of COVID-19--like illness (CLI). Objective: The objective of this study was to evaluate the performance of pathogen-specific and other acute respiratory illness (ARI) International Statistical Classification of Diseases-9 and -10 codes in identifying COVID-19 cases in emergency department (ED) or urgent care (UC) and inpatient settings. Methods: We conducted a retrospective observational cohort study using EHR, claims, and laboratory information system data of ED or UC and inpatient encounters from 4 health systems in the United States. Patients who were aged ?18 years, had an ED or UC or inpatient encounter for an ARI, and underwent a SARS-CoV-2 polymerase chain reaction test between March 1, 2020, and March 31, 2021, were included. We evaluated various CLI definitions using combinations of International Statistical Classification of Diseases-10 codes as follows: COVID-19--specific codes; CLI definition used in VISION network studies; ARI signs, symptoms, and diagnosis codes only; signs and symptoms of ARI only; and random forest model definitions. We evaluated the sensitivity, specificity, positive predictive value, and negative predictive value of each CLI definition using a positive SARS-CoV-2 polymerase chain reaction test as the reference standard. We evaluated the performance of each CLI definition for distinct hospitalization and ED or UC cohorts. Results: Among 90,952 hospitalizations and 137,067 ED or UC visits, 5627 (6.19\%) and 9866 (7.20\%) were positive for SARS-CoV-2, respectively. COVID-19--specific codes had high sensitivity (91.6\%) and specificity (99.6\%) in identifying patients with SARS-CoV-2 positivity among hospitalized patients. The VISION CLI definition maintained high sensitivity (95.8\%) but lowered specificity (45.5\%). By contrast, signs and symptoms of ARI had low sensitivity and positive predictive value (28.9\% and 11.8\%, respectively) but higher specificity and negative predictive value (85.3\% and 94.7\%, respectively). ARI diagnoses, signs, and symptoms alone had low predictive performance. All CLI definitions had lower sensitivity for ED or UC encounters. Random forest approaches identified distinct CLI definitions with high performance for hospital encounters and moderate performance for ED or UC encounters. Conclusions: COVID-19--specific codes have high sensitivity and specificity in identifying adults with positive SARS-CoV-2 test results. Separate combinations of COVID-19-specific codes and ARI codes enhance the utility of CLI definitions in studies using EHR data in hospital and ED or UC settings. ", doi="10.2196/39231", url="https://formative.jmir.org/2023/1/e39231", url="http://www.ncbi.nlm.nih.gov/pubmed/36383633" } @Article{info:doi/10.2196/39044, author="Ferr{\'e}, Fabrice and Laurent, Rodolphe and Furelau, Philippine and Doumard, Emmanuel and Ferrier, Anne and Bosch, Laetitia and Ba, Cyndie and Menut, R{\'e}mi and Kurrek, Matt and Geeraerts, Thomas and Piau, Antoine and Minville, Vincent", title="Perioperative Risk Assessment of Patients Using the MyRISK Digital Score Completed Before the Preanesthetic Consultation: Prospective Observational Study", journal="JMIR Perioper Med", year="2023", month="Jan", day="16", volume="6", pages="e39044", keywords="chatbot", keywords="digital health", keywords="preanesthetic consultation", keywords="perioperative risk", keywords="machine learning", keywords="mobile phone", abstract="Background: The ongoing COVID-19 pandemic has highlighted the potential of digital health solutions to adapt the organization of care in a crisis context. Objective: Our aim was to describe the relationship between the MyRISK score, derived from self-reported data collected by a chatbot before the preanesthetic consultation, and the occurrence of postoperative complications. Methods: This was a single-center prospective observational study that included 401 patients. The 16 items composing the MyRISK score were selected using the Delphi method. An algorithm was used to stratify patients with low (green), intermediate (orange), and high (red) risk. The primary end point concerned postoperative complications occurring in the first 6 months after surgery (composite criterion), collected by telephone and by consulting the electronic medical database. A logistic regression analysis was carried out to identify the explanatory variables associated with the complications. A machine learning model was trained to predict the MyRISK score using a larger data set of 1823 patients classified as green or red to reclassify individuals classified as orange as either modified green or modified red. User satisfaction and usability were assessed. Results: Of the 389 patients analyzed for the primary end point, 16 (4.1\%) experienced a postoperative complication. A red score was independently associated with postoperative complications (odds ratio 5.9, 95\% CI 1.5-22.3; P=.009). A modified red score was strongly correlated with postoperative complications (odds ratio 21.8, 95\% CI 2.8-171.5; P=.003) and predicted postoperative complications with high sensitivity (94\%) and high negative predictive value (99\%) but with low specificity (49\%) and very low positive predictive value (7\%; area under the receiver operating characteristic curve=0.71). Patient satisfaction numeric rating scale and system usability scale median scores were 8.0 (IQR 7.0-9.0) out of 10 and 90.0 (IQR 82.5-95.0) out of 100, respectively. Conclusions: The MyRISK digital perioperative risk score established before the preanesthetic consultation was independently associated with the occurrence of postoperative complications. Its negative predictive strength was increased using a machine learning model to reclassify patients identified as being at intermediate risk. This reliable numerical categorization could be used to objectively refer patients with low risk to teleconsultation. ", doi="10.2196/39044", url="https://periop.jmir.org/2023/1/e39044", url="http://www.ncbi.nlm.nih.gov/pubmed/36645704" } @Article{info:doi/10.2196/41043, author="Heo, Junyeong and Kang, Youjin and Lee, SangKeun and Jeong, Dong-Hwa and Kim, Kang-Min", title="An Accurate Deep Learning--Based System for Automatic Pill Identification: Model Development and Validation", journal="J Med Internet Res", year="2023", month="Jan", day="13", volume="25", pages="e41043", keywords="pill identification", keywords="pill retrieval", keywords="pill recognition", keywords="automatic pill search", keywords="deep learning", keywords="machine learning", keywords="character-level language model", abstract="Background: Medication errors account for a large proportion of all medical errors. In most homes, patients take a variety of medications for a long period. However, medication errors frequently occur because patients often throw away the containers of their medications. Objective: We proposed a deep learning--based system for reducing medication errors by accurately identifying prescription pills. Given the pill images, our system located the pills in the respective pill databases in South Korea and the United States. Methods: We organized the system into a pill recognition step and pill retrieval step, and we applied deep learning models to train not only images of the pill but also imprinted characters. In the pill recognition step, there are 3 modules that recognize the 3 features of pills and their imprints separately and correct the recognized imprint to fit the actual data. We adopted image classification and text detection models for the feature and imprint recognition modules, respectively. In the imprint correction module, we introduced a language model for the first time in the pill identification system and proposed a novel coordinate encoding technique for effective correction in the language model. We identified pills using similarity scores of pill characteristics with those in the database. Results: We collected the open pill database from South Korea and the United States in May 2022. We used a total of 24,404 pill images in our experiments. The experimental results show that the predicted top-1 candidates achieve accuracy levels of 85.6\% (South Korea) and 74.5\% (United States) for the types of pills not trained on 2 different databases (South Korea and the United States). Furthermore, the predicted top-1 candidate accuracy of our system was 78\% with consumer-granted images, which was achieved by training only 1 image per pill. The results demonstrate that our system could identify and retrieve new pills without additional model updates. Finally, we confirmed through an ablation study that the language model that we emphasized significantly improves the pill identification ability of the system. Conclusions: Our study proposes the possibility of reducing medical errors by showing that the introduction of artificial intelligence can identify numerous pills with high precision in real time. Our study suggests that the proposed system can reduce patients' misuse of medications and help medical staff focus on higher-level tasks by simplifying time-consuming lower-level tasks such as pill identification. ", doi="10.2196/41043", url="https://www.jmir.org/2023/1/e41043", url="http://www.ncbi.nlm.nih.gov/pubmed/36637893" } @Article{info:doi/10.2196/40179, author="Suh, Bogyeong and Yu, Heejin and Kim, Hyeyeon and Lee, Sanghwa and Kong, Sunghye and Kim, Jin-Woo and Choi, Jongeun", title="Interpretable Deep-Learning Approaches for Osteoporosis Risk Screening and Individualized Feature Analysis Using Large Population-Based Data: Model Development and Performance Evaluation", journal="J Med Internet Res", year="2023", month="Jan", day="13", volume="25", pages="e40179", keywords="osteoporosis", keywords="artificial intelligence", keywords="deep learning", keywords="machine learning", keywords="risk factors", keywords="screening", abstract="Background: Osteoporosis is one of the diseases that requires early screening and detection for its management. Common clinical tools and machine-learning (ML) models for screening osteoporosis have been developed, but they show limitations such as low accuracy. Moreover, these methods are confined to limited risk factors and lack individualized explanation. Objective: The aim of this study was to develop an interpretable deep-learning (DL) model for osteoporosis risk screening with clinical features. Clinical interpretation with individual explanations of feature contributions is provided using an explainable artificial intelligence (XAI) technique. Methods: We used two separate data sets: the National Health and Nutrition Examination Survey data sets from the United States (NHANES) and South Korea (KNHANES) with 8274 and 8680 respondents, respectively. The study population was classified according to the T-score of bone mineral density at the femoral neck or total femur. A DL model for osteoporosis diagnosis was trained on the data sets and significant risk factors were investigated with local interpretable model-agnostic explanations (LIME). The performance of the DL model was compared with that of ML models and conventional clinical tools. Additionally, contribution ranking of risk factors and individualized explanation of feature contribution were examined. Results: Our DL model showed area under the curve (AUC) values of 0.851 (95\% CI 0.844-0.858) and 0.922 (95\% CI 0.916-0.928) for the femoral neck and total femur bone mineral density, respectively, using the NHANES data set. The corresponding AUC values for the KNHANES data set were 0.827 (95\% CI 0.821-0.833) and 0.912 (95\% CI 0.898-0.927), respectively. Through the LIME method, significant features were induced, and each feature's integrated contribution and interpretation for individual risk were determined. Conclusions: The developed DL model significantly outperforms conventional ML models and clinical tools. Our XAI model produces high-ranked features along with the integrated contributions of each feature, which facilitates the interpretation of individual risk. In summary, our interpretable model for osteoporosis risk screening outperformed state-of-the-art methods. ", doi="10.2196/40179", url="https://www.jmir.org/2023/1/e40179", url="http://www.ncbi.nlm.nih.gov/pubmed/36482780" } @Article{info:doi/10.2196/39742, author="Hogg, Jeffry Henry David and Al-Zubaidy, Mohaimen and and Talks, James and Denniston, K. Alastair and Kelly, J. Christopher and Malawana, Johann and Papoutsi, Chrysanthi and Teare, Dawn Marion and Keane, A. Pearse and Beyer, R. Fiona and Maniatopoulos, Gregory", title="Stakeholder Perspectives of Clinical Artificial Intelligence Implementation: Systematic Review of Qualitative Evidence", journal="J Med Internet Res", year="2023", month="Jan", day="10", volume="25", pages="e39742", keywords="artificial intelligence", keywords="systematic review", keywords="qualitative research", keywords="computerized decision support", keywords="qualitative evidence synthesis", keywords="implementation", abstract="Background: The rhetoric surrounding clinical artificial intelligence (AI) often exaggerates its effect on real-world care. Limited understanding of the factors that influence its implementation can perpetuate this. Objective: In this qualitative systematic review, we aimed to identify key stakeholders, consolidate their perspectives on clinical AI implementation, and characterize the evidence gaps that future qualitative research should target. Methods: Ovid-MEDLINE, EBSCO-CINAHL, ACM Digital Library, Science Citation Index-Web of Science, and Scopus were searched for primary qualitative studies on individuals' perspectives on any application of clinical AI worldwide (January 2014-April 2021). The definition of clinical AI includes both rule-based and machine learning--enabled or non--rule-based decision support tools. The language of the reports was not an exclusion criterion. Two independent reviewers performed title, abstract, and full-text screening with a third arbiter of disagreement. Two reviewers assigned the Joanna Briggs Institute 10-point checklist for qualitative research scores for each study. A single reviewer extracted free-text data relevant to clinical AI implementation, noting the stakeholders contributing to each excerpt. The best-fit framework synthesis used the Nonadoption, Abandonment, Scale-up, Spread, and Sustainability (NASSS) framework. To validate the data and improve accessibility, coauthors representing each emergent stakeholder group codeveloped summaries of the factors most relevant to their respective groups. Results: The initial search yielded 4437 deduplicated articles, with 111 (2.5\%) eligible for inclusion (median Joanna Briggs Institute 10-point checklist for qualitative research score, 8/10). Five distinct stakeholder groups emerged from the data: health care professionals (HCPs), patients, carers and other members of the public, developers, health care managers and leaders, and regulators or policy makers, contributing 1204 (70\%), 196 (11.4\%), 133 (7.7\%), 129 (7.5\%), and 59 (3.4\%) of 1721 eligible excerpts, respectively. All stakeholder groups independently identified a breadth of implementation factors, with each producing data that were mapped between 17 and 24 of the 27 adapted Nonadoption, Abandonment, Scale-up, Spread, and Sustainability subdomains. Most of the factors that stakeholders found influential in the implementation of rule-based clinical AI also applied to non--rule-based clinical AI, with the exception of intellectual property, regulation, and sociocultural attitudes. Conclusions: Clinical AI implementation is influenced by many interdependent factors, which are in turn influenced by at least 5 distinct stakeholder groups. This implies that effective research and practice of clinical AI implementation should consider multiple stakeholder perspectives. The current underrepresentation of perspectives from stakeholders other than HCPs in the literature may limit the anticipation and management of the factors that influence successful clinical AI implementation. Future research should not only widen the representation of tools and contexts in qualitative research but also specifically investigate the perspectives of all stakeholder HCPs and emerging aspects of non--rule-based clinical AI implementation. Trial Registration: PROSPERO (International Prospective Register of Systematic Reviews) CRD42021256005; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=256005 International Registered Report Identifier (IRRID): RR2-10.2196/33145 ", doi="10.2196/39742", url="https://www.jmir.org/2023/1/e39742", url="http://www.ncbi.nlm.nih.gov/pubmed/36626192" } @Article{info:doi/10.2196/39965, author="Yu, Tianzhi and Jin, Chunjie and Wu, Xiaodan and Yue, Dianmin", title="Implementation of Shared Decision-Making Within Internet Hospitals in China Based on Patients' Needs: Feasibility Study and Content Analysis", journal="JMIR Form Res", year="2023", month="Jan", day="6", volume="7", pages="e39965", keywords="internet hospital", keywords="shared decision-making between doctors and patients", keywords="patient needs", keywords="feasibility", abstract="Background: Internet hospitals are developing rapidly in China, and their convenient and efficient medical services are being increasingly recognized by patients. Many hospitals have set up their own internet hospitals to provide web-based medical services. Tianjin Medical University General Hospital has established a multidisciplinary and comprehensive internet hospital to provide diversified medical services according to the needs of patients. A way to further improve web-based medical services is by examining how shared decision-making (SDM) can be carried out in internet hospital diagnosis and treatment services, thereby improving patients' medical experience. Objective: The aim of this study was to analyze the feasibility of implementing doctor-patient SDM in internet hospital diagnosis and treatment services based on patients' needs in China. Methods: In this study, the medical data of 10 representative departments in the internet hospital of Tianjin Medical University General Hospital from January 1 to January 31, 2022, were extracted as a whole; 25,266 cases were selected. After excluding 2056 cases with incomplete information, 23,210 cases were finally included in this study. A chi-square test was performed to analyze the characteristics and medical service needs of internet hospital patients in order to identify the strengths of SDM in internet hospitals. Results: The internet hospital patients from 10 clinical departments were significantly different in terms of gender ($\chi$29=3425.6; P<.001), age ($\chi$236=27,375.8; P<.001), mode of payment ($\chi$29=3501.1; P<.001), geographic distribution ($\chi$29=347.2; P<.001), and duration of illness ($\chi$236=2863.3; P<.001). Patient medical needs included drug prescriptions, examination prescriptions, medical record explanations, drug use instructions, prehospitalization preparations, further consultations with doctors (unspecified purpose), treatment plan consultations, initial diagnoses based on symptoms, and follow-up consultations after discharge. The medical needs of the patients in different clinical departments were significantly different ($\chi$272=8465.5; P<.001). Conclusions: Our study provides a practical and theoretical basis for the feasibility of doctor-patient SDM in internet hospitals and offers some implementation strategies. We focus on the application of SDM in web-based diagnosis and treatment in internet hospitals rather than on a disease or a disease management software. The medical service needs of different patient groups can be effectively obtained from an internet hospital, which provides the practical conditions for the promotion of doctor-patient SDM. Our findings show that the internet hospital platform expands the scope of SDM and is a new way for the large-scale application of doctor-patient SDM. ", doi="10.2196/39965", url="https://formative.jmir.org/2023/1/e39965", url="http://www.ncbi.nlm.nih.gov/pubmed/36607710" } @Article{info:doi/10.2196/41640, author="Guo, Lanwei and Meng, Qingcheng and Zheng, Liyang and Chen, Qiong and Liu, Yin and Xu, Huifang and Kang, Ruihua and Zhang, Luyao and Liu, Shuzheng and Sun, Xibin and Zhang, Shaokai", title="Lung Cancer Risk Prediction Nomogram in Nonsmoking Chinese Women: Retrospective Cross-sectional Cohort Study", journal="JMIR Public Health Surveill", year="2023", month="Jan", day="6", volume="9", pages="e41640", keywords="lung cancer", keywords="risk model", keywords="forecasting", keywords="validation", keywords="female", keywords="nonsmokers", abstract="Background: It is believed that smoking is not the cause of approximately 53\% of lung cancers diagnosed in women globally. Objective: The study aimed to develop and validate a simple and noninvasive model that could assess and stratify lung cancer risk in nonsmoking Chinese women. Methods: Based on the population-based Cancer Screening Program in Urban China, this retrospective, cross-sectional cohort study was carried out with a vast population base and an immense number of participants. The training set and the validation set were both constructed using a random distribution of the data. Following the identification of associated risk factors by multivariable Cox regression analysis, a predictive nomogram was developed. Discrimination (area under the curve) and calibration were further performed to assess the validation of risk prediction nomogram in the training set, which was then validated in the validation set. Results: In sum, 151,834 individuals signed up to take part in the survey. Both the training set (n=75,917) and the validation set (n=75,917) were comprised of randomly selected participants. Potential predictors for lung cancer included age, history of chronic respiratory disease, first-degree family history of lung cancer, menopause, and history of benign breast disease. We displayed 1-year, 3-year, and 5-year lung cancer risk--predicting nomograms using these 5 factors. In the training set, the 1-year, 3-year, and 5-year lung cancer risk areas under the curve were 0.762, 0.718, and 0.703, respectively. In the validation set, the model showed a moderate predictive discrimination. Conclusions: We designed and validated a simple and noninvasive lung cancer risk model for nonsmoking women. This model can be applied to identify and triage people at high risk for developing lung cancers among nonsmoking women. ", doi="10.2196/41640", url="https://publichealth.jmir.org/2023/1/e41640", url="http://www.ncbi.nlm.nih.gov/pubmed/36607729" } @Article{info:doi/10.2196/41142, author="Luo, Xiao-Qin and Kang, Yi-Xin and Duan, Shao-Bin and Yan, Ping and Song, Guo-Bao and Zhang, Ning-Ya and Yang, Shi-Kun and Li, Jing-Xin and Zhang, Hui", title="Machine Learning--Based Prediction of Acute Kidney Injury Following Pediatric Cardiac Surgery: Model Development and Validation Study", journal="J Med Internet Res", year="2023", month="Jan", day="5", volume="25", pages="e41142", keywords="cardiac surgery", keywords="acute kidney injury", keywords="pediatric", keywords="machine learning", abstract="Background: Cardiac surgery--associated acute kidney injury (CSA-AKI) is a major complication following pediatric cardiac surgery, which is associated with increased morbidity and mortality. The early prediction of CSA-AKI before and immediately after surgery could significantly improve the implementation of preventive and therapeutic strategies during the perioperative periods. However, there is limited clinical information on how to identify pediatric patients at high risk of CSA-AKI. Objective: The study aims to develop and validate machine learning models to predict the development of CSA-AKI in the pediatric population. Methods: This retrospective cohort study enrolled patients aged 1 month to 18 years who underwent cardiac surgery with cardiopulmonary bypass at 3 medical centers of Central South University in China. CSA-AKI was defined according to the 2012 Kidney Disease: Improving Global Outcomes criteria. Feature selection was applied separately to 2 data sets: the preoperative data set and the combined preoperative and intraoperative data set. Multiple machine learning algorithms were tested, including K-nearest neighbor, naive Bayes, support vector machines, random forest, extreme gradient boosting (XGBoost), and neural networks. The best performing model was identified in cross-validation by using the area under the receiver operating characteristic curve (AUROC). Model interpretations were generated using the Shapley additive explanations (SHAP) method. Results: A total of 3278 patients from one of the centers were used for model derivation, while 585 patients from another 2 centers served as the external validation cohort. CSA-AKI occurred in 564 (17.2\%) patients in the derivation cohort and 51 (8.7\%) patients in the external validation cohort. Among the considered machine learning models, the XGBoost models achieved the best predictive performance in cross-validation. The AUROC of the XGBoost model using only the preoperative variables was 0.890 (95\% CI 0.876-0.906) in the derivation cohort and 0.857 (95\% CI 0.800-0.903) in the external validation cohort. When the intraoperative variables were included, the AUROC increased to 0.912 (95\% CI 0.899-0.924) and 0.889 (95\% CI 0.844-0.920) in the 2 cohorts, respectively. The SHAP method revealed that baseline serum creatinine level, perfusion time, body length, operation time, and intraoperative blood loss were the top 5 predictors of CSA-AKI. Conclusions: The interpretable XGBoost models provide practical tools for the early prediction of CSA-AKI, which are valuable for risk stratification and perioperative management of pediatric patients undergoing cardiac surgery. ", doi="10.2196/41142", url="https://www.jmir.org/2023/1/e41142", url="http://www.ncbi.nlm.nih.gov/pubmed/36603200" } @Article{info:doi/10.2196/39114, author="van der Meijden, L. Siri and de Hond, H. Anne A. and Thoral, J. Patrick and Steyerberg, W. Ewout and Kant, J. Ilse M. and Cin{\`a}, Giovanni and Arbous, Sesmu M.", title="Intensive Care Unit Physicians' Perspectives on Artificial Intelligence--Based Clinical Decision Support Tools: Preimplementation Survey Study", journal="JMIR Hum Factors", year="2023", month="Jan", day="5", volume="10", pages="e39114", keywords="intensive care unit", keywords="hospital", keywords="discharge", keywords="artificial intelligence", keywords="AI", keywords="clinical decision support", keywords="clinical support", keywords="acceptance", keywords="decision support", keywords="decision-making", keywords="digital health", keywords="eHealth", keywords="survey", keywords="perspective", keywords="attitude", keywords="opinion", keywords="adoption", keywords="prediction", keywords="risk", abstract="Background: Artificial intelligence--based clinical decision support (AI-CDS) tools have great potential to benefit intensive care unit (ICU) patients and physicians. There is a gap between the development and implementation of these tools. Objective: We aimed to investigate physicians' perspectives and their current decision-making behavior before implementing a discharge AI-CDS tool for predicting readmission and mortality risk after ICU discharge. Methods: We conducted a survey of physicians involved in decision-making on discharge of patients at two Dutch academic ICUs between July and November 2021. Questions were divided into four domains: (1) physicians' current decision-making behavior with respect to discharging ICU patients, (2) perspectives on the use of AI-CDS tools in general, (3) willingness to incorporate a discharge AI-CDS tool into daily clinical practice, and (4) preferences for using a discharge AI-CDS tool in daily workflows. Results: Most of the 64 respondents (of 93 contacted, 69\%) were familiar with AI (62/64, 97\%) and had positive expectations of AI, with 55 of 64 (86\%) believing that AI could support them in their work as a physician. The respondents disagreed on whether the decision to discharge a patient was complex (23/64, 36\% agreed and 22/64, 34\% disagreed); nonetheless, most (59/64, 92\%) agreed that a discharge AI-CDS tool could be of value. Significant differences were observed between physicians from the 2 academic sites, which may be related to different levels of involvement in the development of the discharge AI-CDS tool. Conclusions: ICU physicians showed a favorable attitude toward the integration of AI-CDS tools into the ICU setting in general, and in particular toward a tool to predict a patient's risk of readmission and mortality within 7 days after discharge. The findings of this questionnaire will be used to improve the implementation process and training of end users. ", doi="10.2196/39114", url="https://humanfactors.jmir.org/2023/1/e39114", url="http://www.ncbi.nlm.nih.gov/pubmed/36602843" } @Article{info:doi/10.2196/40976, author="Baltaxe, Erik and Hsieh, Wen Hsin and Roca, Josep and Cano, Isaac", title="The Assessment of Medical Device Software Supporting Health Care Services for Chronic Patients in a Tertiary Hospital: Overarching Study", journal="J Med Internet Res", year="2023", month="Jan", day="4", volume="25", pages="e40976", keywords="chronic patients", keywords="digital health", keywords="health technology assessment", keywords="implementation research", keywords="integrated care", abstract="Background: Innovative digital health tools are increasingly being evaluated and, in some instances, integrated at scale into health systems. However, the applicability of assessment methodologies in real-life scenarios to demonstrate value generation and consequently foster sustainable adoption of digitally enabled health interventions has some bottlenecks. Objective: We aimed to build on the process of premarket assessment of 4 digital health interventions piloted at the Hospital Clinic de Barcelona (HCB), as well as on the analysis of current medical device software regulations and postmarket surveillance in the European Union and United States in order to generate recommendations and lessons learnt for the sustainable adoption of digitally enabled health interventions. Methods: Four digital health interventions involving prototypes were piloted at the HCB (studies 1-4). Cocreation and quality improvement methodologies were used to consolidate a pragmatic evaluation method to assess the perceived usability and satisfaction of end users (both patients and health care professionals) by means of the System Usability Scale and the Net Promoter Score, including general questions about satisfaction. Analyses of both medical software device regulations and postmarket surveillance in the European Union and United States (2017-2021) were performed. Finally, an overarching analysis on lessons learnt was conducted considering 4 domains (technical, clinical, usability, and cost), as well as differentiating among 3 different eHealth strategies (telehealth, integrated care, and digital therapeutics). Results: Among the participant stakeholders, the System Usability Scale score was consistently higher in patients (studies 1, 2, 3, and 4: 78, 67, 56, and 76, respectively) than in health professionals (studies 2, 3, and 4: 52, 43, and 54, respectively). In general, use of the supporting digital health tools was recommended more by patients (studies 1, 2, 3, and 4: Net Promoter Scores of ?3\%, 31\%, ?21\%, and 31\%, respectively) than by professionals (studies 2, 3, and 4: Net Promoter Scores of ?67\%, 1\%, and ?80\%, respectively). The overarching analysis resulted in pragmatic recommendations for the digital health evaluation domains and the eHealth strategies considered. Conclusions: Lessons learnt on the digitalization of health resulted in practical recommendations that could contribute to future deployment experiences. ", doi="10.2196/40976", url="https://www.jmir.org/2023/1/e40976", url="http://www.ncbi.nlm.nih.gov/pubmed/36598817" } @Article{info:doi/10.2196/37805, author="Tamang, Suzanne and Humbert-Droz, Marie and Gianfrancesco, Milena and Izadi, Zara and Schmajuk, Gabriela and Yazdany, Jinoos", title="Practical Considerations for Developing Clinical Natural Language Processing Systems for Population Health Management and Measurement", journal="JMIR Med Inform", year="2023", month="Jan", day="3", volume="11", pages="e37805", keywords="clinical natural language processing", keywords="electronic health records", keywords="population health science", keywords="clinical decision support", keywords="information extraction", doi="10.2196/37805", url="https://medinform.jmir.org/2023/1/e37805", url="http://www.ncbi.nlm.nih.gov/pubmed/36595345" } @Article{info:doi/10.2196/37144, author="Maksimenko, Jelena and Rodrigues, Pereira Pedro and Nakazawa-Mikla{\vs}evi{\v c}a, Miki and Pinto, David and Mikla{\vs}evi{\v c}s, Edvins and Trofimovi{\v c}s, Genadijs and Gardovskis, J?nis and Cardoso, Fatima and Cardoso, Jo{\~a}o Maria", title="Effectiveness of Secondary Risk--Reducing Strategies in Patients With Unilateral Breast Cancer With Pathogenic Variants of BRCA1 and BRCA2 Subjected to Breast-Conserving Surgery: Evidence-Based Simulation Study", journal="JMIR Form Res", year="2022", month="Dec", day="29", volume="6", number="12", pages="e37144", keywords="BRCA1 and BRCA2", keywords="secondary prophylactic strategies", keywords="breast-conserving therapy", keywords="breast cancer", abstract="Background: Approximately 62\% of patients with breast cancer with a pathogenic variant (BRCA1 or BRCA2) undergo primary breast-conserving therapy. Objective: The study aims to develop a personalized risk management decision support tool for carriers of a pathogenic variant (BRCA1 or BRCA2) who underwent breast-conserving therapy for unilateral early-stage breast cancer. Methods: We developed a Bayesian network model of a hypothetical cohort of carriers of BRCA1 or BRCA2 diagnosed with stage I/II unilateral breast cancer and treated with breast-conserving treatment who underwent subsequent second primary cancer risk--reducing strategies. Using event dependencies structured according to expert knowledge and conditional probabilities obtained from published evidence, we predicted the 40-year overall survival rate of different risk-reducing strategies for 144 cohorts of women defined by the type of pathogenic variants (BRCA1 or BRCA2), age at primary breast cancer diagnosis, breast cancer subtype, stage of primary breast cancer, and presence or absence of adjuvant chemotherapy. Results: Absence of adjuvant chemotherapy was the most powerful factor that was linked to a dramatic decline in survival. There was a negligible decline in the mortality in patients with triple-negative breast cancer, who received no chemotherapy and underwent any secondary risk--reducing strategy, compared with surveillance. The potential survival benefit from any risk-reducing strategy was more modest in patients with triple-negative breast cancer who received chemotherapy compared with patients with luminal breast cancer. However, most patients with triple-negative breast cancer in stage I benefited from bilateral risk-reducing mastectomy and risk-reducing salpingo-oophorectomy or just risk-reducing salpingo-oophorectomy. Most patients with luminal stage I/II unilateral breast cancer benefited from bilateral risk-reducing mastectomy and risk-reducing salpingo-oophorectomy. The impact of risk-reducing salpingo-oophorectomy in patients with luminal breast cancer in stage I/II increased with age. Most older patients with the BRCA1 and BRCA2 pathogenic variants in exons 12-24/25 with luminal breast cancer may gain a similar survival benefit from other risk-reducing strategies or surveillance. Conclusions: Our study showed that it is mandatory to consider the complex interplay between the types of BRCA1 and BRCA2 pathogenic variants, age at primary breast cancer diagnosis, breast cancer subtype and stage, and received systemic treatment. As no prospective study results are available at the moment, our simulation model, which will integrate a decision support system in the near future, could facilitate the conversation between the health care provider and patient and help to weigh all the options for risk-reducing strategies leading to a more balanced decision. ", doi="10.2196/37144", url="https://formative.jmir.org/2022/12/e37144", url="http://www.ncbi.nlm.nih.gov/pubmed/36580360" } @Article{info:doi/10.2196/38859, author="Wang, Siyang and {\vS}uster, Simon and Baldwin, Timothy and Verspoor, Karin", title="Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study", journal="J Med Internet Res", year="2022", month="Dec", day="23", volume="24", number="12", pages="e38859", keywords="clinical trials", keywords="study characteristics", keywords="machine learning", keywords="natural language processing", keywords="pretrained language models", keywords="publication success", abstract="Background: Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack of research on the predictive analysis of a trial's publishability given an individual (planned) clinical trial description. Objective: We aimed to conduct a study that combined structured and unstructured features relevant to publication status in a single predictive approach. Established natural language processing techniques as well as recent pretrained language models enabled us to incorporate information from the textual descriptions of clinical trials into a machine learning approach. We were particularly interested in whether and which textual features could improve the classification accuracy for publication outcomes. Methods: In this study, we used metadata from ClinicalTrials.gov (a registry of clinical trials) and MEDLINE (a database of academic journal articles) to build a data set of clinical trials (N=76,950) that contained the description of a registered trial and its publication outcome (27,702/76,950, 36\% published and 49,248/76,950, 64\% unpublished). This is the largest data set of its kind, which we released as part of this work. The publication outcome in the data set was identified from MEDLINE based on clinical trial identifiers. We carried out a descriptive analysis and predicted the publication outcome using 2 approaches: a neural network with a large domain-specific language model and a random forest classifier using a weighted bag-of-words representation of text. Results: First, our analysis of the newly created data set corroborates several findings from the existing literature regarding attributes associated with a higher publication rate. Second, a crucial observation from our predictive modeling was that the addition of textual features (eg, eligibility criteria) offers consistent improvements over using only structured data (F1-score=0.62-0.64 vs F1-score=0.61 without textual features). Both pretrained language models and more basic word-based representations provide high-utility text representations, with no significant empirical difference between the two. Conclusions: Different factors affect the publication of a registered clinical trial. Our approach to predictive modeling combines heterogeneous features, both structured and unstructured. We show that methods from natural language processing can provide effective textual features to enable more accurate prediction of publication success, which has not been explored for this task previously. ", doi="10.2196/38859", url="https://www.jmir.org/2022/12/e38859", url="http://www.ncbi.nlm.nih.gov/pubmed/36563029" } @Article{info:doi/10.2196/38751, author="Okiyama, Sho and Fukuda, Memori and Sode, Masashi and Takahashi, Wataru and Ikeda, Masahiro and Kato, Hiroaki and Tsugawa, Yusuke and Iwagami, Masao", title="Examining the Use of an Artificial Intelligence Model to Diagnose Influenza: Development and Validation Study", journal="J Med Internet Res", year="2022", month="Dec", day="23", volume="24", number="12", pages="e38751", keywords="influenza", keywords="physical examination", keywords="pharynx", keywords="deep learning", keywords="diagnostic prediction", abstract="Background: The global burden of influenza is substantial. It is a major disease that causes annual epidemics and occasionally, pandemics. Given that influenza primarily infects the upper respiratory system, it may be possible to diagnose influenza infection by applying deep learning to pharyngeal images. Objective: We aimed to develop a deep learning model to diagnose influenza infection using pharyngeal images and clinical information. Methods: We recruited patients who visited clinics and hospitals because of influenza-like symptoms. In the training stage, we developed a diagnostic prediction artificial intelligence (AI) model based on deep learning to predict polymerase chain reaction (PCR)--confirmed influenza from pharyngeal images and clinical information. In the validation stage, we assessed the diagnostic performance of the AI model. In additional analysis, we compared the diagnostic performance of the AI model with that of 3 physicians and interpreted the AI model using importance heat maps. Results: We enrolled a total of 7831 patients at 64 hospitals between November 1, 2019, and January 21, 2020, in the training stage and 659 patients (including 196 patients with PCR-confirmed influenza) at 11 hospitals between January 25, 2020, and March 13, 2020, in the validation stage. The area under the receiver operating characteristic curve for the AI model was 0.90 (95\% CI 0.87-0.93), and its sensitivity and specificity were 76\% (70\%-82\%) and 88\% (85\%-91\%), respectively, outperforming 3 physicians. In the importance heat maps, the AI model often focused on follicles on the posterior pharyngeal wall. Conclusions: We developed the first AI model that can accurately diagnose influenza from pharyngeal images, which has the potential to help physicians to make a timely diagnosis. ", doi="10.2196/38751", url="https://www.jmir.org/2022/12/e38751", url="http://www.ncbi.nlm.nih.gov/pubmed/36374004" } @Article{info:doi/10.2196/40473, author="Cardozo, Glauco and Tirloni, Francisco Salvador and Pereira Moro, Renato Ant{\^o}nio and Marques, Brum Jefferson Luiz", title="Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review", journal="JMIR Bioinform Biotech", year="2022", month="Dec", day="23", volume="3", number="1", pages="e40473", keywords="review", keywords="laboratory tests", keywords="machine learning", keywords="prediction", keywords="diagnosis", keywords="COVID-19", abstract="Background: In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective: In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods: The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results: Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions: Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases. ", doi="10.2196/40473", url="https://bioinform.jmir.org/2022/1/e40473", url="http://www.ncbi.nlm.nih.gov/pubmed/36644762" } @Article{info:doi/10.2196/31433, author="Khanna, K. Raj and Cecchetti, A. Alfred and Bhardwaj, Niharika and Muto, Steele Bobbi and Murughiyan, Usha", title="Understanding Emergency Room Visits for Nontraumatic Oral Health Conditions in a Hospital Serving Rural Appalachia: Dental Informatics Study", journal="JMIR Form Res", year="2022", month="Dec", day="23", volume="6", number="12", pages="e31433", keywords="dental informatics", keywords="visualization", keywords="nontraumatic dental care", keywords="emergency room", keywords="cost", keywords="utilization", keywords="economic impact", abstract="Background: In the Appalachian region, a variety of factors will impact the ability of patients to maintain good oral health, which is essential for overall health and well-being. Oral health issues have led to high costs within the Appalachian hospital system. Dental informatics examines preventable dental conditions to understand the problem and suggest cost containment. Objective: We aimed to demonstrate the value of dental informatics in dental health care in rural Appalachia by presenting a research study that measured emergency room (ER) use for nontraumatic dental conditions (NTDCs) and the associated economic impact in a hospital system that primarily serves rural Appalachia. Methods: The Appalachian Clinical and Translational Science Institute's oral health data mart with relevant data on patients (n=8372) with ER encounters for NTDC between 2010 and 2018 was created using Appalachian Clinical and Translational Science Institute's research data warehouse. Exploratory analysis was then performed by developing an interactive Tableau dashboard. Dental Informatics provided the platform whereby the overall burden of these encounters, along with disparities in burden by age groups, gender, and primary payer, was assessed. Results: Dental informatics was essential in understanding the overall problem and provided an interactive and easily comprehensible visualization of the situation. We found that ER visits for NTDCs declined by 40\% from 2010 to 2018, but a higher percentage of visits required inpatient care and surgical intervention. Conclusions: Dental informatics can provide the necessary tools and support to health care systems and state health departments across Appalachia to address serious dental problems. In this case, informatics helped identify that although inappropriate ER use for NTDCs diminished due to ER diversion efforts, they remain a significant burden. Through its visualization and data extraction techniques, dental informatics can help produce policy changes by promoting models that improve access to preventive care. ", doi="10.2196/31433", url="https://formative.jmir.org/2022/12/e31433", url="http://www.ncbi.nlm.nih.gov/pubmed/36563041" } @Article{info:doi/10.2196/40534, author="Binsfeld Gon{\c{c}}alves, Laurent and Nesic, Ivan and Obradovic, Marko and Stieltjes, Bram and Weikert, Thomas and Bremerich, Jens", title="Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame", journal="JMIR Med Inform", year="2022", month="Dec", day="21", volume="10", number="12", pages="e40534", keywords="radiology", keywords="deep learning", keywords="NLP", keywords="radiology reports", keywords="imaging record", keywords="temporal referrals", keywords="date extraction", keywords="graph theory", keywords="health care information system", keywords="resource planning.", abstract="Background: A concise visualization framework of related reports would increase readability and improve patient management. To this end, temporal referrals to prior comparative exams are an essential connection to previous exams in written reports. Due to unstructured narrative texts' variable structure and content, their extraction is hampered by poor computer readability. Natural language processing (NLP) permits the extraction of structured information from unstructured texts automatically and can serve as an essential input for such a novel visualization framework. Objective: This study proposes and evaluates an NLP-based algorithm capable of extracting the temporal referrals in written radiology reports, applies it to all the radiology reports generated for 10 years, introduces a graphical representation of imaging reports, and investigates its benefits for clinical and research purposes. Methods: In this single-center, university hospital, retrospective study, we developed a convolutional neural network capable of extracting the date of referrals from imaging reports. The model's performance was assessed by calculating precision, recall, and F1-score using an independent test set of 149 reports. Next, the algorithm was applied to our department's radiology reports generated from 2011 to 2021. Finally, the reports and their metadata were represented in a modulable graph. Results: For extracting the date of referrals, the named-entity recognition (NER) model had a high precision of 0.93, a recall of 0.95, and an F1-score of 0.94. A total of 1,684,635 reports were included in the analysis. Temporal reference was mentioned in 53.3\% (656,852/1,684,635), explicitly stated as not available in 21.0\% (258,386/1,684,635), and omitted in 25.7\% (317,059/1,684,635) of the reports. Imaging records can be visualized in a directed and modulable graph, in which the referring links represent the connecting arrows. Conclusions: Automatically extracting the date of referrals from unstructured radiology reports using deep learning NLP algorithms is feasible. Graphs refined the selection of distinct pathology pathways, facilitated the revelation of missing comparisons, and enabled the query of specific referring exam sequences. Further work is needed to evaluate its benefits in clinics, research, and resource planning. ", doi="10.2196/40534", url="https://medinform.jmir.org/2022/12/e40534", url="http://www.ncbi.nlm.nih.gov/pubmed/36542426" } @Article{info:doi/10.2196/42971, author="Joyce, Cara and Markossian, W. Talar and Nikolaides, Jenna and Ramsey, Elisabeth and Thompson, M. Hale and Rojas, C. Juan and Sharma, Brihat and Dligach, Dmitriy and Oguss, K. Madeline and Cooper, S. Richard and Afshar, Majid", title="The Evaluation of a Clinical Decision Support Tool Using Natural Language Processing to Screen Hospitalized Adults for Unhealthy Substance Use: Protocol for a Quasi-Experimental Design", journal="JMIR Res Protoc", year="2022", month="Dec", day="19", volume="11", number="12", pages="e42971", keywords="substance misuse", keywords="artificial intelligence", keywords="natural language processing, clinical decision support", keywords="study protocol", abstract="Background: Automated and data-driven methods for screening using natural language processing (NLP) and machine learning may replace resource-intensive manual approaches in the usual care of patients hospitalized with conditions related to unhealthy substance use. The rigorous evaluation of tools that use artificial intelligence (AI) is necessary to demonstrate effectiveness before system-wide implementation. An NLP tool to use routinely collected data in the electronic health record was previously validated for diagnostic accuracy in a retrospective study for screening unhealthy substance use. Our next step is a noninferiority design incorporated into a research protocol for clinical implementation with prospective evaluation of clinical effectiveness in a large health system. Objective: This study aims to provide a study protocol to evaluate health outcomes and the costs and benefits of an AI-driven automated screener compared to manual human screening for unhealthy substance use. Methods: A pre-post design is proposed to evaluate 12 months of manual screening followed by 12 months of automated screening across surgical and medical wards at a single medical center. The preintervention period consists of usual care with manual screening by nurses and social workers and referrals to a multidisciplinary Substance Use Intervention Team (SUIT). Facilitated by a NLP pipeline in the postintervention period, clinical notes from the first 24 hours of hospitalization will be processed and scored by a machine learning model, and the SUIT will be similarly alerted to patients who flagged positive for substance misuse. Flowsheets within the electronic health record have been updated to capture rates of interventions for the primary outcome (brief intervention/motivational interviewing, medication-assisted treatment, naloxone dispensing, and referral to outpatient care). Effectiveness in terms of patient outcomes will be determined by noninferior rates of interventions (primary outcome), as well as rates of readmission within 6 months, average time to consult, and discharge rates against medical advice (secondary outcomes) in the postintervention period by a SUIT compared to the preintervention period. A separate analysis will be performed to assess the costs and benefits to the health system by using automated screening. Changes from the pre- to postintervention period will be assessed in covariate-adjusted generalized linear mixed-effects models. Results: The study will begin in September 2022. Monthly data monitoring and Data Safety Monitoring Board reporting are scheduled every 6 months throughout the study period. We anticipate reporting final results by June 2025. Conclusions: The use of augmented intelligence for clinical decision support is growing with an increasing number of AI tools. We provide a research protocol for prospective evaluation of an automated NLP system for screening unhealthy substance use using a noninferiority design to demonstrate comprehensive screening that may be as effective as manual screening but less costly via automated solutions. Trial Registration: ClinicalTrials.gov NCT03833804; https://clinicaltrials.gov/ct2/show/NCT03833804 International Registered Report Identifier (IRRID): DERR1-10.2196/42971 ", doi="10.2196/42971", url="https://www.researchprotocols.org/2022/12/e42971", url="http://www.ncbi.nlm.nih.gov/pubmed/36534461" } @Article{info:doi/10.2196/37833, author="Kanbar, J. Lara and Wissel, Benjamin and Ni, Yizhao and Pajor, Nathan and Glauser, Tracy and Pestian, John and Dexheimer, W. Judith", title="Implementation of Machine Learning Pipelines for Clinical Practice: Development and Validation Study", journal="JMIR Med Inform", year="2022", month="Dec", day="16", volume="10", number="12", pages="e37833", keywords="electronic health record", keywords="natural language processing", keywords="epilepsy", keywords="clinical decision support", keywords="machine learning", keywords="emergency medicine", keywords="artificial intelligence", abstract="Background: Artificial intelligence (AI) technologies, such as machine learning and natural language processing, have the potential to provide new insights into complex health data. Although powerful, these algorithms rarely move from experimental studies to direct clinical care implementation. Objective: We aimed to describe the key components for successful development and integration of two AI technology--based research pipelines for clinical practice. Methods: We summarized the approach, results, and key learnings from the implementation of the following two systems implemented at a large, tertiary care children's hospital: (1) epilepsy surgical candidate identification (or epilepsy ID) in an ambulatory neurology clinic; and (2) an automated clinical trial eligibility screener (ACTES) for the real-time identification of patients for research studies in a pediatric emergency department. Results: The epilepsy ID system performed as well as board-certified neurologists in identifying surgical candidates (with a sensitivity of 71\% and positive predictive value of 77\%). The ACTES system decreased coordinator screening time by 12.9\%. The success of each project was largely dependent upon the collaboration between machine learning experts, research and operational information technology professionals, longitudinal support from clinical providers, and institutional leadership. Conclusions: These projects showcase novel interactions between machine learning recommendations and providers during clinical care. Our deployment provides seamless, real-time integration of AI technology to provide decision support and improve patient care. ", doi="10.2196/37833", url="https://medinform.jmir.org/2022/12/e37833", url="http://www.ncbi.nlm.nih.gov/pubmed/36525289" } @Article{info:doi/10.2196/43229, author="Wang, Bin and Lai, Junkai and Liu, Mimi and Jin, Feifei and Peng, Yifei and Yao, Chen", title="Electronic Source Data Transcription for Electronic Case Report Forms in China: Validation of the Electronic Source Record Tool in a Real-world Ophthalmology Study", journal="JMIR Form Res", year="2022", month="Dec", day="16", volume="6", number="12", pages="e43229", keywords="electronic medical record", keywords="electronic health record", keywords="electronic source", keywords="eSource", keywords="eSource record tool", keywords="real-world data", keywords="data transcription", keywords="data quality", keywords="System Usability Scale", keywords="ophthalmology", abstract="Background: As researchers are increasingly interested in real-world studies (RWSs), improving data collection efficiency and data quality has become an important challenge. An electronic source (eSource) generally includes direct capture, collection, and storage of electronic data to simplify clinical research. It can improve data quality and patient safety and reduce clinical trial costs. Although there are already large projects on eSource technology, there is a lack of experience in using eSource technology to implement RWSs. Our team designed and developed an eSource record (ESR) system in China. In a preliminary prospective study, we selected a cosmetic medical device project to evaluate ESR software's effect on data collection and transcription. As the previous case verification was simple, we plan to choose more complicated ophthalmology projects to further evaluate the ESR. Objective: We aimed to evaluate the data transcription efficiency and quality of ESR software in retrospective studies to verify the feasibility of using eSource as an alternative to traditional manual transcription of data in RWS projects. Methods: The approved ophthalmic femtosecond laser project was used for ESR case validation. This study compared the efficiency and quality of data transcription between the eSource method using ESR software and the traditional clinical research model of manually transcribing the data. Usability refers to the quality of a user's experience when interacting with products or systems including websites, software, devices, or applications. To evaluate the system availability of ESR, we used the System Usability Scale (SUS). The questionnaire consisted of the following 2 parts: participant information and SUS evaluation of the electronic medical record (EMR), electronic data capture (EDC), and ESR systems. By accessing log data from the EDC system previously used by the research project, all the time spent from the beginning to the end of the study could be counted. Results: In terms of transcription time cost per field, the eSource method can reduce the time cost by 81.8\% (11.2/13.7). Compared with traditional manual data transcription, the eSource method has higher data transcription quality (correct entry rate of 2356/2400, 98.17\% vs 47,991/51,424, 93.32\%). A total of 15 questionnaires were received with a response rate of 100\%. In terms of usability, the average overall SUS scores of the EMR, EDC, and ESR systems were 50.3 (SD 21.9), 51.5 (SD 14.2), and 63.0 (SD 11.3; contract research organization experts: 69.5, SD 11.5; clinicians: 59.8, SD 10.2), respectively. The Cronbach $\alpha$ for the SUS items of the EMR, EDC, and ESR systems were 0.591 (95\% CI ?0.012 to 0.903), 0.588 (95\% CI ?0.288 to 0.951), and 0.785 (95\% CI 0.576-0.916), respectively. Conclusions: In real-world ophthalmology studies, the eSource approach based on the ESR system can replace the traditional clinical research model that relies on the manual transcription of data. ", doi="10.2196/43229", url="https://formative.jmir.org/2022/12/e43229", url="http://www.ncbi.nlm.nih.gov/pubmed/36525285" } @Article{info:doi/10.2196/42208, author="Weinert, Lina and Klass, Maximilian and Schneider, Gerd and Heinze, Oliver", title="Exploring Stakeholder Requirements to Enable the Research and Development of Artificial Intelligence Algorithms in a Hospital-Based Generic Infrastructure: Protocol for a Multistep Mixed Methods Study", journal="JMIR Res Protoc", year="2022", month="Dec", day="16", volume="11", number="12", pages="e42208", keywords="artificial intelligence", keywords="requirements analysis", keywords="mixed methods", keywords="innovation", keywords="qualitative research", keywords="health care", keywords="artificial intelligence technology", keywords="diagnostic", keywords="health data", keywords="artificial intelligence infrastructure", keywords="technology development", abstract="Background: In recent years, research and developments in advancing artificial intelligence (AI) in health care and medicine have increased. High expectations surround the use of AI technologies, such as improvements for diagnosis and increases in the quality of care with reductions in health care costs. The successful development and testing of new AI algorithms require large amounts of high-quality data. Academic hospitals could provide the data needed for AI development, but granting legal, controlled, and regulated access to these data for developers and researchers is difficult. Therefore, the German Federal Ministry of Health supports the Protected Artificial Intelligence Innovation Environment for Patient-Oriented Digital Health Solutions for Developing, Testing, and Evidence-Based Evaluation of Clinical Value (pAItient) project, aiming to install the AI Innovation Environment at the Heidelberg University Hospital in Germany. The AI Innovation Environment was designed as a proof-of-concept extension of the already existing Medical Data Integration Center. It will establish a process to support every step of developing and testing AI-based technologies. Objective: The first part of the pAItient project, as presented in this research protocol, aims to explore stakeholders' requirements for developing AI in partnership with an academic hospital and granting AI experts access to anonymized personal health data. Methods: We planned a multistep mixed methods approach. In the first step, researchers and employees from stakeholder organizations were invited to participate in semistructured interviews. In the following step, questionnaires were developed based on the participants' answers and distributed among the stakeholders' organizations to quantify qualitative findings and discover important aspects that were not mentioned by the interviewees. The questionnaires will be analyzed descriptively. In addition, patients and physicians were interviewed as well. No survey questionnaires were developed for this second group of participants. The study was approved by the Ethics Committee of the Heidelberg University Hospital (approval number: S-241/2021). Results: Data collection concluded in summer 2022. Data analysis is planned to start in fall 2022. We plan to publish the results in winter 2022 to 2023. Conclusions: The results of our study will help in shaping the AI Innovation Environment at our academic hospital according to stakeholder requirements. With this approach, in turn, we aim to shape an AI environment that is effective and is deemed acceptable by all parties. International Registered Report Identifier (IRRID): DERR1-10.2196/42208 ", doi="10.2196/42208", url="https://www.researchprotocols.org/2022/12/e42208", url="http://www.ncbi.nlm.nih.gov/pubmed/36525300" } @Article{info:doi/10.2196/40743, author="Jin, Qiao and Tan, Chuanqi and Chen, Mosha and Yan, Ming and Zhang, Ningyu and Huang, Songfang and Liu, Xiaozhong", title="State-of-the-Art Evidence Retriever for Precision Medicine: Algorithm Development and Validation", journal="JMIR Med Inform", year="2022", month="Dec", day="15", volume="10", number="12", pages="e40743", keywords="precision medicine", keywords="evidence-based medicine", keywords="information retrieval", keywords="active learning", keywords="pretrained language models", keywords="digital health intervention", keywords="data retrieval", keywords="big data", keywords="algorithm development", abstract="Background: Under the paradigm of precision medicine (PM), patients with the same disease can receive different personalized therapies according to their clinical and genetic features. These therapies are determined by the totality of all available clinical evidence, including results from case reports, clinical trials, and systematic reviews. However, it is increasingly difficult for physicians to find such evidence from scientific publications, whose size is growing at an unprecedented pace. Objective: In this work, we propose the PM-Search system to facilitate the retrieval of clinical literature that contains critical evidence for or against giving specific therapies to certain cancer patients. Methods: The PM-Search system combines a baseline retriever that selects document candidates at a large scale and an evidence reranker that finely reorders the candidates based on their evidence quality. The baseline retriever uses query expansion and keyword matching with the ElasticSearch retrieval engine, and the evidence reranker fits pretrained language models to expert annotations that are derived from an active learning strategy. Results: The PM-Search system achieved the best performance in the retrieval of high-quality clinical evidence at the Text Retrieval Conference PM Track 2020, outperforming the second-ranking systems by large margins (0.4780 vs 0.4238 for standard normalized discounted cumulative gain at rank 30 and 0.4519 vs 0.4193 for exponential normalized discounted cumulative gain at rank 30). Conclusions: We present PM-Search, a state-of-the-art search engine to assist the practicing of evidence-based PM. PM-Search uses a novel Bidirectional Encoder Representations from Transformers for Biomedical Text Mining--based active learning strategy that models evidence quality and improves the model performance. Our analyses show that evidence quality is a distinct aspect from general relevance, and specific modeling of evidence quality beyond general relevance is required for a PM search engine. ", doi="10.2196/40743", url="https://medinform.jmir.org/2022/12/e40743", url="http://www.ncbi.nlm.nih.gov/pubmed/36409468" } @Article{info:doi/10.2196/43757, author="Lee, Seungseok and Kang, Seong Wu and Seo, Sanghyun and Kim, Wan Do and Ko, Hoon and Kim, Joongsuck and Lee, Seonghwa and Lee, Jinseok", title="Model for Predicting In-Hospital Mortality of Physical Trauma Patients Using Artificial Intelligence Techniques: Nationwide Population-Based Study in Korea", journal="J Med Internet Res", year="2022", month="Dec", day="13", volume="24", number="12", pages="e43757", keywords="artificial intelligence", keywords="trauma", keywords="mortality prediction", keywords="international classification of disease", keywords="injury", keywords="prediction model", keywords="severity score", keywords="emergency department", keywords="Information system", keywords="deep neural network", abstract="Background: Physical trauma--related mortality places a heavy burden on society. Estimating the mortality risk in physical trauma patients is crucial to enhance treatment efficiency and reduce this burden. The most popular and accurate model is the Injury Severity Score (ISS), which is based on the Abbreviated Injury Scale (AIS), an anatomical injury severity scoring system. However, the AIS requires specialists to code the injury scale by reviewing a patient's medical record; therefore, applying the model to every hospital is impossible. Objective: We aimed to develop an artificial intelligence (AI) model to predict in-hospital mortality in physical trauma patients using the International Classification of Disease 10th Revision (ICD-10), triage scale, procedure codes, and other clinical features. Methods: We used the Korean National Emergency Department Information System (NEDIS) data set (N=778,111) compiled from over 400 hospitals between 2016 and 2019. To predict in-hospital mortality, we used the following as input features: ICD-10, patient age, gender, intentionality, injury mechanism, and emergent symptom, Alert/Verbal/Painful/Unresponsive (AVPU) scale, Korean Triage and Acuity Scale (KTAS), and procedure codes. We proposed the ensemble of deep neural networks (EDNN) via 5-fold cross-validation and compared them with other state-of-the-art machine learning models, including traditional prediction models. We further investigated the effect of the features. Results: Our proposed EDNN with all features provided the highest area under the receiver operating characteristic (AUROC) curve of 0.9507, outperforming other state-of-the-art models, including the following traditional prediction models: Adaptive Boosting (AdaBoost; AUROC of 0.9433), Extreme Gradient Boosting (XGBoost; AUROC of 0.9331), ICD-based ISS (AUROC of 0.8699 for an inclusive model and AUROC of 0.8224 for an exclusive model), and KTAS (AUROC of 0.1841). In addition, using all features yielded a higher AUROC than any other partial features, namely, EDNN with the features of ICD-10 only (AUROC of 0.8964) and EDNN with the features excluding ICD-10 (AUROC of 0.9383). Conclusions: Our proposed EDNN with all features outperforms other state-of-the-art models, including the traditional diagnostic code-based prediction model and triage scale. ", doi="10.2196/43757", url="https://www.jmir.org/2022/12/e43757", url="http://www.ncbi.nlm.nih.gov/pubmed/36512392" } @Article{info:doi/10.2196/41312, author="Kosowan, Leanne and Singer, Alexander and Zulkernine, Farhana and Zafari, Hasan and Nesca, Marcello and Muthumuni, Dhasni", title="Pan-Canadian Electronic Medical Record Diagnostic and Unstructured Text Data for Capturing PTSD: Retrospective Observational Study", journal="JMIR Med Inform", year="2022", month="Dec", day="13", volume="10", number="12", pages="e41312", keywords="electronic health records", keywords="EHR", keywords="natural language processing", keywords="NLP", keywords="medical informatics", keywords="primary health care", keywords="stress disorders, posttraumatic", keywords="posttraumatic stress disorder", keywords="PTSD", abstract="Background: The availability of electronic medical record (EMR) free-text data for research varies. However, access to short diagnostic text fields is more widely available. Objective: This study assesses agreement between free-text and short diagnostic text data from primary care EMR for identification of posttraumatic stress disorder (PTSD). Methods: This retrospective cross-sectional study used EMR data from a pan-Canadian repository representing 1574 primary care providers at 265 clinics using 11 EMR vendors. Medical record review using free text and short diagnostic text fields of the EMR produced reference standards for PTSD. Agreement was assessed with sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Results: Our reference set contained 327 patients with free text and short diagnostic text. Among these patients, agreement between free text and short diagnostic text had an accuracy of 93.6\% (CI 90.4\%-96.0\%). In a single Canadian province, case definitions 1 and 4 had a sensitivity of 82.6\% (CI 74.4\%-89.0\%) and specificity of 99.5\% (CI 97.4\%-100\%). However, when the reference set was expanded to a pan-Canada reference (n=12,104 patients), case definition 4 had the strongest agreement (sensitivity: 91.1\%, CI 90.1\%-91.9\%; specificity: 99.1\%, CI 98.9\%-99.3\%). Conclusions: Inclusion of free-text encounter notes during medical record review did not lead to improved capture of PTSD cases, nor did it lead to significant changes in case definition agreement. Within this pan-Canadian database, jurisdictional differences in diagnostic codes and EMR structure suggested the need to supplement diagnostic codes with natural language processing to capture PTSD. When unavailable, short diagnostic text can supplement free-text data for reference set creation and case validation. Application of the PTSD case definition can inform PTSD prevalence and characteristics. ", doi="10.2196/41312", url="https://medinform.jmir.org/2022/12/e41312", url="http://www.ncbi.nlm.nih.gov/pubmed/36512389" } @Article{info:doi/10.2196/40370, author="Huang, Junjie and Pang, Sze Wing and Wong, Yan Yuet and Mak, Yu Fung and Chan, W. Florence S. and Cheung, K. Clement S. and Wong, Nam Wing and Cheung, Tseung Ngai and Wong, S. Martin C.", title="Factors Associated With the Acceptance of an eHealth App for Electronic Health Record Sharing System: Population-Based Study", journal="J Med Internet Res", year="2022", month="Dec", day="12", volume="24", number="12", pages="e40370", keywords="digital health", keywords="eHealth", keywords="electronic health record", keywords="system", keywords="mobile app", keywords="app", keywords="public", keywords="private", keywords="community", keywords="caregiver", keywords="awareness", keywords="perception", keywords="improvement", keywords="utility", keywords="technology", keywords="model", keywords="health information", abstract="Background: In the second stage of the Electronic Health Record Sharing System (eHRSS) development, a mobile app (eHealth app) was launched to further enhance collaborative care among the public sector, the private sector, the community, and the caregivers. Objective: This study aims to investigate the factors associated with the downloading and utilization of the app, as well as the awareness, perception, and future improvement of the app. Methods: We collected 2110 surveys; respondents were stratified into 3 groups according to their status of enrollment in the eHRSS. The primary outcome measure was the downloading and acceptance of the eHealth app. We collected the data on social economics factors, variables of the Technology Acceptance Model and Theory of Planned Behavior. Any factors identified as significant in the univariate analysis (P<.20) will be included in a subsequent multivariable regression analysis model. All P values ?.05 will be considered statistically significant in multiple logistic regression analysis. The structural equation modeling was performed to identify interactions among the variables. Results: The respondents had an overall high satisfaction rate and a positive attitude toward continuing to adopt and recommend the app. However, the satisfaction rate among respondents who have downloaded but not adopted the app was relatively lower, and few of them perceived that the downloading and acceptance processes are difficult. A high proportion of current users expressed a positive attitude about continuing to adopt and recommend the app to friends, colleagues, and family members. The behavioral intention strongly predicted the acceptance of the eHealth app ($\beta$=.89; P<.001). Attitude ($\beta$=.30; P<.001) and perceived norm; $\beta$=.37; P<.001) played important roles in determining behavioral intention, which could predict the downloading and acceptance of the eHealth app ($\beta$=.14; P<.001). Conclusions: Despite the high satisfaction rate among the respondents, privacy concerns and perceived difficulties in adopting the app were the major challenges of promoting eHealth. Further promotion could be made through doctors and publicity. For future improvement, comprehensive health records and tailored health information should be included. ", doi="10.2196/40370", url="https://www.jmir.org/2022/12/e40370", url="http://www.ncbi.nlm.nih.gov/pubmed/36382349" } @Article{info:doi/10.2196/39113, author="Zhang, Xinyuan and Xie, Ziqian and Xiang, Yang and Baig, Imran and Kozman, Mena and Stender, Carly and Giancardo, Luca and Tao, Cui", title="Issues in Melanoma Detection: Semisupervised Deep Learning Algorithm Development via a Combination of Human and Artificial Intelligence", journal="JMIR Dermatol", year="2022", month="Dec", day="12", volume="5", number="4", pages="e39113", keywords="deep learning", keywords="dermoscopic images", keywords="semisupervised learning", keywords="3-point checklist", keywords="skin lesion", keywords="dermatology", keywords="algorithm", keywords="melanoma classification", keywords="melanoma", keywords="automatic diagnosis", keywords="skin disease", abstract="Background: Automatic skin lesion recognition has shown to be effective in increasing access to reliable dermatology evaluation; however, most existing algorithms rely solely on images. Many diagnostic rules, including the 3-point checklist, are not considered by artificial intelligence algorithms, which comprise human knowledge and reflect the diagnosis process of human experts. Objective: In this paper, we aimed to develop a semisupervised model that can not only integrate the dermoscopic features and scoring rule from the 3-point checklist but also automate the feature-annotation process. Methods: We first trained the semisupervised model on a small, annotated data set with disease and dermoscopic feature labels and tried to improve the classification accuracy by integrating the 3-point checklist using ranking loss function. We then used a large, unlabeled data set with only disease label to learn from the trained algorithm to automatically classify skin lesions and features. Results: After adding the 3-point checklist to our model, its performance for melanoma classification improved from a mean of 0.8867 (SD 0.0191) to 0.8943 (SD 0.0115) under 5-fold cross-validation. The trained semisupervised model can automatically detect 3 dermoscopic features from the 3-point checklist, with best performances of 0.80 (area under the curve [AUC] 0.8380), 0.89 (AUC 0.9036), and 0.76 (AUC 0.8444), in some cases outperforming human annotators. Conclusions: Our proposed semisupervised learning framework can help with the automatic diagnosis of skin disease based on its ability to detect dermoscopic features and automate the label-annotation process. The framework can also help combine semantic knowledge with a computer algorithm to arrive at a more accurate and more interpretable diagnostic result, which can be applied to broader use cases. ", doi="10.2196/39113", url="https://derma.jmir.org/2022/4/e39113", url="http://www.ncbi.nlm.nih.gov/pubmed/37632881" } @Article{info:doi/10.2196/41520, author="Heyl, Johannes and Hardy, Flavien and Tucker, Katie and Hopper, Adrian and March{\~a}, M. Maria J. and Navaratnam, V. Annakan and Briggs, R. Tim W. and Yates, Jeremy and Day, Jamie and Wheeler, Andrew and Eve-Jones, Sue and Gray, K. William", title="Frailty, Comorbidity, and Associations With In-Hospital Mortality in Older COVID-19 Patients: Exploratory Study of Administrative Data", journal="Interact J Med Res", year="2022", month="Dec", day="12", volume="11", number="2", pages="e41520", keywords="COVID-19", keywords="coronavirus", keywords="SARS-CoV-2", keywords="frailty", keywords="comorbidity", keywords="mortality", keywords="death", keywords="hospitalization", keywords="hospital admission", keywords="hospitalisation", keywords="patient", keywords="age", keywords="sex", keywords="ethnicity", keywords="disease", keywords="hospital", keywords="cancer", keywords="heart", keywords="heart failure", keywords="weight loss", keywords="weight", keywords="renal disease", keywords="support", keywords="geriatric", keywords="older adult", keywords="elder", keywords="descriptive statistics", keywords="machine learning", keywords="model", abstract="Background: Older adults have worse outcomes following hospitalization with COVID-19, but within this group there is substantial variation. Although frailty and comorbidity are key determinants of mortality, it is less clear which specific manifestations of frailty and comorbidity are associated with the worst outcomes. Objective: We aimed to identify the key comorbidities and domains of frailty that were associated with in-hospital mortality in older patients with COVID-19 using models developed for machine learning algorithms. Methods: This was a retrospective study that used the Hospital Episode Statistics administrative data set from March 1, 2020, to February 28, 2021, for hospitalized patients in England aged 65 years or older. The data set was split into separate training (70\%), test (15\%), and validation (15\%) data sets during model development. Global frailty was assessed using the Hospital Frailty Risk Score (HFRS) and specific domains of frailty were identified using the Global Frailty Scale (GFS). Comorbidity was assessed using the Charlson Comorbidity Index (CCI). Additional features employed in the random forest algorithms included age, sex, deprivation, ethnicity, discharge month and year, geographical region, hospital trust, disease severity, and International Statistical Classification of Disease, 10th Edition codes recorded during the admission. Features were selected, preprocessed, and input into a series of random forest classification algorithms developed to identify factors strongly associated with in-hospital mortality. Two models were developed; the first model included the demographic, hospital-related, and disease-related items described above, as well as individual GFS domains and CCI items. The second model was similar to the first but replaced the GFS domains and CCI items with the HFRS as a global measure of frailty. Model performance was assessed using the area under the receiver operating characteristic (AUROC) curve and measures of model accuracy. Results: In total, 215,831 patients were included. The model using the individual GFS domains and CCI items had an AUROC curve for in-hospital mortality of 90\% and a predictive accuracy of 83\%. The model using the HFRS had similar performance (AUROC curve 90\%, predictive accuracy 82\%). The most important frailty items in the GFS were dementia/delirium, falls/fractures, and pressure ulcers/weight loss. The most important comorbidity items in the CCI were cancer, heart failure, and renal disease. Conclusions: The physical manifestations of frailty and comorbidity, particularly a history of cognitive impairment and falls, may be useful in identification of patients who need additional support during hospitalization with COVID-19. ", doi="10.2196/41520", url="https://www.i-jmr.org/2022/2/e41520", url="http://www.ncbi.nlm.nih.gov/pubmed/36423306" } @Article{info:doi/10.2196/41819, author="Fang, Cheng and Pan, Yifeng and Zhao, Luotong and Niu, Zhaoyi and Guo, Qingguo and Zhao, Bing", title="A Machine Learning-Based Approach to Predict Prognosis and Length of Hospital Stay in Adults and Children With Traumatic Brain Injury: Retrospective Cohort Study", journal="J Med Internet Res", year="2022", month="Dec", day="9", volume="24", number="12", pages="e41819", keywords="convolutional neural network", keywords="machine learning", keywords="neurosurgery", keywords="support vector machine", keywords="support vector regression", keywords="traumatic brain injury", abstract="Background: The treatment and care of adults and children with traumatic brain injury (TBI) constitute an intractable global health problem. Predicting the prognosis and length of hospital stay of patients with TBI may improve therapeutic effects and significantly reduce societal health care burden. Applying novel machine learning methods to the field of TBI may be valuable for determining the prognosis and cost-effectiveness of clinical treatment. Objective: We aimed to combine multiple machine learning approaches to build hybrid models for predicting the prognosis and length of hospital stay for adults and children with TBI. Methods: We collected relevant clinical information from patients treated at the Neurosurgery Center of the Second Affiliated Hospital of Anhui Medical University between May 2017 and May 2022, of which 80\% was used for training the model and 20\% for testing via screening and data splitting. We trained and tested the machine learning models using 5 cross-validations to avoid overfitting. In the machine learning models, 11 types of independent variables were used as input variables and Glasgow Outcome Scale score, used to evaluate patients' prognosis, and patient length of stay were used as output variables. Once the models were trained, we obtained and compared the errors of each machine learning model from 5 rounds of cross-validation to select the best predictive model. The model was then externally tested using clinical data of patients treated at the First Affiliated Hospital of Anhui Medical University from June 2021 to February 2022. Results: The final convolutional neural network--support vector machine (CNN-SVM) model predicted Glasgow Outcome Scale score with an accuracy of 93\% and 93.69\% in the test and external validation sets, respectively, and an area under the curve of 94.68\% and 94.32\% in the test and external validation sets, respectively. The mean absolute percentage error of the final built convolutional neural network--support vector regression (CNN-SVR) model predicting inpatient time in the test set and external validation set was 10.72\% and 10.44\%, respectively. The coefficient of determination (R2) was 0.93 and 0.92 in the test set and external validation set, respectively. Compared with back-propagation neural network, CNN, and SVM models built separately, our hybrid model was identified to be optimal and had high confidence. Conclusions: This study demonstrates the clinical utility of 2 hybrid models built by combining multiple machine learning approaches to accurately predict the prognosis and length of stay in hospital for adults and children with TBI. Application of these models may reduce the burden on physicians when assessing TBI and assist clinicians in the medical decision-making process. ", doi="10.2196/41819", url="https://www.jmir.org/2022/12/e41819", url="http://www.ncbi.nlm.nih.gov/pubmed/36485032" } @Article{info:doi/10.2196/40791, author="Figueroa Gray, Marlaine and Banegas, P. Matthew and Henrikson, B. Nora", title="Conceptions of Legacy Among People Making Treatment Choices for Serious Illness: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2022", month="Dec", day="9", volume="11", number="12", pages="e40791", keywords="communication", keywords="clinical decision", keywords="end-of-life care-ethics", keywords="end of life", keywords="palliative", keywords="ethic", keywords="ethical", keywords="psychological care", keywords="quality of life", keywords="spiritual care", keywords="spiritual", keywords="legacy", keywords="serious illness", keywords="critical illness", keywords="dying", keywords="choice", keywords="decision-making", keywords="conceptual framework", keywords="review methodology", keywords="librarian", keywords="library science", keywords="search strategy", keywords="scoping review", abstract="Background: Legacy---what one leaves behind and how one hopes to be remembered after death---is an unexplored and important dimension of decision-making for people facing serious illnesses. A preliminary literature review suggests that patients facing serious illness consider legacy when making medical decisions, for example, forgoing expensive treatment with limited or unknown clinical benefit to preserve one's inheritance for their children. To date, very little is known about the conceptual foundations of legacy. No conceptual frameworks exist that provide a comprehensive understanding of how legacy considerations relate to patient choices about their medical care. Objective: The objective of this scoping review is to understand the extent and type of research addressing the concept of legacy by people facing serious illness to inform a conceptual framework of legacy and patient treatment choices. Methods: This protocol follows the guidelines put forth by Levac et al, which expands the framework introduced by Arksey and O'Malley, as well as the Joanna Briggs Institute Reviewer's manual. This scoping review will explore several electronic databases including PubMed, Medline, CINAHL, Cochrane Library, PsycINFO, and others and will include legacy-specific gray literature, including dissertation research available via ProQuest. An initial search will be conducted in English-language literature from 1990 to the present with selected keywords to identify relevant articles and refine the search strategy. After the search strategy has been finalized, 2 independent reviewers will undertake a 2-part study selection process. In the first step, reviewers will screen article titles and abstracts to identify the eligibility of each article based on predetermined exclusion or inclusion criteria. A third senior reviewer will arbitrate discrepancies regarding inclusions or exclusions. During the second step, the full texts will be screened by 2 reviewers, and only relevant articles will be kept. Relevant study data will be extracted, collated, and charted to summarize the key findings related to the construct of legacy. Results: This study will identify how people facing serious illness define legacy, and how their thinking about legacy impacts the choices they make about their medical treatments. We will note gaps in the literature base. The findings of this study will inform a conceptual model that outlines how ideas about legacy impact the patient's treatment choices. The results of this study will be submitted to an indexed journal. Conclusions: Very little is known about the role of legacy in the treatment decisions of patients across the continuum of serious illness. In particular, no comprehensive conceptual model exists that would provide an understanding of how legacy is considered by people making decisions about their care during serious illness. This study will be among the first to construct a conceptual model detailing how considerations of legacy impact medical decision-making for people facing or living with serious illnesses. International Registered Report Identifier (IRRID): DERR1-10.2196/40791 ", doi="10.2196/40791", url="https://www.researchprotocols.org/2022/12/e40791", url="http://www.ncbi.nlm.nih.gov/pubmed/36485023" } @Article{info:doi/10.2196/40831, author="Ramaswamy, Priya and Shah, Aalap and Kothari, Rishi and Schloemerkemper, Nina and Methangkool, Emily and Aleck, Amalia and Shapiro, Anne and Dayal, Rakhi and Young, Charlotte and Spinner, Jon and Deibler, Carly and Wang, Kaiyi and Robinowitz, David and Gandhi, Seema", title="An Accessible Clinical Decision Support System to Curtail Anesthetic Greenhouse Gases in a Large Health Network: Implementation Study", journal="JMIR Perioper Med", year="2022", month="Dec", day="8", volume="5", number="1", pages="e40831", keywords="clinical decision support", keywords="sustainability", keywords="intraoperative", keywords="perioperative", keywords="anesthetic gas", keywords="waste reduction", keywords="fresh gas flow", abstract="Background: Inhaled anesthetics in the operating room are potent greenhouse gases and are a key contributor to carbon emissions from health care facilities. Real-time clinical decision support (CDS) systems lower anesthetic gas waste by prompting anesthesia professionals to reduce fresh gas flow (FGF) when a set threshold is exceeded. However, previous CDS systems have relied on proprietary or highly customized anesthesia information management systems, significantly reducing other institutions' accessibility to the technology and thus limiting overall environmental benefit. Objective: In 2018, a CDS system that lowers anesthetic gas waste using methods that can be easily adopted by other institutions was developed at the University of California San Francisco (UCSF). This study aims to facilitate wider uptake of our CDS system and further reduce gas waste by describing the implementation of the FGF CDS toolkit at UCSF and the subsequent implementation at other medical campuses within the University of California Health network. Methods: We developed a noninterruptive active CDS system to alert anesthesia professionals when FGF rates exceeded 0.7 L per minute for common volatile anesthetics. The implementation process at UCSF was documented and assembled into an informational toolkit to aid in the integration of the CDS system at other health care institutions. Before implementation, presentation-based education initiatives were used to disseminate information regarding the safety of low FGF use and its relationship to environmental sustainability. Our FGF CDS toolkit consisted of 4 main components for implementation: sustainability-focused education of anesthesia professionals, hardware integration of the CDS technology, software build of the CDS system, and data reporting of measured outcomes. Results: The FGF CDS system was successfully deployed at 5 University of California Health network campuses. Four of the institutions are independent from the institution that created the CDS system. The CDS system was deployed at each facility using the FGF CDS toolkit, which describes the main components of the technology and implementation. Each campus made modifications to the CDS tool to best suit their institution, emphasizing the versatility and adoptability of the technology and implementation framework. Conclusions: It has previously been shown that the FGF CDS system reduces anesthetic gas waste, leading to environmental and fiscal benefits. Here, we demonstrate that the CDS system can be transferred to other medical facilities using our toolkit for implementation, making the technology and associated benefits globally accessible to advance mitigation of health care--related emissions. ", doi="10.2196/40831", url="https://periop.jmir.org/2022/1/e40831", url="http://www.ncbi.nlm.nih.gov/pubmed/36480254" } @Article{info:doi/10.2196/37239, author="Bhavnani, K. Suresh and Zhang, Weibin and Visweswaran, Shyam and Raji, Mukaila and Kuo, Yong-Fang", title="A Framework for Modeling and Interpreting Patient Subgroups Applied to Hospital Readmission: Visual Analytical Approach", journal="JMIR Med Inform", year="2022", month="Dec", day="7", volume="10", number="12", pages="e37239", keywords="visual analytics", keywords="Bipartite Network analysis", keywords="hospital readmission", keywords="precision medicine", keywords="modeling", keywords="Medicare", abstract="Background: A primary goal of precision medicine is to identify patient subgroups and infer their underlying disease processes with the aim of designing targeted interventions. Although several studies have identified patient subgroups, there is a considerable gap between the identification of patient subgroups and their modeling and interpretation for clinical applications. Objective: This study aimed to develop and evaluate a novel analytical framework for modeling and interpreting patient subgroups (MIPS) using a 3-step modeling approach: visual analytical modeling to automatically identify patient subgroups and their co-occurring comorbidities and determine their statistical significance and clinical interpretability; classification modeling to classify patients into subgroups and measure its accuracy; and prediction modeling to predict a patient's risk of an adverse outcome and compare its accuracy with and without patient subgroup information. Methods: The MIPS framework was developed using bipartite networks to identify patient subgroups based on frequently co-occurring high-risk comorbidities, multinomial logistic regression to classify patients into subgroups, and hierarchical logistic regression to predict the risk of an adverse outcome using subgroup membership compared with standard logistic regression without subgroup membership. The MIPS framework was evaluated for 3 hospital readmission conditions: chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), and total hip arthroplasty/total knee arthroplasty (THA/TKA) (COPD: n=29,016; CHF: n=51,550; THA/TKA: n=16,498). For each condition, we extracted cases defined as patients readmitted within 30 days of hospital discharge. Controls were defined as patients not readmitted within 90 days of discharge, matched by age, sex, race, and Medicaid eligibility. Results: In each condition, the visual analytical model identified patient subgroups that were statistically significant (Q=0.17, 0.17, 0.31; P<.001, <.001, <.05), significantly replicated (Rand Index=0.92, 0.94, 0.89; P<.001, <.001, <.01), and clinically meaningful to clinicians. In each condition, the classification model had high accuracy in classifying patients into subgroups (mean accuracy=99.6\%, 99.34\%, 99.86\%). In 2 conditions (COPD and THA/TKA), the hierarchical prediction model had a small but statistically significant improvement in discriminating between readmitted and not readmitted patients as measured by net reclassification improvement (0.059, 0.11) but not as measured by the C-statistic or integrated discrimination improvement. Conclusions: Although the visual analytical models identified statistically and clinically significant patient subgroups, the results pinpoint the need to analyze subgroups at different levels of granularity for improving the interpretability of intra- and intercluster associations. The high accuracy of the classification models reflects the strong separation of patient subgroups, despite the size and density of the data sets. Finally, the small improvement in predictive accuracy suggests that comorbidities alone were not strong predictors of hospital readmission, and the need for more sophisticated subgroup modeling methods. Such advances could improve the interpretability and predictive accuracy of patient subgroup models for reducing the risk of hospital readmission, and beyond. ", doi="10.2196/37239", url="https://medinform.jmir.org/2022/12/e37239", url="http://www.ncbi.nlm.nih.gov/pubmed/35537203" } @Article{info:doi/10.2196/40589, author="An, Ruopeng and Shen, Jing and Xiao, Yunyu", title="Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies", journal="J Med Internet Res", year="2022", month="Dec", day="7", volume="24", number="12", pages="e40589", keywords="artificial intelligence", keywords="deep learning", keywords="machine learning", keywords="obesity", keywords="scoping review", abstract="Background: Obesity is a leading cause of preventable death worldwide. Artificial intelligence (AI), characterized by machine learning (ML) and deep learning (DL), has become an indispensable tool in obesity research. Objective: This scoping review aimed to provide researchers and practitioners with an overview of the AI applications to obesity research, familiarize them with popular ML and DL models, and facilitate the adoption of AI applications. Methods: We conducted a scoping review in PubMed and Web of Science on the applications of AI to measure, predict, and treat obesity. We summarized and categorized the AI methodologies used in the hope of identifying synergies, patterns, and trends to inform future investigations. We also provided a high-level, beginner-friendly introduction to the core methodologies to facilitate the dissemination and adoption of various AI techniques. Results: We identified 46 studies that used diverse ML and DL models to assess obesity-related outcomes. The studies found AI models helpful in detecting clinically meaningful patterns of obesity or relationships between specific covariates and weight outcomes. The majority (18/22, 82\%) of the studies comparing AI models with conventional statistical approaches found that the AI models achieved higher prediction accuracy on test data. Some (5/46, 11\%) of the studies comparing the performances of different AI models revealed mixed results, indicating the high contingency of model performance on the data set and task it was applied to. An accelerating trend of adopting state-of-the-art DL models over standard ML models was observed to address challenging computer vision and natural language processing tasks. We concisely introduced the popular ML and DL models and summarized their specific applications in the studies included in the review. Conclusions: This study reviewed AI-related methodologies adopted in the obesity literature, particularly ML and DL models applied to tabular, image, and text data. The review also discussed emerging trends such as multimodal or multitask AI models, synthetic data generation, and human-in-the-loop that may witness increasing applications in obesity research. ", doi="10.2196/40589", url="https://www.jmir.org/2022/12/e40589", url="http://www.ncbi.nlm.nih.gov/pubmed/36476515" } @Article{info:doi/10.2196/38655, author="Ghuwalewala, Suraj and Kulkarni, Viraj and Pant, Richa and Kharat, Amit", title="Levels of Autonomous Radiology", journal="Interact J Med Res", year="2022", month="Dec", day="7", volume="11", number="2", pages="e38655", keywords="artificial intelligence", keywords="automation", keywords="machine learning", keywords="radiology", keywords="explainability", keywords="model decay", keywords="generalizability", keywords="fairness and bias", keywords="distributed learning", keywords="autonomous radiology", keywords="AI assistance", doi="10.2196/38655", url="https://www.i-jmr.org/2022/2/e38655", url="http://www.ncbi.nlm.nih.gov/pubmed/36476422" } @Article{info:doi/10.2196/41889, author="Fadahunsi, Philip Kayode and Wark, A. Petra and Mastellos, Nikolaos and Neves, Luisa Ana and Gallagher, Joseph and Majeed, Azeem and Webster, Andrew and Smith, Anthony and Choo-Kang, Brian and Leon, Catherine and Edwards, Christopher and O'Shea, Conor and Heitz, Elizabeth and Kayode, Valentine Olamide and Nash, Makeba and Kowalski, Martin and Jiwani, Mateen and O'Callaghan, Edmund Michael and Zary, Nabil and Henderson, Nicola and Chavannes, H. Niels and ?ivljak, Rok and Olubiyi, Abiola Olubunmi and Mahapatra, Piyush and Panday, Nannan Rishi and Oriji, O. Sunday and Fox, Erlikh Tatiana and Faint, Victoria and Car, Josip", title="Assessment of Clinical Information Quality in Digital Health Technologies: International eDelphi Study", journal="J Med Internet Res", year="2022", month="Dec", day="6", volume="24", number="12", pages="e41889", keywords="information quality", keywords="digital health technology", keywords="patient safety", keywords="perspective", keywords="digital health technologies", keywords="DHT", keywords="thematic analysis", keywords="clarity", keywords="understandable", keywords="understandability", keywords="readability", keywords="searchability", keywords="security", keywords="decision support system", keywords="framework development", keywords="framework", abstract="Background: Digital health technologies (DHTs), such as electronic health records and prescribing systems, are transforming health care delivery around the world. The quality of information in DHTs is key to the quality and safety of care. We developed a novel clinical information quality (CLIQ) framework to assess the quality of clinical information in DHTs. Objective: This study explored clinicians' perspectives on the relevance, definition, and assessment of information quality dimensions in the CLIQ framework. Methods: We used a systematic and iterative eDelphi approach to engage clinicians who had information governance roles or personal interest in information governance; the clinicians were recruited through purposive and snowball sampling techniques. Data were collected using semistructured online questionnaires until consensus was reached on the information quality dimensions in the CLIQ framework. Responses on the relevance of the dimensions were summarized to inform decisions on retention of the dimensions according to prespecified rules. Thematic analysis of the free-text responses was used to revise definitions and the assessment of dimensions. Results: Thirty-five clinicians from 10 countries participated in the study, which was concluded after the second round. Consensus was reached on all dimensions and categories in the CLIQ framework: informativeness (accuracy, completeness, interpretability, plausibility, provenance, and relevance), availability (accessibility, portability, security, and timeliness), and usability (conformance, consistency, and maintainability). A new dimension, searchability, was introduced in the availability category to account for the ease of finding needed information in the DHTs. Certain dimensions were renamed, and some definitions were rephrased to improve clarity. Conclusions: The CLIQ framework reached a high expert consensus and clarity of language relating to the information quality dimensions. The framework can be used by health care managers and institutions as a pragmatic tool for identifying and forestalling information quality problems that could compromise patient safety and quality of care. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2021-057430 ", doi="10.2196/41889", url="https://www.jmir.org/2022/12/e41889", url="http://www.ncbi.nlm.nih.gov/pubmed/36472901" } @Article{info:doi/10.2196/41163, author="Chiu, I-Min and Cheng, Jhu-Yin and Chen, Tien-Yu and Wang, Yi-Min and Cheng, Chi-Yung and Kung, Chia-Te and Cheng, Fu-Jen and Yau, Flora Fei-Fei and Lin, Richard Chun-Hung", title="Using Deep Transfer Learning to Detect Hyperkalemia From Ambulatory Electrocardiogram Monitors in Intensive Care Units: Personalized Medicine Approach", journal="J Med Internet Res", year="2022", month="Dec", day="5", volume="24", number="12", pages="e41163", keywords="deep learning", keywords="transfer learning", keywords="hyperkalemia", keywords="electrocardiogram", keywords="ECG monitor", keywords="ICU", keywords="personalized medicine", abstract="Background: Hyperkalemia is a critical condition, especially in intensive care units. So far, there have been no accurate and noninvasive methods for recognizing hyperkalemia events on ambulatory electrocardiogram monitors. Objective: This study aimed to improve the accuracy of hyperkalemia predictions from ambulatory electrocardiogram (ECG) monitors using a personalized transfer learning method; this would be done by training a generic model and refining it with personal data. Methods: This retrospective cohort study used open source data from the Waveform Database Matched Subset of the Medical Information Mart From Intensive Care III (MIMIC-III). We included patients with multiple serum potassium test results and matched ECG data from the MIMIC-III database. A 1D convolutional neural network--based deep learning model was first developed to predict hyperkalemia in a generic population. Once the model achieved a state-of-the-art performance, it was used in an active transfer learning process to perform patient-adaptive heartbeat classification tasks. Results: The results show that by acquiring data from each new patient, the personalized model can improve the accuracy of hyperkalemia detection significantly, from an average of 0.604 (SD 0.211) to 0.980 (SD 0.078), when compared with the generic model. Moreover, the area under the receiver operating characteristic curve level improved from 0.729 (SD 0.240) to 0.945 (SD 0.094). Conclusions: By using the deep transfer learning method, we were able to build a clinical standard model for hyperkalemia detection using ambulatory ECG monitors. These findings could potentially be extended to applications that continuously monitor one's ECGs for early alerts of hyperkalemia and help avoid unnecessary blood tests. ", doi="10.2196/41163", url="https://www.jmir.org/2022/12/e41163", url="http://www.ncbi.nlm.nih.gov/pubmed/36469396" } @Article{info:doi/10.2196/42163, author="Simon, T. Steven and Trinkley, E. Katy and Malone, C. Daniel and Rosenberg, Aaron Michael", title="Interpretable Machine Learning Prediction of Drug-Induced QT Prolongation: Electronic Health Record Analysis", journal="J Med Internet Res", year="2022", month="Dec", day="1", volume="24", number="12", pages="e42163", keywords="drug-induced QT prolongation", keywords="predictive modeling", keywords="interpretable machine learning", keywords="ML", keywords="artificial intelligence", keywords="AI", keywords="electronic health records", keywords="EHR", keywords="prediction", keywords="risk", keywords="monitoring", keywords="deep learning", abstract="Background: Drug-induced long-QT syndrome (diLQTS) is a major concern among patients who are hospitalized, for whom prediction models capable of identifying individualized risk could be useful to guide monitoring. We have previously demonstrated the feasibility of machine learning to predict the risk of diLQTS, in which deep learning models provided superior accuracy for risk prediction, although these models were limited by a lack of interpretability. Objective: In this investigation, we sought to examine the potential trade-off between interpretability and predictive accuracy with the use of more complex models to identify patients at risk for diLQTS. We planned to compare a deep learning algorithm to predict diLQTS with a more interpretable algorithm based on cluster analysis that would allow medication- and subpopulation-specific evaluation of risk. Methods: We examined the risk of diLQTS among 35,639 inpatients treated between 2003 and 2018 with at least 1 of 39 medications associated with risk of diLQTS and who had an electrocardiogram in the system performed within 24 hours of medication administration. Predictors included over 22,000 diagnoses and medications at the time of medication administration, with cases of diLQTS defined as a corrected QT interval over 500 milliseconds after treatment with a culprit medication. The interpretable model was developed using cluster analysis (K=4 clusters), and risk was assessed for specific medications and classes of medications. The deep learning model was created using all predictors within a 6-layer neural network, based on previously identified hyperparameters. Results: Among the medications, we found that class III antiarrhythmic medications were associated with increased risk across all clusters, and that in patients who are noncritically ill without cardiovascular disease, propofol was associated with increased risk, whereas ondansetron was associated with decreased risk. Compared with deep learning, the interpretable approach was less accurate (area under the receiver operating characteristic curve: 0.65 vs 0.78), with comparable calibration. Conclusions: In summary, we found that an interpretable modeling approach was less accurate, but more clinically applicable, than deep learning for the prediction of diLQTS. Future investigations should consider this trade-off in the development of methods for clinical prediction. ", doi="10.2196/42163", url="https://www.jmir.org/2022/12/e42163", url="http://www.ncbi.nlm.nih.gov/pubmed/36454608" } @Article{info:doi/10.2196/40485, author="Wenderott, Katharina and Gambashidze, Nikoloz and Weigl, Matthias", title="Integration of Artificial Intelligence Into Sociotechnical Work Systems---Effects of Artificial Intelligence Solutions in Medical Imaging on Clinical Efficiency: Protocol for a Systematic Literature Review", journal="JMIR Res Protoc", year="2022", month="Dec", day="1", volume="11", number="12", pages="e40485", keywords="artificial intelligence", keywords="clinical care", keywords="clinical efficiency", keywords="sociotechnical work system", keywords="sociotechnical", keywords="review methodology", keywords="systematic review", keywords="facilitator", keywords="barrier", keywords="diagnostic", keywords="diagnosis", keywords="diagnoses", keywords="digital health", keywords="adoption", keywords="implementation", keywords="literature review", keywords="literature search", keywords="search strategy", keywords="library science", keywords="medical librarian", keywords="narrative review", keywords="narrative synthesis", abstract="Background: When introducing artificial intelligence (AI) into clinical care, one of the main objectives is to improve workflow efficiency because AI-based solutions are expected to take over or support routine tasks. Objective: This study sought to synthesize the current knowledge base on how the use of AI technologies for medical imaging affects efficiency and what facilitators or barriers moderating the impact of AI implementation have been reported. Methods: In this systematic literature review, comprehensive literature searches will be performed in relevant electronic databases, including PubMed/MEDLINE, Embase, PsycINFO, Web of Science, IEEE Xplore, and CENTRAL. Studies in English and German published from 2000 onwards will be included. The following inclusion criteria will be applied: empirical studies targeting the workflow integration or adoption of AI-based software in medical imaging used for diagnostic purposes in a health care setting. The efficiency outcomes of interest include workflow adaptation, time to complete tasks, and workload. Two reviewers will independently screen all retrieved records, full-text articles, and extract data. The study's methodological quality will be appraised using suitable tools. The findings will be described qualitatively, and a meta-analysis will be performed, if possible. Furthermore, a narrative synthesis approach that focuses on work system factors affecting the integration of AI technologies reported in eligible studies will be adopted. Results: This review is anticipated to begin in September 2022 and will be completed in April 2023. Conclusions: This systematic review and synthesis aims to summarize the existing knowledge on efficiency improvements in medical imaging through the integration of AI into clinical workflows. Moreover, it will extract the facilitators and barriers of the AI implementation process in clinical care settings. Therefore, our findings have implications for future clinical implementation processes of AI-based solutions, with a particular focus on diagnostic procedures. This review is additionally expected to identify research gaps regarding the focus on seamless workflow integration of novel technologies in clinical settings. Trial Registration: PROSPERO CRD42022303439; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=303439 International Registered Report Identifier (IRRID): PRR1-10.2196/40485 ", doi="10.2196/40485", url="https://www.researchprotocols.org/2022/12/e40485", url="http://www.ncbi.nlm.nih.gov/pubmed/36454624" } @Article{info:doi/10.2196/42185, author="Tang, Ri and Zhang, Shuyi and Ding, Chenling and Zhu, Mingli and Gao, Yuan", title="Artificial Intelligence in Intensive Care Medicine: Bibliometric Analysis", journal="J Med Internet Res", year="2022", month="Nov", day="30", volume="24", number="11", pages="e42185", keywords="intensive care medicine", keywords="artificial intelligence", keywords="bibliometric analysis", keywords="machine learning", keywords="sepsis", abstract="Background: Interest in critical care--related artificial intelligence (AI) research is growing rapidly. However, the literature is still lacking in comprehensive bibliometric studies that measure and analyze scientific publications globally. Objective: The objective of this study was to assess the global research trends in AI in intensive care medicine based on publication outputs, citations, coauthorships between nations, and co-occurrences of author keywords. Methods: A total of 3619 documents published until March 2022 were retrieved from the Scopus database. After selecting the document type as articles, the titles and abstracts were checked for eligibility. In the final bibliometric study using VOSviewer, 1198 papers were included. The growth rate of publications, preferred journals, leading research countries, international collaborations, and top institutions were computed. Results: The number of publications increased steeply between 2018 and 2022, accounting for 72.53\% (869/1198) of all the included papers. The United States and China contributed to approximately 55.17\% (661/1198) of the total publications. Of the 15 most productive institutions, 9 were among the top 100 universities worldwide. Detecting clinical deterioration, monitoring, predicting disease progression, mortality, prognosis, and classifying disease phenotypes or subtypes were some of the research hot spots for AI in patients who are critically ill. Neural networks, decision support systems, machine learning, and deep learning were all commonly used AI technologies. Conclusions: This study highlights popular areas in AI research aimed at improving health care in intensive care units, offers a comprehensive look at the research trend in AI application in the intensive care unit, and provides an insight into potential collaboration and prospects for future research. The 30 articles that received the most citations were listed in detail. For AI-based clinical research to be sufficiently convincing for routine critical care practice, collaborative research efforts are needed to increase the maturity and robustness of AI-driven models. ", doi="10.2196/42185", url="https://www.jmir.org/2022/11/e42185", url="http://www.ncbi.nlm.nih.gov/pubmed/36449345" } @Article{info:doi/10.2196/40361, author="Chen, Anjun and Huang, Ran and Wu, Erman and Han, Ruobing and Wen, Jian and Li, Qinghua and Zhang, Zhiyong and Shen, Bairong", title="The Generation of a Lung Cancer Health Factor Distribution Using Patient Graphs Constructed From Electronic Medical Records: Retrospective Study", journal="J Med Internet Res", year="2022", month="Nov", day="25", volume="24", number="11", pages="e40361", keywords="lung cancer", keywords="risk factor", keywords="patient graph", keywords="UMLS knowledge graph", keywords="Unified Medical Language System", keywords="connection delta ratio", keywords="EMR", keywords="electronic health record", keywords="EHR", keywords="cancer", abstract="Background: Electronic medical records (EMRs) of patients with lung cancer (LC) capture a variety of health factors. Understanding the distribution of these factors will help identify key factors for risk prediction in preventive screening for LC. Objective: We aimed to generate an integrated biomedical graph from EMR data and Unified Medical Language System (UMLS) ontology for LC, and to generate an LC health factor distribution from a hospital EMR of approximately 1 million patients. Methods: The data were collected from 2 sets of 1397 patients with and those without LC. A patient-centered health factor graph was plotted with 108,000 standardized data, and a graph database was generated to integrate the graphs of patient health factors and the UMLS ontology. With the patient graph, we calculated the connection delta ratio (CDR) for each of the health factors to measure the relative strength of the factor's relationship to LC. Results: The patient graph had 93,000 relations between the 2794 patient nodes and 650 factor nodes. An LC graph with 187 related biomedical concepts and 188 horizontal biomedical relations was plotted and linked to the patient graph. Searching the integrated biomedical graph with any number or category of health factors resulted in graphical representations of relationships between patients and factors, while searches using any patient presented the patient's health factors from the EMR and the LC knowledge graph (KG) from the UMLS in the same graph. Sorting the health factors by CDR in descending order generated a distribution of health factors for LC. The top 70 CDR-ranked factors of disease, symptom, medical history, observation, and laboratory test categories were verified to be concordant with those found in the literature. Conclusions: By collecting standardized data of thousands of patients with and those without LC from the EMR, it was possible to generate a hospital-wide patient-centered health factor graph for graph search and presentation. The patient graph could be integrated with the UMLS KG for LC and thus enable hospitals to bring continuously updated international standard biomedical KGs from the UMLS for clinical use in hospitals. CDR analysis of the graph of patients with LC generated a CDR-sorted distribution of health factors, in which the top CDR-ranked health factors were concordant with the literature. The resulting distribution of LC health factors can be used to help personalize risk evaluation and preventive screening recommendations. ", doi="10.2196/40361", url="https://www.jmir.org/2022/11/e40361", url="http://www.ncbi.nlm.nih.gov/pubmed/36427233" } @Article{info:doi/10.2196/35880, author="Perrot, Serge and Trouvin, Anne-Priscille and Clairaz-Mahiou, Beatrice and Tempremant, Gr{\'e}gory and Martial, Fran{\c{c}}ois and Br{\'e}ment, Diane and Cherkaoui, Asmaa", title="A Computerized Pharmacy Decision Support System (PDSS) for Headache Management: Observational Pilot Study", journal="Interact J Med Res", year="2022", month="Nov", day="25", volume="11", number="2", pages="e35880", keywords="headache", keywords="pharmacy", keywords="counselling", keywords="over-the-counter (OTC) medication", keywords="self-medication", keywords="decision support system", keywords="patient perception", abstract="Background: Headaches are common and often lead patients to seek advice from a pharmacist and consequently self-medicate for relief. Computerized pharmacy decision support systems (PDSSs) may be a valuable resource for health care professionals, particularly for community pharmacists when counseling patients with headache, to guide treatment with over-the-counter medications and recognize patients who require urgent or specialist care. Objective: This observational pilot study aimed to evaluate a newly developed PDSS web app for the management of patients seeking advice from a pharmacy for headache. This study examined the use of the PDSS web app and if it had an impact on patient or pharmacy personnel counseling, pharmacy personnel perception, and patient perception. Methods: The PDSS web app was developed according to Francophone des Sciences Pharmaceutiques Officinales (SFSPO) recommendations for headache management, and was made available to pharmacies in 2 regions of France: Hauts de France and New Aquitaine. Pharmacy personnel received 2 hours of training before using the PDSS web app. All people who visited the pharmacies for headache between June 29, 2020, and December 31, 2020, were offered an interview based on the PDSS web app and given information about the next steps in the management of headaches and advice on the proper use of their medication. Patients and pharmacy personnel reported satisfaction with the PDSS web app following consultations or during a follow-up period (January 18 to 25, 2021). Results: Of the 44 pharmacies that received the PDSS web app, 38 pharmacies representing 179 pharmacy personnel used the PDSS web app, and 435 people visited these pharmacies for headache during the study period. Of these, 70.0\% (305/435) asked for immediate over-the-counter analgesics for themselves and consulted with pharmacy personnel with the use of the PDSS web app. The majority of these patients were given advice and analgesics for self-medication (346/435, 79.5\%); however, 17.0\% (74/435) were given analgesics and referred to urgent medical services, and 3.5\% (15/435) were given analgesics and referred to their general practitioner. All pharmacy personnel (n=45) were satisfied or very satisfied with the use of the PDSS web app, and a majority thought it improved the quality of their care (41/44, 93.2\%). Most pharmacy personnel felt that the PDSS web app modified their approach to management of headache (29/45, 64.4\%). Most patients were very satisfied with the PDSS web app during their consultation (96/119, 80.7\%), and all felt mostly or completely reassured. Conclusions: Use of the PDSS web app for the management of patients with headache improved the perceived quality of care for pharmacy personnel and patients. The PDSS web app was well accepted and effectively identified patients who required specialist medical management. Further studies should identify additional ``red flags'' for more effective screening and management of patients via the PDSS web app. Larger studies can measure the impact of the PDSS web app on the lives of patients and how safe or appropriate pharmacy personnel recommendations are. ", doi="10.2196/35880", url="https://www.i-jmr.org/2022/2/e35880", url="http://www.ncbi.nlm.nih.gov/pubmed/36427228" } @Article{info:doi/10.2196/38182, author="Zhang, Yan and Chen, Chao and Huang, Lingfeng and Liu, Gang and Lian, Tingyu and Yin, Mingjuan and Zhao, Zhiguang and Xu, Jian and Chen, Ruoling and Fu, Yingbin and Liang, Dongmei and Zeng, Jinmei and Ni, Jindong", title="Associations Among Multimorbid Conditions in Hospitalized Middle-aged and Older Adults in China: Statistical Analysis of Medical Records", journal="JMIR Public Health Surveill", year="2022", month="Nov", day="24", volume="8", number="11", pages="e38182", keywords="multimorbidity", keywords="chronic conditions", keywords="aging", keywords="association rule mining", keywords="decision tree analysis", abstract="Background: Multimorbidity has become a new challenge for medical systems and public health policy. Understanding the patterns of and associations among multimorbid conditions should be given priority. It may assist with the early detection of multimorbidity and thus improve quality of life in older adults. Objective: This study aims to comprehensively analyze and compare associations among multimorbid conditions by age and sex in a large number of middle-aged and older Chinese adults. Methods: Data from the home pages of inpatient medical records in the Shenzhen National Health Information Platform were evaluated. From January 1, 2017, to December 31, 2018, inpatients aged 50 years and older who had been diagnosed with at least one of 40 conditions were included in this study. Their demographic characteristics (age and sex) and inpatient diagnoses were extracted. Association rule mining, Chi-square tests, and decision tree analyses were combined to identify associations between multiple chronic conditions. Results: In total, 306,264 hospitalized cases with available information on related chronic conditions were included in this study. The prevalence of multimorbidity in the overall population was 76.46\%. The combined results of the 3 analyses showed that, in patients aged 50 years to 64 years, lipoprotein metabolism disorder tended to be comorbid with multiple chronic conditions. Gout and lipoprotein metabolism disorder had the strongest association. Among patients aged 65 years or older, there were strong associations between cerebrovascular disease, heart disease, lipoprotein metabolism disorder, and peripheral vascular disease. The strongest associations were observed between senile cataract and glaucoma in men and women. In particular, the association between osteoporosis and malignant tumor was only observed in middle-aged and older men, while the association between anemia and chronic kidney disease was only observed in older women. Conclusions: Multimorbidity was prevalent among middle-aged and older Chinese individuals. The results of this comprehensive analysis of 4 age-sex subgroups suggested that associations between particular conditions within the sex and age groups occurred more frequently than expected by random chance. This provides evidence for further research on disease clusters and for health care providers to develop different strategies based on age and sex to improve the early identification and treatment of multimorbidity. ", doi="10.2196/38182", url="https://publichealth.jmir.org/2022/11/e38182", url="http://www.ncbi.nlm.nih.gov/pubmed/36422885" } @Article{info:doi/10.2196/43027, author="Ugalde, T. Irma and Chaudhari, P. Pradip and Badawy, Mohamed and Ishimine, Paul and McCarten-Gibbs, A. Kevan and Yen, Kenneth and Atigapramoj, S. Nisa and Sage, Allyson and Nielsen, Donovan and Adelson, David P. and Upperman, Jeffrey and Tancredi, Daniel and Kuppermann, Nathan and Holmes, F. James", title="Validation of Prediction Rules for Computed Tomography Use in Children With Blunt Abdominal or Blunt Head Trauma: Protocol for a Prospective Multicenter Observational Cohort Study", journal="JMIR Res Protoc", year="2022", month="Nov", day="24", volume="11", number="11", pages="e43027", keywords="pediatric trauma", keywords="intra-abdominal injury", keywords="traumatic brain injury", keywords="clinical prediction rules", keywords="emergency medicine", abstract="Background: Traumatic brain injuries (TBIs) and intra-abdominal injuries (IAIs) are 2 leading causes of traumatic death and disability in children. To avoid missed or delayed diagnoses leading to increased morbidity, computed tomography (CT) is used liberally. However, the overuse of CT leads to inefficient care and radiation-induced malignancies. Therefore, to maximize precision and minimize the overuse of CT, the Pediatric Emergency Care Applied Research Network (PECARN) previously derived clinical prediction rules for identifying children at high risk and very low risk for IAIs undergoing acute intervention and clinically important TBIs after blunt trauma in large cohorts of children who are injured. Objective: This study aimed to validate the IAI and age-based TBI clinical prediction rules for identifying children at high risk and very low risk for IAIs undergoing acute intervention and clinically important TBIs after blunt trauma. Methods: This was a prospective 6-center observational study of children aged <18 years with blunt torso or head trauma. Consistent with the original derivation studies, enrolled children underwent routine history and physical examinations, and the treating clinicians completed case report forms prior to knowledge of CT results (if performed). Medical records were reviewed to determine clinical courses and outcomes for all patients, and for those who were discharged from the emergency department, a follow-up survey via a telephone call or SMS text message was performed to identify any patients with missed IAIs or TBIs. The primary outcomes were IAI undergoing acute intervention (therapeutic laparotomy, angiographic embolization, blood transfusion, or intravenous fluid for ?2 days for pancreatic or gastrointestinal injuries) and clinically important TBI (death from TBI, neurosurgical procedure, intubation for >24 hours for TBI, or hospital admission of ?2 nights due to a TBI on CT). Prediction rule accuracy was assessed by measuring rule classification performance, using standard point and 95\% CI estimates of the operational characteristics of each prediction rule (sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratios). Results: The project was funded in 2016, and enrollment was completed on September 1, 2021. Data analyses are expected to be completed by December 2022, and the primary study results are expected to be submitted for publication in 2023. Conclusions: This study will attempt to validate previously derived clinical prediction rules to accurately identify children at high and very low risk for clinically important IAIs and TBIs. Assuming successful validation, widespread implementation is then indicated, which will optimize the care of children who are injured by better aligning CT use with need. International Registered Report Identifier (IRRID): RR1-10.2196/43027 ", doi="10.2196/43027", url="https://www.researchprotocols.org/2022/11/e43027", url="http://www.ncbi.nlm.nih.gov/pubmed/36422920" } @Article{info:doi/10.2196/36929, author="Dong, Xuejie and Ding, Fang and Zhou, Shuduo and Ma, Junxiong and Li, Na and Maimaitiming, Mailikezhati and Xu, Yawei and Guo, Zhigang and Jia, Shaobin and Li, Chunjie and Luo, Suxin and Bian, Huiping and Luobu, Gesang and Yuan, Zuyi and Shi, Hong and Zheng, Zhi-jie and Jin, Yinzi and Huo, Yong", title="Optimizing an Emergency Medical Dispatch System to Improve Prehospital Diagnosis and Treatment of Acute Coronary Syndrome: Nationwide Retrospective Study in China", journal="J Med Internet Res", year="2022", month="Nov", day="23", volume="24", number="11", pages="e36929", keywords="medical priority dispatch system", keywords="acute coronary syndrome", keywords="prehospital care", keywords="emergency medical service", keywords="health service", keywords="healthcare", keywords="health care", keywords="coronary", keywords="cardiology", keywords="cardiovascular", abstract="Background: Acute coronary syndrome (ACS) is the most time-sensitive acute cardiac event that requires rapid dispatching and response. The medical priority dispatch system (MPDS), one of the most extensively used types of emergency dispatch systems, is hypothesized to provide better-quality prehospital emergency treatment. However, few studies have revealed the impact of MPDS use on the process of ACS care. Objective: This study aimed to investigate whether the use of MPDS was associated with higher prehospital diagnosis accuracy and shorter prehospital delay for patients with ACS transferred by an emergency medical service (EMS), using a national database in China. Methods: This retrospective analysis was based on an integrated database of China's MPDS and hospital registry. From January 1, 2016, to December 31, 2020, EMS-treated ACS cases were divided into before MPDS and after MPDS groups in accordance with the MPDS launch time at each EMS center. The primary outcomes included diagnosis consistency between hospital admission and discharge, and prehospital delay. Multivariable logistic regression and propensity score--matching analysis were performed to compare outcomes between the 2 groups for total ACS and subtypes. Results: A total of 9806 ACS cases (3561 before MPDS and 6245 after MPDS) treated by 43 EMS centers were included. The overall diagnosis consistency of the after MPDS group (Cohen $\kappa$=0.918, P<.001) was higher than that of the before MPDS group (Cohen $\kappa$=0.889, P<.001). After the use of the MPDS, the call-to-EMS arrival time was shortened in the matched ACS cases (20.0 vs 16.0 min, P<.001; adjusted difference: --1.67, 95\% CI --2.33 to --1.02; P<.001) and in the subtype of ST-elevation myocardial infarction (adjusted difference: --3.81, 95\% CI --4.63 to --2.98, P<.001), while the EMS arrival-to-door time (20.0 vs 20.0 min, P=.31) was not significantly different in all ACS cases and subtypes. Conclusions: The optimized use of MPDS in China was associated with increased diagnosis consistency and a reduced call-to-EMS arrival time among EMS-treated patients with ACS. An emergency medical dispatch system should be designed specifically to fit into different prehospital modes in the EMS system on a regional basis. ", doi="10.2196/36929", url="https://www.jmir.org/2022/11/e36929", url="http://www.ncbi.nlm.nih.gov/pubmed/36416876" } @Article{info:doi/10.2196/40516, author="Overmars, Malin L. and Niemantsverdriet, A. Michael S. and Groenhof, J. T. Katrien and De Groot, H. Mark C. and Hulsbergen-Veelken, R. Cornelia A. and Van Solinge, W. Wouter and Musson, A. Ruben E. and Ten Berg, J. Maarten and Hoefer, E. Imo and Haitjema, Saskia", title="A Wolf in Sheep's Clothing: Reuse of Routinely Obtained Laboratory Data in Research", journal="J Med Internet Res", year="2022", month="Nov", day="18", volume="24", number="11", pages="e40516", keywords="laboratory data", keywords="electronic health records", keywords="preprocessing", keywords="applied data science", keywords="laboratory", keywords="data", keywords="clinical", keywords="decision support", keywords="decision", keywords="research", keywords="analysis", keywords="patient", keywords="value", keywords="clinical care", doi="10.2196/40516", url="https://www.jmir.org/2022/11/e40516", url="http://www.ncbi.nlm.nih.gov/pubmed/36399373" } @Article{info:doi/10.2196/38677, author="Han, Feng and Zhang, ZiHeng and Zhang, Hongjian and Nakaya, Jun and Kudo, Kohsuke and Ogasawara, Katsuhiko", title="Extraction and Quantification of Words Representing Degrees of Diseases: Combining the Fuzzy C-Means Method and Gaussian Membership", journal="JMIR Form Res", year="2022", month="Nov", day="18", volume="6", number="11", pages="e38677", keywords="medical text", keywords="fuzzy c-means", keywords="cluster", keywords="algorithm", keywords="machine learning", keywords="word quantification", keywords="fuzzification", keywords="Gauss", keywords="radiology", keywords="medical report", keywords="documentation", keywords="text mining", keywords="data mining", keywords="extraction", keywords="unstructured", keywords="free text", keywords="quantification", keywords="fuzzy", keywords="diagnosis", keywords="diagnostic", keywords="EHR", keywords="support system", abstract="Background: Due to the development of medical data, a large amount of clinical data has been generated. These unstructured data contain substantial information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Marketplace for Medical Information in Intensive Care III (MIMIC-III) data set, contain several ambiguous words that demonstrate the subjectivity of doctors, such as descriptions of patient symptoms. These data could be used to further improve the accuracy of medical diagnostic system assessments. To the best of our knowledge, there is currently no method for extracting subjective words that express the extent of these symptoms (hereinafter, ``degree words''). Objective: Therefore, we propose using the fuzzy c-means (FCM) method and Gaussian membership to quantify the degree words in the clinical medical data set MIMIC-III. Methods: First, we preprocessed the 381,091 radiology reports collected in MIMIC-III, and then we used the FCM method to extract degree words from unstructured text. Thereafter, we used the Gaussian membership method to quantify the extracted degree words, which transform the fuzzy words extracted from the medical text into computer-recognizable numbers. Results: The results showed that the digitization of ambiguous words in medical texts is feasible. The words representing each degree of each disease had a range of corresponding values. Examples of membership medians were 2.971 (atelectasis), 3.121 (pneumonia), 2.899 (pneumothorax), 3.051 (pulmonary edema), and 2.435 (pulmonary embolus). Additionally, all extracted words contained the same subjective words (low, high, etc), which allows for an objective evaluation method. Furthermore, we will verify the specific impact of the quantification results of ambiguous words such as symptom words and degree words on the use of medical texts in subsequent studies. These same ambiguous words may be used as a new set of feature values to represent the disorders. Conclusions: This study proposes an innovative method for handling subjective words. We used the FCM method to extract the subjective degree words in the English-interpreted report of the MIMIC-III and then used the Gaussian functions to quantify the subjective degree words. In this method, words containing subjectivity in unstructured texts can be automatically processed and transformed into numerical ranges by digital processing. It was concluded that the digitization of ambiguous words in medical texts is feasible. ", doi="10.2196/38677", url="https://formative.jmir.org/2022/11/e38677", url="http://www.ncbi.nlm.nih.gov/pubmed/36399376" } @Article{info:doi/10.2196/42955, author="Shi, Jiaxiao and Fassett, J. Michael and Chiu, Y. Vicki and Avila, C. Chantal and Khadka, Nehaa and Brown, Brittany and Patel, Pooja and Mensah, Nana and Xie, Fagen and Peltier, R. Morgan and Getahun, Darios", title="Postpartum Migraine Headache Coding in Electronic Health Records of a Large Integrated Health Care System: Validation Study", journal="JMIR Form Res", year="2022", month="Nov", day="17", volume="6", number="11", pages="e42955", keywords="migraine headache", keywords="validation", keywords="diagnosis", keywords="pharmacy", keywords="postpartum", keywords="medical record", keywords="health plan", keywords="electronic health record", keywords="coding", keywords="pharmacy record", keywords="diagnostic code", keywords="EHR system", abstract="Background: Migraine is a common neurological disorder characterized by repeated headaches of varying intensity. The prevalence and severity of migraine headaches disproportionally affects women, particularly during the postpartum period. Moreover, migraines during pregnancy have been associated with adverse maternal outcomes, including preeclampsia and postpartum stroke. However, due to the lack of a validated instrument for uniform case ascertainment on postpartum migraine headache, there is uncertainty in the reported prevalence in the literature. Objective: The aim of this study was to evaluate the completeness and accuracy of reporting postpartum migraine headache coding in a large integrated health care system's electronic health records (EHRs) and to compare the coding quality before and after the implementation of the International Classification of Diseases, 10th revision, Clinical Modification (ICD-10-CM) codes and pharmacy records in EHRs. Methods: Medical records of 200 deliveries in all 15 Kaiser Permanente Southern California hospitals during 2 time periods, that is, January 1, 2012 through December 31, 2014 (International Classification of Diseases, 9th revision, Clinical Modification [ICD-9-CM] coding period) and January 1, 2017 through December 31, 2019 (ICD-10-CM coding period), were randomly selected from EHRs for chart review. Two trained research associates reviewed the EHRs for all 200 women for postpartum migraine headache cases documented within 1 year after delivery. Women were considered to have postpartum migraine headache if either a mention of migraine headache (yes for diagnosis) or a prescription for treatment of migraine headache (yes for pharmacy records) was noted in the electronic chart. Results from the chart abstraction served as the gold standard and were compared with corresponding diagnosis and pharmacy prescription utilization records for both ICD-9-CM and ICD-10-CM coding periods through comparisons of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), as well as the summary statistics of F-score and Youden J statistic (J). The kappa statistic ($\kappa$) for interrater reliability was calculated. Results: The overall agreement between the identification of migraine headache using diagnosis codes and pharmacy records compared to the medical record review was strong. Diagnosis coding (F-score=87.8\%; J=82.5\%) did better than pharmacy records (F-score=72.7\%; J=57.5\%) when identifying cases, but combining both of these sources of data produced much greater accuracy in the identification of postpartum migraine cases (F-score=96.9\%; J=99.7\%) with sensitivity, specificity, PPV, and NPV of 100\%, 99.7\%, 93.9\%, and 100\%, respectively. Results were similar across the ICD-9-CM (F-score=98.7\%, J=99.9\%) and ICD-10-CM coding periods (F-score=94.9\%; J=99.6\%). The interrater reliability between the 2 research associates for postpartum migraine headache was 100\%. Conclusions: Neither diagnostic codes nor pharmacy records alone are sufficient for identifying postpartum migraine cases reliably, but when used together, they are quite reliable. The completeness of the data remained similar after the implementation of the ICD-10-CM coding in the EHR system. ", doi="10.2196/42955", url="https://formative.jmir.org/2022/11/e42955", url="http://www.ncbi.nlm.nih.gov/pubmed/36394937" } @Article{info:doi/10.2196/39536, author="Mir{\'o} Catalina, Queralt and Fuster-Casanovas, A{\"i}na and Sol{\'e}-Casals, Jordi and Vidal-Alaball, Josep", title="Developing an Artificial Intelligence Model for Reading Chest X-rays: Protocol for a Prospective Validation Study", journal="JMIR Res Protoc", year="2022", month="Nov", day="16", volume="11", number="11", pages="e39536", keywords="artificial intelligence", keywords="machine learning", keywords="chest x-ray", keywords="radiology", keywords="validation", abstract="Background: Chest x-rays are the most commonly used type of x-rays today, accounting for up to 26\% of all radiographic tests performed. However, chest radiography is a complex imaging modality to interpret. Several studies have reported discrepancies in chest x-ray interpretations among emergency physicians and radiologists. It is of vital importance to be able to offer a fast and reliable diagnosis for this kind of x-ray, using artificial intelligence (AI) to support the clinician. Oxipit has developed an AI algorithm for reading chest x-rays, available through a web platform called ChestEye. This platform is an automatic computer-aided diagnosis system where a reading of the inserted chest x-ray is performed, and an automatic report is returned with a capacity to detect 75 pathologies, covering 90\% of diagnoses. Objective: The overall objective of the study is to perform validation with prospective data of the ChestEye algorithm as a diagnostic aid. We wish to validate the algorithm for a single pathology and multiple pathologies by evaluating the accuracy, sensitivity, and specificity of the algorithm. Methods: A prospective validation study will be carried out to compare the diagnosis of the reference radiologists for the users attending the primary care center in the Osona region (Spain), with the diagnosis of the ChestEye AI algorithm. Anonymized chest x-ray images will be acquired and fed into the AI algorithm interface, which will return an automatic report. A radiologist will evaluate the same chest x-ray, and both assessments will be compared to calculate the precision, sensitivity, specificity, and accuracy of the AI algorithm. Results will be represented globally and individually for each pathology using a confusion matrix and the One-vs-All methodology. Results: Patient recruitment was conducted from February 7, 2022, and it is expected that data can be obtained in 5 to 6 months. In June 2022, more than 450 x-rays have been collected, so it is expected that 600 samples will be gathered in July 2022. We hope to obtain sufficient evidence to demonstrate that the use of AI in the reading of chest x-rays can be a good tool for diagnostic support. However, there is a decreasing number of radiology professionals and, therefore, it is necessary to develop and validate tools to support professionals who have to interpret these tests. Conclusions: If the results of the validation of the model are satisfactory, it could be implemented as a support tool and allow an increase in the accuracy and speed of diagnosis, patient safety, and agility in the primary care system, while reducing the cost of unnecessary tests. International Registered Report Identifier (IRRID): PRR1-10.2196/39536 ", doi="10.2196/39536", url="https://www.researchprotocols.org/2022/11/e39536", url="http://www.ncbi.nlm.nih.gov/pubmed/36383419" } @Article{info:doi/10.2196/37976, author="Ouchi, Dan and Giner-Soriano, Maria and G{\'o}mez-Lumbreras, Ainhoa and Vedia Urgell, Cristina and Torres, Ferran and Morros, Rosa", title="Automatic Estimation of the Most Likely Drug Combination in Electronic Health Records Using the Smooth Algorithm: Development and Validation Study", journal="JMIR Med Inform", year="2022", month="Nov", day="15", volume="10", number="11", pages="e37976", keywords="electronic health records", keywords="data mining", keywords="complex drug patterns", keywords="algorithms", keywords="drug utilization", keywords="polypharmacy", keywords="EHR", keywords="medication", keywords="drug combination", keywords="therapy", keywords="automation", keywords="drug exposition", keywords="treatment", keywords="adherence", abstract="Background: Since the use of electronic health records (EHRs) in an automated way, pharmacovigilance or pharmacoepidemiology studies have been used to characterize the therapy using different algorithms. Although progress has been made in this area for monotherapy, with combinations of 2 or more drugs the challenge to characterize the treatment increases significantly, and more research is needed. Objective: The goal of the research was to develop and describe a novel algorithm that automatically returns the most likely therapy of one drug or combinations of 2 or more drugs over time. Methods: We used the Information System for Research in Primary Care as our reference EHR platform for the smooth algorithm development. The algorithm was inspired by statistical methods based on moving averages and depends on a parameter Wt, a flexible window that determines the level of smoothing. The effect of Wt was evaluated in a simulation study on the same data set with different window lengths. To understand the algorithm performance in a clinical or pharmacological perspective, we conducted a validation study. We designed 4 pharmacological scenarios and asked 4 independent professionals to compare a traditional method against the smooth algorithm. Data from the simulation and validation studies were then analyzed. Results: The Wt parameter had an impact over the raw data. As we increased the window length, more patient were modified and the number of smoothed patients augmented, although we rarely observed changes of more than 5\% of the total data. In the validation study, significant differences were obtained in the performance of the smooth algorithm over the traditional method. These differences were consistent across pharmacological scenarios. Conclusions: The smooth algorithm is an automated approach that standardizes, simplifies, and improves data processing in drug exposition studies using EHRs. This algorithm can be generalized to almost any pharmacological medication and model the drug exposure to facilitate the detection of treatment switches, discontinuations, and terminations throughout the study period. ", doi="10.2196/37976", url="https://medinform.jmir.org/2022/11/e37976", url="http://www.ncbi.nlm.nih.gov/pubmed/36378514" } @Article{info:doi/10.2196/40456, author="Gaspar, Frederic and Lutters, Monika and Beeler, Emanuel Patrick and Lang, Olivier Pierre and Burnand, Bernard and Rinaldi, Fabio and Lovis, Christian and Csajka, Chantal and Le Pogam, Marie-Annick and ", title="Automatic Detection of Adverse Drug Events in Geriatric Care: Study Proposal", journal="JMIR Res Protoc", year="2022", month="Nov", day="15", volume="11", number="11", pages="e40456", keywords="adverse drug events", keywords="adverse drug reactions", keywords="older inpatients", keywords="aged 65 and older", keywords="multimorbidity", keywords="polypharmacy", keywords="patient safety", keywords="inappropriate prescribing", keywords="medication errors", keywords="natural language processing", keywords="clinical decision support system", keywords="automated adverse drug event reporting system", keywords="electronic medical record", keywords="hospitals", keywords="multicenter study", keywords="interdisciplinary research", keywords="quality of hospital care", keywords="machine learning", keywords="antithrombotics", keywords="venous thromboembolism", keywords="hemorrhage", abstract="Background: One-third of older inpatients experience adverse drug events (ADEs), which increase their mortality, morbidity, and health care use and costs. In particular, antithrombotic drugs are among the most at-risk medications for this population. Reporting systems have been implemented at the national, regional, and provider levels to monitor ADEs and design prevention strategies. Owing to their well-known limitations, automated detection technologies based on electronic medical records (EMRs) are being developed to routinely detect or predict ADEs. Objective: This study aims to develop and validate an automated detection tool for monitoring antithrombotic-related ADEs using EMRs from 4 large Swiss hospitals. We aim to assess cumulative incidences of hemorrhages and thromboses in older inpatients associated with the prescription of antithrombotic drugs, identify triggering factors, and propose improvements for clinical practice. Methods: This project is a multicenter, cross-sectional study based on 2015 to 2016 EMR data from 4 large hospitals in Switzerland: Lausanne, Geneva, and Z{\"u}rich university hospitals, and Baden Cantonal Hospital. We have included inpatients aged ?65 years who stayed at 1 of the 4 hospitals during 2015 or 2016, received at least one antithrombotic drug during their stay, and signed or were not opposed to a general consent for participation in research. First, clinical experts selected a list of relevant antithrombotic drugs along with their side effects, risks, and confounding factors. Second, administrative, clinical, prescription, and laboratory data available in the form of free text and structured data were extracted from study participants' EMRs. Third, several automated rule-based and machine learning--based algorithms are being developed, allowing for the identification of hemorrhage and thromboembolic events and their triggering factors from the extracted information. Finally, we plan to validate the developed detection tools (one per ADE type) through manual medical record review. Performance metrics for assessing internal validity will comprise the area under the receiver operating characteristic curve, F1-score, sensitivity, specificity, and positive and negative predictive values. Results: After accounting for the inclusion and exclusion criteria, we will include 34,522 residents aged ?65 years. The data will be analyzed in 2022, and the research project will run until the end of 2022 to mid-2023. Conclusions: This project will allow for the introduction of measures to improve safety in prescribing antithrombotic drugs, which today remain among the drugs most involved in ADEs. The findings will be implemented in clinical practice using indicators of adverse events for risk management and training for health care professionals; the tools and methodologies developed will be disseminated for new research in this field. The increased performance of natural language processing as an important complement to structured data will bring existing tools to another level of efficiency in the detection of ADEs. Currently, such systems are unavailable in Switzerland. International Registered Report Identifier (IRRID): DERR1-10.2196/40456 ", doi="10.2196/40456", url="https://www.researchprotocols.org/2022/11/e40456", url="http://www.ncbi.nlm.nih.gov/pubmed/36378522" } @Article{info:doi/10.2196/38088, author="Ankersmid, Wies Jet and Siesling, Sabine and Strobbe, A. Luc J. and Meulepas, M. Johanna and van Riet, A. Yvonne E. and Engels, Noel and Prick, M. Janine C. and The, Regina and Takahashi, Asako and Velting, Mirjam and van Uden-Kraan, F. Cornelia and Drossaert, C. Constance H.", title="Supporting Shared Decision-making About Surveillance After Breast Cancer With Personalized Recurrence Risk Calculations: Development of a Patient Decision Aid Using the International Patient Decision AIDS Standards Development Process in Combination With a Mixed Methods Design", journal="JMIR Cancer", year="2022", month="Nov", day="14", volume="8", number="4", pages="e38088", keywords="patient decision aid", keywords="PtDA", keywords="breast cancer", keywords="surveillance", keywords="risk information", keywords="shared decision-making", keywords="SDM", abstract="Background: Although the treatment for breast cancer is highly personalized, posttreatment surveillance remains one-size-fits-all: annual imaging and physical examination for at least five years after treatment. The INFLUENCE nomogram is a prognostic model for estimating the 5-year risk for locoregional recurrences and second primary tumors after breast cancer. The use of personalized outcome data (such as risks for recurrences) can enrich the process of shared decision-making (SDM) for personalized surveillance after breast cancer. Objective: This study aimed to develop a patient decision aid (PtDA), integrating personalized risk calculations on risks for recurrences, to support SDM for personalized surveillance after curative treatment for invasive breast cancer. Methods: For the development of the PtDA, the International Patient Decision Aids Standards development process was combined with a mixed methods design inspired by the development process of previously developed PtDAs. In the development, 8 steps were distinguished: establishing a multidisciplinary steering group; definition of the end users, scope, and purpose of the PtDA; assessment of the decisional needs of end users; defining requirements for the PtDA; determining the format and implementation strategy for the PtDA; prototyping; alpha testing; and beta testing. The composed steering group convened during regular working-group sessions throughout the development process. Results: The ``Breast Cancer Surveillance Decision Aid'' consists of 3 components that support the SDM process: a handout sheet on which personalized risks for recurrences, calculated using the INFLUENCE-nomogram, can be visualized and which contains an explanation about the decision for surveillance and a login code for a web-based deliberation tool; a web-based deliberation tool, including a patient-reported outcome measure on fear of cancer recurrence; and a summary sheet summarizing patient preferences and considerations. The PtDA was assessed as usable and acceptable during alpha testing. Beta testing is currently ongoing. Conclusions: We developed an acceptable and usable PtDA that integrates personalized risk calculations for the risk for recurrences to support SDM for surveillance after breast cancer. The implementation and effects of the use of the ``Breast Cancer Surveillance Decision Aid'' are being investigated in a clinical trial. ", doi="10.2196/38088", url="https://cancer.jmir.org/2022/4/e38088", url="http://www.ncbi.nlm.nih.gov/pubmed/36374536" } @Article{info:doi/10.2196/40124, author="von Wedel, Philip and Hagist, Christian and Liebe, Jan-David and Esdar, Moritz and H{\"u}bner, Ursula and Pross, Christoph", title="Effects of Hospital Digitization on Clinical Outcomes and Patient Satisfaction: Nationwide Multiple Regression Analysis Across German Hospitals", journal="J Med Internet Res", year="2022", month="Nov", day="10", volume="24", number="11", pages="e40124", keywords="health care information technology", keywords="electronic health records", keywords="hospital digitization", keywords="quality of care", keywords="clinical outcomes", keywords="patient satisfaction", keywords="user-perceived value", abstract="Background: The adoption of health information technology (HIT) by health care providers is commonly believed to improve the quality of care. Policy makers in the United States and Germany follow this logic and deploy nationwide HIT adoption programs to fund hospital investments in digital technologies. However, scientific evidence for the beneficial effects of HIT on care quality at a national level remains mostly US based, is focused on electronic health records (EHRs), and rarely accounts for the quality of digitization from a hospital user perspective. Objective: This study aimed to examine the effects of digitization on clinical outcomes and patient experience in German hospitals. Hence, this study adds to the small stream of literature in this field outside the United States. It goes beyond assessing the effects of mere HIT adoption and also considers user-perceived HIT value. In addition, the impact of a variety of technologies beyond EHRs was examined. Methods: Multiple linear regression models were estimated using emergency care outcomes, elective care outcomes, and patient satisfaction as dependent variables. The adoption and user-perceived value of HIT represented key independent variables, and case volume, hospital size, ownership status, and teaching status were included as controls. Care outcomes were captured via risk-adjusted, observed-to-expected outcome ratios for patients who had stroke, myocardial infarction, or hip replacement. The German Patient Experience Questionnaire of Weisse Liste provided information on patient satisfaction. Information on the adoption and user-perceived value of 10 subdomains of HIT and EHRs was derived from the German 2020 Healthcare IT Report. Results: Statistical analysis was based on an overall sample of 383 German hospitals. The analyzed data set suggested no significant effect of HIT or EHR adoption on clinical outcomes or patient satisfaction. However, a higher user-perceived value or quality of the installed tools did improve outcomes. Emergency care outcomes benefited from user-friendly overall digitization ($\beta$=?.032; P=.04), which was especially driven by the user-friendliness of admission HIT ($\beta$=?.023; P=.07). Elective care outcomes were positively impacted by user-friendly EHR installations ($\beta$=?.138; P=.008). Similarly, the results suggested user-friendly, overall digitization to have a moderate positive effect on patient satisfaction ($\beta$=?.009; P=.01). Conclusions: The results of this study suggest that hospital digitization is not an end in itself. Policy makers and hospitals are well advised to not only focus on the mere adoption of digital technologies but also continuously work toward digitization that is perceived as valuable by physicians and nurses who rely on it every day. Furthermore, hospital digitization strategies should consider that the assumed benefits of single technologies are not realized across all care domains. ", doi="10.2196/40124", url="https://www.jmir.org/2022/11/e40124", url="http://www.ncbi.nlm.nih.gov/pubmed/36355423" } @Article{info:doi/10.2196/40338, author="Jung, Sungwon and Bae, Sungchul and Seong, Donghyeong and Oh, Hee Ock and Kim, Yoomi and Yi, Byoung-Kee", title="Shared Interoperable Clinical Decision Support Service for Drug-Allergy Interaction Checks: Implementation Study", journal="JMIR Med Inform", year="2022", month="Nov", day="10", volume="10", number="11", pages="e40338", keywords="clinical decision support", keywords="drug-allergy interaction", keywords="Health Level 7", keywords="Fast Healthcare Interoperability Resources", keywords="interoperability", keywords="CDS Hooks", abstract="Background: Clinical decision support (CDS) can improve health care with respect to the quality of care, patient safety, efficiency, and effectiveness. Establishing a CDS system in a health care setting remains a challenge. A few hospitals have used self-developed in-house CDS systems or commercial CDS solutions. Since these in-house CDS systems tend to be tightly coupled with a specific electronic health record system, the functionality and knowledge base are not easily shareable. A shared interoperable CDS system facilitates the sharing of the knowledge base and extension of CDS services. Objective: The study focuses on developing and deploying the national CDS service for the drug-allergy interaction (DAI) check for health care providers in Korea that need to introduce the service but lack the budget and expertise. Methods: To provide the shared interoperable CDS service, we designed and implemented the system based on the CDS Hooks specification and Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) standard. The study describes the CDS development process. The system development went through requirement analysis, design, implementation, and deployment. In particular, the concept architecture was designed based on the CDS Hooks structure. The MedicationRequest and AllergyIntolerance resources were profiled to exchange data using the FHIR standard. The discovery and DAI check application programming interfaces and rule engine were developed. Results: The CDS service was deployed on G-Cloud, a government cloud service. In March 2021, the CDS service was launched, and 67 health care providers participated in the CDS service. The health care providers participated in the service with 1,008,357 DAI checks for 114,694 patients, of which 33,054 (3.32\%) cases resulted in a ``warning.'' Conclusions: Korea's Ministry of Health and Welfare has been trying to build an HL7 FHIR-based ecosystem in Korea. As one of these efforts, the CDS service initiative has been conducted. To promote the rapid adoption of the HL7 FHIR standard, it is necessary to accelerate practical service development and to appeal to policy makers regarding the benefits of FHIR standardization. With the development of various case-specific implementation guides using the Korea Core implementation guide, the FHIR standards will be distributed nationwide, and more shared interoperable health care services will be introduced in Korea. ", doi="10.2196/40338", url="https://medinform.jmir.org/2022/11/e40338", url="http://www.ncbi.nlm.nih.gov/pubmed/36355401" } @Article{info:doi/10.2196/39670, author="Tennant, Ryan and Tetui, Moses and Grindrod, Kelly and Burns, M. Catherine", title="Understanding Human Factors Challenges on the Front Lines of Mass COVID-19 Vaccination Clinics: Human Systems Modeling Study", journal="JMIR Hum Factors", year="2022", month="Nov", day="10", volume="9", number="4", pages="e39670", keywords="cognitive work analysis", keywords="contextual design", keywords="COVID-19", keywords="decision making", keywords="health care system", keywords="pandemic", keywords="vaccination clinics", keywords="workplace stress", abstract="Background: Implementing mass vaccination clinics for COVID-19 immunization has been a successful public health activity worldwide. However, this tightly coupled system has many logistical challenges, leading to increased workplace stress, as evidenced throughout the pandemic. The complexities of mass vaccination clinics that combine multidisciplinary teams working within nonclinical environments are yet to be understood through a human systems perspective. Objective: This study aimed to holistically model mass COVID-19 vaccination clinics in the Region of Waterloo, Ontario, Canada, to understand the challenges centered around frontline workers and to inform clinic design and technological recommendations that can minimize the systemic inefficiencies that contribute to workplace stress. Methods: An ethnographic approach was guided by contextual inquiry to gather data on work as done in these ad-hoc immunization settings. Observation data were clarified by speaking with clinic staff, and the research team discussed the observation data regularly throughout the data collection period. Data were analyzed by combining aspects of the contextual design framework and cognitive work analysis, and building workplace models that can identify the stress points and interconnections within mass vaccination clinic flow, developed artifacts, culture, physical layouts, and decision-making. Results: Observations were conducted at 6 mass COVID-19 vaccination clinics over 4 weeks in 2021. The workflow model depicted challenges with maintaining situational awareness about client intake and vaccine preparation among decision-makers. The artifacts model visualized how separately developed tools for the vaccine lead and clinic lead may support cognitive tasks through data synthesis. However, their effectiveness depends on sharing accurate and timely data. The cultural model indicated that perspectives on how to effectively achieve mass immunization might impact workplace stress with changes to responsibilities. This depends on the aggressive or relaxed approach toward minimizing vaccine waste while adapting to changing policies, regulations, and vaccine scarcity. The physical model suggested that the co-location of workstations may influence decision-making coordination. Finally, the decision ladder described the decision-making steps for managing end-of-day doses, highlighting challenges with data uncertainty and ways to support expertise. Conclusions: Modeling mass COVID-19 vaccination clinics from a human systems perspective identified 2 high-level opportunities for improving the inefficiencies within this health care delivery system. First, clinics may become more resilient to unexpected changes in client intake or vaccine preparation using strategies and artifacts that standardize data gathering and synthesis, thereby reducing uncertainties for end-of-day dose decision-making. Second, improving data sharing among staff by co-locating their workstations and implementing collaborative artifacts that support a collective understanding of the state of the clinic may reduce system complexity by improving shared situational awareness. Future research should examine how the developed models apply to immunization settings beyond the Region of Waterloo and evaluate the impact of the recommendations on workflow coordination, stress, and decision-making. ", doi="10.2196/39670", url="https://humanfactors.jmir.org/2022/4/e39670", url="http://www.ncbi.nlm.nih.gov/pubmed/36219839" } @Article{info:doi/10.2196/38053, author="Zhang, Xiangzhou and Xue, Yunfei and Su, Xinyu and Chen, Shaoyong and Liu, Kang and Chen, Weiqi and Liu, Mei and Hu, Yong", title="A Transfer Learning Approach to Correct the Temporal Performance Drift of Clinical Prediction Models: Retrospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Nov", day="9", volume="10", number="11", pages="e38053", keywords="transfer learning", keywords="clinical prediction model", keywords="performance drift", keywords="concept drift", keywords="acute kidney injury", abstract="Background: Clinical prediction models suffer from performance drift as the patient population shifts over time. There is a great need for model updating approaches or modeling frameworks that can effectively use the old and new data. Objective: Based on the paradigm of transfer learning, we aimed to develop a novel modeling framework that transfers old knowledge to the new environment for prediction tasks, and contributes to performance drift correction. Methods: The proposed predictive modeling framework maintains a logistic regression--based stacking ensemble of 2 gradient boosting machine (GBM) models representing old and new knowledge learned from old and new data, respectively (referred to as transfer learning gradient boosting machine [TransferGBM]). The ensemble learning procedure can dynamically balance the old and new knowledge. Using 2010-2017 electronic health record data on a retrospective cohort of 141,696 patients, we validated TransferGBM for hospital-acquired acute kidney injury prediction. Results: The baseline models (ie, transported models) that were trained on 2010 and 2011 data showed significant performance drift in the temporal validation with 2012-2017 data. Refitting these models using updated samples resulted in performance gains in nearly all cases. The proposed TransferGBM model succeeded in achieving uniformly better performance than the refitted models. Conclusions: Under the scenario of population shift, incorporating new knowledge while preserving old knowledge is essential for maintaining stable performance. Transfer learning combined with stacking ensemble learning can help achieve a balance of old and new knowledge in a flexible and adaptive way, even in the case of insufficient new data. ", doi="10.2196/38053", url="https://medinform.jmir.org/2022/11/e38053", url="http://www.ncbi.nlm.nih.gov/pubmed/36350705" } @Article{info:doi/10.2196/36933, author="Logaras, Evangelos and Billis, Antonis and Kyparissidis Kokkinidis, Ilias and Ketseridou, Nafsika Smaranda and Fourlis, Alexios and Tzotzis, Aristotelis and Imprialos, Konstantinos and Doumas, Michael and Bamidis, Panagiotis", title="Risk Assessment of COVID-19 Cases in Emergency Departments and Clinics With the Use of Real-World Data and Artificial Intelligence: Observational Study", journal="JMIR Form Res", year="2022", month="Nov", day="8", volume="6", number="11", pages="e36933", keywords="COVID-19 pandemic", keywords="risk assessment", keywords="wearable device", keywords="respiration evaluation", keywords="emergency department", keywords="artificial intelligence", keywords="real-world data", abstract="Background: The recent COVID-19 pandemic has highlighted the weaknesses of health care systems around the world. In the effort to improve the monitoring of cases admitted to emergency departments, it has become increasingly necessary to adopt new innovative technological solutions in clinical practice. Currently, the continuous monitoring of vital signs is only performed in patients admitted to the intensive care unit. Objective: The study aimed to develop a smart system that will dynamically prioritize patients through the continuous monitoring of vital signs using a wearable biosensor device and recording of meaningful clinical records and estimate the likelihood of deterioration of each case using artificial intelligence models. Methods: The data for the study were collected from the emergency department and COVID-19 inpatient unit of the Hippokration General Hospital of Thessaloniki. The study was carried out in the framework of the COVID-X H2020 project, which was funded by the European Union. For the training of the neural network, data collection was performed from COVID-19 cases hospitalized in the respective unit. A wearable biosensor device was placed on the wrist of each patient, which recorded the primary characteristics of the visual signal related to breathing assessment. Results: A total of 157 adult patients diagnosed with COVID-19 were recruited. Lasso penalty function was used for selecting 18 out of 48 predictors and 2 random forest--based models were implemented for comparison. The high overall performance was maintained, if not improved, by feature selection, with random forest achieving accuracies of 80.9\% and 82.1\% when trained using all predictors and a subset of them, respectively. Preliminary results, although affected by pandemic limitations and restrictions, were promising regarding breathing pattern recognition. Conclusions: This study represents a novel approach that involves the use of machine learning methods and Edge artificial intelligence to assist the prioritization and continuous monitoring procedures of patients with COVID-19 in health departments. Although initial results appear to be promising, further studies are required to examine its actual effectiveness. ", doi="10.2196/36933", url="https://formative.jmir.org/2022/11/e36933", url="http://www.ncbi.nlm.nih.gov/pubmed/36197836" } @Article{info:doi/10.2196/35709, author="Dong, Shengjie and Shi, Chenshu and Zeng, Wu and Jia, Zhiying and Dong, Minye and Xiao, Yuyin and Li, Guohong", title="The Application of Graph Theoretical Analysis to Complex Networks in Medical Malpractice in China: Qualitative Study", journal="JMIR Med Inform", year="2022", month="Nov", day="3", volume="10", number="11", pages="e35709", keywords="medical malpractice", keywords="complex network", keywords="scale-free network", keywords="hub nodes", keywords="patient safety management", keywords="health systems", abstract="Background: Studies have shown that hospitals or physicians with multiple malpractice claims are more likely to be involved in new claims. This finding indicates that medical malpractice may be clustered by institutions. Objective: We aimed to identify the underlying mechanisms of medical malpractice that, in the long term, may contribute to developing interventions to reduce future claims and patient harm. Methods: This study extracted the semantic network in 6610 medical litigation records (unstructured data) obtained from a public judicial database in China. They represented the most serious cases of malpractice in the country. The medical malpractice network of China was presented as a knowledge graph based on the complex network theory; it uses the International Classification of Patient Safety from the World Health Organization as a reference. Results: We found that the medical malpractice network of China was a scale-free network---the occurrence of medical malpractice in litigation cases was not random, but traceable. The results of the hub nodes revealed that orthopedics, obstetrics and gynecology, and the emergency department were the 3 most frequent specialties that incurred malpractice; inadequate informed consent work constituted the most errors. Nontechnical errors (eg, inadequate informed consent) showed a higher centrality than technical errors. Conclusions: Hospitals and medical boards could apply our approach to detect hub nodes that are likely to benefit from interventions; doing so could effectively control medical risks. ", doi="10.2196/35709", url="https://medinform.jmir.org/2022/11/e35709", url="http://www.ncbi.nlm.nih.gov/pubmed/36326815" } @Article{info:doi/10.2196/39748, author="Istasy, Paul and Lee, Shen Wen and Iansavichene, Alla and Upshur, Ross and Gyawali, Bishal and Burkell, Jacquelyn and Sadikovic, Bekim and Lazo-Langner, Alejandro and Chin-Yee, Benjamin", title="The Impact of Artificial Intelligence on Health Equity in Oncology: Scoping Review", journal="J Med Internet Res", year="2022", month="Nov", day="1", volume="24", number="11", pages="e39748", keywords="artificial intelligence", keywords="eHealth", keywords="digital health", keywords="machine learning", keywords="oncology", keywords="cancer", keywords="health equity", keywords="health disparity", keywords="bias", keywords="global health", keywords="public health", keywords="cancer epidemiology", keywords="epidemiology", keywords="scoping", keywords="review", keywords="mobile phone", abstract="Background: The field of oncology is at the forefront of advances in artificial intelligence (AI) in health care, providing an opportunity to examine the early integration of these technologies in clinical research and patient care. Hope that AI will revolutionize health care delivery and improve clinical outcomes has been accompanied by concerns about the impact of these technologies on health equity. Objective: We aimed to conduct a scoping review of the literature to address the question, ``What are the current and potential impacts of AI technologies on health equity in oncology?'' Methods: Following PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines for scoping reviews, we systematically searched MEDLINE and Embase electronic databases from January 2000 to August 2021 for records engaging with key concepts of AI, health equity, and oncology. We included all English-language articles that engaged with the 3 key concepts. Articles were analyzed qualitatively for themes pertaining to the influence of AI on health equity in oncology. Results: Of the 14,011 records, 133 (0.95\%) identified from our review were included. We identified 3 general themes in the literature: the use of AI to reduce health care disparities (58/133, 43.6\%), concerns surrounding AI technologies and bias (16/133, 12.1\%), and the use of AI to examine biological and social determinants of health (55/133, 41.4\%). A total of 3\% (4/133) of articles focused on many of these themes. Conclusions: Our scoping review revealed 3 main themes on the impact of AI on health equity in oncology, which relate to AI's ability to help address health disparities, its potential to mitigate or exacerbate bias, and its capability to help elucidate determinants of health. Gaps in the literature included a lack of discussion of ethical challenges with the application of AI technologies in low- and middle-income countries, lack of discussion of problems of bias in AI algorithms, and a lack of justification for the use of AI technologies over traditional statistical methods to address specific research questions in oncology. Our review highlights a need to address these gaps to ensure a more equitable integration of AI in cancer research and clinical practice. The limitations of our study include its exploratory nature, its focus on oncology as opposed to all health care sectors, and its analysis of solely English-language articles. ", doi="10.2196/39748", url="https://www.jmir.org/2022/11/e39748", url="http://www.ncbi.nlm.nih.gov/pubmed/36005841" } @Article{info:doi/10.2196/38640, author="Kim, Changgyun and Jeong, Hogul and Park, Wonse and Kim, Donghyun", title="Tooth-Related Disease Detection System Based on Panoramic Images and Optimization Through Automation: Development Study", journal="JMIR Med Inform", year="2022", month="Oct", day="31", volume="10", number="10", pages="e38640", keywords="object detection", keywords="tooth", keywords="diagnosis", keywords="panorama", keywords="dentistry", keywords="dental health", keywords="oral health", keywords="dental caries", keywords="image analysis", keywords="artificial intelligence", keywords="detection model", keywords="machine learning", keywords="automation", keywords="diagnosis system", abstract="Background: Early detection of tooth-related diseases in patients plays a key role in maintaining their dental health and preventing future complications. Since dentists are not overly attentive to tooth-related diseases that may be difficult to judge visually, many patients miss timely treatment. The 5 representative tooth-related diseases, that is, coronal caries or defect, proximal caries, cervical caries or abrasion, periapical radiolucency, and residual root can be detected on panoramic images. In this study, a web service was constructed for the detection of these diseases on panoramic images in real time, which helped shorten the treatment planning time and reduce the probability of misdiagnosis. Objective: This study designed a model to assess tooth-related diseases in panoramic images by using artificial intelligence in real time. This model can perform an auxiliary role in the diagnosis of tooth-related diseases by dentists and reduce the treatment planning time spent through telemedicine. Methods: For learning the 5 tooth-related diseases, 10,000 panoramic images were modeled: 4206 coronal caries or defects, 4478 proximal caries, 6920 cervical caries or abrasion, 8290 periapical radiolucencies, and 1446 residual roots. To learn the model, the fast region-based convolutional network (Fast R-CNN), residual neural network (ResNet), and inception models were used. Learning about the 5 tooth-related diseases completely did not provide accurate information on the diseases because of indistinct features present in the panoramic pictures. Therefore, 1 detection model was applied to each tooth-related disease, and the models for each of the diseases were integrated to increase accuracy. Results: The Fast R-CNN model showed the highest accuracy, with an accuracy of over 90\%, in diagnosing the 5 tooth-related diseases. Thus, Fast R-CNN was selected as the final judgment model as it facilitated the real-time diagnosis of dental diseases that are difficult to judge visually from radiographs and images, thereby assisting the dentists in their treatment plans. Conclusions: The Fast R-CNN model showed the highest accuracy in the real-time diagnosis of dental diseases and can therefore play an auxiliary role in shortening the treatment planning time after the dentists diagnose the tooth-related disease. In addition, by updating the captured panoramic images of patients on the web service developed in this study, we are looking forward to increasing the accuracy of diagnosing these 5 tooth-related diseases. The dental diagnosis system in this study takes 2 minutes for diagnosing 5 diseases in 1 panoramic image. Therefore, this system plays an effective role in setting a dental treatment schedule. ", doi="10.2196/38640", url="https://medinform.jmir.org/2022/10/e38640", url="http://www.ncbi.nlm.nih.gov/pubmed/36315222" } @Article{info:doi/10.2196/38411, author="Choudhury, Avishek and Asan, Onur and Medow, E. Joshua", title="Clinicians' Perceptions of an Artificial Intelligence--Based Blood Utilization Calculator: Qualitative Exploratory Study", journal="JMIR Hum Factors", year="2022", month="Oct", day="31", volume="9", number="4", pages="e38411", keywords="artificial intelligence", keywords="human factors", keywords="decision-making", keywords="blood transfusion", keywords="technology acceptance", keywords="complications", keywords="prevention", keywords="decision support", keywords="transfusion overload", keywords="risk", keywords="support", keywords="perception", keywords="safety", keywords="usability", abstract="Background: According to the US Food and Drug Administration Center for Biologics Evaluation and Research, health care systems have been experiencing blood transfusion overuse. To minimize the overuse of blood product transfusions, a proprietary artificial intelligence (AI)--based blood utilization calculator (BUC) was developed and integrated into a US hospital's electronic health record. Despite the promising performance of the BUC, this technology remains underused in the clinical setting. Objective: This study aims to explore how clinicians perceived this AI-based decision support system and, consequently, understand the factors hindering BUC use. Methods: We interviewed 10 clinicians (BUC users) until the data saturation point was reached. The interviews were conducted over a web-based platform and were recorded. The audiovisual recordings were then anonymously transcribed verbatim. We used an inductive-deductive thematic analysis to analyze the transcripts, which involved applying predetermined themes to the data (deductive) and consecutively identifying new themes as they emerged in the data (inductive). Results: We identified the following two themes: (1) workload and usability and (2) clinical decision-making. Clinicians acknowledged the ease of use and usefulness of the BUC for the general inpatient population. The clinicians also found the BUC to be useful in making decisions related to blood transfusion. However, some clinicians found the technology to be confusing due to inconsistent automation across different blood work processes. Conclusions: This study highlights that analytical efficacy alone does not ensure technology use or acceptance. The overall system's design, user perception, and users' knowledge of the technology are equally important and necessary (limitations, functionality, purpose, and scope). Therefore, the effective integration of AI-based decision support systems, such as the BUC, mandates multidisciplinary engagement, ensuring the adequate initial and recurrent training of AI users while maintaining high analytical efficacy and validity. As a final takeaway, the design of AI systems that are made to perform specific tasks must be self-explanatory, so that the users can easily understand how and when to use the technology. Using any technology on a population for whom it was not initially designed will hinder user perception and the technology's use. ", doi="10.2196/38411", url="https://humanfactors.jmir.org/2022/4/e38411", url="http://www.ncbi.nlm.nih.gov/pubmed/36315238" } @Article{info:doi/10.2196/29404, author="Ye, Chao and Hu, Wenxing and Gaeta, Bruno", title="Prediction of Antibody-Antigen Binding via Machine Learning: Development of Data Sets and Evaluation of Methods", journal="JMIR Bioinform Biotech", year="2022", month="Oct", day="28", volume="3", number="1", pages="e29404", keywords="DNA sequencing", keywords="DNA", keywords="DNA sequence", keywords="sequence data", keywords="molecular biology", keywords="genomic", keywords="random forest", keywords="nearest neighbor", keywords="immunoglobulin", keywords="genetics", keywords="antibody-antigen binding", keywords="antigen", keywords="antibody", keywords="structural biology", keywords="machine learning", keywords="protein modeling", keywords="protein", keywords="proteomic", abstract="Background: The mammalian immune system is able to generate antibodies against a huge variety of antigens, including bacteria, viruses, and toxins. The ultradeep DNA sequencing of rearranged immunoglobulin genes has considerable potential in furthering our understanding of the immune response, but it is limited by the lack of a high-throughput, sequence-based method for predicting the antigen(s) that a given immunoglobulin recognizes. Objective: As a step toward the prediction of antibody-antigen binding from sequence data alone, we aimed to compare a range of machine learning approaches that were applied to a collated data set of antibody-antigen pairs in order to predict antibody-antigen binding from sequence data. Methods: Data for training and testing were extracted from the Protein Data Bank and the Coronavirus Antibody Database, and additional antibody-antigen pair data were generated by using a molecular docking protocol. Several machine learning methods, including the weighted nearest neighbor method, the nearest neighbor method with the BLOSUM62 matrix, and the random forest method, were applied to the problem. Results: The final data set contained 1157 antibodies and 57 antigens that were combined in 5041 antibody-antigen pairs. The best performance for the prediction of interactions was obtained by using the nearest neighbor method with the BLOSUM62 matrix, which resulted in around 82\% accuracy on the full data set. These results provide a useful frame of reference, as well as protocols and considerations, for machine learning and data set creation in the prediction of antibody-antigen binding. Conclusions: Several machine learning approaches were compared to predict antibody-antigen interaction from protein sequences. Both the data set (in CSV format) and the machine learning program (coded in Python) are freely available for download on GitHub. ", doi="10.2196/29404", url="https://bioinform.jmir.org/2022/1/e29404" } @Article{info:doi/10.2196/39616, author="Park, H. Eunsoo and Watson, I. Hannah and Mehendale, V. Felicity and O'Neil, Q. Alison and ", title="Evaluating the Impact on Clinical Task Efficiency of a Natural Language Processing Algorithm for Searching Medical Documents: Prospective Crossover Study", journal="JMIR Med Inform", year="2022", month="Oct", day="26", volume="10", number="10", pages="e39616", keywords="clinical decision support", keywords="electronic health records", keywords="natural language processing", keywords="semantic search", keywords="clinical informatics", abstract="Background: Information retrieval (IR) from the free text within electronic health records (EHRs) is time consuming and complex. We hypothesize that natural language processing (NLP)--enhanced search functionality for EHRs can make clinical workflows more efficient and reduce cognitive load for clinicians. Objective: This study aimed to evaluate the efficacy of 3 levels of search functionality (no search, string search, and NLP-enhanced search) in supporting IR for clinical users from the free text of EHR documents in a simulated clinical environment. Methods: A clinical environment was simulated by uploading 3 sets of patient notes into an EHR research software application and presenting these alongside 3 corresponding IR tasks. Tasks contained a mixture of multiple-choice and free-text questions. A prospective crossover study design was used, for which 3 groups of evaluators were recruited, which comprised doctors (n=19) and medical students (n=16). Evaluators performed the 3 tasks using each of the search functionalities in an order in accordance with their randomly assigned group. The speed and accuracy of task completion were measured and analyzed, and user perceptions of NLP-enhanced search were reviewed in a feedback survey. Results: NLP-enhanced search facilitated more accurate task completion than both string search (5.14\%; P=.02) and no search (5.13\%; P=.08). NLP-enhanced search and string search facilitated similar task speeds, both showing an increase in speed compared to the no search function, by 11.5\% (P=.008) and 16.0\% (P=.007) respectively. Overall, 93\% of evaluators agreed that NLP-enhanced search would make clinical workflows more efficient than string search, with qualitative feedback reporting that NLP-enhanced search reduced cognitive load. Conclusions: To the best of our knowledge, this study is the largest evaluation to date of different search functionalities for supporting target clinical users in realistic clinical workflows, with a 3-way prospective crossover study design. NLP-enhanced search improved both accuracy and speed of clinical EHR IR tasks compared to browsing clinical notes without search. NLP-enhanced search improved accuracy and reduced the number of searches required for clinical EHR IR tasks compared to direct search term matching. ", doi="10.2196/39616", url="https://medinform.jmir.org/2022/10/e39616", url="http://www.ncbi.nlm.nih.gov/pubmed/36287591" } @Article{info:doi/10.2196/35860, author="Rosario, Bedda and Zhang, Andrew and Patel, Mehool and Rajmane, Amol and Xie, Ning and Weeraratne, Dilhan and Alterovitz, Gil", title="Characterizing Thrombotic Complication Risk Factors Associated With COVID-19 via Heterogeneous Patient Data: Retrospective Observational Study", journal="J Med Internet Res", year="2022", month="Oct", day="21", volume="24", number="10", pages="e35860", keywords="COVID-19", keywords="thrombotic complications", keywords="logistic regression", keywords="EHR", keywords="electronic health record", keywords="insurance claims data", abstract="Background: COVID-19 has been observed to be associated with venous and arterial thrombosis. The inflammatory disease prolongs hospitalization, and preexisting comorbidities can intensity the thrombotic burden in patients with COVID-19. However, venous thromboembolism, arterial thrombosis, and other vascular complications may go unnoticed in critical care settings. Early risk stratification is paramount in the COVID-19 patient population for proactive monitoring of thrombotic complications. Objective: The aim of this exploratory research was to characterize thrombotic complication risk factors associated with COVID-19 using information from electronic health record (EHR) and insurance claims databases. The goal is to develop an approach for analysis using real-world data evidence that can be generalized to characterize thrombotic complications and additional conditions in other clinical settings as well, such as pneumonia or acute respiratory distress syndrome?in COVID-19 patients or in the intensive care unit. Methods: We extracted deidentified patient data from the insurance claims database IBM MarketScan, and formulated hypotheses on thrombotic complications in patients with COVID-19 with respect to patient demographic and clinical factors using logistic regression. The hypotheses were then verified with analysis of deidentified patient data from the Research Patient Data Registry (RPDR) Mass General Brigham (MGB) patient EHR database. Data were analyzed according to odds ratios, 95\% CIs, and P values. Results: The analysis identified significant predictors (P<.001) for thrombotic complications in 184,831 COVID-19 patients out of the millions of records from IBM MarketScan and the MGB RPDR. With respect to age groups, patients 60 years and older had higher odds (4.866 in MarketScan and 6.357 in RPDR) to have thrombotic complications than those under 60 years old. In terms of gender, men were more likely (odds ratio of 1.245 in MarketScan and 1.693 in RPDR) to have thrombotic complications than women. Among the preexisting comorbidities, patients with heart disease, cerebrovascular diseases, hypertension, and personal history of thrombosis all had significantly higher odds of developing a thrombotic complication. Cancer and obesity were also associated with odds>1. The results from RPDR validated the IBM MarketScan findings, as they were largely consistent and afford mutual enrichment. Conclusions: The analysis approach adopted in this study can work across heterogeneous databases from diverse organizations and thus facilitates collaboration. Searching through millions of patient records, the analysis helped to identify factors influencing a phenotype. Use of thrombotic complications in COVID-19 patients represents only a case study; however, the same design can be used across other disease areas by extracting corresponding disease-specific patient data from available databases. ", doi="10.2196/35860", url="https://www.jmir.org/2022/10/e35860", url="http://www.ncbi.nlm.nih.gov/pubmed/36044652" } @Article{info:doi/10.2196/38325, author="Liu, Shalom David and Sawyer, Jake and Luna, Alexander and Aoun, Jihad and Wang, Janet and Boachie, Lord and Halabi, Safwan and Joe, Bina", title="Perceptions of US Medical Students on Artificial Intelligence in Medicine: Mixed Methods Survey Study", journal="JMIR Med Educ", year="2022", month="Oct", day="21", volume="8", number="4", pages="e38325", keywords="artificial intelligence", keywords="eHealth", keywords="digital health", keywords="integration", keywords="medical education", keywords="medical curriculum", keywords="education", keywords="medical student", keywords="medical school", keywords="elective course", abstract="Background: Given the rapidity with which artificial intelligence is gaining momentum in clinical medicine, current physician leaders have called for more incorporation of artificial intelligence topics into undergraduate medical education. This is to prepare future physicians to better work together with artificial intelligence technology. However, the first step in curriculum development is to survey the needs of end users. There has not been a study to determine which media and which topics are most preferred by US medical students to learn about the topic of artificial intelligence in medicine. Objective: We aimed to survey US medical students on the need to incorporate artificial intelligence in undergraduate medical education and their preferred means to do so to assist with future education initiatives. Methods: A mixed methods survey comprising both specific questions and a write-in response section was sent through Qualtrics to US medical students in May 2021. Likert scale questions were used to first assess various perceptions of artificial intelligence in medicine. Specific questions were posed regarding learning format and topics in artificial intelligence. Results: We surveyed 390 US medical students with an average age of 26 (SD 3) years from 17 different medical programs (the estimated response rate was 3.5\%). A majority (355/388, 91.5\%) of respondents agreed that training in artificial intelligence concepts during medical school would be useful for their future. While 79.4\% (308/388) were excited to use artificial intelligence technologies, 91.2\% (353/387) either reported that their medical schools did not offer resources or were unsure if they did so. Short lectures (264/378, 69.8\%), formal electives (180/378, 47.6\%), and Q and A panels (167/378, 44.2\%) were identified as preferred formats, while fundamental concepts of artificial intelligence (247/379, 65.2\%), when to use artificial intelligence in medicine (227/379, 59.9\%), and pros and cons of using artificial intelligence (224/379, 59.1\%) were the most preferred topics for enhancing their training. Conclusions: The results of this study indicate that current US medical students recognize the importance of artificial intelligence in medicine and acknowledge that current formal education and resources to study artificial intelligence--related topics are limited in most US medical schools. Respondents also indicated that a hybrid formal/flexible format would be most appropriate for incorporating artificial intelligence as a topic in US medical schools. Based on these data, we conclude that there is a definitive knowledge gap in artificial intelligence education within current medical education in the US. Further, the results suggest there is a disparity in opinions on the specific format and topics to be introduced. ", doi="10.2196/38325", url="https://mededu.jmir.org/2022/4/e38325", url="http://www.ncbi.nlm.nih.gov/pubmed/36269641" } @Article{info:doi/10.2196/36976, author="Silvestri, A. Jasmine and Kmiec, E. Tyler and Bishop, S. Nicholas and Regli, H. Susan and Weissman, E. Gary", title="Desired Characteristics of a Clinical Decision Support System for Early Sepsis Recognition: Interview Study Among Hospital-Based Clinicians", journal="JMIR Hum Factors", year="2022", month="Oct", day="21", volume="9", number="4", pages="e36976", keywords="sepsis", keywords="predictive information", keywords="clinical decision support", keywords="human factors", keywords="sepsis onset", abstract="Background: Sepsis is a major burden for health care systems in the United States, with over 750,000 cases annually and a total cost of approximately US \$20 billion. The hallmark of sepsis treatment is early and appropriate initiation of antibiotic therapy. Although sepsis clinical decision support (CDS) systems can provide clinicians with early predictions of suspected sepsis or imminent clinical decline, such systems have not reliably demonstrated improvements in clinical outcomes or care processes. Growing evidence suggests that the challenges of integrating sepsis CDS systems into clinical workflows, gaining the trust of clinicians, and making sepsis CDS systems clinically relevant at the bedside are all obstacles to successful deployment. However, there are significant knowledge gaps regarding the achievement of these implementation and deployment goals. Objective: We aimed to identify perceptions of predictive information in sepsis CDS systems based on clinicians' past experiences, explore clinicians' perceptions of a hypothetical sepsis CDS system, and identify the characteristics of a CDS system that would be helpful in promoting timely recognition and management of suspected sepsis in a multidisciplinary, team-based clinical setting. Methods: We conducted semistructured interviews with practicing bedside nurses, advanced practice providers, and physicians at a large academic medical center between September 2020 and March 2021. We used modified human factor methods (contextual interview and cognitive walkthrough performed over video calls because of the COVID-19 pandemic) and conducted a thematic analysis using an abductive approach for coding to identify important patterns and concepts in the interview transcripts. Results: We interviewed 6 bedside nurses and 9 clinicians responsible for ordering antibiotics (advanced practice providers or physicians) who had a median of 4 (IQR 4-6.5) years of experience working in an inpatient setting. We then synthesized critical content from the thematic analysis of the data into four domains: clinician perceptions of prediction models and alerts; previous experiences of clinician encounters with predictive information and risk scores; desired characteristics of a CDS system build, including predictions, supporting information, and delivery methods for a potential alert; and the clinical relevance and potential utility of a CDS system. These 4 domains were strongly linked to clinicians' perceptions of the likelihood of adoption and the impact on clinical workflows when diagnosing and managing patients with suspected sepsis. Ultimately, clinicians desired a trusted and actionable CDS system to improve sepsis care. Conclusions: Building a trusted and actionable sepsis CDS alert is paramount to achieving acceptability and use among clinicians. These findings can inform the development, implementation, and deployment strategies for CDS systems that support the early detection and treatment of sepsis. This study also highlights several key opportunities when eliciting clinician input before the development and deployment of prediction models. ", doi="10.2196/36976", url="https://humanfactors.jmir.org/2022/4/e36976", url="http://www.ncbi.nlm.nih.gov/pubmed/36269653" } @Article{info:doi/10.2196/39565, author="Alsobhi, Mashael and Khan, Fayaz and Chevidikunnan, Faisal Mohamed and Basuodan, Reem and Shawli, Lama and Neamatallah, Ziyad", title="Physical Therapists' Knowledge and Attitudes Regarding Artificial Intelligence Applications in Health Care and Rehabilitation: Cross-sectional Study", journal="J Med Internet Res", year="2022", month="Oct", day="20", volume="24", number="10", pages="e39565", keywords="artificial intelligence", keywords="physical therapy", keywords="clinicians' attitudes", keywords="health care", keywords="rehabilitation", keywords="digital health", keywords="machine learning", keywords="survey", abstract="Background: The use of artificial intelligence (AI) in the field of rehabilitation is growing rapidly. Therefore, there is a need to understand how physical therapists (PTs) perceive AI technologies in clinical practice. Objective: This study aimed to investigate the knowledge and attitude of PTs regarding AI applications in rehabilitation based on multiple explanatory factors. Methods: A web-based Google Form survey, which was divided into 4 sections, was used to collect the data. A total of 317 PTs participated voluntarily in the study. Results: The PTs' knowledge about AI applications in rehabilitation was lower than their knowledge about AI in general. We found a statistically significant difference in the PTs' knowledge regarding AI applications in the rehabilitation field based on sex (odds ratio [OR] 2.43, 95\% CI 1.53-3.87; P<.001). In addition, experience (OR 1.79, 95\% CI 1.11-2.87; P=.02) and educational qualification (OR 1.68, 95\% CI 1.05-2.70; P=.03) were found to be significant predictors of knowledge about AI applications. PTs who work in the nonacademic sector and who had <10 years of experience had positive attitudes regarding AI. Conclusions: AI technologies have been integrated into many physical therapy practices through the automation of clinical tasks. Therefore, PTs are encouraged to take advantage of the widespread development of AI technologies and enrich their knowledge about, and enhance their practice with, AI applications. ", doi="10.2196/39565", url="https://www.jmir.org/2022/10/e39565", url="http://www.ncbi.nlm.nih.gov/pubmed/36264614" } @Article{info:doi/10.2196/40018, author="Gomez Lumbreras, Ainhoa and Reese, J. Thomas and Del Fiol, Guilherme and Tan, S. Malinda and Butler, M. Jorie and Hurwitz, T. Jason and Brown, Mary and Kawamoto, Kensaku and Thiess, Henrik and Wright, Maria and Malone, C. Daniel", title="Shared Decision-Making for Drug-Drug Interactions: Formative Evaluation of an Anticoagulant Drug Interaction", journal="JMIR Form Res", year="2022", month="Oct", day="19", volume="6", number="10", pages="e40018", keywords="decision making, shared", keywords="decision support systems, clinical", keywords="decision making", keywords="decision support", keywords="user-centered design", keywords="patient-centered care", keywords="risk management", keywords="drug interaction", keywords="pharmacotherapy", keywords="pharmacy", keywords="pharmaceutical", keywords="warfarin", keywords="unified theory of acceptance and use of technology", keywords="UTAUT", keywords="NSAID", keywords="anti-inflammatory", keywords="non-steroidal", abstract="Background: Warnings about drug-drug interactions (DDIs) between warfarin and nonsteroidal anti-inflammatory drugs (NSAIDs) within electronic health records indicate potential harm but fail to account for contextual factors and preferences. We developed a tool called DDInteract to enhance and support shared decision-making (SDM) between patients and physicians when both warfarin and NSAIDs are used concurrently. DDInteract was designed to be integrated into electronic health records using interoperability standards. Objective: The purpose of this study was to conduct a formative evaluation of a DDInteract that incorporates patient and product contextual factors to estimate the risk of bleeding. Methods: A randomized formative evaluation was conducted to compare DDInteract to usual care (UC) using physician-patient dyads. Using case vignettes, physicians and patients on warfarin participated in simulated virtual clinical encounters where they discussed the use of taking ibuprofen and warfarin concurrently and determined an appropriate therapeutic plan based on the patient's individualized risk. Dyads were randomized to either DDInteract or UC. Participants completed a postsession interview and survey of the SDM process. This included the 9-item Shared Decision-Making Questionnaire (SDM-Q-9), tool usability and workload National Aeronautics and Space Administration (NASA) Task Load Index, Unified Theory of Acceptance and Use of Technology (UTAUT), Perceived Behavioral Control (PBC) scale, System Usability Scale (SUS), and Decision Conflict Scale (DCS). They also were interviewed after the session to obtain perceptions on DDInteract and UC resources for DDIs. Results: Twelve dyad encounters were performed using virtual software. Most (n=11, 91.7\%) patients were over 50 years of age, and 9 (75\%) had been taking warfarin for more than 2 years (75\%). Regarding scores on the SDM-Q-9, participants rated DDInteract higher than UC for questions pertaining to helping patients clarify the decision (P=.03), involving patients in the decision (P=.01), displaying treatment options (P<.001), identifying advantages and disadvantages (P=.01), and facilitating patient understanding (P=.01) and discussion of preferences (P=.01). Five of the 8 UTAUT constructs showed differences between the 2 groups, favoring DDInteract (P<.05). Usability ratings from the SUS were significantly higher (P<.05) for physicians using DDInteract compared to those in the UC group but showed no differences from the patient's perspective. No differences in patient responses were observed between groups using the DCS. During the session debrief, physicians indicated little concern for the additional time or workload entailed by DDInteract use. Both clinicians and patients indicated that the tool was beneficial in simulated encounters to understand and mitigate the risk of harm from this DDI. Conclusions: Overall, DDInteract may improve encounters where there is a risk of bleeding due to a potential drug-drug interaction involving anticoagulants. Participants rated DDInteract as logical and useful for enhancing SDM. They reported that they would be willing to use the tool for an interaction involving warfarin and NSAIDs. ", doi="10.2196/40018", url="https://formative.jmir.org/2022/10/e40018", url="http://www.ncbi.nlm.nih.gov/pubmed/36260377" } @Article{info:doi/10.2196/38879, author="Liu, Star and Ding, Xiyu and Belouali, Anas and Bai, Haibin and Raja, Kanimozhi and Kharrazi, Hadi", title="Assessing the Racial and Socioeconomic Disparities in Postpartum Depression Using Population-Level Hospital Discharge Data: Longitudinal Retrospective Study", journal="JMIR Pediatr Parent", year="2022", month="Oct", day="17", volume="5", number="4", pages="e38879", keywords="health disparity", keywords="hospital discharge summary", keywords="phenotyping", keywords="data quality", keywords="vulnerable population", keywords="postpartum depression", keywords="maternal health", abstract="Background: In the United States, >3.6 million deliveries occur annually. Among them, up to 20\% (approximately 700,000) of women experience postpartum depression (PPD) according to the Centers for Disease Control and Prevention. Absence of accurate reporting and diagnosis has made phenotyping of patients with PPD difficult. Existing literature has shown that factors such as race, socioeconomic status, and history of substance abuse are associated with the differential risks of PPD. However, limited research has considered differential temporal associations with the outcome. Objective: This study aimed to estimate the disparities in the risk of PPD and time to diagnosis for patients of different racial and socioeconomic backgrounds. Methods: This is a longitudinal retrospective study using the statewide hospital discharge data from Maryland. We identified 160,066 individuals who had a hospital delivery from 2017 to 2019. We applied logistic regression and Cox regression to study the risk of PPD across racial and socioeconomic strata. Multinomial regression was used to estimate the risk of PPD at different postpartum stages. Results: The cumulative incidence of PPD diagnosis was highest for White patients (8779/65,028, 13.5\%) and lowest for Asian and Pacific Islander patients (248/10,760, 2.3\%). Compared with White patients, PPD diagnosis was less likely to occur for Black patients (odds ratio [OR] 0.31, 95\% CI 0.30-0.33), Asian or Pacific Islander patients (OR 0.17, 95\% CI 0.15-0.19), and Hispanic patients (OR 0.21, 95\% CI 0.19-0.22). Similar findings were observed from the Cox regression analysis. Multinomial regression showed that compared with White patients, Black patients (relative risk 2.12, 95\% CI 1.73-2.60) and Asian and Pacific Islander patients (relative risk 2.48, 95\% CI 1.46-4.21) were more likely to be diagnosed with PPD after 8 weeks of delivery. Conclusions: Compared with White patients, PPD diagnosis is less likely to occur in individuals of other races. We found disparate timing in PPD diagnosis across different racial groups and socioeconomic backgrounds. Our findings serve to enhance intervention strategies and policies for phenotyping patients at the highest risk of PPD and to highlight needs in data quality to support future work on racial disparities in PPD. ", doi="10.2196/38879", url="https://pediatrics.jmir.org/2022/4/e38879", url="http://www.ncbi.nlm.nih.gov/pubmed/36103575" } @Article{info:doi/10.2196/37704, author="Fuster-Casanovas, A{\"i}na and Fernandez-Luque, Luis and Nu{\~n}ez-Benjumea, J. Francisco and Moreno Conde, Alberto and Luque-Romero, G. Luis and Bilionis, Ioannis and Rubio Escudero, Cristina and Chicchi Giglioli, Alice Irene and Vidal-Alaball, Josep", title="An Artificial Intelligence--Driven Digital Health Solution to Support Clinical Management of Patients With Long COVID-19: Protocol for a Prospective Multicenter Observational Study", journal="JMIR Res Protoc", year="2022", month="Oct", day="14", volume="11", number="10", pages="e37704", keywords="COVID-19?syndrome", keywords="artificial?intelligence", keywords="AI", keywords="primary?health?care", keywords="Postacute?COVID-19?syndrome", keywords="COVID-19", keywords="health system", keywords="health care", keywords="health care resource", keywords="public health policy", keywords="long COVID-19", keywords="mHealth", keywords="digital health solution", keywords="patient", keywords="clinical information", keywords="clinical decision support", abstract="Background: COVID-19 pandemic has revealed the weaknesses of most health systems around the world, collapsing them and depleting their available health care resources. Fortunately, the development and enforcement of specific public health policies, such as vaccination, mask wearing, and social distancing, among others, has reduced the prevalence and complications associated with COVID-19 in its acute phase. However, the aftermath of the global pandemic has called for an efficient approach to manage patients with long COVID-19. This is a great opportunity to leverage on innovative digital health solutions to provide exhausted health care systems with the most cost-effective and efficient tools available to support the clinical management of this population. In this context, the SENSING-AI project is focused on the research toward the implementation of an artificial intelligence--driven digital health solution that supports both the adaptive self-management of people living with long COVID-19 and the health care staff in charge of the management and follow-up of this population. Objective: The objective of this protocol is the prospective collection of psychometric and biometric data from 10 patients for training algorithms and prediction models to complement the SENSING-AI cohort. Methods: Publicly available health and lifestyle data registries will be consulted and complemented with a retrospective cohort of anonymized data collected from clinical information of patients diagnosed with long COVID-19. Furthermore, a prospective patient-generated data set will be captured using wearable devices and validated patient-reported outcomes questionnaires to complement the retrospective cohort. Finally, the `Findability, Accessibility, Interoperability, and Reuse' guiding principles for scientific data management and stewardship will be applied to the resulting data set to encourage the continuous process of discovery, evaluation, and reuse of information for the research community at large. Results: The SENSING-AI cohort is expected to be completed during 2022. It is expected that sufficient data will be obtained to generate artificial intelligence models based on behavior change and mental well-being techniques to improve patients' self-management, while providing useful and timely clinical decision support services to health care professionals based on risk stratification models and early detection of exacerbations. Conclusions: SENSING-AI focuses on obtaining high-quality data of patients with long COVID-19 during their daily life. Supporting these patients is of paramount importance in the current pandemic situation, including supporting their health care professionals in a cost-effective and efficient management of long COVID-19. Trial Registration: Clinicaltrials.gov NCT05204615; https://clinicaltrials.gov/ct2/show/NCT05204615 International Registered Report Identifier (IRRID): DERR1-10.2196/37704 ", doi="10.2196/37704", url="https://www.researchprotocols.org/2022/10/e37704", url="http://www.ncbi.nlm.nih.gov/pubmed/36166648" } @Article{info:doi/10.2196/42429, author="Tseng, Tzu-Wei and Su, Chang-Fu and Lai, Feipei", title="Fast Healthcare Interoperability Resources for Inpatient Deterioration Detection With Time-Series Vital Signs: Design and Implementation Study", journal="JMIR Med Inform", year="2022", month="Oct", day="13", volume="10", number="10", pages="e42429", keywords="Fast Healthcare Interoperability Resources", keywords="FHIR", keywords="Health Level 7", keywords="HL7", keywords="health research", keywords="data sharing", keywords="health information technology", keywords="clinical research", abstract="Background: Vital signs have been widely adopted in in-hospital cardiac arrest (IHCA) assessment, which plays an important role in inpatient deterioration detection. As the number of early warning systems and artificial intelligence applications increases, health care information exchange and interoperability are becoming more complex and difficult. Although Health Level 7 Fast Healthcare Interoperability Resources (FHIR) have already developed a vital signs profile, it is not sufficient to support IHCA applications or machine learning--based models. Objective: In this paper, for IHCA instances with vital signs, we define a new implementation guide that includes data mapping, a system architecture, a workflow, and FHIR applications. Methods: We interviewed 10 experts regarding health care system integration and defined an implementation guide. We then developed the FHIR Extract Transform Load to map data to FHIR resources. We also integrated an early warning system and machine learning pipeline. Results: The study data set includes electronic health records of adult inpatients who visited the En-Chu-Kong hospital. Medical staff regularly measured these vital signs at least 2 to 3 times per day during the day, night, and early morning. We used pseudonymization to protect patient privacy. Then, we converted the vital signs to FHIR observations in the JSON format using the FHIR Extract Transform Load application. The measured vital signs include systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, and body temperature. According to clinical requirements, we also extracted the electronic health record information to the FHIR server. Finally, we integrated an early warning system and machine learning pipeline using the FHIR RESTful application programming interface. Conclusions: We successfully demonstrated a process that standardizes health care information for inpatient deterioration detection using vital signs. Based on the FHIR definition, we also provided an implementation guide that includes data mapping, an integration process, and IHCA assessment using vital signs. We also proposed a clarifying system architecture and possible workflows. Based on FHIR, we integrated the 3 different systems in 1 dashboard system, which can effectively solve the complexity of the system in the medical staff workflow. ", doi="10.2196/42429", url="https://medinform.jmir.org/2022/10/e42429", url="http://www.ncbi.nlm.nih.gov/pubmed/36227636" } @Article{info:doi/10.2196/40344, author="Frid, Santiago and Fuentes Exp{\'o}sito, Angeles Maria and Grau-Corral, Inmaculada and Amat-Fernandez, Clara and Mu{\~n}oz Mateu, Montserrat and Pastor Duran, Xavier and Lozano-Rub{\'i}, Raimundo", title="Successful Integration of EN/ISO 13606--Standardized Extracts From a Patient Mobile App Into an Electronic Health Record: Description of a Methodology", journal="JMIR Med Inform", year="2022", month="Oct", day="12", volume="10", number="10", pages="e40344", keywords="health information interoperability", keywords="mobile app", keywords="health information standards", keywords="artificial intelligence", keywords="electronic health records", keywords="machine learning", abstract="Background: There is an increasing need to integrate patient-generated health data (PGHD) into health information systems (HISs). The use of health information standards based on the dual model allows the achievement of semantic interoperability among systems. Although there is evidence in the use of the Substitutable Medical Applications and Reusable Technologies on Fast Healthcare Interoperability Resources (SMART on FHIR) framework for standardized communication between mobile apps and electronic health records (EHRs), the use of European Norm/International Organization for Standardization (EN/ISO) 13606 has not been explored yet, despite some advantages over FHIR in terms of modeling and formalization of clinical knowledge, as well as flexibility in the creation of new concepts. Objective: This study aims to design and implement a methodology based on the dual-model paradigm to communicate clinical information between a patient mobile app (Xemio Research) and an institutional ontology-based clinical repository (OntoCR) without loss of meaning. Methods: This paper is framed within Artificial intelligence Supporting CAncer Patients across Europe (ASCAPE), a project that aims to use artificial intelligence (AI)/machine learning (ML) mechanisms to support cancer patients' health status and quality of life (QoL). First, the variables ``side effect'' and ``daily steps'' were defined and represented with EN/ISO 13606 archetypes. Next, ontologies that model archetyped concepts and map them to the standard were created and uploaded to OntoCR, where they were ready to receive instantiated patient data. Xemio Research used a conversion module in the ASCAPE Local Edge to transform data entered into the app to create EN/ISO 13606 extracts, which were sent to an Application Programming Interface (API) in OntoCR that maps each element in the normalized XML files to its corresponding location in the ontology. This way, instantiated data of patients are stored in the clinical repository. Results: Between December 22, 2020, and April 4, 2022, 1100 extracts of 47 patients were successfully communicated (234/1100, 21.3\%, extracts of side effects and 866/1100, 78.7\%, extracts of daily activity). Furthermore, the creation of EN/ISO 13606--standardized archetypes allows the reuse of clinical information regarding daily activity and side effects, while with the creation of ontologies, we extended the knowledge representation of our clinical repository. Conclusions: Health information interoperability is one of the requirements for continuity of health care. The dual model allows the separation of knowledge and information in HISs. EN/ISO 13606 was chosen for this project because of the operational mechanisms it offers for data exchange, as well as its flexibility for modeling knowledge and creating new concepts. To the best of our knowledge, this is the first experience reported in the literature of effective communication of EN/ISO 13606 EHR extracts between a patient mobile app and an institutional clinical repository using a scalable standard-agnostic methodology that can be applied to other projects, data sources, and institutions. ", doi="10.2196/40344", url="https://medinform.jmir.org/2022/10/e40344", url="http://www.ncbi.nlm.nih.gov/pubmed/36222792" } @Article{info:doi/10.2196/38977, author="Kopka, Marvin and Feufel, A. Markus and Balzer, Felix and Schmieding, L. Malte", title="The Triage Capability of Laypersons: Retrospective Exploratory Analysis", journal="JMIR Form Res", year="2022", month="Oct", day="12", volume="6", number="10", pages="e38977", keywords="digital health", keywords="triage", keywords="self-triage", keywords="urgency assessment", keywords="patient-centered care", keywords="care navigation", keywords="decision support", keywords="symptom checker", keywords="care", keywords="support", keywords="medical", keywords="health professional", keywords="patient", keywords="self-assessment", keywords="decision", keywords="accuracy", keywords="error", keywords="sensitivity", keywords="emergency", keywords="female", keywords="male", abstract="Background: Although medical decision-making may be thought of as a task involving health professionals, many decisions, including critical health--related decisions are made by laypersons alone. Specifically, as the first step to most care episodes, it is the patient who determines whether and where to seek health care (triage). Overcautious self-assessments (ie, overtriaging) may lead to overutilization of health care facilities and overcrowded emergency departments, whereas imprudent decisions (ie, undertriaging) constitute a risk to the patient's health. Recently, patient-facing decision support systems, commonly known as symptom checkers, have been developed to assist laypersons in these decisions. Objective: The purpose of this study is to identify factors influencing laypersons' ability to self-triage and their risk averseness in self-triage decisions. Methods: We analyzed publicly available data on 91 laypersons appraising 45 short fictitious patient descriptions (case vignettes; N=4095 appraisals). Using signal detection theory and descriptive and inferential statistics, we explored whether the type of medical decision laypersons face, their confidence in their decision, and sociodemographic factors influence their triage accuracy and the type of errors they make. We distinguished between 2 decisions: whether emergency care was required (decision 1) and whether self-care was sufficient (decision 2). Results: The accuracy of detecting emergencies (decision 1) was higher (mean 82.2\%, SD 5.9\%) than that of deciding whether any type of medical care is required (decision 2, mean 75.9\%, SD 5.25\%; t>90=8.4; P<.001; Cohen d=0.9). Sensitivity for decision 1 was lower (mean 67.5\%, SD 16.4\%) than its specificity (mean 89.6\%, SD 8.6\%) whereas sensitivity for decision 2 was higher (mean 90.5\%, SD 8.3\%) than its specificity (mean 46.7\%, SD 15.95\%). Female participants were more risk averse and overtriaged more often than male participants, but age and level of education showed no association with participants' risk averseness. Participants' triage accuracy was higher when they were certain about their appraisal (2114/3381, 62.5\%) than when being uncertain (378/714, 52.9\%). However, most errors occurred when participants were certain of their decision (1267/1603, 79\%). Participants were more commonly certain of their overtriage errors (mean 80.9\%, SD 23.8\%) than their undertriage errors (mean 72.5\%, SD 30.9\%; t>89=3.7; P<.001; d=0.39). Conclusions: Our study suggests that laypersons are overcautious in deciding whether they require medical care at all, but they miss identifying a considerable portion of emergencies. Our results further indicate that women are more risk averse than men in both types of decisions. Layperson participants made most triage errors when they were certain of their own appraisal. Thus, they might not follow or even seek advice (eg, from symptom checkers) in most instances where advice would be useful. ", doi="10.2196/38977", url="https://formative.jmir.org/2022/10/e38977", url="http://www.ncbi.nlm.nih.gov/pubmed/36222793" } @Article{info:doi/10.2196/39102, author="Daniel, Thomas and de Chevigny, Alix and Champrigaud, Adeline and Valette, Julie and Sitbon, Marine and Jardin, Meryam and Chevalier, Delphine and Renet, Sophie", title="Answering Hospital Caregivers' Questions at Any Time: Proof-of-Concept Study of an Artificial Intelligence--Based Chatbot in a French Hospital", journal="JMIR Hum Factors", year="2022", month="Oct", day="11", volume="9", number="4", pages="e39102", keywords="chatbot", keywords="artificial intelligence", keywords="pharmacy", keywords="hospital", keywords="health care", keywords="drugs", keywords="medication", keywords="information quality", keywords="health information", keywords="caregiver", keywords="healthcare staff", keywords="digital health tool", keywords="COVID-19", keywords="information technology", abstract="Background: Access to accurate information in health care is a key point for caregivers to avoid medication errors, especially with the reorganization of staff and drug circuits during health crises such as the COVID?19 pandemic. It is, therefore, the role of the hospital pharmacy to answer caregivers' questions. Some may require the expertise of a pharmacist, some should be answered by pharmacy technicians, but others are simple and redundant, and automated responses may be provided. Objective: We aimed at developing and implementing a chatbot to answer questions from hospital caregivers about drugs and pharmacy organization 24 hours a day and to evaluate this tool. Methods: The ADDIE (Analysis, Design, Development, Implementation, and Evaluation) model was used by a multiprofessional team composed of 3 hospital pharmacists, 2 members of the Innovation and Transformation Department, and the IT service provider. Based on an analysis of the caregivers' needs about drugs and pharmacy organization, we designed and developed a chatbot. The tool was then evaluated before its implementation into the hospital intranet. Its relevance and conversations with testers were monitored via the IT provider's back office. Results: Needs analysis with 5 hospital pharmacists and 33 caregivers from 5 health services allowed us to identify 7 themes about drugs and pharmacy organization (such as opening hours and specific prescriptions). After a year of chatbot design and development, the test version obtained good evaluation scores: its speed was rated 8.2 out of 10, usability 8.1 out of 10, and appearance 7.5 out of 10. Testers were generally satisfied (70\%) and were hoping for the content to be enhanced. Conclusions: The chatbot seems to be a relevant tool for hospital caregivers, helping them obtain reliable and verified information they need on drugs and pharmacy organization. In the context of significant mobility of nursing staff during the health crisis due to the COVID-19 pandemic, the chatbot could be a suitable tool for transmitting relevant information related to drug circuits or specific procedures. To our knowledge, this is the first time that such a tool has been designed for caregivers. Its development further continued by means of tests conducted with other users such as pharmacy technicians and via the integration of additional data before the implementation on the 2 hospital sites. ", doi="10.2196/39102", url="https://humanfactors.jmir.org/2022/4/e39102", url="http://www.ncbi.nlm.nih.gov/pubmed/35930555" } @Article{info:doi/10.2196/38464, author="Oates, John and Shafiabady, Niusha and Ambagtsheer, Rachel and Beilby, Justin and Seiboth, Chris and Dent, Elsa", title="Evolving Hybrid Partial Genetic Algorithm Classification Model for Cost-effective Frailty Screening: Investigative Study", journal="JMIR Aging", year="2022", month="Oct", day="7", volume="5", number="4", pages="e38464", keywords="machine learning", keywords="frailty screening", keywords="partial genetic algorithms", keywords="SVM", keywords="KNN", keywords="decision trees", keywords="frailty", keywords="algorithm", keywords="cost", keywords="model", keywords="index", keywords="database", keywords="ai", keywords="ageing", keywords="adults", keywords="older people", keywords="screening", keywords="tool", abstract="Background: A commonly used method for measuring frailty is the accumulation of deficits expressed as a frailty index (FI). FIs can be readily adapted to many databases, as the parameters to use are not prescribed but rather reflect a subset of extracted features (variables). Unfortunately, the structure of many databases does not permit the direct extraction of a suitable subset, requiring additional effort to determine and verify the value of features for each record and thus significantly increasing cost. Objective: Our objective is to describe how an artificial intelligence (AI) optimization technique called partial genetic algorithms can be used to refine the subset of features used to calculate an FI and favor features that have the least cost of acquisition. Methods: This is a secondary analysis of a residential care database compiled from 10 facilities in Queensland, Australia. The database is comprised of routinely collected administrative data and unstructured patient notes for 592 residents aged 75 years and over. The primary study derived an electronic frailty index (eFI) calculated from 36 suitable features. We then structurally modified a genetic algorithm to find an optimal predictor of the calculated eFI (0.21 threshold) from 2 sets of features. Partial genetic algorithms were used to optimize 4 underlying classification models: logistic regression, decision trees, random forest, and support vector machines. Results: Among the underlying models, logistic regression was found to produce the best models in almost all scenarios and feature set sizes. The best models were built using all the low-cost features and as few as 10 high-cost features, and they performed well enough (sensitivity 89\%, specificity 87\%) to be considered candidates for a low-cost frailty screening test. Conclusions: In this study, a systematic approach for selecting an optimal set of features with a low cost of acquisition and performance comparable to the eFI for detecting frailty was demonstrated on an aged care database. Partial genetic algorithms have proven useful in offering a trade-off between cost and accuracy to systematically identify frailty. ", doi="10.2196/38464", url="https://aging.jmir.org/2022/4/e38464", url="http://www.ncbi.nlm.nih.gov/pubmed/36206042" } @Article{info:doi/10.2196/32666, author="Sperl-Hillen, M. JoAnn and Anderson, P. Jeffrey and Margolis, L. Karen and Rossom, C. Rebecca and Kopski, M. Kristen and Averbeck, M. Beth and Rosner, A. Jeanine and Ekstrom, L. Heidi and Dehmer, P. Steven and O'Connor, J. Patrick", title="Bolstering the Business Case for Adoption of Shared Decision-Making Systems in Primary Care: Randomized Controlled Trial", journal="JMIR Form Res", year="2022", month="Oct", day="6", volume="6", number="10", pages="e32666", keywords="clinical decision support", keywords="primary care", keywords="ICD-10 diagnostic coding", keywords="CPT levels of service", keywords="shared decision-making", abstract="Background: Limited budgets may often constrain the ability of health care delivery systems to adopt shared decision-making (SDM) systems designed to improve clinical encounters with patients and quality of care. Objective: This study aimed to assess the impact of an SDM system shown to improve diabetes and cardiovascular patient outcomes on factors affecting revenue generation in primary care clinics. Methods: As part of a large multisite clinic randomized controlled trial (RCT), we explored the differences in 1 care system between clinics randomized to use an SDM intervention (n=8) versus control clinics (n=9) regarding the (1) likelihood of diagnostic coding for cardiometabolic conditions using the 10th Revision of the International Classification of Diseases (ICD-10) and (2) current procedural terminology (CPT) billing codes. Results: At all 24,138 encounters with care gaps targeted by the SDM system, the proportion assigned high-complexity CPT codes for level of service 5 was significantly higher at the intervention clinics (6.1\%) compared to that in the control clinics (2.9\%), with P<.001 and adjusted odds ratio (OR) 1.64 (95\% CI 1.02-2.61). This was consistently observed across the following specific care gaps: diabetes with glycated hemoglobin A1c (HbA1c)>8\% (n=8463), 7.2\% vs 3.4\%, P<.001, and adjusted OR 1.93 (95\% CI 1.01-3.67); blood pressure above goal (n=8515), 6.5\% vs 3.7\%, P<.001, and adjusted OR 1.42 (95\% CI 0.72-2.79); suboptimal statin management (n=17,765), 5.8\% vs 3\%, P<.001, and adjusted OR 1.41 (95\% CI 0.76-2.61); tobacco dependency (n=7449), 7.5\% vs. 3.4\%, P<.001, and adjusted OR 2.14 (95\% CI 1.31-3.51); BMI >30 kg/m2 (n=19,838), 6.2\% vs 2.9\%, P<.001, and adjusted OR 1.45 (95\% CI 0.75-2.8). Compared to control clinics, intervention clinics assigned ICD-10 diagnosis codes more often for observed cardiometabolic conditions with care gaps, although the difference did not reach statistical significance. Conclusions: In this randomized study, use of a clinically effective SDM system at encounters with care gaps significantly increased the proportion of encounters assigned high-complexity (level 5) CPT codes, and it was associated with a nonsignificant increase in assigning ICD-10 codes for observed cardiometabolic conditions. Trial Registration: ClinicalTrials.gov NCT 02451670; https://clinicaltrials.gov/ct2/show/NCT 02451670 ", doi="10.2196/32666", url="https://formative.jmir.org/2022/10/e32666", url="http://www.ncbi.nlm.nih.gov/pubmed/36201392" } @Article{info:doi/10.2196/40238, author="Sharma, Malvika and Savage, Carl and Nair, Monika and Larsson, Ingrid and Svedberg, Petra and Nygren, M. Jens", title="Artificial Intelligence Applications in Health Care Practice: Scoping Review", journal="J Med Internet Res", year="2022", month="Oct", day="5", volume="24", number="10", pages="e40238", keywords="artificial intelligence", keywords="health care", keywords="implementation", keywords="scoping review", keywords="technology adoption", abstract="Background: Artificial intelligence (AI) is often heralded as a potential disruptor that will transform the practice of medicine. The amount of data collected and available in health care, coupled with advances in computational power, has contributed to advances in AI and an exponential growth of publications. However, the development of AI applications does not guarantee their adoption into routine practice. There is a risk that despite the resources invested, benefits for patients, staff, and society will not be realized if AI implementation is not better understood. Objective: The aim of this study was to explore how the implementation of AI in health care practice has been described and researched in the literature by answering 3 questions: What are the characteristics of research on implementation of AI in practice? What types and applications of AI systems are described? What characteristics of the implementation process for AI systems are discernible? Methods: A scoping review was conducted of MEDLINE (PubMed), Scopus, Web of Science, CINAHL, and PsycINFO databases to identify empirical studies of AI implementation in health care since 2011, in addition to snowball sampling of selected reference lists. Using Rayyan software, we screened titles and abstracts and selected full-text articles. Data from the included articles were charted and summarized. Results: Of the 9218 records retrieved, 45 (0.49\%) articles were included. The articles cover diverse clinical settings and disciplines; most (32/45, 71\%) were published recently, were from high-income countries (33/45, 73\%), and were intended for care providers (25/45, 56\%). AI systems are predominantly intended for clinical care, particularly clinical care pertaining to patient-provider encounters. More than half (24/45, 53\%) possess no action autonomy but rather support human decision-making. The focus of most research was on establishing the effectiveness of interventions (16/45, 35\%) or related to technical and computational aspects of AI systems (11/45, 24\%). Focus on the specifics of implementation processes does not yet seem to be a priority in research, and the use of frameworks to guide implementation is rare. Conclusions: Our current empirical knowledge derives from implementations of AI systems with low action autonomy and approaches common to implementations of other types of information systems. To develop a specific and empirically based implementation framework, further research is needed on the more disruptive types of AI systems being implemented in routine care and on aspects unique to AI implementation in health care, such as building trust, addressing transparency issues, developing explainable and interpretable solutions, and addressing ethical concerns around privacy and data protection. ", doi="10.2196/40238", url="https://www.jmir.org/2022/10/e40238", url="http://www.ncbi.nlm.nih.gov/pubmed/36197712" } @Article{info:doi/10.2196/40511, author="Park, Hyunjung and Chae, Kathy Minjung and Jeong, Woohyeon and Yu, Jaeyong and Jung, Weon and Chang, Hansol and Cha, Chul Won", title="Appropriateness of Alerts and Physicians' Responses With a Medication-Related Clinical Decision Support System: Retrospective Observational Study", journal="JMIR Med Inform", year="2022", month="Oct", day="4", volume="10", number="10", pages="e40511", keywords="clinical decision support system", keywords="computerized physician order entry", keywords="alert fatigue", keywords="health personnel", keywords="decision-making support", keywords="physician behavior", keywords="physician response", keywords="alert system", abstract="Background: Alert fatigue is unavoidable when many irrelevant alerts are generated in response to a small number of useful alerts. It is necessary to increase the effectiveness of the clinical decision support system (CDSS) by understanding physicians' responses. Objective: This study aimed to understand the CDSS and physicians' behavior by evaluating the clinical appropriateness of alerts and the corresponding physicians' responses in a medication-related passive alert system. Methods: Data on medication-related orders, alerts, and patients' electronic medical records were analyzed. The analyzed data were generated between August 2019 and June 2020 while the patient was in the emergency department. We evaluated the appropriateness of alerts and physicians' responses for a subset of 382 alert cases and classified them. Results: Of the 382 alert cases, only 7.3\% (n=28) of the alerts were clinically appropriate. Regarding the appropriateness of the physicians' responses about the alerts, 92.4\% (n=353) were deemed appropriate. In the classification of alerts, only 3.4\% (n=13) of alerts were successfully triggered, and 2.1\% (n=8) were inappropriate in both alert clinical relevance and physician's response. In this study, the override rate was 92.9\% (n=355). Conclusions: We evaluated the appropriateness of alerts and physicians' responses through a detailed medical record review of the medication-related passive alert system. An excessive number of unnecessary alerts are generated, because the algorithm operates as a rule base without reflecting the individual condition of the patient. It is important to maximize the value of the CDSS by comprehending physicians' responses. ", doi="10.2196/40511", url="https://medinform.jmir.org/2022/10/e40511", url="http://www.ncbi.nlm.nih.gov/pubmed/36194461" } @Article{info:doi/10.2196/37900, author="Minian, Nadia and Lingam, Mathangee and Moineddin, Rahim and Thorpe, E. Kevin and Veldhuizen, Scott and Dragonetti, Rosa and Zawertailo, Laurie and Taylor, H. Valerie and Hahn, Margaret and deRuiter, K. Wayne and Melamed, C. Osnat and Selby, Peter", title="The Impact of a Clinical Decision Support System for Addressing Physical Activity and Healthy Eating During Smoking Cessation Treatment: Hybrid Type I Randomized Controlled Trial", journal="J Med Internet Res", year="2022", month="Sep", day="30", volume="24", number="9", pages="e37900", keywords="smoking cessation", keywords="physical activity", keywords="healthy eating", keywords="clinical decision support system", keywords="Canada", keywords="diet", keywords="intervention", keywords="smoking", keywords="primary care", keywords="program", keywords="treatment", keywords="clinical decision support", keywords="health behavior", abstract="Background: People who smoke have other risk factors for chronic diseases, such as low levels of physical activity and poor diet. Clinical decision support systems (CDSSs) might help health care practitioners integrate interventions for diet and physical activity into their smoking cessation programming but could worsen quit rates. Objective: The aims of this study are to assess the effects of the addition of a CDSS for physical activity and diet on smoking cessation outcomes and to assess the implementation of the study. Methods: We conducted a pragmatic hybrid type I effectiveness-implementation trial with 232 team-based primary care practices in Ontario, Canada, from November 2019 to May 2021. We used a 2-arm randomized controlled trial comparing a CDSS addressing physical activity and diet to treatment as usual and used the Reach, Effectiveness, Adoption, Implementation, and Maintenance framework to measure implementation outcomes. The primary outcome was self-reported 7-day tobacco abstinence at 6 months. Results: We enrolled 5331 participants in the study. Of these, 2732 (51.2\%) were randomized to the intervention group and 2599 (48.8\%) to the control group. At the 6-month follow-up, 29.7\% (634/2137) of respondents in the intervention arm and 27.3\% (552/2020) in the control arm reported abstinence from tobacco. After multiple imputation, the absolute group difference was 2.1\% (95\% CI ?0.5 to 4.6; F1,1000.42=2.43; P=.12). Mean exercise minutes changed from 32 (SD 44.7) to 110 (SD 196.1) in the intervention arm and from 32 (SD 45.1) to 113 (SD 195.1) in the control arm (group effect: B=?3.7 minutes; 95\% CI ?17.8 to 10.4; P=.61). Servings of fruit and vegetables changed from 2.64 servings to 2.42 servings in the intervention group and from 2.52 servings to 2.45 servings in the control group (incidence rate ratio for intervention group=0.98; 95\% CI 0.93-1.02; P=.35). Conclusions: A CDSS for physical activity and diet may be added to a smoking cessation program without affecting the outcomes. Further research is needed to improve the impact of integrated health promotion interventions in primary care smoking cessation programs. Trial Registration: ClinicalTrials.gov NCT04223336https://www.clinicaltrials.gov/ct2/show/NCT04223336 International Registered Report Identifier (IRRID): RR2-10.2196/19157 ", doi="10.2196/37900", url="https://www.jmir.org/2022/9/e37900", url="http://www.ncbi.nlm.nih.gov/pubmed/36178716" } @Article{info:doi/10.2196/39452, author="Ferreira-Santos, Daniela and Amorim, Pedro and Silva Martins, Tiago and Monteiro-Soares, Matilde and Pereira Rodrigues, Pedro", title="Enabling Early Obstructive Sleep Apnea Diagnosis With Machine Learning: Systematic Review", journal="J Med Internet Res", year="2022", month="Sep", day="30", volume="24", number="9", pages="e39452", keywords="machine learning", keywords="obstructive sleep apnea", keywords="systematic review", keywords="polysomnography", abstract="Background: American Academy of Sleep Medicine guidelines suggest that clinical prediction algorithms can be used to screen patients with obstructive sleep apnea (OSA) without replacing polysomnography, the gold standard. Objective: We aimed to identify, gather, and analyze existing machine learning approaches that are being used for disease screening in adult patients with suspected OSA. Methods: We searched the MEDLINE, Scopus, and ISI Web of Knowledge databases to evaluate the validity of different machine learning techniques, with polysomnography as the gold standard outcome measure and used the Prediction Model Risk of Bias Assessment Tool (Kleijnen Systematic Reviews Ltd) to assess risk of bias and applicability of each included study. Results: Our search retrieved 5479 articles, of which 63 (1.15\%) articles were included. We found 23 studies performing diagnostic model development alone, 26 with added internal validation, and 14 applying the clinical prediction algorithm to an independent sample (although not all reporting the most common discrimination metrics, sensitivity or specificity). Logistic regression was applied in 35 studies, linear regression in 16, support vector machine in 9, neural networks in 8, decision trees in 6, and Bayesian networks in 4. Random forest, discriminant analysis, classification and regression tree, and nomogram were each performed in 2 studies, whereas Pearson correlation, adaptive neuro-fuzzy inference system, artificial immune recognition system, genetic algorithm, supersparse linear integer models, and k-nearest neighbors algorithm were each performed in 1 study. The best area under the receiver operating curve was 0.98 (0.96-0.99) for age, waist circumference, Epworth Somnolence Scale score, and oxygen saturation as predictors in a logistic regression. Conclusions: Although high values were obtained, they still lacked external validation results in large cohorts and a standard OSA criteria definition. Trial Registration: PROSPERO CRD42021221339; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=221339 ", doi="10.2196/39452", url="https://www.jmir.org/2022/9/e39452", url="http://www.ncbi.nlm.nih.gov/pubmed/36178720" } @Article{info:doi/10.2196/39234, author="Sibbald, Matthew and Abdulla, Bashayer and Keuhl, Amy and Norman, Geoffrey and Monteiro, Sandra and Sherbino, Jonathan", title="Electronic Diagnostic Support in Emergency Physician Triage: Qualitative Study With Thematic Analysis of Interviews", journal="JMIR Hum Factors", year="2022", month="Sep", day="30", volume="9", number="3", pages="e39234", keywords="electronic differential diagnostic support", keywords="clinical reasoning", keywords="natural language processing", keywords="triage", keywords="diagnostic error", keywords="human factors", keywords="diagnosis", keywords="diagnostic", keywords="emergency", keywords="artificial intelligence", keywords="adoption", keywords="attitude", keywords="support system", keywords="automation", abstract="Background: Not thinking of a diagnosis is a leading cause of diagnostic error in the emergency department, resulting in delayed treatment, morbidity, and excess mortality. Electronic differential diagnostic support (EDS) results in small but significant reductions in diagnostic error. However, the uptake of EDS by clinicians is limited. Objective: We sought to understand physician perceptions and barriers to the uptake of EDS within the emergency department triage process. Methods: We conducted a qualitative study using a research associate to rapidly prototype an embedded EDS into the emergency department triage process. Physicians involved in the triage assessment of a busy emergency department were provided the output of an EDS based on the triage complaint by an embedded researcher to simulate an automated system that would draw from the electronic medical record. Physicians were interviewed immediately after their experience. Verbatim transcripts were analyzed by a team using open and axial coding, informed by direct content analysis. Results: In all, 4 themes emerged from 14 interviews: (1) the quality of the EDS was inferred from the scope and prioritization of the diagnoses present in the EDS differential; (2) the trust of the EDS was linked to varied beliefs around the diagnostic process and potential for bias; (3) clinicians foresaw more benefit to EDS use for colleagues and trainees rather than themselves; and (4) clinicians felt strongly that EDS output should not be included in the patient record. Conclusions: The adoption of an EDS into an emergency department triage process will require a system that provides diagnostic suggestions appropriate for the scope and context of the emergency department triage process, transparency of system design, and affordances for clinician beliefs about the diagnostic process and addresses clinician concern around including EDS output in the patient record. ", doi="10.2196/39234", url="https://humanfactors.jmir.org/2022/3/e39234", url="http://www.ncbi.nlm.nih.gov/pubmed/36178728" } @Article{info:doi/10.2196/33775, author="Li, Xiaochun and Xu, Huiping and Grannis, Shaun", title="The Data-Adaptive Fellegi-Sunter Model for Probabilistic Record Linkage: Algorithm Development and Validation for Incorporating Missing Data and Field Selection", journal="J Med Internet Res", year="2022", month="Sep", day="29", volume="24", number="9", pages="e33775", keywords="record linkage", keywords="Fellegi-Sunter model", keywords="latent class model", keywords="missing at random", keywords="matching field selection", abstract="Background: Quality patient care requires comprehensive health care data from a broad set of sources. However, missing data in medical records and matching field selection are 2 real-world challenges in patient-record linkage. Objective: In this study, we aimed to evaluate the extent to which incorporating the missing at random (MAR)--assumption in the Fellegi-Sunter model and using data-driven selected fields improve patient-matching accuracy using real-world use cases. Methods: We adapted the Fellegi-Sunter model to accommodate missing data using the MAR assumption and compared the adaptation to the common strategy of treating missing values as disagreement with matching fields specified by experts or selected by data-driven methods. We used 4 use cases, each containing a random sample of record pairs with match statuses ascertained by manual reviews. Use cases included health information exchange (HIE) record deduplication, linkage of public health registry records to HIE, linkage of Social Security Death Master File records to HIE, and deduplication of newborn screening records, which represent real-world clinical and public health scenarios. Matching performance was evaluated using the sensitivity, specificity, positive predictive value, negative predictive value, and F1-score. Results: Incorporating the MAR assumption in the Fellegi-Sunter model maintained or improved F1-scores, regardless of whether matching fields were expert-specified or selected by data-driven methods. Combining the MAR assumption and data-driven fields optimized the F1-scores in the 4 use cases. Conclusions: MAR is a reasonable assumption in real-world record linkage applications: it maintains or improves F1-scores regardless of whether matching fields are expert-specified or data-driven. Data-driven selection of fields coupled with MAR achieves the best overall performance, which can be especially useful in privacy-preserving record linkage. ", doi="10.2196/33775", url="https://www.jmir.org/2022/9/e33775", url="http://www.ncbi.nlm.nih.gov/pubmed/36173664" } @Article{info:doi/10.2196/35114, author="Winston, Luke and McCann, Michael and Onofrei, George", title="Exploring Socioeconomic Status as a Global Determinant of COVID-19 Prevalence, Using Exploratory Data Analytic and Supervised Machine Learning Techniques: Algorithm Development and Validation Study", journal="JMIR Form Res", year="2022", month="Sep", day="27", volume="6", number="9", pages="e35114", keywords="COVID-19", keywords="machine learning", keywords="data analysis", keywords="epidemiology", keywords="human development index", abstract="Background: The COVID-19 pandemic represents the most unprecedented global challenge in recent times. As the global community attempts to manage the pandemic in the long term, it is pivotal to understand what factors drive prevalence rates and to predict the future trajectory of the virus. Objective: This study had 2 objectives. First, it tested the statistical relationship between socioeconomic status and COVID-19 prevalence. Second, it used machine learning techniques to predict cumulative COVID-19 cases in a multicountry sample of 182 countries. Taken together, these objectives will shed light on socioeconomic status as a global risk factor of the COVID-19 pandemic. Methods: This research used exploratory data analysis and supervised machine learning methods. Exploratory analysis included variable distribution, variable correlations, and outlier detection. Following this, the following 3 supervised regression techniques were applied: linear regression, random forest, and adaptive boosting (AdaBoost). Results were evaluated using k-fold cross-validation and subsequently compared to analyze algorithmic suitability. The analysis involved 2 models. First, the algorithms were trained to predict 2021 COVID-19 prevalence using only 2020 reported case data. Following this, socioeconomic indicators were added as features and the algorithms were trained again. The Human Development Index (HDI) metrics of life expectancy, mean years of schooling, expected years of schooling, and gross national income were used to approximate socioeconomic status. Results: All variables correlated positively with the 2021 COVID-19 prevalence, with R2 values ranging from 0.55 to 0.85. Using socioeconomic indicators, COVID-19 prevalence was predicted with a reasonable degree of accuracy. Using 2020 reported case rates as a lone predictor to predict 2021 prevalence rates, the average predictive accuracy of the algorithms was low (R2=0.543). When socioeconomic indicators were added alongside 2020 prevalence rates as features, the average predictive performance improved considerably (R2=0.721) and all error statistics decreased. Thus, adding socioeconomic indicators alongside 2020 reported case data optimized the prediction of COVID-19 prevalence to a considerable degree. Linear regression was the strongest learner with R2=0.693 on the first model and R2=0.763 on the second model, followed by random forest (0.481 and 0.722) and AdaBoost (0.454 and 0.679). Following this, the second model was retrained using a selection of additional COVID-19 risk factors (population density, median age, and vaccination uptake) instead of the HDI metrics. However, average accuracy dropped to 0.649, which highlights the value of socioeconomic status as a predictor of COVID-19 cases in the chosen sample. Conclusions: The results show that socioeconomic status is an important variable to consider in future epidemiological modeling, and highlights the reality of the COVID-19 pandemic as a social phenomenon and a health care phenomenon. This paper also puts forward new considerations about the application of statistical and machine learning techniques to understand and combat the COVID-19 pandemic. ", doi="10.2196/35114", url="https://formative.jmir.org/2022/9/e35114", url="http://www.ncbi.nlm.nih.gov/pubmed/36001798" } @Article{info:doi/10.2196/40888, author="Tensen, Esm{\'e}e and van Sinderen, Femke and Bekkenk, W. Marcel and Jaspers, W. Monique and Peute, W. Linda", title="To Refer or Not to Refer in Teledermoscopy: Retrospective Study", journal="JMIR Dermatol", year="2022", month="Sep", day="23", volume="5", number="3", pages="e40888", keywords="teledermoscopy", keywords="dermoscopy", keywords="telemedicine", keywords="telehealth", keywords="triage", keywords="general practitioner", keywords="GP", keywords="general practice", keywords="family doctor", keywords="family physician", keywords="unnecessary referrals", keywords="refer", keywords="referral", keywords="skin", keywords="lesion", keywords="specialist", keywords="physician communication", keywords="diagnostic", keywords="interprofessional", keywords="diagnose", keywords="diagnosis", keywords="dermatology", keywords="dermatologist", abstract="Background: Challenges remain for general practitioners (GPs) in diagnosing (pre)malignant and benign skin lesions. Teledermoscopy (TDsc) supports GPs in diagnosing these skin lesions guided by teledermatologists' (TDs) diagnosis and advice and prevents unnecessary referrals to dermatology care. However, the impact of the availability of TDsc on GPs' self-reported referral decisions to dermatology care before and after the TDsc consultation is unknown. Objective: The objective of this study is to assess and compare the initial self-reported referral decisions of GPs before TDsc versus their final self-reported referral decisions after TDsc for skin lesions diagnosed by the TD as (pre)malignant or benign. Methods: TDsc consultations requested by GPs in daily practice between July 2015 and June 2020 with a TD assessment and diagnosis were extracted from a nationwide Dutch telemedicine database. Based on GP self-administered questions, the GPs' referral decisions before and their final referral decision after TDsc consultation were assessed for (pre)malignant and benign TD diagnoses. Results: GP self-administered questions and TD diagnoses were evaluated for 6364 TDsc consultations (9.3\% malignant, 8.8\% premalignant, and 81.9\% benign skin lesions). In half of the TDsc consultations, GPs adjusted their initial referral decision after TD advice and TD diagnosis. Initially, GPs did not have the intention to refer 67 (56.8\%) of 118 patients with a malignant TD diagnosis and 26 (16.0\%) of 162 patients with a premalignant TD diagnosis but then decided to refer these patients after the TDsc consultation. Furthermore, GPs adjusted their decision from referral to nonreferral for 2534 (74.9\%) benign skin lesions (including 676 seborrheic keratosis and 131 vascular lesions). Conclusions: GPs adjusted their referral decision in 52\% (n=3306) of the TDsc consultations after the TD assessment. The availability of TDsc is thus of added value and assists GPs in their (non)referral for patients with skin lesions to dermatology care. TDsc resulted in referrals of patients with (pre)malignant skin lesions that GPs would not have referred directly to the dermatologist. TDsc also led to a reduction of unnecessary referrals of patients with low complex benign skin lesions (eg, seborrheic keratosis and vascular lesions). ", doi="10.2196/40888", url="https://derma.jmir.org/2022/3/e40888", url="http://www.ncbi.nlm.nih.gov/pubmed/37632902" } @Article{info:doi/10.2196/37951, author="Kurasawa, Hisashi and Waki, Kayo and Chiba, Akihiro and Seki, Tomohisa and Hayashi, Katsuyoshi and Fujino, Akinori and Haga, Tsuneyuki and Noguchi, Takashi and Ohe, Kazuhiko", title="Treatment Discontinuation Prediction in Patients With Diabetes Using a Ranking Model: Machine Learning Model Development", journal="JMIR Bioinform Biotech", year="2022", month="Sep", day="23", volume="3", number="1", pages="e37951", keywords="machine learning", keywords="machine-learned ranking model", keywords="treatment discontinuation", keywords="diabetes", keywords="prediction", keywords="electronic health record", keywords="EHR", keywords="big data", keywords="ranking", keywords="algorithm", abstract="Background: Treatment discontinuation (TD) is one of the major prognostic issues in diabetes care, and several models have been proposed to predict a missed appointment that may lead to TD in patients with diabetes by using binary classification models for the early detection of TD and for providing intervention support for patients. However, as binary classification models output the probability of a missed appointment occurring within a predetermined period, they are limited in their ability to estimate the magnitude of TD risk in patients with inconsistent intervals between appointments, making it difficult to prioritize patients for whom intervention support should be provided. Objective: This study aimed to develop a machine-learned prediction model that can output a TD risk score defined by the length of time until TD and prioritize patients for intervention according to their TD risk. Methods: This model included patients with diagnostic codes indicative of diabetes at the University of Tokyo Hospital between September 3, 2012, and May 17, 2014. The model was internally validated with patients from the same hospital from May 18, 2014, to January 29, 2016. The data used in this study included 7551 patients who visited the hospital after January 1, 2004, and had diagnostic codes indicative of diabetes. In particular, data that were recorded in the electronic medical records between September 3, 2012, and January 29, 2016, were used. The main outcome was the TD of a patient, which was defined as missing a scheduled clinical appointment and having no hospital visits within 3 times the average number of days between the visits of the patient and within 60 days. The TD risk score was calculated by using the parameters derived from the machine-learned ranking model. The prediction capacity was evaluated by using test data with the C-index for the performance of ranking patients, area under the receiver operating characteristic curve, and area under the precision-recall curve for discrimination, in addition to a calibration plot. Results: The means (95\% confidence limits) of the C-index, area under the receiver operating characteristic curve, and area under the precision-recall curve for the TD risk score were 0.749 (0.655, 0.823), 0.758 (0.649, 0.857), and 0.713 (0.554, 0.841), respectively. The observed and predicted probabilities were correlated with the calibration plots. Conclusions: A TD risk score was developed for patients with diabetes by combining a machine-learned method with electronic medical records. The score calculation can be integrated into medical records to identify patients at high risk of TD, which would be useful in supporting diabetes care and preventing TD. ", doi="10.2196/37951", url="https://bioinform.jmir.org/2022/1/e37951" } @Article{info:doi/10.2196/38461, author="Perry, M. Laura and Morken, Victoria and Peipert, D. John and Yanez, Betina and Garcia, F. Sofia and Barnard, Cynthia and Hirschhorn, R. Lisa and Linder, A. Jeffrey and Jordan, Neil and Ackermann, T. Ronald and Harris, Alexandra and Kircher, Sheetal and Mohindra, Nisha and Aggarwal, Vikram and Frazier, Rebecca and Coughlin, Ava and Bedjeti, Katy and Weitzel, Melissa and Nelson, C. Eugene and Elwyn, Glyn and Van Citters, D. Aricca and O'Connor, Mary and Cella, David", title="Patient-Reported Outcome Dashboards Within the Electronic Health Record to Support Shared Decision-making: Protocol for Co-design and Clinical Evaluation With Patients With Advanced Cancer and Chronic Kidney Disease", journal="JMIR Res Protoc", year="2022", month="Sep", day="21", volume="11", number="9", pages="e38461", keywords="patient-reported outcome measures", keywords="shared decision-making", keywords="medical informatics", keywords="coproduction", keywords="learning health system", keywords="cancer", keywords="chronic kidney disease", abstract="Background: Patient-reported outcomes---symptoms, treatment side effects, and health-related quality of life---are important to consider in chronic illness care. The increasing availability of health IT to collect patient-reported outcomes and integrate results within the electronic health record provides an unprecedented opportunity to support patients' symptom monitoring, shared decision-making, and effective use of the health care system. Objective: The objectives of this study are to co-design a dashboard that displays patient-reported outcomes along with other clinical data (eg, laboratory tests, medications, and appointments) within an electronic health record and conduct a longitudinal demonstration trial to evaluate whether the dashboard is associated with improved shared decision-making and disease management outcomes. Methods: Co-design teams comprising study investigators, patients with advanced cancer or chronic kidney disease, their care partners, and their clinicians will collaborate to develop the dashboard. Investigators will work with clinic staff to implement the co-designed dashboard for clinical testing during a demonstration trial. The primary outcome of the demonstration trial is whether the quality of shared decision-making increases from baseline to the 3-month follow-up. Secondary outcomes include longitudinal changes in satisfaction with care, self-efficacy in managing treatments and symptoms, health-related quality of life, and use of costly and potentially avoidable health care services. Implementation outcomes (ie, fidelity, appropriateness, acceptability, feasibility, reach, adoption, and sustainability) during the co-design process and demonstration trial will also be collected and summarized. Results: The dashboard co-design process was completed in May 2020, and data collection for the demonstration trial is anticipated to be completed by the end of July 2022. The results will be disseminated in at least one manuscript per study objective. Conclusions: This protocol combines stakeholder engagement, health care coproduction frameworks, and health IT to develop a clinically feasible model of person-centered care delivery. The results will inform our current understanding of how best to integrate patient-reported outcome measures into clinical workflows to improve outcomes and reduce the burden of chronic disease on patients and health care systems. International Registered Report Identifier (IRRID): DERR1-10.2196/38461 ", doi="10.2196/38461", url="https://www.researchprotocols.org/2022/9/e38461", url="http://www.ncbi.nlm.nih.gov/pubmed/36129747" } @Article{info:doi/10.2196/40249, author="Lin, Senlin and Li, Liping and Zou, Haidong and Xu, Yi and Lu, Lina", title="Medical Staff and Resident Preferences for Using Deep Learning in Eye Disease Screening: Discrete Choice Experiment", journal="J Med Internet Res", year="2022", month="Sep", day="20", volume="24", number="9", pages="e40249", keywords="discrete choice experiment", keywords="preference", keywords="artificial intelligence", keywords="AI", keywords="vision health", keywords="screening", abstract="Background: Deep learning--assisted eye disease diagnosis technology is increasingly applied in eye disease screening. However, no research has suggested the prerequisites for health care service providers and residents willing to use it. Objective: The aim of this paper is to reveal the preferences of health care service providers and residents for using artificial intelligence (AI) in community-based eye disease screening, particularly their preference for accuracy. Methods: Discrete choice experiments for health care providers and residents were conducted in Shanghai, China. In total, 34 medical institutions with adequate AI-assisted screening experience participated. A total of 39 medical staff and 318 residents were asked to answer the questionnaire and make a trade-off among alternative screening strategies with different attributes, including missed diagnosis rate, overdiagnosis rate, screening result feedback efficiency, level of ophthalmologist involvement, organizational form, cost, and screening result feedback form. Conditional logit models with the stepwise selection method were used to estimate the preferences. Results: Medical staff preferred high accuracy: The specificity of deep learning models should be more than 90\% (odds ratio [OR]=0.61 for 10\% overdiagnosis; P<.001), which was much higher than the Food and Drug Administration standards. However, accuracy was not the residents' preference. Rather, they preferred to have the doctors involved in the screening process. In addition, when compared with a fully manual diagnosis, AI technology was more favored by the medical staff (OR=2.08 for semiautomated AI model and OR=2.39 for fully automated AI model; P<.001), while the residents were in disfavor of the AI technology without doctors' supervision (OR=0.24; P<.001). Conclusions: Deep learning model under doctors' supervision is strongly recommended, and the specificity of the model should be more than 90\%. In addition, digital transformation should help medical staff move away from heavy and repetitive work and spend more time on communicating with residents. ", doi="10.2196/40249", url="https://www.jmir.org/2022/9/e40249", url="http://www.ncbi.nlm.nih.gov/pubmed/36125854" } @Article{info:doi/10.2196/39386, author="Adisso, Lionel {\'E}v{\`e}hou{\'e}nou and Taljaard, Monica and Stacey, Dawn and Bri{\`e}re, Nathalie and Zomahoun, Vignon Herv{\'e} Tchala and Durand, Jacob Pierre and Rivest, Louis-Paul and L{\'e}gar{\'e}, France", title="Shared Decision-Making Training for Home Care Teams to Engage Frail Older Adults and Caregivers in Housing Decisions: Stepped-Wedge Cluster Randomized Trial", journal="JMIR Aging", year="2022", month="Sep", day="20", volume="5", number="3", pages="e39386", keywords="shared decision-making", keywords="home care", keywords="nursing homes", keywords="patient engagement", abstract="Background: Frail older adults and caregivers need support from their home care teams in making difficult housing decisions, such as whether to remain at home, with or without assistance, or move into residential care. However, home care teams are often understaffed and busy, and shared decision-making training is costly. Nevertheless, overall awareness of shared decision-making is increasing. We hypothesized that distributing a decision aid could be sufficient for providing decision support without the addition of shared decision-making training for home care teams. Objective: We evaluated the effectiveness of adding web-based training and workshops for care teams in interprofessional shared decision-making to passive dissemination of a decision guide on the proportion of frail older adults or caregivers of cognitively-impaired frail older adults reporting active roles in housing decision-making. Methods: We conducted a stepped-wedge cluster randomized trial with home care teams in 9 health centers in Quebec, Canada. Participants were frail older adults or caregivers of cognitively impaired frail older adults facing housing decisions and receiving care from the home care team at one of the participating health centers. The intervention consisted of a 1.5-hour web-based tutorial for the home care teams plus a 3.5-hour interactive workshop in interprofessional shared decision-making using a decision guide that was designed to support frail older adults and caregivers in making housing decisions. The control was passive dissemination of the decision guide. The primary outcome was an active role in decision-making among frail older adults and caregivers, measured using the Control Preferences Scale. Secondary outcomes included decisional conflict and perceptions of how much care teams involved frail older adults and caregivers in decision-making. We performed an intention-to-treat analysis. Results: A total of 311 frail older adults were included in the analysis, including 208 (66.9\%) women, with a mean age of 81.2 (SD 7.5) years. Among 339 caregivers of cognitively-impaired frail older adults, 239 (70.5\%) were female and their mean age was 66.4 (SD 11.7) years. The intervention increased the proportion of frail older adults reporting an active role in decision-making by 3.3\% (95\% CI --5.8\% to 12.4\%, P=.47) and the proportion of caregivers of cognitively-impaired frail older adults by 6.1\% (95\% CI -11.2\% to 23.4\%, P=.49). There was no significant impact on the secondary outcomes. However, the mean score for the frail older adults' perception of how much health professionals involved them in decision-making increased by 5.4 (95\% CI ?0.6 to 11.4, P=.07) and the proportion of caregivers who reported decisional conflict decreased by 7.5\% (95\% CI ?16.5\% to 1.6\%, P=.10). Conclusions: Although it slightly reduced decisional conflict for caregivers, shared decision-making training did not equip home care teams significantly better than provision of a decision aid for involving frail older adults and their caregivers in decision-making. Trial Registration: ClinicalTrials.gov NCT02592525; https://clinicaltrials.gov/show/NCT02592525 ", doi="10.2196/39386", url="https://aging.jmir.org/2022/3/e39386", url="http://www.ncbi.nlm.nih.gov/pubmed/35759791" } @Article{info:doi/10.2196/38364, author="Fraser, F. Hamish S. and Cohan, Gregory and Koehler, Christopher and Anderson, Jared and Lawrence, Alexis and Pate{\~n}a, John and Bacher, Ian and Ranney, L. Megan", title="Evaluation of Diagnostic and Triage Accuracy and Usability of a Symptom Checker in an Emergency Department: Observational Study", journal="JMIR Mhealth Uhealth", year="2022", month="Sep", day="19", volume="10", number="9", pages="e38364", keywords="mobile health", keywords="mHealth", keywords="symptom checker", keywords="diagnosis", keywords="user experience", abstract="Background: Symptom checkers are clinical decision support apps for patients, used by tens of millions of people annually. They are designed to provide diagnostic and triage advice and assist users in seeking the appropriate level of care. Little evidence is available regarding their diagnostic and triage accuracy with direct use by patients for urgent conditions. Objective: The aim of this study is to determine the diagnostic and triage accuracy and usability of a symptom checker in use by patients presenting to an emergency department (ED). Methods: We recruited a convenience sample of English-speaking patients presenting for care in an urban ED. Each consenting patient used a leading symptom checker from Ada Health before the ED evaluation. Diagnostic accuracy was evaluated by comparing the symptom checker's diagnoses and those of 3 independent emergency physicians viewing the patient-entered symptom data, with the final diagnoses from the ED evaluation. The Ada diagnoses and triage were also critiqued by the independent physicians. The patients completed a usability survey based on the Technology Acceptance Model. Results: A total of 40 (80\%) of the 50 participants approached completed the symptom checker assessment and usability survey. Their mean age was 39.3 (SD 15.9; range 18-76) years, and they were 65\% (26/40) female, 68\% (27/40) White, 48\% (19/40) Hispanic or Latino, and 13\% (5/40) Black or African American. Some cases had missing data or a lack of a clear ED diagnosis; 75\% (30/40) were included in the analysis of diagnosis, and 93\% (37/40) for triage. The sensitivity for at least one of the final ED diagnoses by Ada (based on its top 5 diagnoses) was 70\% (95\% CI 54\%-86\%), close to the mean sensitivity for the 3 physicians (on their top 3 diagnoses) of 68.9\%. The physicians rated the Ada triage decisions as 62\% (23/37) fully agree and 24\% (9/37) safe but too cautious. It was rated as unsafe and too risky in 22\% (8/37) of cases by at least one physician, in 14\% (5/37) of cases by at least two physicians, and in 5\% (2/37) of cases by all 3 physicians. Usability was rated highly; participants agreed or strongly agreed with the 7 Technology Acceptance Model usability questions with a mean score of 84.6\%, although ``satisfaction'' and ``enjoyment'' were rated low. Conclusions: This study provides preliminary evidence that a symptom checker can provide acceptable usability and diagnostic accuracy for patients with various urgent conditions. A total of 14\% (5/37) of symptom checker triage recommendations were deemed unsafe and too risky by at least two physicians based on the symptoms recorded, similar to the results of studies on telephone and nurse triage. Larger studies are needed of diagnosis and triage performance with direct patient use in different clinical environments. ", doi="10.2196/38364", url="https://mhealth.jmir.org/2022/9/e38364", url="http://www.ncbi.nlm.nih.gov/pubmed/36121688" } @Article{info:doi/10.2196/34533, author="Abdel-Hafez, Ahmad and Scott, A. Ian and Falconer, Nazanin and Canaris, Stephen and Bonilla, Oscar and Marxen, Sven and Van Garderen, Aaron and Barras, Michael", title="Predicting Therapeutic Response to Unfractionated Heparin Therapy: Machine Learning Approach", journal="Interact J Med Res", year="2022", month="Sep", day="19", volume="11", number="2", pages="e34533", keywords="heparin", keywords="activated partial thromboplastin time", keywords="aPTT", keywords="predictive modeling", keywords="machine learning", keywords="personalized medicine", abstract="Background: Unfractionated heparin (UFH) is an anticoagulant drug that is considered a high-risk medication because an excessive dose can cause bleeding, whereas an insufficient dose can lead to a recurrent embolic event. Therapeutic response to the initiation of intravenous UFH is monitored using activated partial thromboplastin time (aPTT) as a measure of blood clotting time. Clinicians iteratively adjust the dose of UFH toward a target, indication-defined therapeutic aPTT range using nomograms, but this process can be imprecise and can take ?36 hours to achieve the target range. Thus, a more efficient approach is required. Objective: In this study, we aimed to develop and validate a machine learning (ML) algorithm to predict aPTT within 12 hours after a specified bolus and maintenance dose of UFH. Methods: This was a retrospective cohort study of 3019 patient episodes of care from January 2017 to August 2020 using data collected from electronic health records of 5 hospitals in Queensland, Australia. Data from 4 hospitals were used to build and test ensemble models using cross-validation, whereas data from the fifth hospital were used for external validation. We built 2 ML models: a regression model to predict the aPTT value after a UFH bolus dose and a multiclass model to predict the aPTT, classified as subtherapeutic (aPTT <70 seconds), therapeutic (aPTT 70-100 seconds), or supratherapeutic (aPTT >100 seconds). Modeling was performed using Driverless AI (H2O), an automated ML tool, and 17 different experiments were iteratively conducted to optimize model accuracy. Results: In predicting aPTT, the best performing model was an ensemble with 4x LightGBM models with a root mean square error of 31.35 (SD 1.37). In predicting the aPTT class using a repurposed data set, the best performing ensemble model achieved an accuracy of 0.599 (SD 0.0289) and an area under the receiver operating characteristic curve of 0.735. External validation yielded similar results: root mean square error of 30.52 (SD 1.29) for the aPTT prediction model, and accuracy of 0.568 (SD 0.0315) and area under the receiver operating characteristic curve of 0.724 for the aPTT multiclassification model. Conclusions: To the best of our knowledge, this is the first ML model applied to intravenous UFH dosing that has been developed and externally validated in a multisite adult general medical and surgical inpatient setting. We present the processes of data collection, preparation, and feature engineering for replication. ", doi="10.2196/34533", url="https://www.i-jmr.org/2022/2/e34533", url="http://www.ncbi.nlm.nih.gov/pubmed/35993617" } @Article{info:doi/10.2196/34568, author="Pfeuffer, Nils and Beyer, Angelika and Penndorf, Peter and Leiz, Maren and Radicke, Franziska and Hoffmann, Wolfgang and van den Berg, Neeltje", title="Evaluation of a Health Information Exchange System for Geriatric Health Care in Rural Areas: Development and Technical Acceptance Study", journal="JMIR Hum Factors", year="2022", month="Sep", day="15", volume="9", number="3", pages="e34568", keywords="electronic health records", keywords="health information exchange", keywords="geriatrics", keywords="community-based participatory research", keywords="technical acceptance", keywords="usability", keywords="health information network", keywords="postacute care", keywords="patient-centered care", abstract="Background: Patients of geriatrics are often treated by several health care providers at the same time. The spatial, informational, and organizational separation of these health care providers can hinder the effective treatment of these patients. Objective: This study aimed to develop a regional health information exchange (HIE) system to improve HIE in geriatric treatment. This study also evaluated the usability of the regional HIE system and sought to identify barriers to and facilitators of its implementation. Methods: The development of the regional HIE system followed the community-based participatory research approach. The primary outcomes were the usability of the regional HIE system, expected implementation barriers and facilitators, and the quality of the developmental process. Data were collected and analyzed using a mixed methods approach. Results: A total of 3 focus regions were identified, 22 geriatric health care providers participated in the development of the regional HIE system, and 11 workshops were conducted between October 2019 and September 2020. In total, 12 participants responded to a questionnaire. The main results were that the regional HIE system should support the exchange of assessments, diagnoses, medication, assistive device supply, and social information. The regional HIE system was expected to be able to improve the quality and continuity of care. In total, 5 adoption facilitators were identified. The main points were adaptability of the regional HIE system to local needs, availability to different patient groups and treatment documents, web-based design, trust among the users, and computer literacy. A total of 13 barriers to adoption were identified. The main expected barriers to implementation were lack of resources, interoperability issues, computer illiteracy, lack of trust, privacy concerns, and ease-of-use issues. Conclusions: Participating health care professionals shared similar motivations for developing the regional HIE system, including improved quality of care, reduction of unnecessary examinations, and more effective health care provision. An overly complicated registration process for health care professionals and the patients' free choice of their health care providers hinder the effectiveness of the regional HIE system, resulting in incomplete patient health information. However, the web-based design of the system bridges interoperability problems that exist owing to the different technical and organizational structures of the health care facilities involved. The regional HIE system is better accepted by health care professionals who are already engaged in an interdisciplinary, geriatric-focused network. This might indicate that pre-existing cross-organizational structures and processes are prerequisites for using HIE systems. The participatory design supports the development of technologies that are adaptable to regional needs. Health care providers are interested in participating in the development of an HIE system, but they often lack the required time, knowledge, and resources. ", doi="10.2196/34568", url="https://humanfactors.jmir.org/2022/3/e34568", url="http://www.ncbi.nlm.nih.gov/pubmed/36107474" } @Article{info:doi/10.2196/37701, author="Huang, Yu-Shan and Hsu, Ching and Chune, Yu-Chang and Liao, I-Cheng and Wang, Hsin and Lin, Yi-Lin and Hwu, Wuh-Liang and Lee, Ni-Chung and Lai, Feipei", title="Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization", journal="JMIR Bioinform Biotech", year="2022", month="Sep", day="15", volume="3", number="1", pages="e37701", keywords="next-generation sequencing", keywords="genetic variation analysis", keywords="machine learning", keywords="artificial intelligence", keywords="whole-exome sequencing", abstract="Background: In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. Objective: This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. Methods: We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient's electronic medical records into our machine learning model. Results: We succeeded in locating 92.5\% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1\% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4\% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5\% (101/108). Conclusions: We successfully applied sequencing data from WES and free-text phenotypic information of patient's disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5\% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis. ", doi="10.2196/37701", url="https://bioinform.jmir.org/2022/1/e37701" } @Article{info:doi/10.2196/35675, author="Mulder, Tahar Skander and Omidvari, Amir-Houshang and Rueten-Budde, J. Anja and Huang, Pei-Hua and Kim, Ki-Hun and Bais, Babette and Rousian, Melek and Hai, Rihan and Akgun, Can and van Lennep, Roeters Jeanine and Willemsen, Sten and Rijnbeek, R. Peter and Tax, MJ David and Reinders, Marcel and Boersma, Eric and Rizopoulos, Dimitris and Visch, Valentijn and Steegers-Theunissen, R{\'e}gine", title="Dynamic Digital Twin: Diagnosis, Treatment, Prediction, and Prevention of Disease During the Life Course", journal="J Med Internet Res", year="2022", month="Sep", day="14", volume="24", number="9", pages="e35675", keywords="digital health", keywords="digital twin", keywords="machine learning", keywords="artifical intelligence", keywords="obstetrics", keywords="cardiovascular", keywords="disease", keywords="health", doi="10.2196/35675", url="https://www.jmir.org/2022/9/e35675", url="http://www.ncbi.nlm.nih.gov/pubmed/36103220" } @Article{info:doi/10.2196/37812, author="Yu, Yuncui and Zhao, Qiuye and Cao, Wang and Wang, Xiaochuan and Li, Yanming and Xie, Yuefeng and Wang, Xiaoling", title="Mining Severe Drug Hypersensitivity Reaction Cases in Pediatric Electronic Health Records: Methodology Development and Applications", journal="JMIR Med Inform", year="2022", month="Sep", day="13", volume="10", number="9", pages="e37812", keywords="drug hypersensitivity reactions", keywords="electronic health records", keywords="clinical notes", keywords="phenotyping", keywords="natural language processing", keywords="medical language processing", keywords="bidirectional encoder representation from transformers", abstract="Background: Severe drug hypersensitivity reactions (DHRs) refer to allergic reactions caused by drugs and usually present with severe skin rashes and internal damage as the main symptoms. Reporting of severe DHRs in hospitals now solely occurs through spontaneous reporting systems (SRSs), which clinicians in charge operate. An automatic identification system scrutinizes clinical notes and reports potential severe DHR cases. Objective: The goal of the research was to develop an automatic identification system for mining severe DHR cases and discover more DHR cases for further study. The proposed method was applied to 9 years of data in pediatrics electronic health records (EHRs) of Beijing Children's Hospital. Methods: The phenotyping task was approached as a document classification problem. A DHR dataset containing tagged documents for training was prepared. Each document contains all the clinical notes generated during 1 inpatient visit in this data set. Document-level tags correspond to DHR types and a negative category. Strategies were evaluated for long document classification on the openly available National NLP Clinical Challenges 2016 smoking task. Four strategies were evaluated in this work: document truncation, hierarchy representation, efficient self-attention, and key sentence selection. In-domain and open-domain pretrained embeddings were evaluated on the DHR dataset. An automatic grid search was performed to tune statistical classifiers for the best performance over the transformed data. Inference efficiency and memory requirements of the best performing models were analyzed. The most efficient model for mining DHR cases from millions of documents in the EHR system was run. Results: For long document classification, key sentence selection with guideline keywords achieved the best performance and was 9 times faster than hierarchy representation models for inference. The best model discovered 1155 DHR cases in Beijing Children's Hospital EHR system. After double-checking by clinician experts, 357 cases of severe DHRs were finally identified. For the smoking challenge, our model reached the record of state-of-the-art performance (94.1\% vs 94.2\%). Conclusions: The proposed method discovered 357 positive DHR cases from a large archive of EHR records, about 90\% of which were missed by SRSs. SRSs reported only 36 cases during the same period. The case analysis also found more suspected drugs associated with severe DHRs in pediatrics. ", doi="10.2196/37812", url="https://medinform.jmir.org/2022/9/e37812", url="http://www.ncbi.nlm.nih.gov/pubmed/36099001" } @Article{info:doi/10.2196/40064, author="Liu, W. Andrew and Odisho, Y. Anobel and Brown III, William and Gonzales, Ralph and Neinstein, B. Aaron and Judson, J. Timothy", title="Patient Experience and Feedback After Using an Electronic Health Record--Integrated COVID-19 Symptom Checker: Survey Study", journal="JMIR Hum Factors", year="2022", month="Sep", day="13", volume="9", number="3", pages="e40064", keywords="COVID-19", keywords="patient portals", keywords="digital health", keywords="diagnostic self evaluation", keywords="medical informatics", keywords="internet", keywords="telemedicine", keywords="triage", keywords="feedback", keywords="medical records systems", keywords="San Francisco", keywords="user experience", keywords="user satisfaction", keywords="self-triage", keywords="symptom checker", keywords="health system", keywords="workflow", keywords="integration", keywords="electronic health record", abstract="Background: Symptom checkers have been widely used during the COVID-19 pandemic to alleviate strain on health systems and offer patients a 24-7 self-service triage option. Although studies suggest that users may positively perceive web-based symptom checkers, no studies have quantified user feedback after use of an electronic health record--integrated COVID-19 symptom checker with self-scheduling functionality. Objective: In this paper, we aimed to understand user experience, user satisfaction, and user-reported alternatives to the use of a COVID-19 symptom checker with self-triage and self-scheduling functionality. Methods: We launched a patient-portal--based self-triage and self-scheduling tool in March 2020 for patients with COVID-19 symptoms, exposures, or questions. We made an optional, anonymous Qualtrics survey available to patients immediately after they completed the symptom checker. Results: Between December 16, 2021, and March 28, 2022, there were 395 unique responses to the survey. Overall, the respondents reported high satisfaction across all demographics, with a median rating of 8 out of 10 and 288/395 (47.6\%) of the respondents giving a rating of 9 or 10 out of 10. User satisfaction scores were not associated with any demographic factors. The most common user-reported alternatives had the web-based tool not been available were calling the COVID-19 telephone hotline and sending a patient-portal message to their physician for advice. The ability to schedule a test online was the most important symptom checker feature for the respondents. The most common categories of user feedback were regarding other COVID-19 services (eg, telephone hotline), policies, or procedures, and requesting additional features or functionality. Conclusions: This analysis suggests that COVID-19 symptom checkers with self-triage and self-scheduling functionality may have high overall user satisfaction, regardless of user demographics. By allowing users to self-triage and self-schedule tests and visits, tools such as this may prevent unnecessary calls and messages to clinicians. Individual feedback suggested that the user experience for this type of tool is highly dependent on the organization's operational workflows for COVID-19 testing and care. This study provides insight for the implementation and improvement of COVID-19 symptom checkers to ensure high user satisfaction. ", doi="10.2196/40064", url="https://humanfactors.jmir.org/2022/3/e40064", url="http://www.ncbi.nlm.nih.gov/pubmed/35960593" } @Article{info:doi/10.2196/38845, author="Monahan, Corneille Ann and Feldman, S. Sue and Fitzgerald, P. Tony", title="Reducing Crowding in Emergency Departments With Early Prediction of Hospital Admission of Adult Patients Using Biomarkers Collected at Triage: Retrospective Cohort Study", journal="JMIR Bioinform Biotech", year="2022", month="Sep", day="13", volume="3", number="1", pages="e38845", keywords="emergency care", keywords="prehospital", keywords="emergency", keywords="information system", keywords="crowding", keywords="boarding", keywords="exit block", keywords="medical informatics", keywords="application", keywords="health service research", keywords="personalized medicine", keywords="predictive medicine", keywords="model", keywords="probabilistic", keywords="polynomial model", keywords="decision support technique", keywords="decision support", keywords="evidence-based health care", keywords="management information systems", keywords="algorithm", keywords="machine learning", keywords="predict", keywords="risk", abstract="Background: Emergency department crowding continues to threaten patient safety and cause poor patient outcomes. Prior models designed to predict hospital admission have had biases. Predictive models that successfully estimate the probability of patient hospital admission would be useful in reducing or preventing emergency department ``boarding'' and hospital ``exit block'' and would reduce emergency department crowding by initiating earlier hospital admission and avoiding protracted bed procurement processes. Objective: To develop a model to predict imminent adult patient hospital admission from the emergency department early in the patient visit by utilizing existing clinical descriptors (ie, patient biomarkers) that are routinely collected at triage and captured in the hospital's electronic medical records. Biomarkers are advantageous for modeling due to their early and routine collection at triage; instantaneous availability; standardized definition, measurement, and interpretation; and their freedom from the confines of patient histories (ie, they are not affected by inaccurate patient reports on medical history, unavailable reports, or delayed report retrieval). Methods: This retrospective cohort study evaluated 1 year of consecutive data events among adult patients admitted to the emergency department and developed an algorithm that predicted which patients would require imminent hospital admission. Eight predictor variables were evaluated for their roles in the outcome of the patient emergency department visit. Logistic regression was used to model the study data. Results: The 8-predictor model included the following biomarkers: age, systolic blood pressure, diastolic blood pressure, heart rate, respiration rate, temperature, gender, and acuity level. The model used these biomarkers to identify emergency department patients who required hospital admission. Our model performed well, with good agreement between observed and predicted admissions, indicating a well-fitting and well-calibrated model that showed good ability to discriminate between patients who would and would not be admitted. Conclusions: This prediction model based on primary data identified emergency department patients with an increased risk of hospital admission. This actionable information can be used to improve patient care and hospital operations, especially by reducing emergency department crowding by looking ahead to predict which patients are likely to be admitted following triage, thereby providing needed information to initiate the complex admission and bed assignment processes much earlier in the care continuum. ", doi="10.2196/38845", url="https://bioinform.jmir.org/2022/1/e38845" } @Article{info:doi/10.2196/40387, author="Belmin, Jo{\"e}l and Villani, Patrick and Gay, Mathias and Fabries, St{\'e}phane and Havreng-Th{\'e}ry, Charlotte and Malvoisin, St{\'e}phanie and Denis, Fabrice and Veyron, Jacques-Henri", title="Real-world Implementation of an eHealth System Based on Artificial Intelligence Designed to Predict and Reduce Emergency Department Visits by Older Adults: Pragmatic Trial", journal="J Med Internet Res", year="2022", month="Sep", day="8", volume="24", number="9", pages="e40387", keywords="emergency department visits", keywords="home care aides", keywords="community-dwelling older adults", keywords="smartphone", keywords="mobile phone", keywords="predictive tool", keywords="health intervention", keywords="machine learning", keywords="predict", keywords="risk", keywords="algorithm", keywords="model", keywords="user experience", keywords="alert", keywords="monitoring", abstract="Background: Frail older people use emergency services extensively, and digital systems that monitor health remotely could be useful in reducing these visits by earlier detection of worsening health conditions. Objective: We aimed to implement a system that produces alerts when the machine learning algorithm identifies a short-term risk for an emergency department (ED) visit and examine health interventions delivered after these alerts and users' experience. This study highlights the feasibility of the general system and its performance in reducing ED visits. It also evaluates the accuracy of alerts' prediction. Methods: An uncontrolled multicenter trial was conducted in community-dwelling older adults receiving assistance from home aides (HAs). We implemented an eHealth system that produces an alert for a high risk of ED visits. After each home visit, the HAs completed a questionnaire on participants' functional status, using a smartphone app, and the information was processed in real time by a previously developed machine learning algorithm that identifies patients at risk of an ED visit within 14 days. In case of risk, the eHealth system alerted a coordinating nurse who could then inform the family carer and the patient's nurses or general practitioner. The primary outcomes were the rate of ED visits and the number of deaths after alert-triggered health interventions (ATHIs) and users' experience with the eHealth system; the secondary outcome was the accuracy of the eHealth system in predicting ED visits. Results: We included 206 patients (mean age 85, SD 8 years; 161/206, 78\% women) who received aid from 109 HAs, and the mean follow-up period was 10 months. The HAs monitored 2656 visits, which resulted in 405 alerts. Two ED visits were recorded following 131 alerts with an ATHI (2/131, 1.5\%), whereas 36 ED visits were recorded following 274 alerts that did not result in an ATHI (36/274, 13.4\%), corresponding to an odds ratio of 0.10 (95\% IC 0.02-0.43; P<.001). Five patients died during the study. All had alerts, 4 did not have an ATHI and were hospitalized, and 1 had an ATHI (P=.04). In terms of overall usability, the digital system was easy to use for 90\% (98/109) of HAs, and response time was acceptable for 89\% (98/109) of them. Conclusions: The eHealth system has been successfully implemented, was appreciated by users, and produced relevant alerts. ATHIs were associated with a lower rate of ED visits, suggesting that the eHealth system might be effective in lowering the number of ED visits in this population. Trial Registration: clinicaltrials.gov NCT05221697; https://clinicaltrials.gov/ct2/show/NCT05221697. ", doi="10.2196/40387", url="https://www.jmir.org/2022/9/e40387", url="http://www.ncbi.nlm.nih.gov/pubmed/35921685" } @Article{info:doi/10.2196/38727, author="Reis, Nogueira Zilma Silveira and Romanelli, Castro Roberta Maia de and Guimar{\~a}es, Nascimento Rodney and Gaspar, Souza Juliano de and Neves, Silveira Gabriela and do Vale, Silva Marynea and Nader, Jesus Paulo de and de Moura, Rocha Martha David and Vitral, Nogueira Gabriela Lu{\'i}za and dos Reis, Aguiar Marconi Augusto and Pereira, Mendon{\c{c}}a Marcia Margarida and Marques, Franco Patr{\'i}cia and Nader, Salgado Silvana and Harff, Luize Augusta and Beleza, Oliveira Ludmylla de and de Castro, Canellas Maria Eduarda and Souza, Guilherme Rayner and Pappa, Lobo Gisele and de Aguiar, Lopes Regina Am{\'e}lia Pessoa", title="Newborn Skin Maturity Medical Device Validation for Gestational Age Prediction: Clinical Trial", journal="J Med Internet Res", year="2022", month="Sep", day="7", volume="24", number="9", pages="e38727", keywords="gestational age", keywords="prematurity", keywords="childbirth", keywords="skin physiological phenomena", keywords="machine learning", keywords="equipment and supplies", keywords="pregnancy", keywords="reproductive health", keywords="pregnant", keywords="skin", keywords="age", keywords="medical", keywords="device", keywords="newborn", keywords="baby", keywords="trimester", keywords="therapy", keywords="learning model", keywords="ultrasound", abstract="Background: Early access to antenatal care and high-cost technologies for pregnancy dating challenge early neonatal risk assessment at birth in resource-constrained settings. To overcome the absence or inaccuracy of postnatal gestational age (GA), we developed a new medical device to assess GA based on the photobiological properties of newborns' skin and predictive models. Objective: This study aims to validate a device that uses the photobiological model of skin maturity adjusted to the clinical data to detect GA and establish its accuracy in discriminating preterm newborns. Methods: A multicenter, single-blinded, and single-arm intention-to-diagnosis clinical trial evaluated the accuracy of a novel device for the detection of GA and preterm newborns. The first-trimester ultrasound, a second comparator ultrasound, and data regarding the last menstrual period (LMP) from antenatal reports were used as references for GA at birth. The new test for validation was performed using a portable multiband reflectance photometer device that assessed the skin maturity of newborns and used machine learning models to predict GA, adjusted for birth weight and antenatal corticosteroid therapy exposure. Results: The study group comprised 702 pregnant women who gave birth to 781 newborns, of which 366 (46.9\%) were preterm newborns. As the primary outcome, the GA as predicted by the new test was in line with the reference GA that was calculated by using the intraclass correlation coefficient (0.969, 95\% CI 0.964-0.973). The paired difference between predicted and reference GAs was ?1.34 days, with Bland-Altman limits of ?21.2 to 18.4 days. As a secondary outcome, the new test achieved 66.6\% (95\% CI 62.9\%-70.1\%) agreement with the reference GA within an error of 1 week. This agreement was similar to that of comparator-LMP-GAs (64.1\%, 95\% CI 60.7\%-67.5\%). The discrimination between preterm and term newborns via the device had a similar area under the receiver operating characteristic curve (0.970, 95\% CI 0.959-0.981) compared with that for comparator-LMP-GAs (0.957, 95\% CI 0.941-0.974). In newborns with absent or unreliable LMPs (n=451), the intent-to-discriminate analysis showed correct preterm versus term classifications with the new test, which achieved an accuracy of 89.6\% (95\% CI 86.4\%-92.2\%), while the accuracy for comparator-LMP-GA was 69.6\% (95\% CI 65.3\%-73.7\%). Conclusions: The assessment of newborn's skin maturity (adjusted by learning models) promises accurate pregnancy dating at birth, even without the antenatal ultrasound reference. Thus, the novel device could add value to the set of clinical parameters that direct the delivery of neonatal care in birth scenarios where GA is unknown or unreliable. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2018-027442 ", doi="10.2196/38727", url="https://www.jmir.org/2022/9/e38727", url="http://www.ncbi.nlm.nih.gov/pubmed/36069805" } @Article{info:doi/10.2196/39681, author="Lapp, Linda and Egan, Kieren and McCann, Lisa and Mackenzie, Moira and Wales, Ann and Maguire, Roma", title="Decision Support Tools in Adult Long-term Care Facilities: Scoping Review", journal="J Med Internet Res", year="2022", month="Sep", day="6", volume="24", number="9", pages="e39681", keywords="decision support", keywords="care home", keywords="nursing home", keywords="digital health", abstract="Background: Digital innovations are yet to make real impacts in the care home sector despite the considerable potential of digital health approaches to help with continued staff shortages and to improve quality of care. To understand the current landscape of digital innovation in long-term care facilities such as nursing and care homes, it is important to find out which clinical decision support tools are currently used in long-term care facilities, what their purpose is, how they were developed, and what types of data they use. Objective: The aim of this review was to analyze studies that evaluated clinical decision support tools in long-term care facilities based on the purpose and intended users of the tools, the evidence base used to develop the tools, how the tools are used and their effectiveness, and the types of data the tools use to contribute to the existing scientific evidence to inform a roadmap for digital innovation, specifically for clinical decision support tools, in long-term care facilities. Methods: A review of the literature published between January 1, 2010, and July 21, 2021, was conducted, using key search terms in 3 scientific journal databases: PubMed, Cochrane Library, and the British Nursing Index. Only studies evaluating clinical decision support tools in long-term care facilities were included in the review. Results: In total, 17 papers were included in the final review. The clinical decision support tools described in these papers were evaluated for medication management, pressure ulcer prevention, dementia management, falls prevention, hospitalization, malnutrition prevention, urinary tract infection, and COVID-19 infection. In general, the included studies show that decision support tools can show improvements in delivery of care and in health outcomes. Conclusions: Although the studies demonstrate the potential of positive impact of clinical decision support tools, there is variability in results, in part because of the diversity of types of decision support tools, users, and contexts as well as limited validation of the tools in use and in part because of the lack of clarity in defining the whole intervention. ", doi="10.2196/39681", url="https://www.jmir.org/2022/9/e39681", url="http://www.ncbi.nlm.nih.gov/pubmed/36066928" } @Article{info:doi/10.2196/39556, author="Tougas, Hailee and Chan, Steven and Shahrvini, Tara and Gonzalez, Alvaro and Chun Reyes, Ruth and Burke Parish, Michelle and Yellowlees, Peter", title="The Use of Automated Machine Translation to Translate Figurative Language in a Clinical Setting: Analysis of a Convenience Sample of Patients Drawn From a Randomized Controlled Trial", journal="JMIR Ment Health", year="2022", month="Sep", day="6", volume="9", number="9", pages="e39556", keywords="telepsychiatry", keywords="automated machine translation", keywords="language barriers", keywords="psychiatry", keywords="assessment", keywords="automated translation", keywords="automated", keywords="translation", keywords="artificial intelligence", keywords="AI", keywords="speech recognition", keywords="limited English proficiency", keywords="LEP", keywords="asynchronous telepsychiatry", keywords="ATP", keywords="automated speech recognition", keywords="ASR", keywords="AMT", keywords="figurative language device", keywords="FLD", keywords="language concordant", keywords="language discordant", keywords="AI interpretation", abstract="Background: Patients with limited English proficiency frequently receive substandard health care. Asynchronous telepsychiatry (ATP) has been established as a clinically valid method for psychiatric assessments. The addition of automated speech recognition (ASR) and automated machine translation (AMT) technologies to asynchronous telepsychiatry may be a viable artificial intelligence (AI)--language interpretation option. Objective: This project measures the frequency and accuracy of the translation of figurative language devices (FLDs) and patient word count per minute, in a subset of psychiatric interviews from a larger trial, as an approximation to patient speech complexity and quantity in clinical encounters that require interpretation. Methods: A total of 6 patients were selected from the original trial, where they had undergone 2 assessments, once by an English-speaking psychiatrist through a Spanish-speaking human interpreter and once in Spanish by a trained mental health interviewer-researcher with AI interpretation. 3 (50\%) of the 6 selected patients were interviewed via videoconferencing because of the COVID-19 pandemic. Interview transcripts were created by automated speech recognition with manual corrections for transcriptional accuracy and assessment for translational accuracy of FLDs. Results: AI-interpreted interviews were found to have a significant increase in the use of FLDs and patient word count per minute. Both human and AI-interpreted FLDs were frequently translated inaccurately, however FLD translation may be more accurate on videoconferencing. Conclusions: AI interpretation is currently not sufficiently accurate for use in clinical settings. However, this study suggests that alternatives to human interpretation are needed to circumvent modifications to patients' speech. While AI interpretation technologies are being further developed, using videoconferencing for human interpreting may be more accurate than in-person interpreting. Trial Registration: ClinicalTrials.gov NCT03538860; https://clinicaltrials.gov/ct2/show/NCT03538860 ", doi="10.2196/39556", url="https://mental.jmir.org/2022/9/e39556", url="http://www.ncbi.nlm.nih.gov/pubmed/36066959" } @Article{info:doi/10.2196/39235, author="Cook, Lily and Espinoza, Juan and Weiskopf, G. Nicole and Mathews, Nisha and Dorr, A. David and Gonzales, L. Kelly and Wilcox, Adam and Madlock-Brown, Charisse and ", title="Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave", journal="JMIR Med Inform", year="2022", month="Sep", day="6", volume="10", number="9", pages="e39235", keywords="social determinants of health", keywords="health equity", keywords="bias", keywords="data quality", keywords="data harmonization", keywords="data standards", keywords="terminology", keywords="data aggregation", abstract="Background: The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations. Objective: This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database. Methods: At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as ``Declined'' were grouped with ``Refused,'' and ``Multiple Race'' was grouped with ``Two or more races'' and ``Multiracial.'' Results: ``No matching concept'' was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7\% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6\%), Black or African American and Hispanic/Latino patients were overrepresented in this category. Conclusions: Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy. ", doi="10.2196/39235", url="https://medinform.jmir.org/2022/9/e39235", url="http://www.ncbi.nlm.nih.gov/pubmed/35917481" } @Article{info:doi/10.2196/38385, author="Yan, Xiaowei and Husby, Hannah and Mudiganti, Satish and Gbotoe, Madina and Delatorre-Reimer, Jake and Knobel, Kevin and Hudnut, Andrew and Jones, B. J.", title="Evaluating the Impact of a Point-of-Care Cardiometabolic Clinical Decision Support Tool on Clinical Efficiency Using Electronic Health Record Audit Log Data: Algorithm Development and Validation", journal="JMIR Med Inform", year="2022", month="Sep", day="6", volume="10", number="9", pages="e38385", keywords="digital health", keywords="electronic health record", keywords="EHR audit logs", keywords="workflow efficiency", keywords="cardiometabolic conditions", abstract="Background: Electronic health record (EHR) systems are becoming increasingly complicated, leading to concerns about rising physician burnout, particularly for primary care physicians (PCPs). Managing the most common cardiometabolic chronic conditions by PCPs during a limited clinical time with a patient is challenging. Objective: This study aimed to evaluate a Cardiometabolic Sutter Health Advanced Reengineered Encounter (CM-SHARE), a web-based application to visualize key EHR data, on the EHR use efficiency. Methods: We developed algorithms to identify key clinic workflow measures (eg, total encounter time, total physician time in the examination room, and physician EHR time in the examination room) using audit data, and we validated and calibrated the measures with time-motion data. We used a pre-post parallel design to identify propensity score--matched CM-SHARE users (cases), nonusers (controls), and nested-matched patients. Cardiometabolic encounters from matched case and control patients were used for the workflow evaluation. Outcome measures were compared between the cases and controls. We applied this approach separately to both the CM-SHARE pilot and spread phases. Results: Time-motion observation was conducted on 101 primary care encounters for 9 PCPs in 3 clinics. There was little difference (<0.8 minutes) between the audit data--derived workflow measures and the time-motion observation. Two key unobservable times from audit data, physician entry into and exiting the examination room, were imputed based on time-motion studies. CM-SHARE was launched with 6 pilot PCPs in April 2016. During the prestudy period (April 1, 2015, to April 1, 2016), 870 control patients with 2845 encounters were matched with 870 case patients and encounters, and 727 case patients with 852 encounters were matched with 727 control patients and 3754 encounters in the poststudy period (June 1, 2016, to June 30, 2017). Total encounter time was slightly shorter (mean ?2.7, SD 1.4 minutes, 95\% CI ?4.7 to ?0.9; mean --1.6, SD 1.1 minutes, 95\% CI ?3.2 to ?0.1) for cases than controls for both periods. CM-SHARE saves physicians approximately 2 minutes EHR time in the examination room (mean ?2.0, SD 1.3, 95\% CI ?3.4 to ?0.9) compared with prestudy period and poststudy period controls (mean ?1.9, SD 0.9, 95\% CI ?3.8 to ?0.5). In the spread phase, 48 CM-SHARE spread PCPs were matched with 84 control PCPs and 1272 cases with 3412 control patients, having 1119 and 4240 encounters, respectively. A significant reduction in total encounter time for the CM-SHARE group was observed for short appointments (?20 minutes; 5.3-minute reduction on average) only. Total physician EHR time was significantly reduced for both longer and shorter appointments (17\%-33\% reductions). Conclusions: Combining EHR audit log files and clinical information, our approach offers an innovative and scalable method and new measures that can be used to evaluate clinical EHR efficiency of digital tools used in clinical settings. ", doi="10.2196/38385", url="https://medinform.jmir.org/2022/9/e38385", url="http://www.ncbi.nlm.nih.gov/pubmed/36066940" } @Article{info:doi/10.2196/37896, author="Xie, Fagen and Khadka, Nehaa and Fassett, J. Michael and Chiu, Y. Vicki and Avila, C. Chantal and Shi, Jiaxiao and Yeh, Meiyu and Kawatkar, Aniket and Mensah, A. Nana and Sacks, A. David and Getahun, Darios", title="Identification of Preterm Labor Evaluation Visits and Extraction of Cervical Length Measures from Electronic Health Records Within a Large Integrated Health Care System: Algorithm Development and Validation", journal="JMIR Med Inform", year="2022", month="Sep", day="6", volume="10", number="9", pages="e37896", keywords="preterm labor", keywords="preterm birth", keywords="fetal fibronectin", keywords="transvaginal ultrasound", keywords="cervical length", keywords="natural language processing", keywords="computerized algorithm", keywords="data extraction", keywords="patient records", keywords="clinical notes", keywords="evaluation notes", keywords="patient care", keywords="patient notes", keywords="electronic health records", abstract="Background: Preterm birth (PTB) represents a significant public health problem in the United States and throughout the world. Accurate identification of preterm labor (PTL) evaluation visits is the first step in conducting PTB-related research. Objective: We aimed to develop a validated computerized algorithm to identify PTL evaluation visits and extract cervical length (CL) measures from electronic health records (EHRs) within a large integrated health care system. Methods: We used data extracted from the EHRs at Kaiser Permanente Southern California between 2009 and 2020. First, we identified triage and hospital encounters with fetal fibronectin (fFN) tests, transvaginal ultrasound (TVUS) procedures, PTL medications, or PTL diagnosis codes within 240/7-346/7 gestational weeks. Second, clinical notes associated with triage and hospital encounters within 240/7-346/7 gestational weeks were extracted from EHRs. A computerized algorithm and an automated process were developed and refined by multiple iterations of chart review and adjudication to search the following PTL indicators: fFN tests, TVUS procedures, abdominal pain, uterine contractions, PTL medications, and descriptions of PTL evaluations. An additional process was constructed to extract the CLs from the corresponding clinical notes of these identified PTL evaluation visits. Results: A total of 441,673 live birth pregnancies were identified between 2009 and 2020. Of these, 103,139 pregnancies (23.35\%) had documented PTL evaluation visits identified by the computerized algorithm. The trend of pregnancies with PTL evaluation visits slightly decreased from 24.41\% (2009) to 17.42\% (2020). Of the first 103,139 PTL visits, 19,439 (18.85\%) and 44,423 (43.97\%) had an fFN test and a TVUS, respectively. The percentage of first PTL visits with an fFN test decreased from 18.06\% at 240/7 gestational weeks to 2.32\% at 346/7 gestational weeks, and TVUS from 54.67\% at 240/7 gestational weeks to 12.05\% in 346/7 gestational weeks. The mean (SD) of the CL was 3.66 (0.99) cm with a mean range of 3.61-3.69 cm that remained stable across the study period. Of the pregnancies with PTL evaluation visits, the rate of PTB remained stable over time (20,399, 19.78\%). Validation of the computerized algorithms against 100 randomly selected records from these potential PTL visits showed positive predictive values of 97\%, 94.44\%, 100\%, and 96.43\% for the PTL evaluation visits, fFN tests, TVUS, and CL, respectively, along with sensitivity values of 100\%, 90\%, and 90\%, and specificity values of 98.8\%, 100\%, and 98.6\% for the fFN test, TVUS, and CL, respectively. Conclusions: The developed computerized algorithm effectively identified PTL evaluation visits and extracted the corresponding CL measures from the EHRs. Validation against this algorithm achieved a high level of accuracy. This computerized algorithm can be used for conducting PTL- or PTB-related pharmacoepidemiologic studies and patient care reviews. ", doi="10.2196/37896", url="https://medinform.jmir.org/2022/9/e37896", url="http://www.ncbi.nlm.nih.gov/pubmed/36066930" } @Article{info:doi/10.2196/38414, author="Weng, Heng and Chen, Jielong and Ou, Aihua and Lao, Yingrong", title="Leveraging Representation Learning for the Construction and Application of a Knowledge Graph for Traditional Chinese Medicine: Framework Development Study", journal="JMIR Med Inform", year="2022", month="Sep", day="2", volume="10", number="9", pages="e38414", keywords="knowledge graph", keywords="knowledge embedding", keywords="traditional Chinese medicine", keywords="knowledge discovery", keywords="medicine", keywords="clinical", keywords="framework", abstract="Background: Knowledge discovery from treatment data records from Chinese physicians is a dramatic challenge in the application of artificial intelligence (AI) models to the research of traditional Chinese medicine (TCM). Objective: This paper aims to construct a TCM knowledge graph (KG) from Chinese physicians and apply it to the decision-making related to diagnosis and treatment in TCM. Methods: A new framework leveraging a representation learning method for TCM KG construction and application was designed. A transformer-based Contextualized Knowledge Graph Embedding (CoKE) model was applied to KG representation learning and knowledge distillation. Automatic identification and expansion of multihop relations were integrated with the CoKE model as a pipeline. Based on the framework, a TCM KG containing 59,882 entities (eg, diseases, symptoms, examinations, drugs), 17 relations, and 604,700 triples was constructed. The framework was validated through a link predication task. Results: Experiments showed that the framework outperforms a set of baseline models in the link prediction task using the standard metrics mean reciprocal rank (MRR) and Hits@N. The knowledge graph embedding (KGE) multitagged TCM discriminative diagnosis metrics also indicated the improvement of our framework compared with the baseline models. Conclusions: Experiments showed that the clinical KG representation learning and application framework is effective for knowledge discovery and decision-making assistance in diagnosis and treatment. Our framework shows superiority of application prospects in tasks such as KG-fused multimodal information diagnosis, KGE-based text classification, and knowledge inference--based medical question answering. ", doi="10.2196/38414", url="https://medinform.jmir.org/2022/9/e38414", url="http://www.ncbi.nlm.nih.gov/pubmed/36053574" } @Article{info:doi/10.2196/41451, author="Fogarty, Justin and Siriruchatanon, Mutita and Makarov, Danil and Langford, Aisha and Kang, Stella", title="An Evaluation of a Web-Based Decision Aid for Treatment Planning of Small Kidney Tumors: Pilot Randomized Controlled Trial", journal="JMIR Res Protoc", year="2022", month="Sep", day="2", volume="11", number="9", pages="e41451", keywords="small kidney mass", keywords="decision aid", keywords="renal tumor", keywords="randomized controlled trial", keywords="shared decision-making", keywords="decisional conflict", abstract="Background: Surgery is the most common treatment for localized small kidney masses (SKMs) up to 4 cm, despite a lack of evidence for improved overall survival. Nonsurgical management options are gaining recognition, as evidence supports the indolence of most SKMs. Decision aids (DAs) have been shown to improve patient comprehension of the trade-offs of treatment options and overall decision quality, and may improve consideration of all major options according to individual health priorities and preferences. Objective: This pilot randomized controlled trial (RCT) primarily aims to evaluate the impact of a new web-based DA on treatment decisions for patients with SKM; that is, selection of surgical versus nonsurgical treatment options. Secondary objectives include an assessment of decision-making outcomes: decisional conflict, decision satisfaction, and an understanding of individual preferences for treatment that incorporate the trade-offs associated with surgical versus nonsurgical interventions. Methods: Three phases comprise the construction and evaluation of a new web-based DA on SKM treatment. In phase 1, this DA was developed in print format through a multidisciplinary design committee incorporating patient focus groups. Phase 2 was an observational study on patient knowledge and decision-making measures after randomization to receive the printed DA or institutional educational materials, which identified further educational needs applied to a web-based DA. Phase 3 will preliminarily evaluate the web-based DA: in a pilot RCT, 50 adults diagnosed with SKMs will receive the web-based DA or an existing web-based institutional website at urology clinics at a large academic medical center. The web-based DA applies risk communication and information about diagnosis and treatment options, elicits preferences regarding treatment options, and provides a set of options to consider with their doctor based on a decision-analytic model of benefits/harm analysis that accounts for comorbidity, age group, and tumor features. Questionnaires and treatment decision data will be gathered before and after viewing the educational material. Results: This phase will consist of a pilot RCT from August 2022 to January 2023 to establish feasibility and preliminarily evaluate decision outcomes. Previous study phases from 2018 to 2020 supported the feasibility of providing the printed DA in urology clinics before clinical consultation and demonstrated increased patient knowledge about the diagnosis and treatment options and greater likelihood of favoring nonsurgical treatment just before consultation. This study was funded by the National Cancer Institute. Recruitment will begin in August 2022. Conclusions: A web-based DA has been designed to address educational needs for patients making treatment decisions for SKM, accounting for comorbidities and treatment-related benefits and risks. Outcomes from the pilot trial will evaluate the potential of a web-based DA in personalizing treatment decisions and in helping patients weigh attributes of surgical versus nonsurgical treatment options for their SKMs. Trial Registration: ClinicalTrials.gov NCT05387863; https://clinicaltrials.gov/ct2/show/NCT05387863 International Registered Report Identifier (IRRID): PRR1-10.2196/41451 ", doi="10.2196/41451", url="https://www.researchprotocols.org/2022/9/e41451", url="http://www.ncbi.nlm.nih.gov/pubmed/36053558" } @Article{info:doi/10.2196/39782, author="Seino, Yusuke and Sato, Nobuo and Idei, Masafumi and Nomura, Takeshi", title="The Reduction in Medical Errors on Implementing an Intensive Care Information System in a Setting Where a Hospital Electronic Medical Record System is Already in Use: Retrospective Analysis", journal="JMIR Perioper Med", year="2022", month="Aug", day="31", volume="5", number="1", pages="e39782", keywords="clinical information system", keywords="electronic medical record", keywords="intensive care unit", keywords="medical error", abstract="Background: Although the various advantages of clinical information systems in intensive care units (ICUs), such as intensive care information systems (ICISs), have been reported, their role in preventing medical errors remains unclear. Objective: This study aimed to investigate the changes in the incidence and type of errors in the ICU before and after ICIS implementation in a setting where a hospital electronic medical record system is already in use. Methods: An ICIS was introduced to the general ICU of a university hospital. After a step-by-step implementation lasting 3 months, the ICIS was used for all patients starting from April 2019. We performed a retrospective analysis of the errors in the ICU during the 6-month period before and after ICIS implementation by using data from an incident reporting system, and the number, incidence rate, type, and patient outcome level of errors were determined. Results: From April 2018 to September 2018, 755 patients were admitted to the ICU, and 719 patients were admitted from April 2019 to September 2019. The number of errors was 153 in the 2018 study period and 71 in the 2019 study period. The error incidence rates in 2018 and 2019 were 54.1 (95\% CI 45.9-63.4) and 27.3 (95\% CI 21.3-34.4) events per 1000 patient-days, respectively (P<.001). During both periods, there were no significant changes in the composition of the types of errors (P=.16), and the most common type of error was medication error. Conclusions: ICIS implementation was temporally associated with a 50\% reduction in the number and incidence rate of errors in the ICU. Although the most common type of error was medication error in both study periods, ICIS implementation significantly reduced the number and incidence rate of medication errors. Trial Registration: University Hospital Medical Information Network Clinical Trials Registry UMIN000041471; https://center6.umin.ac.jp/cgi-open-bin/ctr\_e/ctr\_view.cgi?recptno=R000047345 ", doi="10.2196/39782", url="https://periop.jmir.org/2022/1/e39782", url="http://www.ncbi.nlm.nih.gov/pubmed/35964333" } @Article{info:doi/10.2196/37531, author="Escal{\'e}-Besa, Anna and Fuster-Casanovas, A{\"i}na and B{\"o}rve, Alexander and Y{\'e}lamos, Oriol and Fust{\`a}-Novell, Xavier and Esquius Rafat, Mireia and Marin-Gomez, X. Francesc and Vidal-Alaball, Josep", title="Using Artificial Intelligence as a Diagnostic Decision Support Tool in Skin Disease: Protocol for an Observational Prospective Cohort Study", journal="JMIR Res Protoc", year="2022", month="Aug", day="31", volume="11", number="8", pages="e37531", keywords="machine learning", keywords="artificial intelligence", keywords="data accuracy", keywords="computer-assisted diagnosis", keywords="neural network computer", keywords="support tool", keywords="skin disease", keywords="cohort study", keywords="dermatology", abstract="Background: Dermatological conditions are a relevant health problem. Each person has an average of 1.6 skin diseases per year, and consultations for skin pathology represent 20\% of the total annual visits to primary care and around 35\% are referred to a dermatology specialist. Machine learning (ML) models can be a good tool to help primary care professionals, as it can analyze and optimize complex sets of data. In addition, ML models are increasingly being applied to dermatology as a diagnostic decision support tool using image analysis, especially for skin cancer detection and classification. Objective: This study aims to perform a prospective validation of an image analysis ML model as a diagnostic decision support tool for the diagnosis of dermatological conditions. Methods: In this prospective study, 100 consecutive patients who visit a participant general practitioner (GP) with a skin problem in central Catalonia were recruited. Data collection was planned to last 7 months. Anonymized pictures of skin diseases were taken and introduced to the ML model interface (capable of screening for 44 different skin diseases), which returned the top 5 diagnoses by probability. The same image was also sent as a teledermatology consultation following the current stablished workflow. The GP, ML model, and dermatologist's assessments will be compared to calculate the precision, sensitivity, specificity, and accuracy of the ML model. The results will be represented globally and individually for each skin disease class using a confusion matrix and one-versus-all methodology. The time taken to make the diagnosis will also be taken into consideration. Results: Patient recruitment began in June 2021 and lasted for 5 months. Currently, all patients have been recruited and the images have been shown to the GPs and dermatologists. The analysis of the results has already started. Conclusions: This study will provide information about ML models' effectiveness and limitations. External testing is essential for regulating these diagnostic systems to deploy ML models in a primary care practice setting. ", doi="10.2196/37531", url="https://www.researchprotocols.org/2022/8/e37531", url="http://www.ncbi.nlm.nih.gov/pubmed/36044249" } @Article{info:doi/10.2196/38352, author="Lowe, Cabella and Browne, Mitchell and Marsh, William and Morrissey, Dylan", title="Usability Testing of a Digital Assessment Routing Tool for Musculoskeletal Disorders: Iterative, Convergent Mixed Methods Study", journal="J Med Internet Res", year="2022", month="Aug", day="30", volume="24", number="8", pages="e38352", keywords="mobile health", keywords="mHealth", keywords="eHealth", keywords="digital health", keywords="digital technology", keywords="musculoskeletal", keywords="triage", keywords="physiotherapy triage", keywords="usability", keywords="acceptability", keywords="mobile phone", abstract="Background: Musculoskeletal disorders negatively affect millions of patients worldwide, placing significant demand on health care systems. Digital technologies that improve clinical outcomes and efficiency across the care pathway are development priorities. We developed the musculoskeletal Digital Assessment Routing Tool (DART) to enable self-assessment and immediate direction to the right care. Objective: We aimed to assess and resolve all serious DART usability issues to create a positive user experience and enhance system adoption before conducting randomized controlled trials for the integration of DART into musculoskeletal management pathways. Methods: An iterative, convergent mixed methods design was used, with 22 adult participants assessing 50 different clinical presentations over 5 testing rounds across 4 DART iterations. Participants were recruited using purposive sampling, with quotas for age, habitual internet use, and English-language ability. Quantitative data collection was defined by the constructs within the International Organization for Standardization 9241-210-2019 standard, with user satisfaction measured by the System Usability Scale. Study end points were resolution of all grade 1 and 2 usability problems and a mean System Usability Scale score of ?80 across a minimum of 3 user group sessions. Results: All participants (mean age 48.6, SD 15.2; range 20-77 years) completed the study. Every assessment resulted in a recommendation with no DART system errors and a mean completion time of 5.2 (SD 4.44, range 1-18) minutes. Usability problems were reduced from 12 to 0, with trust and intention to act improving during the study. The relationship between eHealth literacy and age, as explored with a scatter plot and calculation of the Pearson correlation coefficient, was performed for all participants (r=--0.2; 20/22, 91\%) and repeated with a potential outlier removed (r=--0.23), with no meaningful relationships observed or found for either. The mean satisfaction for daily internet users was highest (19/22, 86\%; mean 86.5, SD 4.48; 90\% confidence level [CL] 1.78 or --1.78), with nonnative English speakers (6/22, 27\%; mean 78.1, SD 4.60; 90\% CL 3.79 or --3.79) and infrequent internet users scoring the lowest (3/22, 14\%; mean 70.8, SD 5.44; 90\% CL 9.17 or --9.17), although the CIs overlap. The mean score across all groups was 84.3 (SD 4.67), corresponding to an excellent system, with qualitative data from all participants confirming that DART was simple to use. Conclusions: All serious DART usability issues were resolved, and a good level of satisfaction, trust, and willingness to act on the DART recommendation was demonstrated, thus allowing progression to randomized controlled trials that assess safety and effectiveness against usual care comparators. The iterative, convergent mixed methods design proved highly effective in fully evaluating DART from a user perspective and could provide a blueprint for other researchers of mobile health systems. International Registered Report Identifier (IRRID): RR2-10.2196/27205 ", doi="10.2196/38352", url="https://www.jmir.org/2022/8/e38352", url="http://www.ncbi.nlm.nih.gov/pubmed/36040787" } @Article{info:doi/10.2196/40384, author="Noori, Ayush and Magdamo, Colin and Liu, Xiao and Tyagi, Tanish and Li, Zhaozhi and Kondepudi, Akhil and Alabsi, Haitham and Rudmann, Emily and Wilcox, Douglas and Brenner, Laura and Robbins, K. Gregory and Moura, Lidia and Zafar, Sahar and Benson, M. Nicole and Hsu, John and R Dickson, John and Serrano-Pozo, Alberto and Hyman, T. Bradley and Blacker, Deborah and Westover, Brandon M. and Mukerji, S. Shibani and Das, Sudeshna", title="Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study", journal="J Med Internet Res", year="2022", month="Aug", day="30", volume="24", number="8", pages="e40384", keywords="chart review", keywords="cognition", keywords="cognitive status", keywords="dementia", keywords="diagnostic", keywords="electronic health record", keywords="health care", keywords="natural language processing", keywords="research cohort", abstract="Background: Electronic health records (EHRs) with large sample sizes and rich information offer great potential for dementia research, but current methods of phenotyping cognitive status are not scalable. Objective: The aim of this study was to evaluate whether natural language processing (NLP)--powered semiautomated annotation can improve the speed and interrater reliability of chart reviews for phenotyping cognitive status. Methods: In this diagnostic study, we developed and evaluated a semiautomated NLP-powered annotation tool (NAT) to facilitate phenotyping of cognitive status. Clinical experts adjudicated the cognitive status of 627 patients at Mass General Brigham (MGB) health care, using NAT or traditional chart reviews. Patient charts contained EHR data from two data sets: (1) records from January 1, 2017, to December 31, 2018, for 100 Medicare beneficiaries from the MGB Accountable Care Organization and (2) records from 2 years prior to COVID-19 diagnosis to the date of COVID-19 diagnosis for 527 MGB patients. All EHR data from the relevant period were extracted; diagnosis codes, medications, and laboratory test values were processed and summarized; clinical notes were processed through an NLP pipeline; and a web tool was developed to present an integrated view of all data. Cognitive status was rated as cognitively normal, cognitively impaired, or undetermined. Assessment time and interrater agreement of NAT compared to manual chart reviews for cognitive status phenotyping was evaluated. Results: NAT adjudication provided higher interrater agreement (Cohen $\kappa$=0.89 vs $\kappa$=0.80) and significant speed up (time difference mean 1.4, SD 1.3 minutes; P<.001; ratio median 2.2, min-max 0.4-20) over manual chart reviews. There was moderate agreement with manual chart reviews (Cohen $\kappa$=0.67). In the cases that exhibited disagreement with manual chart reviews, NAT adjudication was able to produce assessments that had broader clinical consensus due to its integrated view of highlighted relevant information and semiautomated NLP features. Conclusions: NAT adjudication improves the speed and interrater reliability for phenotyping cognitive status compared to manual chart reviews. This study underscores the potential of an NLP-based clinically adjudicated method to build large-scale dementia research cohorts from EHRs. ", doi="10.2196/40384", url="https://www.jmir.org/2022/8/e40384", url="http://www.ncbi.nlm.nih.gov/pubmed/36040790" } @Article{info:doi/10.2196/38943, author="Choudhary, Soumya and Thomas, Nikita and Alshamrani, Sultan and Srinivasan, Girish and Ellenberger, Janine and Nawaz, Usman and Cohen, Roy", title="A Machine Learning Approach for Continuous Mining of Nonidentifiable Smartphone Data to Create a Novel Digital Biomarker Detecting Generalized Anxiety Disorder: Prospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Aug", day="30", volume="10", number="8", pages="e38943", keywords="digital phenotyping", keywords="machine learning", keywords="mental health", keywords="profiling metric", keywords="smartphone data", keywords="anxiety assessment", keywords="mining technique", keywords="algorithm prediction", keywords="digital marker", keywords="behavioral marker", keywords="anxiety", abstract="Background: Anxiety is one of the leading causes of mental health disability around the world. Currently, a majority of the population who experience anxiety go undiagnosed or untreated. New and innovative ways of diagnosing and monitoring anxiety have emerged using smartphone sensor--based monitoring as a metric for the management of anxiety. This is a novel study as it adds to the field of research through the use of nonidentifiable smartphone usage to help detect and monitor anxiety remotely and in a continuous and passive manner. Objective: This study aims to evaluate the accuracy of a novel mental behavioral profiling metric derived from smartphone usage for the identification and tracking of generalized anxiety disorder (GAD). Methods: Smartphone data and self-reported 7-item GAD anxiety assessments were collected from 229 participants using an Android operating system smartphone in an observational study over an average of 14 days (SD 29.8). A total of 34 features were mined to be constructed as a potential digital phenotyping marker from continuous smartphone usage data. We further analyzed the correlation of these digital behavioral markers against each item of the 7-item Generalized Anxiety Disorder Scale (GAD-7) and its influence on the predictions of machine learning algorithms. Results: A total of 229 participants were recruited in this study who had completed the GAD-7 assessment and had at least one set of passive digital data collected within a 24-hour period. The mean GAD-7 score was 11.8 (SD 5.7). Regression modeling was tested against classification modeling and the highest prediction accuracy was achieved from a binary XGBoost classification model (precision of 73\%-81\%; recall of 68\%-87\%; F1-score of 71\%-79\%; accuracy of 76\%; area under the curve of 80\%). Nonparametric permutation testing with Pearson correlation results indicated that the proposed metric (Mental Health Similarity Score [MHSS]) had a colinear relationship between GAD-7 Items 1, 3 and 7. Conclusions: The proposed MHSS metric demonstrates the feasibility of using passively collected nonintrusive smartphone data and machine learning--based data mining techniques to track an individuals' daily anxiety levels with a 76\% accuracy that directly relates to the GAD-7 scale. ", doi="10.2196/38943", url="https://medinform.jmir.org/2022/8/e38943", url="http://www.ncbi.nlm.nih.gov/pubmed/36040777" } @Article{info:doi/10.2196/37578, author="Gopukumar, Deepika and Ghoshal, Abhijeet and Zhao, Huimin", title="Predicting Readmission Charges Billed by Hospitals: Machine Learning Approach", journal="JMIR Med Inform", year="2022", month="Aug", day="30", volume="10", number="8", pages="e37578", keywords="readmission charges", keywords="readmission analytics", keywords="predictive models", keywords="machine learning", keywords="readmissions", keywords="predictive analytics", abstract="Background: The Centers for Medicare and Medicaid Services projects that health care costs will continue to grow over the next few years. Rising readmission costs contribute significantly to increasing health care costs. Multiple areas of health care, including readmissions, have benefited from the application of various machine learning algorithms in several ways. Objective: We aimed to identify suitable models for predicting readmission charges billed by hospitals. Our literature review revealed that this application of machine learning is underexplored. We used various predictive methods, ranging from glass-box models (such as regularization techniques) to black-box models (such as deep learning--based models). Methods: We defined readmissions as readmission with the same major diagnostic category (RSDC) and all-cause readmission category (RADC). For these readmission categories, 576,701 and 1,091,580 individuals, respectively, were identified from the Nationwide Readmission Database of the Healthcare Cost and Utilization Project by the Agency for Healthcare Research and Quality for 2013. Linear regression, lasso regression, elastic net, ridge regression, eXtreme gradient boosting (XGBoost), and a deep learning model based on multilayer perceptron (MLP) were the 6 machine learning algorithms we tested for RSDC and RADC through 10-fold cross-validation. Results: Our preliminary analysis using a data-driven approach revealed that within RADC, the subsequent readmission charge billed per patient was higher than the previous charge for 541,090 individuals, and this number was 319,233 for RSDC. The top 3 major diagnostic categories (MDCs) for such instances were the same for RADC and RSDC. The average readmission charge billed was higher than the previous charge for 21 of the MDCs in the case of RSDC, whereas it was only for 13 of the MDCs in RADC. We recommend XGBoost and the deep learning model based on MLP for predicting readmission charges. The following performance metrics were obtained for XGBoost: (1) RADC (mean absolute percentage error [MAPE]=3.121\%; root mean squared error [RMSE]=0.414; mean absolute error [MAE]=0.317; root relative squared error [RRSE]=0.410; relative absolute error [RAE]=0.399; normalized RMSE [NRMSE]=0.040; mean absolute deviation [MAD]=0.031) and (2) RSDC (MAPE=3.171\%; RMSE=0.421; MAE=0.321; RRSE=0.407; RAE=0.393; NRMSE=0.041; MAD=0.031). The performance obtained for MLP-based deep neural networks are as follows: (1) RADC (MAPE=3.103\%; RMSE=0.413; MAE=0.316; RRSE=0.410; RAE=0.397; NRMSE=0.040; MAD=0.031) and (2) RSDC (MAPE=3.202\%; RMSE=0.427; MAE=0.326; RRSE=0.413; RAE=0.399; NRMSE=0.041; MAD=0.032). Repeated measures ANOVA revealed that the mean RMSE differed significantly across models with P<.001. Post hoc tests using the Bonferroni correction method indicated that the mean RMSE of the deep learning/XGBoost models was statistically significantly (P<.001) lower than that of all other models, namely linear regression/elastic net/lasso/ridge regression. Conclusions: Models built using XGBoost and MLP are suitable for predicting readmission charges billed by hospitals. The MDCs allow models to accurately predict hospital readmission charges. ", doi="10.2196/37578", url="https://medinform.jmir.org/2022/8/e37578", url="http://www.ncbi.nlm.nih.gov/pubmed/35896038" } @Article{info:doi/10.2196/37850, author="Xu, Xianglong and Yu, Zhen and Ge, Zongyuan and Chow, F. Eric P. and Bao, Yining and Ong, J. Jason and Li, Wei and Wu, Jinrong and Fairley, K. Christopher and Zhang, Lei", title="Web-Based Risk Prediction Tool for an Individual's Risk of HIV and Sexually Transmitted Infections Using Machine Learning Algorithms: Development and External Validation Study", journal="J Med Internet Res", year="2022", month="Aug", day="25", volume="24", number="8", pages="e37850", keywords="HIV", keywords="sexually transmitted infections", keywords="syphilis", keywords="gonorrhea", keywords="chlamydia", keywords="sexual health", keywords="sexual transmission", keywords="sexually transmitted", keywords="prediction", keywords="web-based", keywords="risk assessment", keywords="machine learning", keywords="model", keywords="algorithm", keywords="predictive", keywords="risk", keywords="development", keywords="validation", abstract="Background: HIV and sexually transmitted infections (STIs) are major global public health concerns. Over 1 million curable STIs occur every day among people aged 15 years to 49 years worldwide. Insufficient testing or screening substantially impedes the elimination of HIV and STI transmission. Objective: The aim of our study was to develop an HIV and STI risk prediction tool using machine learning algorithms. Methods: We used clinic consultations that tested for HIV and STIs at the Melbourne Sexual Health Centre between March 2, 2015, and December 31, 2018, as the development data set (training and testing data set). We also used 2 external validation data sets, including data from 2019 as external ``validation data 1'' and data from January 2020 and January 2021 as external ``validation data 2.'' We developed 34 machine learning models to assess the risk of acquiring HIV, syphilis, gonorrhea, and chlamydia. We created an online tool to generate an individual's risk of HIV or an STI. Results: The important predictors for HIV and STI risk were gender, age, men who reported having sex with men, number of casual sexual partners, and condom use. Our machine learning--based risk prediction tool, named MySTIRisk, performed at an acceptable or excellent level on testing data sets (area under the curve [AUC] for HIV=0.78; AUC for syphilis=0.84; AUC for gonorrhea=0.78; AUC for chlamydia=0.70) and had stable performance on both external validation data from 2019 (AUC for HIV=0.79; AUC for syphilis=0.85; AUC for gonorrhea=0.81; AUC for chlamydia=0.69) and data from 2020-2021 (AUC for HIV=0.71; AUC for syphilis=0.84; AUC for gonorrhea=0.79; AUC for chlamydia=0.69). Conclusions: Our web-based risk prediction tool could accurately predict the risk of HIV and STIs for clinic attendees using simple self-reported questions. MySTIRisk could serve as an HIV and STI screening tool on clinic websites or digital health platforms to encourage individuals at risk of HIV or an STI to be tested or start HIV pre-exposure prophylaxis. The public can use this tool to assess their risk and then decide if they would attend a clinic for testing. Clinicians or public health workers can use this tool to identify high-risk individuals for further interventions. ", doi="10.2196/37850", url="https://www.jmir.org/2022/8/e37850", url="http://www.ncbi.nlm.nih.gov/pubmed/36006685" } @Article{info:doi/10.2196/38826, author="Gray, Caroline and Wray, Charlie and Tisdale, Rebecca and Chaudary, Camila and Slightam, Cindie and Zulman, Donna", title="Factors Influencing How Providers Assess the Appropriateness of Video Visits: Interview Study With Primary and Specialty Health Care Providers", journal="J Med Internet Res", year="2022", month="Aug", day="24", volume="24", number="8", pages="e38826", keywords="virtual care", keywords="decision-making", keywords="qualitative", keywords="virtual visits", keywords="web-based", keywords="carer", keywords="video", keywords="telephone", keywords="telemedicine", keywords="appointments", keywords="caregiver", abstract="Background: The rapid implementation of virtual care (ie, telephone or video-based clinic appointments) during the COVID-19 pandemic resulted in many providers offering virtual care with little or no formal training and without clinical guidelines and tools to assist with decision-making. As new guidelines for virtual care provision take shape, it is critical that they are informed by an in-depth understanding of how providers make decisions about virtual care in their clinical practices. Objective: In this paper, we sought to identify the most salient factors that influence how providers decide when to offer patients video appointments instead of or in conjunction with in-person care. Methods: We conducted semistructured interviews with 28 purposefully selected primary and specialty health care providers from the US Department of Veteran's Affairs health care system. We used an inductive approach to identify factors that impact provider decision-making. Results: Qualitative analysis revealed distinct clinical, patient, and provider factors that influence provider decisions to initiate or continue with virtual visits. Clinical factors include patient acuity, the need for additional tests or labs, changes in patients' health status, and whether the patient is new or has no recent visit. Patient factors include patients' ability to articulate symptoms or needs, availability and accessibility of technology, preferences for or against virtual visits, and access to caregiver assistance. Provider factors include provider comfort with and acceptance of virtual technology as well as virtual physical exam skills and training. Conclusions: Providers within the US Department of Veterans Affairs health administration system consider a complex set of factors when deciding whether to offer or continue a video or telephone visit. These factors can inform the development and further refinement of decision tools, guides, and other policies to ensure that virtual care expands access to high-quality care. ", doi="10.2196/38826", url="https://www.jmir.org/2022/8/e38826", url="http://www.ncbi.nlm.nih.gov/pubmed/36001364" } @Article{info:doi/10.2196/38226, author="Xu, Lingxiao and Liu, Jun and Han, Chunxia and Ai, Zisheng", title="The Application of Machine Learning in Predicting Mortality Risk in Patients With Severe Femoral Neck Fractures: Prediction Model Development Study", journal="JMIR Bioinform Biotech", year="2022", month="Aug", day="19", volume="3", number="1", pages="e38226", keywords="machine learning", keywords="femoral neck fracture", keywords="hospital mortality", keywords="hip", keywords="fracture", keywords="mortality", keywords="prediction", keywords="intensive care unit", keywords="ICU", keywords="decision-making", keywords="risk", keywords="assessment", keywords="prognosis", abstract="Background: Femoral neck fracture (FNF) accounts for approximately 3.58\% of all fractures in the entire body, exhibiting an increasing trend each year. According to a survey, in 1990, the total number of hip fractures in men and women worldwide was approximately 338,000 and 917,000, respectively. In China, FNFs account for 48.22\% of hip fractures. Currently, many studies have been conducted on postdischarge mortality and mortality risk in patients with FNF. However, there have been no definitive studies on in-hospital mortality or its influencing factors in patients with severe FNF admitted to the intensive care unit. Objective: In this paper, 3 machine learning methods were used to construct a nosocomial death prediction model for patients admitted to intensive care units to assist clinicians in early clinical decision-making. Methods: A retrospective analysis was conducted using information of a patient with FNF from the Medical Information Mart for Intensive Care III. After balancing the data set using the Synthetic Minority Oversampling Technique algorithm, patients were randomly separated into a 70\% training set and a 30\% testing set for the development and validation, respectively, of the prediction model. Random forest, extreme gradient boosting, and backpropagation neural network prediction models were constructed with nosocomial death as the outcome. Model performance was assessed using the area under the receiver operating characteristic curve, accuracy, precision, sensitivity, and specificity. The predictive value of the models was verified in comparison to the traditional logistic model. Results: A total of 366 patients with FNFs were selected, including 48 cases (13.1\%) of in-hospital death. Data from 636 patients were obtained by balancing the data set with the in-hospital death group to survival group as 1:1. The 3 machine learning models exhibited high predictive accuracy, and the area under the receiver operating characteristic curve of the random forest, extreme gradient boosting, and backpropagation neural network were 0.98, 0.97, and 0.95, respectively, all with higher predictive performance than the traditional logistic regression model. Ranking the importance of the feature variables, the top 10 feature variables that were meaningful for predicting the risk of in-hospital death of patients were the Simplified Acute Physiology Score II, lactate, creatinine, gender, vitamin D, calcium, creatine kinase, creatine kinase isoenzyme, white blood cell, and age. Conclusions: Death risk assessment models constructed using machine learning have positive significance for predicting the in-hospital mortality of patients with severe disease and provide a valid basis for reducing in-hospital mortality and improving patient prognosis. ", doi="10.2196/38226", url="https://bioinform.jmir.org/2022/1/e38226" } @Article{info:doi/10.2196/37584, author="Rose, Christian and D{\'i}az, Mark and D{\'i}az, Tom{\'a}s", title="Addressing Medicine's Dark Matter", journal="Interact J Med Res", year="2022", month="Aug", day="17", volume="11", number="2", pages="e37584", keywords="big data", keywords="AI", keywords="artificial intelligence", keywords="equity", keywords="data collection", keywords="health care", keywords="prediction", keywords="model", keywords="predict", keywords="representative", keywords="unrepresented", doi="10.2196/37584", url="https://www.i-jmr.org/2022/2/e37584", url="http://www.ncbi.nlm.nih.gov/pubmed/35976194" } @Article{info:doi/10.2196/38052, author="Wang, Xin and Wang, Jian and Xu, Bo and Lin, Hongfei and Zhang, Bo and Yang, Zhihao", title="Exploiting Intersentence Information for Better Question-Driven Abstractive Summarization: Algorithm Development and Validation", journal="JMIR Med Inform", year="2022", month="Aug", day="15", volume="10", number="8", pages="e38052", keywords="question-driven abstractive summarization", keywords="transformer", keywords="multi-head attention", keywords="pointer network", keywords="question answering", keywords="factual consistency", keywords="algorithm", keywords="validation", keywords="natural language processing", abstract="Background: Question-driven summarization has become a practical and accurate approach to summarizing the source document. The generated summary should be concise and consistent with the concerned question, and thus, it could be regarded as the answer to the nonfactoid question. Existing methods do not fully exploit question information over documents and dependencies across sentences. Besides, most existing summarization evaluation tools like recall-oriented understudy for gisting evaluation (ROUGE) calculate N-gram overlaps between the generated summary and the reference summary while neglecting the factual consistency problem. Objective: This paper proposes a novel question-driven abstractive summarization model based on transformer, including a two-step attention mechanism and an overall integration mechanism, which can generate concise and consistent summaries for nonfactoid question answering. Methods: Specifically, the two-step attention mechanism is proposed to exploit the mutual information both of question to context and sentence over other sentences. We further introduced an overall integration mechanism and a novel pointer network for information integration. We conducted a question-answering task to evaluate the factual consistency between the generated summary and the reference summary. Results: The experimental results of question-driven summarization on the PubMedQA data set showed that our model achieved ROUGE-1, ROUGE-2, and ROUGE-L measures of 36.01, 15.59, and 30.22, respectively, which is superior to the state-of-the-art methods with a gain of 0.79 (absolute) in the ROUGE-2 score. The question-answering task demonstrates that the generated summaries of our model have better factual constancy. Our method achieved 94.2\% accuracy and a 77.57\% F1 score. Conclusions: Our proposed question-driven summarization model effectively exploits the mutual information among the question, document, and summary to generate concise and consistent summaries. ", doi="10.2196/38052", url="https://medinform.jmir.org/2022/8/e38052", url="http://www.ncbi.nlm.nih.gov/pubmed/35969463" } @Article{info:doi/10.2196/36199, author="Abbasgholizadeh Rahimi, Samira and Cwintal, Michelle and Huang, Yuhui and Ghadiri, Pooria and Grad, Roland and Poenaru, Dan and Gore, Genevieve and Zomahoun, Vignon Herv{\'e} Tchala and L{\'e}gar{\'e}, France and Pluye, Pierre", title="Application of Artificial Intelligence in Shared Decision Making: Scoping Review", journal="JMIR Med Inform", year="2022", month="Aug", day="9", volume="10", number="8", pages="e36199", keywords="artificial intelligence", keywords="machine learning", keywords="shared decision making", keywords="patient-centered care", keywords="scoping review", abstract="Background: Artificial intelligence (AI) has shown promising results in various fields of medicine. It has the potential to facilitate shared decision making (SDM). However, there is no comprehensive mapping of how AI may be used for SDM. Objective: We aimed to identify and evaluate published studies that have tested or implemented AI to facilitate SDM. Methods: We performed a scoping review informed by the methodological framework proposed by Levac et al, modifications to the original Arksey and O'Malley framework of a scoping review, and the Joanna Briggs Institute scoping review framework. We reported our results based on the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) reporting guideline. At the identification stage, an information specialist performed a comprehensive search of 6 electronic databases from their inception to May 2021. The inclusion criteria were: all populations; all AI interventions that were used to facilitate SDM, and if the AI intervention was not used for the decision-making point in SDM, it was excluded; any outcome related to patients, health care providers, or health care systems; studies in any health care setting, only studies published in the English language, and all study types. Overall, 2 reviewers independently performed the study selection process and extracted data. Any disagreements were resolved by a third reviewer. A descriptive analysis was performed. Results: The search process yielded 1445 records. After removing duplicates, 894 documents were screened, and 6 peer-reviewed publications met our inclusion criteria. Overall, 2 of them were conducted in North America, 2 in Europe, 1 in Australia, and 1 in Asia. Most articles were published after 2017. Overall, 3 articles focused on primary care, and 3 articles focused on secondary care. All studies used machine learning methods. Moreover, 3 articles included health care providers in the validation stage of the AI intervention, and 1 article included both health care providers and patients in clinical validation, but none of the articles included health care providers or patients in the design and development of the AI intervention. All used AI to support SDM by providing clinical recommendations or predictions. Conclusions: Evidence of the use of AI in SDM is in its infancy. We found AI supporting SDM in similar ways across the included articles. We observed a lack of emphasis on patients' values and preferences, as well as poor reporting of AI interventions, resulting in a lack of clarity about different aspects. Little effort was made to address the topics of explainability of AI interventions and to include end-users in the design and development of the interventions. Further efforts are required to strengthen and standardize the use of AI in different steps of SDM and to evaluate its impact on various decisions, populations, and settings. ", doi="10.2196/36199", url="https://medinform.jmir.org/2022/8/e36199", url="http://www.ncbi.nlm.nih.gov/pubmed/35943793" } @Article{info:doi/10.2196/38082, author="Li, Jili and Liu, Siru and Hu, Yundi and Zhu, Lingfeng and Mao, Yujia and Liu, Jialin", title="Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study", journal="J Med Internet Res", year="2022", month="Aug", day="9", volume="24", number="8", pages="e38082", keywords="heart failure", keywords="mortality", keywords="intensive care unit", keywords="prediction", keywords="XGBoost", keywords="SHAP", keywords="SHapley Additive exPlanation", abstract="Background: Heart failure (HF) is a common disease and a major public health problem. HF mortality prediction is critical for developing individualized prevention and treatment plans. However, due to their lack of interpretability, most HF mortality prediction models have not yet reached clinical practice. Objective: We aimed to develop an interpretable model to predict the mortality risk for patients with HF in intensive care units (ICUs) and used the SHapley Additive exPlanation (SHAP) method to explain the extreme gradient boosting (XGBoost) model and explore prognostic factors for HF. Methods: In this retrospective cohort study, we achieved model development and performance comparison on the eICU Collaborative Research Database (eICU-CRD). We extracted data during the first 24 hours of each ICU admission, and the data set was randomly divided, with 70\% used for model training and 30\% used for model validation. The prediction performance of the XGBoost model was compared with three other machine learning models by the area under the curve. We used the SHAP method to explain the XGBoost model. Results: A total of 2798 eligible patients with HF were included in the final cohort for this study. The observed in-hospital mortality of patients with HF was 9.97\%. Comparatively, the XGBoost model had the highest predictive performance among four models with an area under the curve (AUC) of 0.824 (95\% CI 0.7766-0.8708), whereas support vector machine had the poorest generalization ability (AUC=0.701, 95\% CI 0.6433-0.7582). The decision curve showed that the net benefit of the XGBoost model surpassed those of other machine learning models at 10\%{\textasciitilde}28\% threshold probabilities. The SHAP method reveals the top 20 predictors of HF according to the importance ranking, and the average of the blood urea nitrogen was recognized as the most important predictor variable. Conclusions: The interpretable predictive model helps physicians more accurately predict the mortality risk in ICU patients with HF, and therefore, provides better treatment plans and optimal resource allocation for their patients. In addition, the interpretable framework can increase the transparency of the model and facilitate understanding the reliability of the predictive model for the physicians. ", doi="10.2196/38082", url="https://www.jmir.org/2022/8/e38082", url="http://www.ncbi.nlm.nih.gov/pubmed/35943767" } @Article{info:doi/10.2196/36877, author="Wendelboe, Aaron and Saber, Ibrahim and Dvorak, Justin and Adamski, Alys and Feland, Natalie and Reyes, Nimia and Abe, Karon and Ortel, Thomas and Raskob, Gary", title="Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study", journal="JMIR Bioinform Biotech", year="2022", month="Aug", day="5", volume="3", number="1", pages="e36877", keywords="venous thromboembolism", keywords="public health surveillance", keywords="machine learning", keywords="natural language processing", keywords="medical imaging review", keywords="public health", abstract="Background: Venous thromboembolism (VTE) is a preventable, common vascular disease that has been estimated to affect up to 900,000 people per year. It has been associated with risk factors such as recent surgery, cancer, and hospitalization. VTE surveillance for patient management and safety can be improved via natural language processing (NLP). NLP tools have the ability to access electronic medical records, identify patients that meet the VTE case definition, and subsequently enter the relevant information into a database for hospital review. Objective: We aimed to evaluate the performance of a VTE identification model of IDEAL-X (Information and Data Extraction Using Adaptive Learning; Emory University)---an NLP tool---in automatically classifying cases of VTE by ``reading'' unstructured text from diagnostic imaging records collected from 2012 to 2014. Methods: After accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC), we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified. Experts reviewed the technicians' comments in each record to determine if a VTE event occurred. The performance measures calculated (with 95\% CIs) were accuracy, sensitivity, specificity, and positive and negative predictive values. Chi-square tests of homogeneity were conducted to evaluate differences in performance measures by site, using a significance level of .05. Results: The VTE model of IDEAL-X ``read'' 1591 records from Duke University and 1487 records from the OUHSC, for a total of 3078 records. The combined performance measures were 93.7\% accuracy (95\% CI 93.7\%-93.8\%), 96.3\% sensitivity (95\% CI 96.2\%-96.4\%), 92\% specificity (95\% CI 91.9\%-92\%), an 89.1\% positive predictive value (95\% CI 89\%-89.2\%), and a 97.3\% negative predictive value (95\% CI 97.3\%-97.4\%). The sensitivity was higher at Duke University (97.9\%, 95\% CI 97.8\%-98\%) than at the OUHSC (93.3\%, 95\% CI 93.1\%-93.4\%; P<.001), but the specificity was higher at the OUHSC (95.9\%, 95\% CI 95.8\%-96\%) than at Duke University (86.5\%, 95\% CI 86.4\%-86.7\%; P<.001). Conclusions: The VTE model of IDEAL-X accurately classified cases of VTE from the pilot surveillance systems of two separate health systems in Durham, North Carolina, and Oklahoma City, Oklahoma. NLP is a promising tool for the design and implementation of an automated, cost-effective national surveillance system for VTE. Conducting public health surveillance at a national scale is important for measuring disease burden and the impact of prevention measures. We recommend additional studies to identify how integrating IDEAL-X in a medical record system could further automate the surveillance process. ", doi="10.2196/36877", url="https://bioinform.jmir.org/2022/1/e36877", url="http://www.ncbi.nlm.nih.gov/pubmed/37206160" } @Article{info:doi/10.2196/37486, author="Huang, Yanqun and Zheng, Zhimin and Ma, Moxuan and Xin, Xin and Liu, Honglei and Fei, Xiaolu and Wei, Lan and Chen, Hui", title="Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study", journal="J Med Internet Res", year="2022", month="Aug", day="3", volume="24", number="8", pages="e37486", keywords="representation learning", keywords="skip-gram", keywords="feature association strengths", keywords="feature importance", keywords="mortality risk prediction", keywords="acute myocardial infarction", abstract="Background: The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention. Objective: We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI). Methods: Medical concepts, including patients' age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score. Results: Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice. Conclusions: The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation. ", doi="10.2196/37486", url="https://www.jmir.org/2022/8/e37486", url="http://www.ncbi.nlm.nih.gov/pubmed/35921141" } @Article{info:doi/10.2196/34126, author="Yu, Fangzhou and Wu, Peixia and Deng, Haowen and Wu, Jingfang and Sun, Shan and Yu, Huiqian and Yang, Jianming and Luo, Xianyang and He, Jing and Ma, Xiulan and Wen, Junxiong and Qiu, Danhong and Nie, Guohui and Liu, Rizhao and Hu, Guohua and Chen, Tao and Zhang, Cheng and Li, Huawei", title="A Questionnaire-Based Ensemble Learning Model to Predict the Diagnosis of Vertigo: Model Development and Validation Study", journal="J Med Internet Res", year="2022", month="Aug", day="3", volume="24", number="8", pages="e34126", keywords="vestibular disorders", keywords="machine learning", keywords="diagnostic model", keywords="vertigo", keywords="ENT", keywords="questionnaire", abstract="Background: Questionnaires have been used in the past 2 decades to predict the diagnosis of vertigo and assist clinical decision-making. A questionnaire-based machine learning model is expected to improve the efficiency of diagnosis of vestibular disorders. Objective: This study aims to develop and validate a questionnaire-based machine learning model that predicts the diagnosis of vertigo. Methods: In this multicenter prospective study, patients presenting with vertigo entered a consecutive cohort at their first visit to the ENT and vertigo clinics of 7 tertiary referral centers from August 2019 to March 2021, with a follow-up period of 2 months. All participants completed a diagnostic questionnaire after eligibility screening. Patients who received only 1 final diagnosis by their treating specialists for their primary complaint were included in model development and validation. The data of patients enrolled before February 1, 2021 were used for modeling and cross-validation, while patients enrolled afterward entered external validation. Results: A total of 1693 patients were enrolled, with a response rate of 96.2\% (1693/1760). The median age was 51 (IQR 38-61) years, with 991 (58.5\%) females; 1041 (61.5\%) patients received the final diagnosis during the study period. Among them, 928 (54.8\%) patients were included in model development and validation, and 113 (6.7\%) patients who enrolled later were used as a test set for external validation. They were classified into 5 diagnostic categories. We compared 9 candidate machine learning methods, and the recalibrated model of light gradient boosting machine achieved the best performance, with an area under the curve of 0.937 (95\% CI 0.917-0.962) in cross-validation and 0.954 (95\% CI 0.944-0.967) in external validation. Conclusions: The questionnaire-based light gradient boosting machine was able to predict common vestibular disorders and assist decision-making in ENT and vertigo clinics. Further studies with a larger sample size and the participation of neurologists will help assess the generalization and robustness of this machine learning method. ", doi="10.2196/34126", url="https://www.jmir.org/2022/8/e34126", url="http://www.ncbi.nlm.nih.gov/pubmed/35921135" } @Article{info:doi/10.2196/27990, author="Rom{\'a}n-Villar{\'a}n, Esther and Alvarez-Romero, Celia and Mart{\'i}nez-Garc{\'i}a, Alicia and Escobar-Rodr{\'i}guez, Antonio German and Garc{\'i}a-Lozano, Jos{\'e} Mar{\'i}a and Bar{\'o}n-Franco, Bosco and Moreno-Gavi{\~n}o, Lourdes and Moreno-Conde, Jes{\'u}s and Rivas-Gonz{\'a}lez, Antonio Jos{\'e} and Parra-Calder{\'o}n, Luis Carlos", title="A Personalized Ontology-Based Decision Support System for Complex Chronic Patients: Retrospective Observational Study", journal="JMIR Form Res", year="2022", month="Aug", day="2", volume="6", number="8", pages="e27990", keywords="adherence", keywords="ontology", keywords="clinical decision support system", keywords="CDSS", keywords="complex chronic patients", keywords="functional validation", keywords="multimorbidity", keywords="polypharmacy", keywords="atrial fibrillation", keywords="anticoagulants", abstract="Background: Due to an increase in life expectancy, the prevalence of chronic diseases is also on the rise. Clinical practice guidelines (CPGs) provide recommendations for suitable interventions regarding different chronic diseases, but a deficiency in the implementation of these CPGs has been identified. The PITeS-TiiSS (Telemedicine and eHealth Innovation Platform: Information Communications Technology for Research and Information Challenges in Health Services) tool, a personalized ontology-based clinical decision support system (CDSS), aims to reduce variability, prevent errors, and consider interactions between different CPG recommendations, among other benefits. Objective: The aim of this study is to design, develop, and validate an ontology-based CDSS that provides personalized recommendations related to drug prescription. The target population is older adult patients with chronic diseases and polypharmacy, and the goal is to reduce complications related to these types of conditions while offering integrated care. Methods: A study scenario about atrial fibrillation and treatment with anticoagulants was selected to validate the tool. After this, a series of knowledge sources were identified, including CPGs, PROFUND index, LESS/CHRON criteria, and STOPP/START criteria, to extract the information. Modeling was carried out using an ontology, and mapping was done with Health Level 7 Fast Healthcare Interoperability Resources (HL7 FHIR) and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT; International Health Terminology Standards Development Organisation). Once the CDSS was developed, validation was carried out by using a retrospective case study. Results: This project was funded in January 2015 and approved by the Virgen del Rocio University Hospital ethics committee on November 24, 2015. Two different tasks were carried out to test the functioning of the tool. First, retrospective data from a real patient who met the inclusion criteria were used. Second, the analysis of an adoption model was performed through the study of the requirements and characteristics that a CDSS must meet in order to be well accepted and used by health professionals. The results are favorable and allow the proposed research to continue to the next phase. Conclusions: An ontology-based CDSS was successfully designed, developed, and validated. However, in future work, validation in a real environment should be performed to ensure the tool is usable and reliable. ", doi="10.2196/27990", url="https://formative.jmir.org/2022/8/e27990", url="http://www.ncbi.nlm.nih.gov/pubmed/35916719" } @Article{info:doi/10.2196/37928, author="Yoo, Junsang and Lee, Jeonghoon and Min, Young Ji and Choi, Won Sae and Kwon, Joon-myoung and Cho, Insook and Lim, Chiyeon and Choi, Young Mi and Cha, Chul Won", title="Development of an Interoperable and Easily Transferable Clinical Decision Support System Deployment Platform: System Design and Development Study", journal="J Med Internet Res", year="2022", month="Jul", day="27", volume="24", number="7", pages="e37928", keywords="clinical decision support system", keywords="decision making", keywords="decision aid", keywords="decision support", keywords="common data model", keywords="model", keywords="development", keywords="electronic health record", keywords="medical record", keywords="EHR", keywords="EMR", keywords="Fast Healthcare Interoperability Resource", keywords="interoperability", keywords="machine learning", keywords="clinical decision", keywords="health technology", keywords="algorithm", keywords="intelligent algorithm network", keywords="modeling", abstract="Background: A clinical decision support system (CDSS) is recognized as a technology that enhances clinical efficacy and safety. However, its full potential has not been realized, mainly due to clinical data standards and noninteroperable platforms. Objective: In this paper, we introduce the common data model--based intelligent algorithm network environment (CANE) platform that supports the implementation and deployment of a CDSS. Methods: CDSS reasoning engines, usually represented as R or Python objects, are deployed into the CANE platform and converted into C\# objects. When a clinician requests CANE-based decision support in the electronic health record (EHR) system, patients' information is transformed into Health Level 7 Fast Healthcare Interoperability Resources (FHIR) format and transmitted to the CANE server inside the hospital firewall. Upon receiving the necessary data, the CANE system's modules perform the following tasks: (1) the preprocessing module converts the FHIRs into the input data required by the specific reasoning engine, (2) the reasoning engine module operates the target algorithms, (3) the integration module communicates with the other institutions' CANE systems to request and transmit a summary report to aid in decision support, and (4) creates a user interface by integrating the summary report and the results calculated by the reasoning engine. Results: We developed a CANE system such that any algorithm implemented in the system can be directly called through the RESTful application programming interface when it is integrated with an EHR system. Eight algorithms were developed and deployed in the CANE system. Using a knowledge-based algorithm, physicians can screen patients who are prone to sepsis and obtain treatment guides for patients with sepsis with the CANE system. Further, using a nonknowledge-based algorithm, the CANE system supports emergency physicians' clinical decisions about optimum resource allocation by predicting a patient's acuity and prognosis during triage. Conclusions: We successfully developed a common data model--based platform that adheres to medical informatics standards and could aid artificial intelligence model deployment using R or Python. ", doi="10.2196/37928", url="https://www.jmir.org/2022/7/e37928", url="http://www.ncbi.nlm.nih.gov/pubmed/35896020" } @Article{info:doi/10.2196/37913, author="Kawazoe, Yoshimasa and Shimamoto, Kiminori and Shibata, Daisaku and Shinohara, Emiko and Kawaguchi, Hideaki and Yamamoto, Tomotaka", title="Impact of a Clinical Text--Based Fall Prediction Model on Preventing Extended Hospital Stays for Elderly Inpatients: Model Development and Performance Evaluation", journal="JMIR Med Inform", year="2022", month="Jul", day="27", volume="10", number="7", pages="e37913", keywords="accidental falls", keywords="accident prevention", keywords="inpatients", keywords="machine learning", keywords="natural language processing", keywords="propensity score", keywords="hospital", keywords="elderly", keywords="prediction model", keywords="patient", keywords="risk assessment", abstract="Background: Falls may cause elderly people to be bedridden, requiring professional intervention; thus, fall prevention is crucial. The use of electronic health records (EHRs) is expected to provide highly accurate risk assessment and length-of-stay data related to falls, which may be used to estimate the costs and benefits of prevention. However, no studies to date have investigated the extent to which hospital stays could be shortened through fall avoidance resulting from the use of prediction tools. Objective: We first estimated the extended length of hospital stay caused by falls among elderly inpatients. Next, we developed a model that predicts falls using clinical text as input and evaluated its accuracy. Finally, we estimated the potentially shortened hospital stay that would be made possible by appropriate interventions based on the prediction model. Methods: Patients aged 65 years or older were selected as subjects, and the EHRs of 1728 falls and 70,586 nonfalls were subjected to analysis. The extended-stay lengths were estimated using propensity score matching of 49 associated variables. Bidirectional encoder representations from transformers and bidirectional long short-term memory methods were used to predict falls from clinical text. The estimated length of stay and the outputs of the prediction model were used to determine stay reductions. Results: The extended length of hospital stay due to falls was estimated to be 17.8 days (95\% CI 16.6-19.0), which dropped to 8.6 days when there were unobserved covariates at an odds ratio of 2.0. The accuracy of the prediction model was as follows: area under the receiver operating characteristic curve, 0.851; F-value, 0.165; recall, 0.737; precision, 0.093; and specificity, 0.839. When assuming interventions with 25\% or 100\% effectiveness against cases where the model predicted a fall, the stay reduction was estimated at 0.022 and 0.099 days/day, respectively. Conclusions: The accuracy of the prediction model using clinical text is considered to be higher than the prediction accuracy of conventional assessments. However, our model's precision remained low at 9.3\%. This may be due, in part, to the inclusion of cases in which falls did not occur because of preventative interventions during hospitalization. Nonetheless, it is estimated that interventions for cases when falls were predicted will reduce medical costs by 886 Yen/day ({\textasciitilde}US \$6.50/day) of intervention, even if the preventative effect is 25\%. Limitations include the fact that these results cannot be extrapolated to short- or long-term hospitalization cases, and that this was a single-center study. ", doi="10.2196/37913", url="https://medinform.jmir.org/2022/7/e37913", url="http://www.ncbi.nlm.nih.gov/pubmed/35896017" } @Article{info:doi/10.2196/34108, author="Westcott, M. Jill and Hughes, Francine and Liu, Wenke and Grivainis, Mark and Hoskins, Iffath and Fenyo, David", title="Prediction of Maternal Hemorrhage Using Machine Learning: Retrospective Cohort Study", journal="J Med Internet Res", year="2022", month="Jul", day="18", volume="24", number="7", pages="e34108", keywords="predictive modeling", keywords="maternal morbidity", keywords="postpartum hemorrhage", keywords="machine learning", keywords="obstetrics", keywords="pregnancy", keywords="post partum", keywords="maternal", abstract="Background: Postpartum hemorrhage remains one of the largest causes of maternal morbidity and mortality in the United States. Objective: The aim of this paper is to use machine learning techniques to identify patients at risk for postpartum hemorrhage at obstetric delivery. Methods: Women aged 18 to 55 years delivering at a major academic center from July 2013 to October 2018 were included for analysis (N=30,867). A total of 497 variables were collected from the electronic medical record including the following: demographic information; obstetric, medical, surgical, and family history; vital signs; laboratory results; labor medication exposures; and delivery outcomes. Postpartum hemorrhage was defined as a blood loss of ?1000 mL at the time of delivery, regardless of delivery method, with 2179 (7.1\%) positive cases observed.Supervised learning with regression-, tree-, and kernel-based machine learning methods was used to create classification models based upon training (21,606/30,867, 70\%) and validation (4630/30,867, 15\%) cohorts. Models were tuned using feature selection algorithms and domain knowledge. An independent test cohort (4631/30,867, 15\%) determined final performance by assessing for accuracy, area under the receiver operating curve (AUROC), and sensitivity for proper classification of postpartum hemorrhage. Separate models were created using all collected data versus models limited to data available prior to the second stage of labor or at the time of decision to proceed with cesarean delivery. Additional models examined patients by mode of delivery. Results: Gradient boosted decision trees achieved the best discrimination in the overall model. The model including all data mildly outperformed the second stage model (AUROC 0.979, 95\% CI 0.971-0.986 vs AUROC 0.955, 95\% CI 0.939-0.970). Optimal model accuracy was 98.1\% with a sensitivity of 0.763 for positive prediction of postpartum hemorrhage. The second stage model achieved an accuracy of 98.0\% with a sensitivity of 0.737. Other selected algorithms returned models that performed with decreased discrimination. Models stratified by mode of delivery achieved good to excellent discrimination but lacked the sensitivity necessary for clinical applicability. Conclusions: Machine learning methods can be used to identify women at risk for postpartum hemorrhage who may benefit from individualized preventative measures. Models limited to data available prior to delivery perform nearly as well as those with more complete data sets, supporting their potential utility in the clinical setting. Further work is necessary to create successful models based upon mode of delivery and to validate the findings of this study. An unbiased approach to hemorrhage risk prediction may be superior to human risk assessment and represents an area for future research. ", doi="10.2196/34108", url="https://www.jmir.org/2022/7/e34108", url="http://www.ncbi.nlm.nih.gov/pubmed/35849436" } @Article{info:doi/10.2196/36176, author="Witte, Harald and Nakas, Christos and Bally, Lia and Leichtle, Benedikt Alexander", title="Machine Learning Prediction of Hypoglycemia and Hyperglycemia From Electronic Health Records: Algorithm Development and Validation", journal="JMIR Form Res", year="2022", month="Jul", day="18", volume="6", number="7", pages="e36176", keywords="diabetes", keywords="blood glucose decompensation", keywords="multiclass prediction model", keywords="dysglycemia", keywords="hyperglycemia", keywords="hypoglycemia", abstract="Background: Acute blood glucose (BG) decompensations (hypoglycemia and hyperglycemia) represent a frequent and significant risk for inpatients and adversely affect patient outcomes and safety. The increasing need for BG management in inpatients poses a high demand on clinical staff and health care systems in addition. Objective: This study aimed to generate a broadly applicable multiclass classification model for predicting BG decompensation events from patients' electronic health records to indicate where adjustments in patient monitoring and therapeutic interventions are required. This should allow for taking proactive measures before BG levels are derailed. Methods: A retrospective cohort study was conducted on patients who were hospitalized at a tertiary hospital in Bern, Switzerland. Using patient details and routine data from electronic health records, a multiclass prediction model for BG decompensation events (<3.9 mmol/L [hypoglycemia] or >10, >13.9, or >16.7 mmol/L [representing different degrees of hyperglycemia]) was generated based on a second-level ensemble of gradient-boosted binary trees. Results: A total of 63,579 hospital admissions of 38,250 patients were included in this study. The multiclass prediction model reached specificities of 93.7\%, 98.9\%, and 93.9\% and sensitivities of 67.1\%, 59\%, and 63.6\% for the main categories of interest, which were nondecompensated cases, hypoglycemia, or hyperglycemia, respectively. The median prediction horizon was 7 hours and 4 hours for hypoglycemia and hyperglycemia, respectively. Conclusions: Electronic health records have the potential to reliably predict all types of BG decompensation. Readily available patient details and routine laboratory data can support the decisions for proactive interventions and thus help to reduce the detrimental health effects of hypoglycemia and hyperglycemia. ", doi="10.2196/36176", url="https://formative.jmir.org/2022/7/e36176", url="http://www.ncbi.nlm.nih.gov/pubmed/35526139" } @Article{info:doi/10.2196/37233, author="Granda Morales, Fernando Luis and Valdiviezo-Diaz, Priscila and Re{\'a}tegui, Ruth and Barba-Guaman, Luis", title="Drug Recommendation System for Diabetes Using a Collaborative Filtering and Clustering Approach: Development and Performance Evaluation", journal="J Med Internet Res", year="2022", month="Jul", day="15", volume="24", number="7", pages="e37233", keywords="clustering", keywords="collaborative filtering", keywords="diabetes", keywords="recommender system", keywords="recommend", keywords="drug", keywords="chronic disease", keywords="patient information", keywords="data mining", keywords="machine learning", abstract="Background: Diabetes is a public health problem worldwide. Although diabetes is a chronic and incurable disease, measures and treatments can be taken to control it and keep the patient stable. Diabetes has been the subject of extensive research, ranging from disease prevention to the use of technologies for its diagnosis and control. Health institutions obtain information required for the diagnosis of diabetes through various tests, and appropriate treatment is provided according to the diagnosis. These institutions have databases with large volumes of information that can be analyzed and used in different applications such as pattern discovery and outcome prediction, which can help health personnel in making decisions about treatments or determining the appropriate prescriptions for diabetes management. Objective: The aim of this study was to develop a drug recommendation system for patients with diabetes based on collaborative filtering and clustering techniques as a complement to the treatments given by the treating doctor. Methods: The data set used contains information from patients with diabetes available in the University of California Irvine Machine Learning Repository. Data mining techniques were applied for processing and analysis of the data set. Unsupervised learning techniques were used for dimensionality reduction and patient clustering. Drug predictions were obtained with a user-based collaborative filtering approach, which enabled creating a patient profile that can be compared with the profiles of other patients with similar characteristics. Finally, recommendations were made considering the identified patient groups. The performance of the system was evaluated using metrics to assess the quality of the groups and the quality of the predictions and recommendations. Results: Principal component analysis to reduce the dimensionality of the data showed that eight components best explained the variability of the data. We identified six groups of patients using the clustering algorithm, which were evenly distributed. These groups were identified based on the available information of patients with diabetes, and then the variation between groups was examined to predict a suitable medication for a target patient. The recommender system achieved good results in the quality of predictions with a mean squared error metric of 0.51 and accuracy in the quality of recommendations of 0.61, which is acceptable. Conclusions: This work presents a recommendation system that suggests medications according to drug information and the characteristics of patients with diabetes. Some aspects related to this disease were analyzed based on the data set used from patients with diabetes. The experimental results with clustering and prediction techniques were found to be acceptable for the recommendation process. This system can provide a novel perspective for health institutions that require technologies to support health care personnel in the management of diabetes treatment and control. ", doi="10.2196/37233", url="https://www.jmir.org/2022/7/e37233", url="http://www.ncbi.nlm.nih.gov/pubmed/35838763" } @Article{info:doi/10.2196/21994, author="von Tottleben, Malte and Grinyer, Katie and Arfa, Ali and Traore, Lamine and Verdoy, Dolores and Lim Choi Keung, N. Sarah and Larranaga, Igor and Jaulent, Marie-Christine and De Manuel Keenoy, Esteban and Lilja, Mikael and Beach, Marie and Marguerie, Christopher and Yuksel, Mustafa and Laleci Erturkmen, Banu Gokce and Klein, O. Gunnar and Lindman, Pontus and Mar, Javier and Kalra, Dipak and and Arvanitis, N. Theodoros", title="An Integrated Care Platform System (C3-Cloud) for Care Planning, Decision Support, and Empowerment of Patients With Multimorbidity: Protocol for a Technology Trial", journal="JMIR Res Protoc", year="2022", month="Jul", day="13", volume="11", number="7", pages="e21994", keywords="multimorbidity", keywords="polypharmacy", keywords="guidelines reconciliation", keywords="clinical decision support", keywords="personalized care plans", keywords="diabetes mellitus type 2", keywords="heart failure", keywords="depression", keywords="renal failure", keywords="acceptability", keywords="usability", keywords="evaluation", keywords="cost-benefit evaluation", keywords="predictive modeling", abstract="Background: There is an increasing need to organize the care around the patient and not the disease, while considering the complex realities of multiple physical and psychosocial conditions, and polypharmacy. Integrated patient-centered care delivery platforms have been developed for both patients and clinicians. These platforms could provide a promising way to achieve a collaborative environment that improves the provision of integrated care for patients via enhanced information and communication technology solutions for semiautomated clinical decision support. Objective: The Collaborative Care and Cure Cloud project (C3-Cloud) has developed 2 collaborative computer platforms for patients and members of the multidisciplinary team (MDT) and deployed these in 3 different European settings. The objective of this study is to pilot test the platforms and evaluate their impact on patients with 2 or more chronic conditions (diabetes mellitus type 2, heart failure, kidney failure, depression), their informal caregivers, health care professionals, and, to some extent, health care systems. Methods: This paper describes the protocol for conducting an evaluation of user experience, acceptability, and usefulness of the platforms. For this, 2 ``testing and evaluation'' phases have been defined, involving multiple qualitative methods (focus groups and surveys) and advanced impact modeling (predictive modeling and cost-benefit analysis). Patients and health care professionals were identified and recruited from 3 partnering regions in Spain, Sweden, and the United Kingdom via electronic health record screening. Results: The technology trial in this 4-year funded project (2016-2020) concluded in April 2020. The pilot technology trial for evaluation phases 3 and 4 was launched in November 2019 and carried out until April 2020. Data collection for these phases is completed with promising results on platform acceptance and socioeconomic impact. We believe that the phased, iterative approach taken is useful as it involves relevant stakeholders at crucial stages in the platform development and allows for a sound user acceptance assessment of the final product. Conclusions: Patients with multiple chronic conditions often experience shortcomings in the care they receive. It is hoped that personalized care plan platforms for patients and collaboration platforms for members of MDTs can help tackle the specific challenges of clinical guideline reconciliation for patients with multimorbidity and improve the management of polypharmacy. The initial evaluative phases have indicated promising results of platform usability. Results of phases 3 and 4 were methodologically useful, yet limited due to the COVID-19 pandemic. Trial Registration: ClinicalTrials.gov NCT03834207; https://clinicaltrials.gov/ct2/show/NCT03834207 International Registered Report Identifier (IRRID): RR1-10.2196/21994 ", doi="10.2196/21994", url="https://www.researchprotocols.org/2022/7/e21994", url="http://www.ncbi.nlm.nih.gov/pubmed/35830239" } @Article{info:doi/10.2196/39003, author="Leis, Angela and Casadevall, David and Albanell, Joan and Posso, Margarita and Maci{\`a}, Francesc and Castells, Xavier and Ram{\'i}rez-Anguita, Manuel Juan and Mart{\'i}nez Rold{\'a}n, Jordi and Furlong, I. Laura and Sanz, Ferran and Ronzano, Francesco and Mayer, A. Miguel", title="Exploring the Association of Cancer and Depression in Electronic Health Records: Combining Encoded Diagnosis and Mining Free-Text Clinical Notes", journal="JMIR Cancer", year="2022", month="Jul", day="11", volume="8", number="3", pages="e39003", keywords="cancer", keywords="depression", keywords="electronic health records", keywords="text mining", keywords="natural language processing", abstract="Background: A cancer diagnosis is a source of psychological and emotional stress, which are often maintained for sustained periods of time that may lead to depressive disorders. Depression is one of the most common psychological conditions in patients with cancer. According to the Global Cancer Observatory, breast and colorectal cancers are the most prevalent cancers in both sexes and across all age groups in Spain. Objective: This study aimed to compare the prevalence of depression in patients before and after the diagnosis of breast or colorectal cancer, as well as to assess the usefulness of the analysis of free-text clinical notes in 2 languages (Spanish or Catalan) for detecting depression in combination with encoded diagnoses. Methods: We carried out an analysis of the electronic health records from a general hospital by considering the different sources of clinical information related to depression in patients with breast and colorectal cancer. This analysis included ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification) diagnosis codes and unstructured information extracted by mining free-text clinical notes via natural language processing tools based on Systematized Nomenclature of Medicine Clinical Terms that mentions symptoms and drugs used for the treatment of depression. Results: We observed that the percentage of patients diagnosed with depressive disorders significantly increased after cancer diagnosis in the 2 types of cancer considered---breast and colorectal cancers. We managed to identify a higher number of patients with depression by mining free-text clinical notes than the group selected exclusively on ICD-9-CM codes, increasing the number of patients diagnosed with depression by 34.8\% (441/1269). In addition, the number of patients with depression who received chemotherapy was higher than those who did not receive this treatment, with significant differences (P<.001). Conclusions: This study provides new clinical evidence of the depression-cancer comorbidity and supports the use of natural language processing for extracting and analyzing free-text clinical notes from electronic health records, contributing to the identification of additional clinical data that complements those provided by coded data to improve the management of these patients. ", doi="10.2196/39003", url="https://cancer.jmir.org/2022/3/e39003", url="http://www.ncbi.nlm.nih.gov/pubmed/35816382" } @Article{info:doi/10.2196/37301, author="Fukaguchi, Kiyomitsu and Goto, Tadahiro and Yamamoto, Tadatsugu and Yamagami, Hiroshi", title="Experimental Implementation of NSER Mobile App for Efficient Real-Time Sharing of Prehospital Patient Information With Emergency Departments: Interrupted Time-Series Analysis", journal="JMIR Form Res", year="2022", month="Jul", day="6", volume="6", number="7", pages="e37301", keywords="emergency department", keywords="emergency medical services", keywords="mobile apps", keywords="interrupted time series analysis", keywords="emergency", keywords="patient record", keywords="implementation", keywords="patient care", keywords="app", keywords="implement", keywords="medical informatics", keywords="clinical informatics", keywords="decision support", keywords="electronic health record", keywords="eHealth", keywords="digital health", abstract="Background: With the aging society, the number of emergency transportations has been growing. Although it is important that a patient be immediately transported to an appropriate hospital for proper management, accurate diagnosis in the prehospital setting is challenging. However, at present, patient information is mainly communicated by telephone, which has a potential risk of communication errors such as mishearing. Sharing correct and detailed prehospital information with emergency departments (EDs) should facilitate optimal patient care and resource use. Therefore, the implementation of an app that provides on-site, real-time information to emergency physicians could be useful for early preparation, intervention, and effective use of medical and human resources. Objective: In this paper, we aimed to examine whether the implementation of a mobile app for emergency medical service (EMS) would improve patient outcomes and reduce transportation time as well as communication time by phone (ie, phone-communication time). Methods: We performed an interrupted time-series analysis (ITSA) on the data from a tertiary care hospital in Japan from July 2021 to October 2021 (8 weeks before and 8 weeks after the implementation period). We included all patients transported by EMS. Using the mobile app, EMS can send information on patient demographics, vital signs, medications, and photos of the scene to the ED. The outcome measure was inpatient mortality and transportation time, as well as phone-communication time, which was the time for EMS to negotiate with ED staffs for transport requests. Results: During the study period, 1966 emergency transportations were made (n=1033, 53\% patients during the preimplementation period and n=933, 47\% patients after the implementation period). The ITSA did not reveal a significant decrease in patient mortality and transportation time before and after the implementation. However, the ITSA revealed a significant decrease in mean phone-communication time between pre- and postimplementation periods (from 216 to 171 seconds; ?45 seconds; 95\% CI ?71 to ?18 seconds). From the pre- to postimplementation period, the mean transportation time from EMS request to ED arrival decreased by 0.29 minutes (from 36.1 minutes to 35.9 minutes; 95\% CI ?2.20 to 1.60 minutes), without change in time trends. We also introduced cases where the app allowed EMS to share accurate and detailed prehospital information with the emergency department, resulting in timely intervention and reducing the burden on the ED. Conclusions: The implementation of a mobile app for EMS was associated with reduced phone-communication time by 45 seconds (22\%) without increasing mortality or overall transportation time despite the implementation of new methods in the real clinical setting. In addition, real-time patient information sharing, such as the transfer of monitor images and photos of the accident site, could facilitate optimal patient care and resource use. ", doi="10.2196/37301", url="https://formative.jmir.org/2022/7/e37301", url="http://www.ncbi.nlm.nih.gov/pubmed/35793142" } @Article{info:doi/10.2196/35403, author="van Kasteren, Yasmin and Strobel, J{\"o}rg and Bastiampillai, Tarun and Linedale, Ecushla and Bidargaddi, Niranjan", title="Automated Decision Support For Community Mental Health Services Using National Electronic Health Records: Qualitative Implementation Case Study", journal="JMIR Hum Factors", year="2022", month="Jul", day="5", volume="9", number="3", pages="e35403", keywords="implementation", keywords="computerised clinical decision system", keywords="decision system", keywords="decision support", keywords="participatory action framework", keywords="psychotropic medication", keywords="psychotropic", keywords="nonadherence", keywords="monitoring", keywords="medication adherence", keywords="algorithms", keywords="algorithm", keywords="electronic health records", keywords="EHR", keywords="health record", keywords="normalization process theory", keywords="automated alerts", keywords="automated alert", keywords="mental health", keywords="mental illness", keywords="adherence", keywords="medication", keywords="eHealth", keywords="web-based", abstract="Background: A high proportion of patients with severe mental illness relapse due to nonadherence to psychotropic medication. In this paper, we use the normalization process theory (NPT) to describe the implementation of a web-based clinical decision support system (CDSS) for Community Mental Health Services (CMHS) called Actionable Intime Insights or AI2. AI2 has two distinct functions: (1) it provides an overview of medication and treatment history to assist in reviewing patient adherence and (2) gives alerts indicating nonadherence to support early intervention. Objective: Our objective is to evaluate the pilot implementation of the AI2 application to better understand the challenges of implementing a web-based CDSS to support medication adherence and early intervention in CMHS. Methods: The NPT and participatory action framework were used to both explore and support implementation. Qualitative data were collected over the course of the 14-month implementation, in which researchers were active participants. Data were analyzed and coded using the NPT framework. Qualitative data included discussions, meetings, and work products, including emails and documents. Results: This study explores the barriers and enablers of implementing a CDSS to support early intervention within CMHS using Medicare data from Australia's national electronic record system, My Health Record (MyHR). The implementation was a series of ongoing negotiations, which resulted in a staged implementation with compromises on both sides. Clinicians were initially hesitant about using a CDSS based on MyHR data and expressed concerns about the changes to their work practice required to support early intervention. Substantial workarounds were required to move the implementation forward. This pilot implementation allowed us to better understand the challenges of implementation and the resources and support required to implement and sustain a model of care based on automated alerts to support early intervention. Conclusions: The use of decision support based on electronic health records is growing, and while implementation is challenging, the potential benefits of early intervention to prevent relapse and hospitalization and ensure increased efficiency of the health care system are worth pursuing. ", doi="10.2196/35403", url="https://humanfactors.jmir.org/2022/3/e35403", url="http://www.ncbi.nlm.nih.gov/pubmed/35788103" } @Article{info:doi/10.2196/37456, author="Karabukayeva, Aizhan and Anderson, L. Jami and Hall, G. Allyson and Feldman, S. Sue and Mehta, Tapan", title="Exploring a Need for a Cardiometabolic Disease Staging System as a Computerized Clinical Decision Support Tool: Qualitative Study", journal="JMIR Form Res", year="2022", month="Jul", day="1", volume="6", number="7", pages="e37456", keywords="cardiometabolic disease staging system", keywords="risk assessment", keywords="cardiometabolic disease", keywords="clinical decision support system", keywords="primary care", keywords="obesity", keywords="overweight", keywords="medical management", abstract="Background: Although cardiometabolic diseases are leading causes of morbidity and mortality in the United States, computerized tools for risk assessment of cardiometabolic disease are rarely integral components of primary care practice. Embedding cardiometabolic disease staging systems (CMDS) into computerized clinical decision support systems (CDSS) may assist with identifying and treating patients at greatest risk for developing cardiometabolic disease. Objective: This study aimed to explore the current approach to medical management of obesity and the need for CMDS designed to aid medical management of people living with obesity, at risk of being obese, or diabetic at the point of care. Methods: Using a general inductive approach, this qualitative research study was guided by an interpretive epistemology. The method included semistructured, in-depth interviews with primary care providers (PCPs) from university-based community health clinics. The literature informed the interview protocol and included questions on PCPs' experiences and the need for a tool to improve their ability to manage and prevent complications from overweight and obesity. Results: PCPs (N=10) described their current approaches and emphasized behavioral treatments consisting of combined diet, physical activity, and behavior therapy as the first line of treatment for people who were overweight or obese. Results suggest that beneficial features of CDSS include (1) clinically relevant and customizable support, (2) provision of a comprehensive medical summary with trends, (3) availability of patient education materials and community resources, and (4) simplicity and ease of navigation. Conclusions: Implementation of a CMDS via a CDSS could enable PCPs to conduct comprehensive cardiometabolic disease risk assessments, supporting clinical management of overweight, obesity, and diabetes. Results from this study provide unique insights to developers and researchers by identifying areas for design optimization, improved end user experience, and successful adoption of the CDSS. ", doi="10.2196/37456", url="https://formative.jmir.org/2022/7/e37456", url="http://www.ncbi.nlm.nih.gov/pubmed/35776499" } @Article{info:doi/10.2196/39586, author="Cole, Amy and Richardson, R. Daniel and Adapa, Karthik and Khasawneh, Amro and Crossnohere, Norah and Bridges, P. John F. and Mazur, Lukasz", title="Development of a Patient-Centered Preference Tool for Patients With Hematologic Malignancies: Protocol for a Mixed Methods Study", journal="JMIR Res Protoc", year="2022", month="Jun", day="29", volume="11", number="6", pages="e39586", keywords="co-design", keywords="informed decision making", keywords="mHealth", keywords="electronic health care tools", keywords="shared decision making", keywords="patient engagement", keywords="hematologic malignancies", abstract="Background: The approval of novel therapies for patients diagnosed with hematologic malignancies have improved survival outcomes but increased the challenge of aligning chemotherapy choices with patient preferences. We previously developed paper versions of a discrete choice experiment (DCE) and a best-worst scaling (BWS) instrument to quantify the treatment outcome preferences of patients with hematologic malignancies to inform shared decision making. Objective: We aim to develop an electronic health care tool (EHT) to guide clinical decision making that uses either a BWS or DCE instrument to capture patient preferences. The primary objective of this study is to use both qualitative and quantitative methods to evaluate the perceived usability, cognitive workload (CWL), and performance of electronic prototypes that include the DCE and BWS instrument. Methods: This mixed methods study includes iterative co-design methods that will involve healthy volunteers, patient-caregiver pairs, and health care workers to evaluate the perceived usability, CWL, and performance of tasks within distinct prototypes. Think-aloud sessions and semistructured interviews will be conducted to collect qualitative data to develop an affinity diagram for thematic analysis. Validated assessments (Post-Study System Usability Questionnaire [PSSUQ] and the National Aeronautical and Space Administration's Task Load Index [NASA-TLX]) will be used to evaluate the usability and CWL required to complete tasks within the prototypes. Performance assessments of the DCE and BWS will include the evaluation of tasks using the Single Easy Questionnaire (SEQ), time to complete using the prototype, and the number of errors. Additional qualitative assessments will be conducted to gather participants' feedback on visualizations used in the Personalized Treatment Preferences Dashboard that provides a representation of user results after completing the choice tasks within the prototype. Results: Ethical approval was obtained in June 2021 from the Institutional Review Board of the University of North Carolina at Chapel Hill. The DCE and BWS instruments were developed and incorporated into the PRIME (Preference Reporting to Improve Management and Experience) prototype in early 2021 and prototypes were completed by June 2021. Heuristic evaluations were conducted in phase 1 and completed by July 2021. Recruitment of healthy volunteers began in August 2021 and concluded in September 2021. In December 2021, our findings from phase 2 were accepted for publication. Phase 3 recruitment began in January 2022 and is expected to conclude in September 2022. The data analysis from phase 3 is expected to be completed by November 2022. Conclusions: Our findings will help differentiate the usability, CWL, and performance of the DCE and BWS within the prototypes. These findings will contribute to the optimization of the prototypes, leading to the development of an EHT that helps facilitate shared decision making. This evaluation will inform the development of EHTs to be used clinically with patients and health care workers. International Registered Report Identifier (IRRID): DERR1-10.2196/39586 ", doi="10.2196/39586", url="https://www.researchprotocols.org/2022/6/e39586", url="http://www.ncbi.nlm.nih.gov/pubmed/35767340" } @Article{info:doi/10.2196/36774, author="Zhuang, Yan and Zhang, Luxia and Gao, Xiyuan and Shae, Zon-Yin and Tsai, P. Jeffrey J. and Li, Pengfei and Shyu, Chi-Ren", title="Re-engineering a Clinical Trial Management System Using Blockchain Technology: System Design, Development, and Case Studies", journal="J Med Internet Res", year="2022", month="Jun", day="27", volume="24", number="6", pages="e36774", keywords="blockchain", keywords="clinical trials", keywords="clinical trial management system", keywords="electronic data capture", keywords="smart contract", abstract="Background: A clinical trial management system (CTMS) is a suite of specialized productivity tools that manage clinical trial processes from study planning to closeout. Using CTMSs has shown remarkable benefits in delivering efficient, auditable, and visualizable clinical trials. However, the current CTMS market is fragmented, and most CTMSs fail to meet expectations because of their inability to support key functions, such as inconsistencies in data captured across multiple sites. Blockchain technology, an emerging distributed ledger technology, is considered to potentially provide a holistic solution to current CTMS challenges by using its unique features, such as transparency, traceability, immutability, and security. Objective: This study aimed to re-engineer the traditional CTMS by leveraging the unique properties of blockchain technology to create a secure, auditable, efficient, and generalizable CTMS. Methods: A comprehensive, blockchain-based CTMS that spans all stages of clinical trials, including a sharable trial master file system; a fast recruitment and simplified enrollment system; a timely, secure, and consistent electronic data capture system; a reproducible data analytics system; and an efficient, traceable payment and reimbursement system, was designed and implemented using the Quorum blockchain. Compared with traditional blockchain technologies, such as Ethereum, Quorum blockchain offers higher transaction throughput and lowers transaction latency. Case studies on each application of the CTMS were conducted to assess the feasibility, scalability, stability, and efficiency of the proposed blockchain-based CTMS. Results: A total of 21.6 million electronic data capture transactions were generated and successfully processed through blockchain, with an average of 335.4 transactions per second. Of the 6000 patients, 1145 were matched in 1.39 seconds using 10 recruitment criteria with an automated matching mechanism implemented by the smart contract. Key features, such as immutability, traceability, and stability, were also tested and empirically proven through case studies. Conclusions: This study proposed a comprehensive blockchain-based CTMS that covers all stages of the clinical trial process. Compared with our previous research, the proposed system showed an overall better performance. Our system design, implementation, and case studies demonstrated the potential of blockchain technology as a potential solution to CTMS challenges and its ability to perform more health care tasks. ", doi="10.2196/36774", url="https://www.jmir.org/2022/6/e36774", url="http://www.ncbi.nlm.nih.gov/pubmed/35759315" } @Article{info:doi/10.2196/37028, author="Hong, Grace and Smith, Margaret and Lin, Steven", title="The AI Will See You Now: Feasibility and Acceptability of a Conversational AI Medical Interviewing System", journal="JMIR Form Res", year="2022", month="Jun", day="27", volume="6", number="6", pages="e37028", keywords="artificial intelligence", keywords="feasibility studies", keywords="patient acceptance of health care", keywords="diagnostic errors", keywords="patient-generated health data", keywords="clinical", keywords="medical history", keywords="healthcare", keywords="health care", abstract="Background: Primary care physicians (PCPs) are often limited in their ability to collect detailed medical histories from patients, which can lead to errors or delays in diagnosis. Recent advances in artificial intelligence (AI) show promise in augmenting current human-driven methods of collecting personal and family histories; however, such tools are largely unproven. Objective: The main aim of this pilot study was to evaluate the feasibility and acceptability of a conversational AI medical interviewing system among patients. Methods: The study was conducted among adult patients empaneled at a family medicine clinic within a large academic medical center in Northern California. Participants were asked to test an AI medical interviewing system, which uses a conversational avatar and chatbot to capture medical histories and identify patients with risk factors. After completing an interview with the AI system, participants completed a web-based survey inquiring about the performance of the system, the ease of using the system, and attitudes toward the system. Responses on a 7-point Likert scale were collected and evaluated using descriptive statistics. Results: A total of 20 patients with a mean age of 50 years completed an interview with the AI system, including 12 females (60\%) and 8 males (40\%); 11 were White (55\%), 8 were Asian (40\%), and 1 was Black (5\%), and 19 had at least a bachelor's degree (95\%). Most participants agreed that using the system to collect histories could help their PCPs have a better understanding of their health (16/20, 80\%) and help them stay healthy through identification of their health risks (14/20, 70\%). Those who reported that the system was clear and understandable, and that they were able to learn it quickly, tended to be younger; those who reported that the tool could motivate them to share more comprehensive histories with their PCPs tended to be older. Conclusions: In this feasibility and acceptability pilot of a conversational AI medical interviewing system, the majority of patients believed that it could help clinicians better understand their health and identify health risks; however, patients were split on the effort required to use the system, and whether AI should be used for medical interviewing. Our findings suggest areas for further research, such as understanding the user interface factors that influence ease of use and adoption, and the reasons behind patients' attitudes toward AI-assisted history-taking. ", doi="10.2196/37028", url="https://formative.jmir.org/2022/6/e37028", url="http://www.ncbi.nlm.nih.gov/pubmed/35759326" } @Article{info:doi/10.2196/37209, author="Nguyen, Hai and Meczner, Andras and Burslam-Dawe, Krista and Hayhoe, Benedict", title="Triage Errors in Primary and Pre--Primary Care", journal="J Med Internet Res", year="2022", month="Jun", day="24", volume="24", number="6", pages="e37209", keywords="triage errors", keywords="pre-primary care", keywords="digital symptom checker", keywords="primary care", keywords="viewpoint", keywords="triage", keywords="symptom checker", keywords="emergency care", doi="10.2196/37209", url="https://www.jmir.org/2022/6/e37209", url="http://www.ncbi.nlm.nih.gov/pubmed/35749166" } @Article{info:doi/10.2196/33834, author="Ge, Wendong and Alabsi, Haitham and Jain, Aayushee and Ye, Elissa and Sun, Haoqi and Fernandes, Marta and Magdamo, Colin and Tesh, A. Ryan and Collens, I. Sarah and Newhouse, Amy and MVR Moura, Lidia and Zafar, Sahar and Hsu, John and Akeju, Oluwaseun and Robbins, K. Gregory and Mukerji, S. Shibani and Das, Sudeshna and Westover, Brandon M.", title="Identifying Patients With Delirium Based on Unstructured Clinical Notes: Observational Study", journal="JMIR Form Res", year="2022", month="Jun", day="24", volume="6", number="6", pages="e33834", keywords="delirium", keywords="electronic health records", keywords="clinical notes", keywords="machine learning", keywords="natural language processing", abstract="Background: Delirium in hospitalized patients is a syndrome of acute brain dysfunction. Diagnostic (International Classification of Diseases [ICD]) codes are often used in studies using electronic health records (EHRs), but they are inaccurate. Objective: We sought to develop a more accurate method using natural language processing (NLP) to detect delirium episodes on the basis of unstructured clinical notes. Methods: We collected 1.5 million notes from >10,000 patients from among 9 hospitals. Seven experts iteratively labeled 200,471 sentences. Using these, we trained three NLP classifiers: Support Vector Machine, Recurrent Neural Networks, and Transformer. Testing was performed using an external data set. We also evaluated associations with delirium billing (ICD) codes, medications, orders for restraints and sitters, direct assessments (Confusion Assessment Method?[CAM] scores), and in-hospital mortality. F1 scores, confusion matrices, and areas under the receiver operating characteristic curve (AUCs) were used to compare NLP models. We used the $\phi$ coefficient to measure associations with other delirium indicators. Results: The transformer NLP performed best on the following parameters: micro F1=0.978, macro F1=0.918, positive AUC=0.984, and negative AUC=0.992. NLP detections exhibited higher correlations ($\phi$) than ICD codes with deliriogenic medications (0.194 vs 0.073 for ICD codes), restraints and sitter orders (0.358 vs 0.177), mortality (0.216 vs 0.000), and CAM scores (0.256 vs --0.028). Conclusions: Clinical notes are an attractive alternative to ICD codes for EHR delirium studies but require automated methods. Our NLP model detects delirium with high accuracy, similar to manual chart review. Our NLP approach can provide more accurate determination of delirium for large-scale EHR-based studies regarding delirium, quality improvement, and clinical trails. ", doi="10.2196/33834", url="https://formative.jmir.org/2022/6/e33834", url="http://www.ncbi.nlm.nih.gov/pubmed/35749214" } @Article{info:doi/10.2196/33446, author="Ismail, Kamaran Rawa and van Breeschoten, Jesper and van der Flier, Silvia and van Loosen, Caspar and Pasmooij, Gerdina Anna Maria and van Dartel, Maaike and van den Eertwegh, Alfons and de Boer, Anthonius and Wouters, Michel and Hilarius, Doranne", title="Medication Use and Clinical Outcomes by the Dutch Institute for Clinical Auditing Medicines Program: Quantitative Analysis", journal="J Med Internet Res", year="2022", month="Jun", day="23", volume="24", number="6", pages="e33446", keywords="real-world data", keywords="quality of care", keywords="medicines", keywords="cancer", abstract="Background: The Dutch Institute for Clinical Auditing (DICA) Medicines Program was set up in September 2018 to evaluate expensive medicine use in daily practice in terms of real-world effectiveness using only existing data sources. Objective: The aim of this study is to describe the potential of the addition of declaration data to quality registries to provide participating centers with benchmark information about the use of medicines and outcomes among patients. Methods: A total of 3 national population-based registries were linked to clinical and financial data from the hospital pharmacy, the Dutch diagnosis treatment combinations information system including in-hospital activities, and survival data from health care insurers. The first results of the real-world data (RWD) linkage are presented using descriptive statistics to assess patient, tumor, and treatment characteristics. Time-to-next-treatment (TTNT) and overall survival (OS) were estimated using the Kaplan-Meier method. Results: A total of 21 Dutch hospitals participated in the DICA Medicines Program, which included 7412 patients with colorectal cancer, 1981 patients with metastasized colon cancer, 3860 patients with lung cancer, 1253 patients with metastasized breast cancer, and 7564 patients with rheumatic disease. The data were used for hospital benchmarking to gain insights into medication use in specific patient populations, treatment information, clinical outcomes, and costs. Detailed treatment information (duration and treatment steps) led to insights into differences between hospitals in daily clinical practices. Furthermore, exploratory analyses on clinical outcomes (TTNT and OS) were possible. Conclusions: The DICA Medicines Program shows that it is possible to gather and link RWD about medicines to 4 disease-specific population-based registries. Since these RWD became available with minimal registration burden and effort for hospitals, this method can be explored in other population-based registries to evaluate real-world efficacy. ", doi="10.2196/33446", url="https://www.jmir.org/2022/6/e33446", url="http://www.ncbi.nlm.nih.gov/pubmed/35737449" } @Article{info:doi/10.2196/36931, author="Tanwani, Jaya and Alam, Fahad and Matava, Clyde and Choi, Stephen and McHardy, Paul and Singer, Oskar and Cheong, Geraldine and Wiegelmann, Julian", title="Development of a Head-Mounted Holographic Needle Guidance System for Enhanced Ultrasound-Guided Neuraxial Anesthesia: System Development and Observational Evaluation", journal="JMIR Form Res", year="2022", month="Jun", day="23", volume="6", number="6", pages="e36931", keywords="mixed reality", keywords="virtual reality", keywords="augmented reality", keywords="HoloLens", keywords="holograms", keywords="neuraxial anesthesia", abstract="Background: Neuraxial anesthesia is conventionally performed using a landmark-based technique. Preprocedural ultrasound is often used in challenging clinical scenarios to identify an ideal needle path. The procedure is then carried out by the operator recreating the ultrasound needle path from memory. We suggest that a needle guidance system using the Microsoft HoloLens mixed reality headset, which projects a hologram of the ideal needle path, can assist operators in replicating the correct needle angulation and result in fewer needle passes. Objective: The objective of the study was to develop software for the mixed reality HoloLens headset, which could be used to augment the performance of neuraxial anesthesia, and establish its face validity in lumbar spine phantom models. Methods: We developed an ultrasound transducer marker and software for the HoloLens, which registers the position and angulation of the ultrasound transducer during preprocedural scans. Once an image of a clear path from skin to the intrathecal space is acquired, a hologram of the ideal needle path is projected onto the user's visual field. The ultrasound probe is removed while the hologram remains in the correct spatial position to visualize the needle trajectory during the procedure as if conducting real-time ultrasound. User testing was performed using a lumbar spine phantom. Results: Preliminary work demonstrates that novice (2 anesthesia residents) and experienced operators (5 attending anesthesiologists) can rapidly learn to use mixed reality holograms to perform neuraxial anesthesia on lumbar spine phantoms. Conclusions: Our study shows promising results for performing neuraxial anesthesia in phantoms using the HoloLens. Although this may have wide-ranging implications for image-guided therapies, further study is required to quantify the accuracy and safety benefit of using holographic guidance. Trial Registration: ClinicalTrials.gov NCT04028284; https://clinicaltrials.gov/ct2/show/NCT04028284 ", doi="10.2196/36931", url="https://formative.jmir.org/2022/6/e36931", url="http://www.ncbi.nlm.nih.gov/pubmed/35737430" } @Article{info:doi/10.2196/34753, author="Chekin, Nasrin and Ayatollahi, Haleh and Karimi Zarchi, Mojgan", title="A Clinical Decision Support System for Assessing the Risk of Cervical Cancer: Development and Evaluation Study", journal="JMIR Med Inform", year="2022", month="Jun", day="22", volume="10", number="6", pages="e34753", keywords="cervical cancer", keywords="clinical decision support system", keywords="risk assessment", keywords="medical informatics", keywords="cancer", keywords="oncology", keywords="decision support", keywords="risk", keywords="CDSS", keywords="cervical", keywords="prototype", keywords="evaluation", keywords="testing", abstract="Background: Cervical cancer has been recognized as a preventable type of cancer. As the assessment of all the risk factors of a disease is challenging for physicians, information technology and risk assessment models have been used to estimate the degree of risk. Objective: The aim of this study was to develop a clinical decision support system to assess the risk of cervical cancer. Methods: This study was conducted in 2 phases in 2021. In the first phase of the study, 20 gynecologists completed a questionnaire to determine the essential parameters for assessing the risk of cervical cancer, and the data were analyzed using descriptive statistics. In the second phase of the study, the prototype of the clinical decision support system was developed and evaluated. Results: The findings revealed that the most important parameters for assessing the risk of cervical cancer consisted of general and specific parameters. In total, the 8 parameters that had the greatest impact on the risk of cervical cancer were selected. After developing the clinical decision support system, it was evaluated and the mean values of sensitivity, specificity, and accuracy were 85.81\%, 93.82\%, and 91.39\%, respectively. Conclusions: The clinical decision support system developed in this study can facilitate the process of identifying people who are at risk of developing cervical cancer. In addition, it can help to increase the quality of health care and reduce the costs associated with the treatment of cervical cancer. ", doi="10.2196/34753", url="https://medinform.jmir.org/2022/6/e34753", url="http://www.ncbi.nlm.nih.gov/pubmed/35731549" } @Article{info:doi/10.2196/34141, author="Cooper, R. Ian and Lindsay, Cameron and Fraser, Keaton and Hill, T. Tiffany and Siu, Andrew and Fletcher, Sarah and Klimas, Jan and Hamilton, Michee-Ana and Frazer, D. Amanda and Humphrys, Elka and Koepke, Kira and Hedden, Lindsay and Price, Morgan and McCracken, K. Rita", title="Finding Primary Care---Repurposing Physician Registration Data to Generate a Regionally Accurate List of Primary Care Clinics: Development and Validation of an Open-Source Algorithm", journal="JMIR Form Res", year="2022", month="Jun", day="22", volume="6", number="6", pages="e34141", keywords="physicians, primary care", keywords="primary health care", keywords="health services accessibility", keywords="practice patterns, physicians", keywords="physicians' offices", keywords="computing methodologies", keywords="algorithms", abstract="Background: Some Canadians have limited access to longitudinal primary care, despite its known advantages for population health. Current initiatives to transform primary care aim to increase access to team-based primary care clinics. However, many regions lack a reliable method to enumerate clinics, limiting estimates of clinical capacity and ongoing access gaps. A region-based complete clinic list is needed to effectively describe clinic characteristics and to compare primary care outcomes at the clinic level. Objective: The objective of this study is to show how publicly available data sources, including the provincial physician license registry, can be used to generate a verifiable, region-wide list of primary care clinics in British Columbia, Canada, using a process named the Clinic List Algorithm (CLA). Methods: The CLA has 10 steps: (1) collect data sets, (2) develop clinic inclusion and exclusion criteria, (3) process data sets, (4) consolidate data sets, (5) transform from list of physicians to initial list of clinics, (6) add additional metadata, (7) create working lists, (8) verify working lists, (9) consolidate working lists, and (10) adjust processing steps based on learnings. Results: The College of Physicians and Surgeons of British Columbia Registry contained 13,726 physicians, at 2915 unique addresses, 6942 (50.58\%) of whom were family physicians (FPs) licensed to practice in British Columbia. The CLA identified 1239 addresses where primary care was delivered by 4262 (61.39\%) FPs. Of the included addresses, 84.50\% (n=1047) were in urban locations, and there was a median of 2 (IQR 2-4, range 1-23) FPs at each unique address. Conclusions: The CLA provides a region-wide description of primary care clinics that improves on simple counts of primary care providers or self-report lists. It identifies the number and location of primary care clinics and excludes primary care providers who are likely not providing community-based primary care. Such information may be useful for estimates of capacity of primary care, as well as for policy planning and research in regions engaged in primary care evaluation or transformation. ", doi="10.2196/34141", url="https://formative.jmir.org/2022/6/e34141", url="http://www.ncbi.nlm.nih.gov/pubmed/35731556" } @Article{info:doi/10.2196/28013, author="Fortmann, Jonas and Lutz, Marlene and Spreckelsen, Cord", title="System for Context-Specific Visualization of Clinical Practice Guidelines (GuLiNav): Concept and Software Implementation", journal="JMIR Form Res", year="2022", month="Jun", day="22", volume="6", number="6", pages="e28013", keywords="clinical practice guideline", keywords="clinical decision support system", keywords="decision support techniques", keywords="computer-assisted decision making", keywords="guideline representation", keywords="workflow control patterns", keywords="workflow", keywords="clinical", keywords="decision making", keywords="support systems", keywords="software", keywords="eHealth", keywords="electronic health", abstract="Background: Clinical decision support systems often adopt and operationalize existing clinical practice guidelines leading to higher guideline availability, increased guideline adherence, and data integration. Most of these systems use an internal state-based model of a clinical practice guideline to derive recommendations but do not provide the user with comprehensive insight into the model. Objective: Here we present a novel approach based on dynamic guideline visualization that incorporates the individual patient's current treatment context. Methods: We derived multiple requirements to be fulfilled by such an enhanced guideline visualization. Using business process and model notation as the representation format for computer-interpretable guidelines, a combination of graph-based representation and logical inferences is adopted for guideline processing. A context-specific guideline visualization is inferred using a business rules engine. Results: We implemented and piloted an algorithmic approach for guideline interpretation and processing. As a result of this interpretation, a context-specific guideline is derived and visualized. Our implementation can be used as a software library but also provides a representational state transfer interface. Spring, Camunda, and Drools served as the main frameworks for implementation. A formative usability evaluation of a demonstrator tool that uses the visualization yielded high acceptance among clinicians. Conclusions: The novel guideline processing and visualization concept proved to be technically feasible. The approach addresses known problems of guideline-based clinical decision support systems. Further research is necessary to evaluate the applicability of the approach in specific medical use cases. ", doi="10.2196/28013", url="https://formative.jmir.org/2022/6/e28013", url="http://www.ncbi.nlm.nih.gov/pubmed/35731571" } @Article{info:doi/10.2196/37004, author="Dang, Ting and Han, Jing and Xia, Tong and Spathis, Dimitris and Bondareva, Erika and Siegele-Brown, Chlo{\"e} and Chauhan, Jagmohan and Grammenos, Andreas and Hasthanasombat, Apinan and Floto, Andres R. and Cicuta, Pietro and Mascolo, Cecilia", title="Exploring Longitudinal Cough, Breath, and Voice Data for COVID-19 Progression Prediction via Sequential Deep Learning: Model Development and Validation", journal="J Med Internet Res", year="2022", month="Jun", day="21", volume="24", number="6", pages="e37004", keywords="COVID-19", keywords="audio", keywords="COVID-19 progression", keywords="deep learning", keywords="mobile health", keywords="longitudinal study", abstract="Background: Recent work has shown the potential of using audio data (eg, cough, breathing, and voice) in the screening for COVID-19. However, these approaches only focus on one-off detection and detect the infection, given the current audio sample, but do not monitor disease progression in COVID-19. Limited exploration has been put forward to continuously monitor COVID-19 progression, especially recovery, through longitudinal audio data. Tracking disease progression characteristics and patterns of recovery could bring insights and lead to more timely treatment or treatment adjustment, as well as better resource management in health care systems. Objective: The primary objective of this study is to explore the potential of longitudinal audio samples over time for COVID-19 progression prediction and, especially, recovery trend prediction using sequential deep learning techniques. Methods: Crowdsourced respiratory audio data, including breathing, cough, and voice samples, from 212 individuals over 5-385 days were analyzed, alongside their self-reported COVID-19 test results. We developed and validated a deep learning--enabled tracking tool using gated recurrent units (GRUs) to detect COVID-19 progression by exploring the audio dynamics of the individuals' historical audio biomarkers. The investigation comprised 2 parts: (1) COVID-19 detection in terms of positive and negative (healthy) tests using sequential audio signals, which was primarily assessed in terms of the area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity, with 95\% CIs, and (2) longitudinal disease progression prediction over time in terms of probability of positive tests, which was evaluated using the correlation between the predicted probability trajectory and self-reported labels. Results: We first explored the benefits of capturing longitudinal dynamics of audio biomarkers for COVID-19 detection. The strong performance, yielding an AUROC of 0.79, a sensitivity of 0.75, and a specificity of 0.71 supported the effectiveness of the approach compared to methods that do not leverage longitudinal dynamics. We further examined the predicted disease progression trajectory, which displayed high consistency with longitudinal test results with a correlation of 0.75 in the test cohort and 0.86 in a subset of the test cohort with 12 (57.1\%) of 21 COVID-19--positive participants who reported disease recovery. Our findings suggest that monitoring COVID-19 evolution via longitudinal audio data has potential in the tracking of individuals' disease progression and recovery. Conclusions: An audio-based COVID-19 progression monitoring system was developed using deep learning techniques, with strong performance showing high consistency between the predicted trajectory and the test results over time, especially for recovery trend predictions. This has good potential in the postpeak and postpandemic era that can help guide medical treatment and optimize hospital resource allocations. The changes in longitudinal audio samples, referred to as audio dynamics, are associated with COVID-19 progression; thus, modeling the audio dynamics can potentially capture the underlying disease progression process and further aid COVID-19 progression prediction. This framework provides a flexible, affordable, and timely tool for COVID-19 tracking, and more importantly, it also provides a proof of concept of how telemonitoring could be applicable to respiratory diseases monitoring, in general. ", doi="10.2196/37004", url="https://www.jmir.org/2022/6/e37004", url="http://www.ncbi.nlm.nih.gov/pubmed/35653606" } @Article{info:doi/10.2196/32867, author="Doerstling, S. Steven and Akrobetu, Dennis and Engelhard, M. Matthew and Chen, Felicia and Ubel, A. Peter", title="A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study", journal="J Med Internet Res", year="2022", month="Jun", day="21", volume="24", number="6", pages="e32867", keywords="crowdfunding", keywords="natural language processing", keywords="named entity recognition", keywords="health care costs", keywords="GoFundMe", abstract="Background: Web-based crowdfunding has become a popular method to raise money for medical expenses, and there is growing research interest in this topic. However, crowdfunding data are largely composed of unstructured text, thereby posing many challenges for researchers hoping to answer questions about specific medical conditions. Previous studies have used methods that either failed to address major challenges or were poorly scalable to large sample sizes. To enable further research on this emerging funding mechanism in health care, better methods are needed. Objective: We sought to validate an algorithm for identifying 11 disease categories in web-based medical crowdfunding campaigns. We hypothesized that a disease identification algorithm combining a named entity recognition (NER) model and word search approach could identify disease categories with high precision and accuracy. Such an algorithm would facilitate further research using these data. Methods: Web scraping was used to collect data on medical crowdfunding campaigns from GoFundMe (GoFundMe Inc). Using pretrained NER and entity resolution models from Spark NLP for Healthcare in combination with targeted keyword searches, we constructed an algorithm to identify conditions in the campaign descriptions, translate conditions to International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes, and predict the presence or absence of 11 disease categories in the campaigns. The classification performance of the algorithm was evaluated against 400 manually labeled campaigns. Results: We collected data on 89,645 crowdfunding campaigns through web scraping. The interrater reliability for detecting the presence of broad disease categories in the campaign descriptions was high (Cohen $\kappa$: range 0.69-0.96). The NER and entity resolution models identified 6594 unique (276,020 total) ICD-10-CM codes among all of the crowdfunding campaigns in our sample. Through our word search, we identified 3261 additional campaigns for which a medical condition was not otherwise detected with the NER model. When averaged across all disease categories and weighted by the number of campaigns that mentioned each disease category, the algorithm demonstrated an overall precision of 0.83 (range 0.48-0.97), a recall of 0.77 (range 0.42-0.98), an F1 score of 0.78 (range 0.56-0.96), and an accuracy of 95\% (range 90\%-98\%). Conclusions: A disease identification algorithm combining pretrained natural language processing models and ICD-10-CM code--based disease categorization was able to detect 11 disease categories in medical crowdfunding campaigns with high precision and accuracy. ", doi="10.2196/32867", url="https://www.jmir.org/2022/6/e32867", url="http://www.ncbi.nlm.nih.gov/pubmed/35727610" } @Article{info:doi/10.2196/35421, author="Choudhury, Avishek", title="Toward an Ecologically Valid Conceptual Framework for the Use of Artificial Intelligence in Clinical Settings: Need for Systems Thinking, Accountability, Decision-making, Trust, and Patient Safety Considerations in Safeguarding the Technology and Clinicians", journal="JMIR Hum Factors", year="2022", month="Jun", day="21", volume="9", number="2", pages="e35421", keywords="health care", keywords="artificial intelligence", keywords="ecological validity", keywords="trust in AI", keywords="clinical workload", keywords="patient safety", keywords="AI accountability", keywords="reliability", doi="10.2196/35421", url="https://humanfactors.jmir.org/2022/2/e35421", url="http://www.ncbi.nlm.nih.gov/pubmed/35727615" } @Article{info:doi/10.2196/34741, author="Dinh, Nhi and Agarwal, Smisha and Avery, Lisa and Ponnappan, Priya and Chelangat, Judith and Amendola, Paul and Labrique, Alain and Bartlett, Linda", title="Implementation Outcomes Assessment of a Digital Clinical Support Tool for Intrapartum Care in Rural Kenya: Observational Analysis", journal="JMIR Form Res", year="2022", month="Jun", day="20", volume="6", number="6", pages="e34741", keywords="newborn", keywords="neonatal health", keywords="maternal health", keywords="intrapartum care", keywords="labor and delivery", keywords="Kenya", keywords="digital clinical decision support", keywords="health information systems", keywords="digital health", keywords="implementation research", abstract="Background: iDeliver, a digital clinical support system for maternal and neonatal care, was developed to support quality of care improvements in Kenya. Objective: Taking an implementation research approach, we evaluated the adoption and fidelity of iDeliver over time and assessed the feasibility of its use to provide routine Ministry of Health (MOH) reports. Methods: We analyzed routinely collected data from iDeliver, which was implemented at the Transmara West Sub-County Hospital from December 2018 to September 2020. To evaluate its adoption, we assessed the proportion of actual facility deliveries that was recorded in iDeliver over time. We evaluated the fidelity of iDeliver use by studying the completeness of data entry by care providers during each stage of the labor and delivery workflow and whether the use reflected iDeliver's envisioned function. We also examined the data completeness of the maternal and neonatal indicators prioritized by the Kenya MOH. Results: A total of 1164 deliveries were registered in iDeliver, capturing 45.31\% (1164/2569) of the facility's deliveries over 22 months. This uptake of registration improved significantly over time by 6.7\% (SE 2.1) on average in each quarter-year (P=.005), from 9.6\% (15/157) in the fourth quarter of 2018 to 64\% (235/367) in the third quarter of 2020. Across iDeliver's workflow, the overall completion rate of all variables improved significantly by 2.9\% (SE 0.4) on average in each quarter-year (P<.001), from 22.25\% (257/1155) in the fourth quarter of 2018 to 49.21\% (8905/18,095) in the third quarter of 2020. Data completion was highest for the discharge-labor summary stage (16,796/23,280, 72.15\%) and lowest for the labor signs stage (848/5820, 14.57\%). The completion rate of the key MOH indicators also improved significantly by 4.6\% (SE 0.5) on average in each quarter-year (P<.001), from 27.1\% (69/255) in the fourth quarter of 2018 to 83.75\% (3346/3995) in the third quarter of 2020. Conclusions: iDeliver's adoption and data completeness improved significantly over time. The assessment of iDeliver' use fidelity suggested that some features were more easily used because providers had time to enter data; however, there was low use during active childbirth, which is when providers are necessarily engaged with the woman and newborn. These insights on the adoption and fidelity of iDeliver use prompted the team to adapt the application to reflect the users' culture of use and further improve the implementation of iDeliver. ", doi="10.2196/34741", url="https://formative.jmir.org/2022/6/e34741", url="http://www.ncbi.nlm.nih.gov/pubmed/35723911" } @Article{info:doi/10.2196/36958, author="Yang, Hao and Li, Jiaxi and Liu, Siru and Yang, Xiaoling and Liu, Jialin", title="Predicting Risk of Hypoglycemia in Patients With Type 2 Diabetes by Electronic Health Record--Based Machine Learning: Development and Validation", journal="JMIR Med Inform", year="2022", month="Jun", day="16", volume="10", number="6", pages="e36958", keywords="diabetes", keywords="type 2 diabetes", keywords="hypoglycemia", keywords="learning", keywords="machine learning model", keywords="EHR", keywords="electronic health record", keywords="XGBoost", keywords="natural language processing", abstract="Background: Hypoglycemia is a common adverse event in the treatment of diabetes. To efficiently cope with hypoglycemia, effective hypoglycemia prediction models need to be developed. Objective: The aim of this study was to develop and validate machine learning models to predict the risk of hypoglycemia in adult patients with type 2 diabetes. Methods: We used the electronic health records of all adult patients with type 2 diabetes admitted to West China Hospital between November 2019 and December 2021. The prediction model was developed based on XGBoost and natural language processing. F1 score, area under the receiver operating characteristic curve (AUC), and decision curve analysis (DCA) were used as the main criteria to evaluate model performance. Results: We included 29,843 patients with type 2 diabetes, of whom 2804 patients (9.4\%) developed hypoglycemia. In this study, the embedding machine learning model (XGBoost3) showed the best performance among all the models. The AUC and the accuracy of XGBoost are 0.82 and 0.93, respectively. The XGboost3 was also superior to other models in DCA. Conclusions: The Paragraph Vector--Distributed Memory model can effectively extract features and improve the performance of the XGBoost model, which can then effectively predict hypoglycemia in patients with type 2 diabetes. ", doi="10.2196/36958", url="https://medinform.jmir.org/2022/6/e36958", url="http://www.ncbi.nlm.nih.gov/pubmed/35708754" } @Article{info:doi/10.2196/37689, author="Yang, Donghun and Kim, Jimin and Yoo, Junsang and Cha, Chul Won and Paik, Hyojung", title="Identifying the Risk of Sepsis in Patients With Cancer Using Digital Health Care Records: Machine Learning--Based Approach", journal="JMIR Med Inform", year="2022", month="Jun", day="15", volume="10", number="6", pages="e37689", keywords="sepsis", keywords="cancer", keywords="EHR", keywords="machine learning", keywords="deep learning", keywords="mortality rate", keywords="learning model", keywords="electronic health record", keywords="network based analysis", keywords="sepsis risk", keywords="risk model", keywords="prediction model", abstract="Background: Sepsis is diagnosed in millions of people every year, resulting in a high mortality rate. Although patients with sepsis present multimorbid conditions, including cancer, sepsis predictions have mainly focused on patients with severe injuries. Objective: In this paper, we present a machine learning--based approach to identify the risk of sepsis in patients with cancer using electronic health records (EHRs). Methods: We utilized deidentified anonymized EHRs of 8580 patients with cancer from the Samsung Medical Center in Korea in a longitudinal manner between 2014 and 2019. To build a prediction model based on physical status that would differ between sepsis and nonsepsis patients, we analyzed 2462 laboratory test results and 2266 medication prescriptions using graph network and statistical analyses. The medication relationships and lab test results from each analysis were used as additional learning features to train our predictive model. Results: Patients with sepsis showed differential medication trajectories and physical status. For example, in the network-based analysis, narcotic analgesics were prescribed more often in the sepsis group, along with other drugs. Likewise, 35 types of lab tests, including albumin, globulin, and prothrombin time, showed significantly different distributions between sepsis and nonsepsis patients (P<.001). Our model outperformed the model trained using only common EHRs, showing an improved accuracy, area under the receiver operating characteristic (AUROC), and F1 score by 11.9\%, 11.3\%, and 13.6\%, respectively. For the random forest--based model, the accuracy, AUROC, and F1 score were 0.692, 0.753, and 0.602, respectively. Conclusions: We showed that lab tests and medication relationships can be used as efficient features for predicting sepsis in patients with cancer. Consequently, identifying the risk of sepsis in patients with cancer using EHRs and machine learning is feasible. ", doi="10.2196/37689", url="https://medinform.jmir.org/2022/6/e37689", url="http://www.ncbi.nlm.nih.gov/pubmed/35704364" } @Article{info:doi/10.2196/34678, author="Weinert, Lina and M{\"u}ller, Julia and Svensson, Laura and Heinze, Oliver", title="Perspective of Information Technology Decision Makers on Factors Influencing Adoption and Implementation of Artificial Intelligence Technologies in 40 German Hospitals: Descriptive Analysis", journal="JMIR Med Inform", year="2022", month="Jun", day="15", volume="10", number="6", pages="e34678", keywords="artificial intelligence", keywords="AI readiness", keywords="implementation", keywords="decision-making", keywords="descriptive analysis", keywords="quantitative study", abstract="Background: New artificial intelligence (AI) tools are being developed at a high speed. However, strategies and practical experiences surrounding the adoption and implementation of AI in health care are lacking. This is likely because of the high implementation complexity of AI, legacy IT infrastructure, and unclear business cases, thus complicating AI adoption. Research has recently started to identify the factors influencing AI readiness of organizations. Objective: This study aimed to investigate the factors influencing AI readiness as well as possible barriers to AI adoption and implementation in German hospitals. We also assessed the status quo regarding the dissemination of AI tools in hospitals. We focused on IT decision makers, a seldom studied but highly relevant group. Methods: We created a web-based survey based on recent AI readiness and implementation literature. Participants were identified through a publicly accessible database and contacted via email or invitational leaflets sent by mail, in some cases accompanied by a telephonic prenotification. The survey responses were analyzed using descriptive statistics. Results: We contacted 609 possible participants, and our database recorded 40 completed surveys. Most participants agreed or rather agreed with the statement that AI would be relevant in the future, both in Germany (37/40, 93\%) and in their own hospital (36/40, 90\%). Participants were asked whether their hospitals used or planned to use AI technologies. Of the 40 participants, 26 (65\%) answered ``yes.'' Most AI technologies were used or planned for patient care, followed by biomedical research, administration, and logistics and central purchasing. The most important barriers to AI were lack of resources (staff, knowledge, and financial). Relevant possible opportunities for using AI were increase in efficiency owing to time-saving effects, competitive advantages, and increase in quality of care. Most AI tools in use or in planning have been developed with external partners. Conclusions: Few tools have been implemented in routine care, and many hospitals do not use or plan to use AI in the future. This can likely be explained by missing or unclear business cases or the need for a modern IT infrastructure to integrate AI tools in a usable manner. These shortcomings complicate decision-making and resource attribution. As most AI technologies already in use were developed in cooperation with external partners, these relationships should be fostered. IT decision makers should assess their hospitals' readiness for AI individually with a focus on resources. Further research should continue to monitor the dissemination of AI tools and readiness factors to determine whether improvements can be made over time. This monitoring is especially important with regard to government-supported investments in AI technologies that could alleviate financial burdens. Qualitative studies with hospital IT decision makers should be conducted to further explore the reasons for slow AI. ", doi="10.2196/34678", url="https://medinform.jmir.org/2022/6/e34678", url="http://www.ncbi.nlm.nih.gov/pubmed/35704378" } @Article{info:doi/10.2196/36787, author="Elnakib, Shatha and Vecino-Ortiz, I. Andres and Gibson, G. Dustin and Agarwal, Smisha and Trujillo, J. Antonio and Zhu, Yifan and Labrique, B. Alain", title="A Novel Score for mHealth Apps to Predict and Prevent Mortality: Further Validation and Adaptation to the US Population Using the US National Health and Nutrition Examination Survey Data Set", journal="J Med Internet Res", year="2022", month="Jun", day="14", volume="24", number="6", pages="e36787", keywords="C-Score", keywords="validation", keywords="mortality", keywords="predictive models", keywords="mobile phone", abstract="Background: The C-Score, which is an individual health score, is based on a predictive model validated in the UK and US populations. It was designed to serve as an individualized point-in-time health assessment tool that could be integrated into clinical counseling or consumer-facing digital health tools to encourage lifestyle modifications that reduce the risk of premature death. Objective: Our study aimed to conduct an external validation of the C-Score in the US population and expand the original score to improve its predictive capabilities in the US population. The C-Score is intended for mobile health apps on wearable devices. Methods: We conducted a literature review to identify relevant variables that were missing in the original C-Score. Subsequently, we used data from the 2005 to 2014 US National Health and Nutrition Examination Survey (NHANES; N=21,015) to test the capacity of the model to predict all-cause mortality. We used NHANES III data from 1988 to 1994 (N=1440) to conduct an external validation of the test. Only participants with complete data were included in this study. Discrimination and calibration tests were conducted to assess the operational characteristics of the adapted C-Score from receiver operating curves and a design-based goodness-of-fit test. Results: Higher C-Scores were associated with reduced odds of all-cause mortality (odds ratio 0.96, P<.001). We found a good fit of the C-Score for all-cause mortality with an area under the curve (AUC) of 0.72. Among participants aged between 40 and 69 years, C-Score models had a good fit for all-cause mortality and an AUC >0.72. A sensitivity analysis using NHANES III data (1988-1994) was performed, yielding similar results. The inclusion of sociodemographic and clinical variables in the basic C-Score increased the AUCs from 0.72 (95\% CI 0.71-0.73) to 0.87 (95\% CI 0.85-0.88). Conclusions: Our study shows that this digital biomarker, the C-Score, has good capabilities to predict all-cause mortality in the general US population. An expanded health score can predict 87\% of the mortality in the US population. This model can be used as an instrument to assess individual mortality risk and as a counseling tool to motivate behavior changes and lifestyle modifications. ", doi="10.2196/36787", url="https://www.jmir.org/2022/6/e36787", url="http://www.ncbi.nlm.nih.gov/pubmed/35483022" } @Article{info:doi/10.2196/34554, author="Paquette, Fran{\c{c}}ois-Xavier and Ghassemi, Amir and Bukhtiyarova, Olga and Cisse, Moustapha and Gagnon, Natanael and Della Vecchia, Alexia and Rabearivelo, A. Hobivola and Loudiyi, Youssef", title="Machine Learning Support for Decision-Making in Kidney Transplantation: Step-by-step Development of a Technological Solution", journal="JMIR Med Inform", year="2022", month="Jun", day="14", volume="10", number="6", pages="e34554", keywords="machine learning", keywords="artificial intelligence", keywords="medical decision support", keywords="kidney transplantation", abstract="Background: Kidney transplantation is the preferred treatment option for patients with end-stage renal disease. To maximize patient and graft survival, the allocation of donor organs to potential recipients requires careful consideration. Objective: This study aimed to develop an innovative technological solution to enable better prediction of kidney transplant survival for each potential donor-recipient pair. Methods: We used deidentified data on past organ donors, recipients, and transplant outcomes in the United States from the Scientific Registry of Transplant Recipients. To predict transplant outcomes for potential donor-recipient pairs, we used several survival analysis models, including regression analysis (Cox proportional hazards), random survival forests, and several artificial neural networks (DeepSurv, DeepHit, and recurrent neural network [RNN]). We evaluated the performance of each model in terms of its ability to predict the probability of graft survival after kidney transplantation from deceased donors. Three metrics were used: the C-index, integrated Brier score, and integrated calibration index, along with calibration plots. Results: On the basis of the C-index metrics, the neural network--based models (DeepSurv, DeepHit, and RNN) had better discriminative ability than the Cox model and random survival forest model (0.650, 0.661, and 0.659 vs 0.646 and 0.644, respectively). The proposed RNN model offered a compromise between the good discriminative ability and calibration and was implemented in a technological solution of technology readiness level 4. Conclusions: Our technological solution based on the RNN model can effectively predict kidney transplant survival and provide support for medical professionals and candidate recipients in determining the most optimal donor-recipient pair. ", doi="10.2196/34554", url="https://medinform.jmir.org/2022/6/e34554", url="http://www.ncbi.nlm.nih.gov/pubmed/35700006" } @Article{info:doi/10.2196/36501, author="Fujimori, Ryo and Liu, Keibun and Soeno, Shoko and Naraba, Hiromu and Ogura, Kentaro and Hara, Konan and Sonoo, Tomohiro and Ogura, Takayuki and Nakamura, Kensuke and Goto, Tadahiro", title="Acceptance, Barriers, and Facilitators to Implementing Artificial Intelligence--Based Decision Support Systems in Emergency Departments: Quantitative and Qualitative Evaluation", journal="JMIR Form Res", year="2022", month="Jun", day="13", volume="6", number="6", pages="e36501", keywords="clinical decision support system", keywords="preimplementation", keywords="qualitative", keywords="mixed methods", keywords="artificial intelligence", keywords="emergency medicine", keywords="CDSS", keywords="computerized decision", keywords="computerized decision support system", keywords="AI", keywords="AI-based", keywords="CFIR", keywords="quantitative analysis", abstract="Background: Despite the increasing availability of clinical decision support systems (CDSSs) and rising expectation for CDSSs based on artificial intelligence (AI), little is known about the acceptance of AI-based CDSS by physicians and its barriers and facilitators in emergency care settings. Objective: We aimed to evaluate the acceptance, barriers, and facilitators to implementing AI-based CDSSs in the emergency care setting through the opinions of physicians on our newly developed, real-time AI-based CDSS, which alerts ED physicians by predicting aortic dissection based on numeric and text information from medical charts, by using the Unified Theory of Acceptance and Use of Technology (UTAUT; for quantitative evaluation) and the Consolidated Framework for Implementation Research (CFIR; for qualitative evaluation) frameworks. Methods: This mixed methods study was performed from March to April 2021. Transitional year residents (n=6), emergency medicine residents (n=5), and emergency physicians (n=3) from two community, tertiary care hospitals in Japan were included. We first developed a real-time CDSS for predicting aortic dissection based on numeric and text information from medical charts (eg, chief complaints, medical history, vital signs) with natural language processing. This system was deployed on the internet, and the participants used the system with clinical vignettes of model cases. Participants were then involved in a mixed methods evaluation consisting of a UTAUT-based questionnaire with a 5-point Likert scale (quantitative) and a CFIR-based semistructured interview (qualitative). Cronbach $\alpha$ was calculated as a reliability estimate for UTAUT subconstructs. Interviews were sampled, transcribed, and analyzed using the MaxQDA software. The framework analysis approach was used during the study to determine the relevance of the CFIR constructs. Results: All 14 participants completed the questionnaires and interviews. Quantitative analysis revealed generally positive responses for user acceptance with all scores above the neutral score of 3.0. In addition, the mixed methods analysis identified two significant barriers (System Performance, Compatibility) and two major facilitators (Evidence Strength, Design Quality) for implementation of AI-based CDSSs in emergency care settings. Conclusions: Our mixed methods evaluation based on theoretically grounded frameworks revealed the acceptance, barriers, and facilitators of implementation of AI-based CDSS. Although the concern of system failure and overtrusting of the system could be barriers to implementation, the locality of the system and designing an intuitive user interface could likely facilitate the use of optimal AI-based CDSS. Alleviating and resolving these factors should be key to achieving good user acceptance of AI-based CDSS. ", doi="10.2196/36501", url="https://formative.jmir.org/2022/6/e36501", url="http://www.ncbi.nlm.nih.gov/pubmed/35699995" } @Article{info:doi/10.2196/30210, author="Chin, Kuan-Chen and Cheng, Yu-Chia and Sun, Jen-Tang and Ou, Chih-Yen and Hu, Chun-Hua and Tsai, Ming-Chi and Ma, Huei-Ming Matthew and Chiang, Wen-Chu and Chen, Y. Albert", title="Machine Learning--Based Text Analysis to Predict Severely Injured Patients in Emergency Medical Dispatch: Model Development and Validation", journal="J Med Internet Res", year="2022", month="Jun", day="10", volume="24", number="6", pages="e30210", keywords="emergency medical service", keywords="emergency medical dispatch", keywords="dispatcher", keywords="trauma", keywords="machine learning", keywords="frequency--inverse document frequency", keywords="Bernoulli na{\"i}ve Bayes", abstract="Background: Early recognition of severely injured patients in prehospital settings is of paramount importance for timely treatment and transportation of patients to further treatment facilities. The dispatching accuracy has seldom been addressed in previous studies. Objective: In this study, we aimed to build a machine learning--based model through text mining of emergency calls for the automated identification of severely injured patients after a road accident. Methods: Audio recordings of road accidents in Taipei City, Taiwan, in 2018 were obtained and randomly sampled. Data on call transfers or non-Mandarin speeches were excluded. To predict cases of severe trauma identified on-site by emergency medical technicians, all included cases were evaluated by both humans (6 dispatchers) and a machine learning model, that is, a prehospital-activated major trauma (PAMT) model. The PAMT model was developed using term frequency--inverse document frequency, rule-based classification, and a Bernoulli na{\"i}ve Bayes classifier. Repeated random subsampling cross-validation was applied to evaluate the robustness of the model. The prediction performance of dispatchers and the PAMT model, in severe cases, was compared. Performance was indicated by sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Results: Although the mean sensitivity and negative predictive value obtained by the PAMT model were higher than those of dispatchers, they obtained higher mean specificity, positive predictive value, and accuracy. The mean accuracy of the PAMT model, from certainty level 0 (lowest certainty) to level 6 (highest certainty), was higher except for levels 5 and 6. The overall performances of the dispatchers and the PAMT model were similar; however, the PAMT model had higher accuracy in cases where the dispatchers were less certain of their judgments. Conclusions: A machine learning--based model, called the PAMT model, was developed to predict severe road accident trauma. The results of our study suggest that the accuracy of the PAMT model is not superior to that of the participating dispatchers; however, it may assist dispatchers when they lack confidence while making a judgment. ", doi="10.2196/30210", url="https://www.jmir.org/2022/6/e30210", url="http://www.ncbi.nlm.nih.gov/pubmed/35687393" } @Article{info:doi/10.2196/32439, author="Rivas Velarde, Minerva and Jagoe, Caroline and Cuculick, Jessica", title="Video Relay Interpretation and Overcoming Barriers in Health Care for Deaf Users: Scoping Review", journal="J Med Internet Res", year="2022", month="Jun", day="9", volume="24", number="6", pages="e32439", keywords="deafness", keywords="disability", keywords="accessibility", keywords="communication", keywords="video", keywords="remote interpretation", keywords="health care", keywords="system", keywords="deaf users", keywords="sign language", keywords="interpreter", keywords="medical interpretation", keywords="mobile phone", abstract="Background: Persons who are deaf are more likely to avoid health care providers than those who can hear, partially because of the lack of means of communication with these providers and the dearth of available interpreters. The use of video remote interpretation, namely the video camera on an electronic device, to connect deaf patients and health providers has rapidly expanded owing to its flexibility and advantageous cost compared with in-person sign language interpretation. Thus, we need to learn more about how this technology could effectively engage with and respond to the priorities of its users. Objective: We aimed to identify existing evidence regarding the use of video remote interpretation (VRI) in health care settings and to assess whether VRI technology can enable deaf users to overcome barriers to interpretation and improve communication outcomes between them and health care personnel. Methods: We conducted a search in 7 medical research databases (including MEDLINE, Web of Science, Embase, and Google Scholar) from 2006 including bibliographies and citations of relevant papers. The searches included articles in English, Spanish, and French. The eligibility criteria for study selection included original articles on the use of VRI for deaf or hard of hearing (DHH) sign language users for, or within, health care. Results: From the original 176 articles identified, 120 were eliminated after reading the article title and abstract, and 41 articles were excluded after they were fully read. In total, 15 articles were included in this study: 4 studies were literature reviews, 4 were surveys, 3 were qualitative studies, and 1 was a mixed methods study that combined qualitative and quantitative data, 1 brief communication, 1 quality improvement report, and 1 secondary analysis. In this scoping review, we identified a knowledge gap regarding the quality of interpretation and training in sign language interpretation for health care. It also shows that this area is underresearched, and evidence is scant. All evidence came from high-income countries, which is particularly problematic given that most DHH persons live in low- and middle-income countries. Conclusions: Furthering our understanding of the use of VRI technology is pertinent and relevant. The available literature shows that VRI may enable deaf users to overcome interpretation barriers and can potentially improve communication outcomes between them and health personnel within health care services. For VRI to be acceptable, sign language users require a VRI system supported by devices with large screens and a reliable internet connection, as well as qualified interpreters trained on medical interpretation. ", doi="10.2196/32439", url="https://www.jmir.org/2022/6/e32439", url="http://www.ncbi.nlm.nih.gov/pubmed/35679099" } @Article{info:doi/10.2196/34295, author="Sun, Hong and Depraetere, Kristof and Meesseman, Laurent and Cabanillas Silva, Patricia and Szymanowsky, Ralph and Fliegenschmidt, Janis and Hulde, Nikolai and von Dossow, Vera and Vanbiervliet, Martijn and De Baerdemaeker, Jos and Roccaro-Waldmeyer, M. Diana and Stieg, J{\"o}rg and Dom{\'i}nguez Hidalgo, Manuel and Dahlweid, Fried-Michael", title="Machine Learning--Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance", journal="J Med Internet Res", year="2022", month="Jun", day="7", volume="24", number="6", pages="e34295", keywords="machine learning", keywords="clinical risk prediction", keywords="prediction", keywords="model", keywords="model evaluation", keywords="scalability", keywords="risk", keywords="live clinical workflow", keywords="delirium", keywords="sepsis", keywords="acute kidney injury", keywords="kidney", keywords="EHR", keywords="electronic health record", keywords="workflow", keywords="algorithm", abstract="Background: Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. Objective: The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. Methods: We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital's specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations. Results: The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8\% to 94.2\% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2\% to 86.3\% at discharge), which indicates the importance of model calibration with data from the deployment hospital. Conclusions: Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals. ", doi="10.2196/34295", url="https://www.jmir.org/2022/6/e34295", url="http://www.ncbi.nlm.nih.gov/pubmed/35502887" } @Article{info:doi/10.2196/34298, author="Schmude, Marcel and Salim, Nahya and Azadzoy, Hila and Bane, Mustafa and Millen, Elizabeth and O'Donnell, Lisa and Bode, Philipp and T{\"u}rk, Ewelina and Vaidya, Ria and Gilbert, Stephen", title="Investigating the Potential for Clinical Decision Support in Sub-Saharan Africa With AFYA (Artificial Intelligence-Based Assessment of Health Symptoms in Tanzania): Protocol for a Prospective, Observational Pilot Study", journal="JMIR Res Protoc", year="2022", month="Jun", day="7", volume="11", number="6", pages="e34298", keywords="differential diagnosis", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="decision support", keywords="diagnostic decision support systems", keywords="diagnosis", keywords="Africa", keywords="low income", keywords="middle income", keywords="user centred design", keywords="user centered design", keywords="symptom assessment", keywords="chatbot", keywords="health app", keywords="prototype", abstract="Background: Low- and middle-income countries face difficulties in providing adequate health care. One of the reasons is a shortage of qualified health workers. Diagnostic decision support systems are designed to aid clinicians in their work and have the potential to mitigate pressure on health care systems. Objective: The Artificial Intelligence--Based Assessment of Health Symptoms in Tanzania (AFYA) study will evaluate the potential of an English-language artificial intelligence--based prototype diagnostic decision support system for mid-level health care practitioners in a low- or middle-income setting. Methods: This is an observational, prospective clinical study conducted in a busy Tanzanian district hospital. In addition to usual care visits, study participants will consult a mid-level health care practitioner, who will use a prototype diagnostic decision support system, and a study physician. The accuracy and comprehensiveness of the differential diagnosis provided by the diagnostic decision support system will be evaluated against a gold-standard differential diagnosis provided by an expert panel. Results: Patient recruitment started in October 2021. Participants were recruited directly in the waiting room of the outpatient clinic at the hospital. Data collection will conclude in May 2022. Data analysis is planned to be finished by the end of June 2022. The results will be published in a peer-reviewed journal. Conclusions: Most diagnostic decision support systems have been developed and evaluated in high-income countries, but there is great potential for these systems to improve the delivery of health care in low- and middle-income countries. The findings of this real-patient study will provide insights based on the performance and usability of a prototype diagnostic decision support system in low- or middle-income countries. Trial Registration: ClinicalTrials.gov NCT04958577; http://clinicaltrials.gov/ct2/show/NCT04958577 International Registered Report Identifier (IRRID): DERR1-10.2196/34298 ", doi="10.2196/34298", url="https://www.researchprotocols.org/2022/6/e34298", url="http://www.ncbi.nlm.nih.gov/pubmed/35671073" } @Article{info:doi/10.2196/37213, author="Li, Shicheng and Deng, Lizong and Zhang, Xu and Chen, Luming and Yang, Tao and Qi, Yifan and Jiang, Taijiao", title="Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation", journal="J Med Internet Res", year="2022", month="Jun", day="3", volume="24", number="6", pages="e37213", keywords="deep phenotyping", keywords="Chinese EHRs", keywords="linguistic pattern", keywords="motif discovery", keywords="pattern recognition", abstract="Background: Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep-phenotyping method for non-English EHRs (ie, Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data that are suitable for developing deep-phenotyping methods are limited. It is challenging to develop a deep-phenotyping method for Chinese EHRs in such a low-resource scenario. Objective: In this study, we aimed to develop a deep-phenotyping method with good generalization ability for Chinese EHRs based on limited fine-grained annotation data. Methods: The core of the methodology was to identify linguistic patterns of phenotype descriptions in Chinese EHRs with a sequence motif discovery tool and perform deep phenotyping of Chinese EHRs by recognizing linguistic patterns in free text. Specifically, 1000 Chinese EHRs were manually annotated based on a fine-grained information model, PhenoSSU (Semantic Structured Unit of Phenotypes). The annotation data set was randomly divided into a training set (n=700, 70\%) and a testing set (n=300, 30\%). The process for mining linguistic patterns was divided into three steps. First, free text in the training set was encoded as single-letter sequences (P: phenotype, A: attribute). Second, a biological sequence analysis tool---MEME (Multiple Expectation Maximums for Motif Elicitation)---was used to identify motifs in the single-letter sequences. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep-phenotyping method for Chinese EHRs, including a deep learning--based method for named entity recognition and a pattern recognition--based method for attribute prediction. Results: In total, 51 sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions could be learned from a mean of 134 (SD 9.7) annotated EHRs in the training set. The deep-phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1 score of 0.898 with the Bidirectional Encoder Representations from Transformers--bidirectional long short-term memory and conditional random field model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern--based method. Conclusions: We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non--English-speaking countries. ", doi="10.2196/37213", url="https://www.jmir.org/2022/6/e37213", url="http://www.ncbi.nlm.nih.gov/pubmed/35657661" } @Article{info:doi/10.2196/32168, author="Yuan, Yingzhe and Price, Megan and Schmidt, F. David and Ward, Merry and Nebeker, Jonathan and Pizer, Steven", title="Integrated Health Record Viewers and Reduction in Duplicate Medical Imaging: Retrospective Observational Analysis", journal="JMIR Med Inform", year="2022", month="May", day="20", volume="10", number="5", pages="e32168", keywords="health informatics", keywords="duplicate medical imaging", keywords="health record viewer", keywords="health care system", keywords="health care", keywords="health records", keywords="electronic health records", keywords="health information exchange", abstract="Background: Health information exchange and multiplatform health record viewers support more informed medical decisions, improve quality of care, and reduce the risk of adverse outcomes due to fragmentation and discontinuity in care during transition of care. An example of a multiplatform health record viewer is the VA/DoD Joint Longitudinal Viewer (JLV), which supports the Department of Veterans Affairs (VA) and Department of Defense (DoD) health care providers with read-only access to patient medical records integrated from multiple sources. JLV is intended to support more informed medical decisions such as reducing duplicate medical imaging when previous image study results may meet current clinical needs. Objective: We estimated the impact of provider usage of JLV on duplicate imaging for service members transitioning from the DoD to the VA health care system. Methods: We conducted a retrospective cross-sectional study in fiscal year 2018 to examine the relationship between providers' use of JLV and the likelihood of ordering duplicate images. Our sample included recently separated service members who had a VA primary care visit in fiscal year 2018 within 90 days of a DoD imaging study. Patients who received at least one imaging study at VA within 90 days of a DoD imaging study of the same imaging mode and on the same body part are considered to have received potentially duplicate imaging studies. We use a logistic regression model with ``JLV provider'' (providers with 1 or more JLV audits in the prior 6 months) as the independent variable to estimate the relationship between JLV use and ordering of duplicate images. Control variables included provider image ordering rates in the prior 6 months, provider type, patient demographics (age, race, gender), and clinical characteristics (Elixhauser comorbidity score). Results: Providers known to utilize JLV in the prior 6 months order fewer duplicate images relative to providers not utilizing JLV for similar visits over time (odds ratio 0.44, 95\% CI 0.24-0.78; P=.005). This effect is robust across multiple specifications of linear and logistic regression models. The provider's practice pattern of ordering image studies and the patient's health status are powerful confounders. Conclusions: This study provides evidence that adoption of a longitudinal viewer of health records from multiple electronic health record systems is associated with a reduced likelihood of ordering duplicate images. Investments in health information exchange systems may be effective ways to improve the quality of care and reduce adverse outcomes for patients experiencing fragmentation and discontinuity of care. ", doi="10.2196/32168", url="https://medinform.jmir.org/2022/5/e32168", url="http://www.ncbi.nlm.nih.gov/pubmed/35594070" } @Article{info:doi/10.2196/37975, author="LeBaron, Virginia and Boukhechba, Mehdi and Edwards, James and Flickinger, Tabor and Ling, David and Barnes, E. Laura", title="Exploring the Use of Wearable Sensors and Natural Language Processing Technology to Improve Patient-Clinician Communication: Protocol for a Feasibility Study", journal="JMIR Res Protoc", year="2022", month="May", day="20", volume="11", number="5", pages="e37975", keywords="communication", keywords="technology", keywords="ubiquitous computing, natural language processing", keywords="cancer", keywords="palliative care", abstract="Background: Effective communication is the bedrock of quality health care, but it continues to be a major problem for patients, family caregivers, health care providers, and organizations. Although progress related to communication skills training for health care providers has been made, clinical practice and research gaps persist, particularly regarding how to best monitor, measure, and evaluate the implementation of communication skills in the actual clinical setting and provide timely feedback about communication effectiveness and quality. Objective: Our interdisciplinary team of investigators aims to develop, and pilot test, a novel sensing system and associated natural language processing algorithms (CommSense) that can (1) be used on mobile devices, such as smartwatches; (2) reliably capture patient-clinician interactions in a clinical setting; and (3) process these communications to extract key markers of communication effectiveness and quality. The long-term goal of this research is to use CommSense in a variety of health care contexts to provide real-time feedback to end users to improve communication and patient health outcomes. Methods: This is a 1-year pilot study. During Phase I (Aim 1), we will identify feasible metrics of communication to extract from conversations using CommSense. To achieve this, clinical investigators will conduct a thorough review of the recent health care communication and palliative care literature to develop an evidence-based ``ideal and optimal'' list of communication metrics. This list will be discussed collaboratively within the study team and consensus will be reached regarding the included items. In Phase II (Aim 2), we will develop the CommSense software by sharing the ``ideal and optimal'' list of communication metrics with engineering investigators to gauge technical feasibility. CommSense will build upon prior work using an existing Android smartwatch platform (SWear) and will include sensing modules that can collect (1) physiological metrics via embedded sensors to measure markers of stress (eg, heart rate variability), (2) gesture data via embedded accelerometer and gyroscope sensors, and (3) voice and ultimately textual features via the embedded microphone. In Phase III (Aim 3), we will pilot test the ability of CommSense to accurately extract identified communication metrics using simulated clinical scenarios with nurse and physician participants. Results: Development of the CommSense platform began in November 2021, with participant recruitment expected to begin in summer 2022. We anticipate that preliminary results will be available in fall 2022. Conclusions: CommSense is poised to make a valuable contribution to communication science, ubiquitous computing technologies, and natural language processing. We are particularly eager to explore the ability of CommSense to support effective virtual and remote health care interactions and reduce disparities related to patient-clinician communication in the context of serious illness. International Registered Report Identifier (IRRID): PRR1-10.2196/37975 ", doi="10.2196/37975", url="https://www.researchprotocols.org/2022/5/e37975", url="http://www.ncbi.nlm.nih.gov/pubmed/35594139" } @Article{info:doi/10.2196/34681, author="Zheng, Yaguang and Dickson, Vaughan Victoria and Blecker, Saul and Ng, M. Jason and Rice, Campbell Brynne and Melkus, D'Eramo Gail and Shenkar, Liat and Mortejo, R. Marie Claire and Johnson, B. Stephen", title="Identifying Patients With Hypoglycemia Using Natural Language Processing: Systematic Literature Review", journal="JMIR Diabetes", year="2022", month="May", day="16", volume="7", number="2", pages="e34681", keywords="hypoglycemia", keywords="natural language processing", keywords="electronic health records", keywords="diabetes", abstract="Background: Accurately identifying patients with hypoglycemia is key to preventing adverse events and mortality. Natural language processing (NLP), a form of artificial intelligence, uses computational algorithms to extract information from text data. NLP is a scalable, efficient, and quick method to extract hypoglycemia-related information when using electronic health record data sources from a large population. Objective: The objective of this systematic review was to synthesize the literature on the application of NLP to extract hypoglycemia from electronic health record clinical notes. Methods: Literature searches were conducted electronically in PubMed, Web of Science Core Collection, CINAHL (EBSCO), PsycINFO (Ovid), IEEE Xplore, Google Scholar, and ACL Anthology. Keywords included hypoglycemia, low blood glucose, NLP, and machine learning. Inclusion criteria included studies that applied NLP to identify hypoglycemia, reported the outcomes related to hypoglycemia, and were published in English as full papers. Results: This review (n=8 studies) revealed heterogeneity of the reported results related to hypoglycemia. Of the 8 included studies, 4 (50\%) reported that the prevalence rate of any level of hypoglycemia was 3.4\% to 46.2\%. The use of NLP to analyze clinical notes improved the capture of undocumented or missed hypoglycemic events using International Classification of Diseases, Ninth Revision (ICD-9), and International Classification of Diseases, Tenth Revision (ICD-10), and laboratory testing. The combination of NLP and ICD-9 or ICD-10 codes significantly increased the identification of hypoglycemic events compared with individual methods; for example, the prevalence rates of hypoglycemia were 12.4\% for International Classification of Diseases codes, 25.1\% for an NLP algorithm, and 32.2\% for combined algorithms. All the reviewed studies applied rule-based NLP algorithms to identify hypoglycemia. Conclusions: The findings provided evidence that the application of NLP to analyze clinical notes improved the capture of hypoglycemic events, particularly when combined with the ICD-9 or ICD-10 codes and laboratory testing. ", doi="10.2196/34681", url="https://diabetes.jmir.org/2022/2/e34681", url="http://www.ncbi.nlm.nih.gov/pubmed/35576579" } @Article{info:doi/10.2196/35981, author="Lai, Chao-Han and Li, Kai-Wen and Hu, Fang-Wen and Su, Pei-Fang and Hsu, I-Lin and Huang, Min?Hsin and Huang, Yen?Ta and Liu, Ping-Yen and Shen, Meng-Ru", title="Integration of an Intensive Care Unit Visualization Dashboard (i-Dashboard) as a Platform to Facilitate Multidisciplinary Rounds: Cluster-Randomized Controlled Trial", journal="J Med Internet Res", year="2022", month="May", day="13", volume="24", number="5", pages="e35981", keywords="Intensive care unit", keywords="multidisciplinary round", keywords="visualization dashboard", keywords="large screen", keywords="information management strategy", keywords="electronic health record", keywords="medical record", keywords="digital health", keywords="dashboard", keywords="i-Dashboard", keywords="electronic medical record", keywords="information exchange", abstract="Background: ?Multidisciplinary rounds (MDRs) are scheduled, patient-focused communication mechanisms among multidisciplinary providers in the intensive care unit (ICU). Objective: i-Dashboard is a custom-developed visualization dashboard that supports (1) key information retrieval and reorganization, (2) time-series data, and (3) display on large touch screens during MDRs. This study aimed to evaluate the performance, including the efficiency of prerounding data gathering, communication accuracy, and information exchange, and clinical satisfaction of integrating i-Dashboard as a platform to facilitate MDRs. Methods: A cluster-randomized controlled trial was performed in 2 surgical ICUs at a university hospital. Study participants included all multidisciplinary care team members. The performance and clinical satisfaction of i-Dashboard during MDRs were compared with those of the established electronic medical record (EMR) through direct observation and questionnaire surveys. Results: Between April 26 and July 18, 2021, a total of 78 and 91 MDRs were performed with the established EMR and i-Dashboard, respectively. For prerounding data gathering, the median time was 10.4 (IQR 9.1-11.8) and 4.6 (IQR 3.5-5.8) minutes using the established EMR and i-Dashboard (P<.001), respectively. During MDRs, data misrepresentations were significantly less frequent with i-Dashboard (median 0, IQR 0-0) than with the established EMR (4, IQR 3-5; P<.001). Further, effective recommendations were significantly more frequent with i-Dashboard than with the established EMR (P<.001). The questionnaire results revealed that participants favored using i-Dashboard in association with the enhancement of care plan development and team participation during MDRs. Conclusions: ?i-Dashboard increases efficiency in data gathering. Displaying i-Dashboard on large touch screens in MDRs may enhance communication accuracy, information exchange, and clinical satisfaction. The design concepts of i-Dashboard may help develop visualization dashboards that are more applicable for ICU MDRs. Trial Registration: ClinicalTrials.gov NCT04845698; https://clinicaltrials.gov/ct2/show/NCT04845698 ", doi="10.2196/35981", url="https://www.jmir.org/2022/5/e35981", url="http://www.ncbi.nlm.nih.gov/pubmed/35560107" } @Article{info:doi/10.2196/33960, author="Schwartz, M. Jessica and George, Maureen and Rossetti, Collins Sarah and Dykes, C. Patricia and Minshall, R. Simon and Lucas, Eugene and Cato, D. Kenrick", title="Factors Influencing Clinician Trust in Predictive Clinical Decision Support Systems for In-Hospital Deterioration: Qualitative Descriptive Study", journal="JMIR Hum Factors", year="2022", month="May", day="12", volume="9", number="2", pages="e33960", keywords="clinical decision support systems", keywords="machine learning", keywords="inpatient", keywords="nurses", keywords="physicians", keywords="qualitative research", abstract="Background: Clinician trust in machine learning--based clinical decision support systems (CDSSs) for predicting in-hospital deterioration (a type of predictive CDSS) is essential for adoption. Evidence shows that clinician trust in predictive CDSSs is influenced by perceived understandability and perceived accuracy. Objective: The aim of this study was to explore the phenomenon of clinician trust in predictive CDSSs for in-hospital deterioration by confirming and characterizing factors known to influence trust (understandability and accuracy), uncovering and describing other influencing factors, and comparing nurses' and prescribing providers' trust in predictive CDSSs. Methods: We followed a qualitative descriptive methodology conducting directed deductive and inductive content analysis of interview data. Directed deductive analyses were guided by the human-computer trust conceptual framework. Semistructured interviews were conducted with nurses and prescribing providers (physicians, physician assistants, or nurse practitioners) working with a predictive CDSS at 2 hospitals in Mass General Brigham. Results: A total of 17 clinicians were interviewed. Concepts from the human-computer trust conceptual framework---perceived understandability and perceived technical competence (ie, perceived accuracy)---were found to influence clinician trust in predictive CDSSs for in-hospital deterioration. The concordance between clinicians' impressions of patients' clinical status and system predictions influenced clinicians' perceptions of system accuracy. Understandability was influenced by system explanations, both global and local, as well as training. In total, 3 additional themes emerged from the inductive analysis. The first, perceived actionability, captured the variation in clinicians' desires for predictive CDSSs to recommend a discrete action. The second, evidence, described the importance of both macro- (scientific) and micro- (anecdotal) evidence for fostering trust. The final theme, equitability, described fairness in system predictions. The findings were largely similar between nurses and prescribing providers. Conclusions: Although there is a perceived trade-off between machine learning--based CDSS accuracy and understandability, our findings confirm that both are important for fostering clinician trust in predictive CDSSs for in-hospital deterioration. We found that reliance on the predictive CDSS in the clinical workflow may influence clinicians' requirements for trust. Future research should explore the impact of reliance, the optimal explanation design for enhancing understandability, and the role of perceived actionability in driving trust. ", doi="10.2196/33960", url="https://humanfactors.jmir.org/2022/2/e33960", url="http://www.ncbi.nlm.nih.gov/pubmed/35550304" } @Article{info:doi/10.2196/34436, author="Panaite, Vanessa and Devendorf, R. Andrew and Finch, Dezon and Bouayad, Lina and Luther, L. Stephen and Schultz, K. Susan", title="The Value of Extracting Clinician-Recorded Affect for Advancing Clinical Research on Depression: Proof-of-Concept Study Applying Natural Language Processing to Electronic Health Records", journal="JMIR Form Res", year="2022", month="May", day="12", volume="6", number="5", pages="e34436", keywords="depression", keywords="affect", keywords="natural language processing", keywords="electronic health records", keywords="vocabularies", abstract="Background: Affective characteristics are associated with depression severity, course, and prognosis. Patients' affect captured by clinicians during sessions may provide a rich source of information that more naturally aligns with the depression course and patient-desired depression outcomes. Objective: In this paper, we propose an information extraction vocabulary used to pilot the feasibility and reliability of identifying clinician-recorded patient affective states in clinical notes from electronic health records. Methods: Affect and mood were annotated in 147 clinical notes of 109 patients by 2 independent coders across 3 pilots. Intercoder discrepancies were settled by a third coder. This reference annotation set was used to test a proof-of-concept natural language processing (NLP) system using a named entity recognition approach. Results: Concepts were frequently addressed in templated format and free text in clinical notes. Annotated data demonstrated that affective characteristics were identified in 87.8\% (129/147) of the notes, while mood was identified in 97.3\% (143/147) of the notes. The intercoder reliability was consistently good across the pilots (interannotator agreement [IAA] >70\%). The final NLP system showed good reliability with the final reference annotation set (mood IAA=85.8\%; affect IAA=80.9\%). Conclusions: Affect and mood can be reliably identified in clinician reports and are good targets for NLP. We discuss several next steps to expand on this proof of concept and the value of this research for depression clinical research. ", doi="10.2196/34436", url="https://formative.jmir.org/2022/5/e34436", url="http://www.ncbi.nlm.nih.gov/pubmed/35551066" } @Article{info:doi/10.2196/26801, author="Kwon, Osung and Na, Wonjun and Kang, Heejun and Jun, Joon Tae and Kweon, Jihoon and Park, Gyung-Min and Cho, YongHyun and Hur, Cinyoung and Chae, Jungwoo and Kang, Do-Yoon and Lee, Hyung Pil and Ahn, Jung-Min and Park, Duk-Woo and Kang, Soo-Jin and Lee, Seung-Whan and Lee, Whan Cheol and Park, Seong-Wook and Park, Seung-Jung and Yang, Hyun Dong and Kim, Young-Hak", title="Electronic Medical Record--Based Machine Learning Approach to Predict the Risk of 30-Day Adverse Cardiac Events After Invasive Coronary Treatment: Machine Learning Model Development and Validation", journal="JMIR Med Inform", year="2022", month="May", day="11", volume="10", number="5", pages="e26801", keywords="big data", keywords="electronic medical record", keywords="machine learning", keywords="mortality", keywords="adverse cardiac event", keywords="coronary artery disease", keywords="prediction", abstract="Background: Although there is a growing interest in prediction models based on electronic medical records (EMRs) to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited. Objective: We aimed to develop and validate machine learning (ML) models by using diverse fields of EMR to predict the risk of 30-day adverse cardiac events after percutaneous intervention or bypass surgery. Methods: EMR data of 5,184,565 records of 16,793 patients at a quaternary hospital between 2006 and 2016 were categorized into static basic (eg, demographics), dynamic time-series (eg, laboratory values), and cardiac-specific data (eg, coronary angiography). The data were randomly split into training, tuning, and testing sets in a ratio of 3:1:1. Each model was evaluated with 5-fold cross-validation and with an external EMR-based cohort at a tertiary hospital. Logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and feedforward neural network (FNN) algorithms were applied. The primary outcome was 30-day mortality following invasive treatment. Results: GBM showed the best performance with area under the receiver operating characteristic curve (AUROC) of 0.99; RF had a similar AUROC of 0.98. AUROCs of FNN and LR were 0.96 and 0.93, respectively. GBM had the highest area under the precision-recall curve (AUPRC) of 0.80, and the AUPRCs of RF, LR, and FNN were 0.73, 0.68, and 0.63, respectively. All models showed low Brier scores of <0.1 as well as highly fitted calibration plots, indicating a good fit of the ML-based models. On external validation, the GBM model demonstrated maximal performance with an AUROC of 0.90, while FNN had an AUROC of 0.85. The AUROCs of LR and RF were slightly lower at 0.80 and 0.79, respectively. The AUPRCs of GBM, LR, and FNN were similar at 0.47, 0.43, and 0.41, respectively, while that of RF was lower at 0.33. Among the categories in the GBM model, time-series dynamic data demonstrated a high AUROC of >0.95, contributing majorly to the excellent results. Conclusions: Exploiting the diverse fields of the EMR data set, the ML-based 30-day adverse cardiac event prediction models demonstrated outstanding results, and the applied framework could be generalized for various health care prediction models. ", doi="10.2196/26801", url="https://medinform.jmir.org/2022/5/e26801", url="http://www.ncbi.nlm.nih.gov/pubmed/35544292" } @Article{info:doi/10.2196/37092, author="Farrow, Luke and Ashcroft, Patrick George and Zhong, Mingjun and Anderson, Lesley", title="Using Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY): Protocol for the Development of a Clinical Prediction Model", journal="JMIR Res Protoc", year="2022", month="May", day="11", volume="11", number="5", pages="e37092", keywords="orthopedics", keywords="prediction modelling", keywords="machine learning", keywords="artificial intelligence", keywords="imaging", keywords="hip", keywords="knee", keywords="arthroplasty", keywords="health care", keywords="patient care", keywords="arthritis", abstract="Background: Hip and knee osteoarthritis is substantially prevalent worldwide, with large numbers of older adults undergoing joint replacement (arthroplasty) every year. A backlog of elective surgery due to the COVID-19 pandemic, and an aging population, has led to substantial issues with access to timely arthroplasty surgery. A potential method to improve the efficiency of arthroplasty services is by increasing the percentage of patients who are listed for surgery from primary care referrals. The use of artificial intelligence (AI) techniques, specifically machine learning, provides a potential unexplored solution to correctly and rapidly select suitable patients for arthroplasty surgery. Objective: This study has 2 objectives: (1) develop a cohort of patients with referrals by general practitioners regarding assessment of suitability for hip or knee replacement from National Health Service (NHS) Grampian data via the Grampian Data Safe Haven and (2) determine the demographic, clinical, and imaging characteristics that influence the selection of patients to undergo hip or knee arthroplasty, and develop a tested and validated patient-specific predictive model to guide arthroplasty referral pathways. Methods: The AI to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project will be delivered through 2 linked work packages conducted within the Grampian Data Safe Haven and Safe Haven Artificial Intelligence Platform. The data set will include a cohort of individuals aged ?16 years with referrals for the consideration of elective primary hip or knee replacement from January 2015 to January 2022. Linked pseudo-anonymized NHS Grampian health care data will be acquired including patient demographics, medication records, laboratory data, theatre records, text from clinical letters, and radiological images and reports. Following the creation of the data set, machine learning techniques will be used to develop pattern classification and probabilistic prediction models based on radiological images. Supplemental demographic and clinical data will be used to improve the predictive capabilities of the models. The sample size is predicted to be approximately 2000 patients---a sufficient size for satisfactory assessment of the primary outcome. Cross-validation will be used for development, testing, and internal validation. Evaluation will be performed through standard techniques, such as the C statistic (area under curve) metric, calibration characteristics (Brier score), and a confusion matrix. Results: The study was funded by the Chief Scientist Office Scotland as part of a Clinical Research Fellowship that runs from August 2021 to August 2024. Approval from the North Node Privacy Advisory Committee was confirmed on October 13, 2021. Data collection started in May 2022, with the results expected to be published in the first quarter of 2024. ISRCTN registration has been completed. Conclusions: This project provides a first step toward delivering an automated solution for arthroplasty selection using routinely collected health care data. Following appropriate external validation and clinical testing, this project could substantially improve the proportion of referred patients that are selected to undergo surgery, with a subsequent reduction in waiting time for arthroplasty appointments. Trial Registration: ISRCTN Registry ISRCTN18398037; https://www.isrctn.com/ISRCTN18398037 International Registered Report Identifier (IRRID): PRR1-10.2196/37092 ", doi="10.2196/37092", url="https://www.researchprotocols.org/2022/5/e37092", url="http://www.ncbi.nlm.nih.gov/pubmed/35544289" } @Article{info:doi/10.2196/31810, author="Schmieding, L. Malte and Kopka, Marvin and Schmidt, Konrad and Schulz-Niethammer, Sven and Balzer, Felix and Feufel, A. Markus", title="Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation", journal="J Med Internet Res", year="2022", month="May", day="10", volume="24", number="5", pages="e31810", keywords="digital health", keywords="triage", keywords="symptom checker", keywords="patient-centered care", keywords="eHealth apps", keywords="mobile phone", abstract="Background: Symptom checkers are digital tools assisting laypersons in self-assessing the urgency and potential causes of their medical complaints. They are widely used but face concerns from both patients and health care professionals, especially regarding their accuracy. A 2015 landmark study substantiated these concerns using case vignettes to demonstrate that symptom checkers commonly err in their triage assessment. Objective: This study aims to revisit the landmark index study to investigate whether and how symptom checkers' capabilities have evolved since 2015 and how they currently compare with laypersons' stand-alone triage appraisal. Methods: In early 2020, we searched for smartphone and web-based applications providing triage advice. We evaluated these apps on the same 45 case vignettes as the index study. Using descriptive statistics, we compared our findings with those of the index study and with publicly available data on laypersons' triage capability. Results: We retrieved 22 symptom checkers providing triage advice. The median triage accuracy in 2020 (55.8\%, IQR 15.1\%) was close to that in 2015 (59.1\%, IQR 15.5\%). The apps in 2020 were less risk averse (odds 1.11:1, the ratio of overtriage errors to undertriage errors) than those in 2015 (odds 2.82:1), missing >40\% of emergencies. Few apps outperformed laypersons in either deciding whether emergency care was required or whether self-care was sufficient. No apps outperformed the laypersons on both decisions. Conclusions: Triage performance of symptom checkers has, on average, not improved over the course of 5 years. It decreased in 2 use cases (advice on when emergency care is required and when no health care is needed for the moment). However, triage capability varies widely within the sample of symptom checkers. Whether it is beneficial to seek advice from symptom checkers depends on the app chosen and on the specific question to be answered. Future research should develop resources (eg, case vignette repositories) to audit the capabilities of symptom checkers continuously and independently and provide guidance on when and to whom they should be recommended. ", doi="10.2196/31810", url="https://www.jmir.org/2022/5/e31810", url="http://www.ncbi.nlm.nih.gov/pubmed/35536633" } @Article{info:doi/10.2196/38241, author="Chen, Pei-Fu and Chen, Lichin and Lin, Yow-Kuan and Li, Guo-Hung and Lai, Feipei and Lu, Cheng-Wei and Yang, Chi-Yu and Chen, Kuan-Chih and Lin, Tzu-Yu", title="Predicting Postoperative Mortality With Deep Neural Networks and Natural Language Processing: Model Development and Validation", journal="JMIR Med Inform", year="2022", month="May", day="10", volume="10", number="5", pages="e38241", keywords="bidirectional encoder representations from transformers", keywords="deep neural network", keywords="natural language processing", keywords="postoperative mortality prediction", keywords="unstructured text", keywords="machine learning", keywords="preoperative medicine", keywords="anesthesia", keywords="prediction model", keywords="anesthesiologist", keywords="deep learning model", keywords="electronic health record", keywords="neural network", abstract="Background: Machine learning (ML) achieves better predictions of postoperative mortality than previous prediction tools. Free-text descriptions of the preoperative diagnosis and the planned procedure are available preoperatively. Because reading these descriptions helps anesthesiologists evaluate the risk of the surgery, we hypothesized that deep learning (DL) models with unstructured text could improve postoperative mortality prediction. However, it is challenging to extract meaningful concept embeddings from this unstructured clinical text. Objective: This study aims to develop a fusion DL model containing structured and unstructured features to predict the in-hospital 30-day postoperative mortality before surgery. ML models for predicting postoperative mortality using preoperative data with or without free clinical text were assessed. Methods: We retrospectively collected preoperative anesthesia assessments, surgical information, and discharge summaries of patients undergoing general and neuraxial anesthesia from electronic health records (EHRs) from 2016 to 2020. We first compared the deep neural network (DNN) with other models using the same input features to demonstrate effectiveness. Then, we combined the DNN model with bidirectional encoder representations from transformers (BERT) to extract information from clinical texts. The effects of adding text information on the model performance were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Statistical significance was evaluated using P<.05. Results: The final cohort contained 121,313 patients who underwent surgeries. A total of 1562 (1.29\%) patients died within 30 days of surgery. Our BERT-DNN model achieved the highest AUROC (0.964, 95\% CI 0.961-0.967) and AUPRC (0.336, 95\% CI 0.276-0.402). The AUROC of the BERT-DNN was significantly higher compared to logistic regression (AUROC=0.952, 95\% CI 0.949-0.955) and the American Society of Anesthesiologist Physical Status (ASAPS AUROC=0.892, 95\% CI 0.887-0.896) but not significantly higher compared to the DNN (AUROC=0.959, 95\% CI 0.956-0.962) and the random forest (AUROC=0.961, 95\% CI 0.958-0.964). The AUPRC of the BERT-DNN was significantly higher compared to the DNN (AUPRC=0.319, 95\% CI 0.260-0.384), the random forest (AUPRC=0.296, 95\% CI 0.239-0.360), logistic regression (AUPRC=0.276, 95\% CI 0.220-0.339), and the ASAPS (AUPRC=0.149, 95\% CI 0.107-0.203). Conclusions: Our BERT-DNN model has an AUPRC significantly higher compared to previously proposed models using no text and an AUROC significantly higher compared to logistic regression and the ASAPS. This technique helps identify patients with higher risk from the surgical description text in EHRs. ", doi="10.2196/38241", url="https://medinform.jmir.org/2022/5/e38241", url="http://www.ncbi.nlm.nih.gov/pubmed/35536634" } @Article{info:doi/10.2196/31758, author="Cho, Hwayoung and Keenan, Gail and Madandola, O. Olatunde and Dos Santos, Cristina Fabiana and Macieira, R. Tamara G. and Bjarnadottir, I. Ragnhildur and Priola, B. Karen J. and Dunn Lopez, Karen", title="Assessing the Usability of a Clinical Decision Support System: Heuristic Evaluation", journal="JMIR Hum Factors", year="2022", month="May", day="10", volume="9", number="2", pages="e31758", keywords="usability", keywords="heuristic", keywords="clinical decision support", keywords="electronic health record", keywords="expert review", keywords="evaluation", keywords="user interface", keywords="human-computer interaction", abstract="Background: Poor usability is a primary cause of unintended consequences related to the use of electronic health record (EHR) systems, which negatively impacts patient safety. Due to the cost and time needed to carry out iterative evaluations, many EHR components, such as clinical decision support systems (CDSSs), have not undergone rigorous usability testing prior to their deployment in clinical practice. Usability testing in the predeployment phase is crucial to eliminating usability issues and preventing costly fixes that will be needed if these issues are found after the system's implementation. Objective: This study presents an example application of a systematic evaluation method that uses clinician experts with human-computer interaction (HCI) expertise to evaluate the usability of an electronic clinical decision support (CDS) intervention prior to its deployment in a randomized controlled trial. Methods: We invited 6 HCI experts to participate in a heuristic evaluation of our CDS intervention. Each expert was asked to independently explore the intervention at least twice. After completing the assigned tasks using patient scenarios, each expert completed a heuristic evaluation checklist developed by Bright et al based on Nielsen's 10 heuristics. The experts also rated the overall severity of each identified heuristic violation on a scale of 0 to 4, where 0 indicates no problems and 4 indicates a usability catastrophe. Data from the experts' coded comments were synthesized, and the severity of each identified usability heuristic was analyzed. Results: The 6 HCI experts included professionals from the fields of nursing (n=4), pharmaceutical science (n=1), and systems engineering (n=1). The mean overall severity scores of the identified heuristic violations ranged from 0.66 (flexibility and efficiency of use) to 2.00 (user control and freedom and error prevention), in which scores closer to 0 indicate a more usable system. The heuristic principle user control and freedom was identified as the most in need of refinement and, particularly by nonnursing HCI experts, considered as having major usability problems. In response to the heuristic match between system and the real world, the experts pointed to the reversed direction of our system's pain scale scores (1=severe pain) compared to those commonly used in clinical practice (typically 1=mild pain); although this was identified as a minor usability problem, its refinement was repeatedly emphasized by nursing HCI experts. Conclusions: Our heuristic evaluation process is simple and systematic and can be used at multiple stages of system development to reduce the time and cost needed to establish the usability of a system before its widespread implementation. Furthermore, heuristic evaluations can help organizations develop transparent reporting protocols for usability, as required by Title IV of the 21st Century Cures Act. Testing of EHRs and CDSSs by clinicians with HCI expertise in heuristic evaluation processes has the potential to reduce the frequency of testing while increasing its quality, which may reduce clinicians' cognitive workload and errors and enhance the adoption of EHRs and CDSSs. ", doi="10.2196/31758", url="https://humanfactors.jmir.org/2022/2/e31758", url="http://www.ncbi.nlm.nih.gov/pubmed/35536613" } @Article{info:doi/10.2196/35061, author="Ackermann, Khalia and Baker, Jannah and Festa, Marino and McMullan, Brendan and Westbrook, Johanna and Li, Ling", title="Computerized Clinical Decision Support Systems for the Early Detection of Sepsis Among Pediatric, Neonatal, and Maternal Inpatients: Scoping Review", journal="JMIR Med Inform", year="2022", month="May", day="6", volume="10", number="5", pages="e35061", keywords="sepsis", keywords="early detection of disease", keywords="computerized clinical decision support", keywords="patient safety", keywords="electronic health records", keywords="sepsis care pathway", abstract="Background: Sepsis is a severe condition associated with extensive morbidity and mortality worldwide. Pediatric, neonatal, and maternal patients represent a considerable proportion of the sepsis burden. Identifying sepsis cases as early as possible is a key pillar of sepsis management and has prompted the development of sepsis identification rules and algorithms that are embedded in computerized clinical decision support (CCDS) systems. Objective: This scoping review aimed to systematically describe studies reporting on the use and evaluation of CCDS systems for the early detection of pediatric, neonatal, and maternal inpatients at risk of sepsis. Methods: MEDLINE, Embase, CINAHL, Cochrane, Latin American and Caribbean Health Sciences Literature (LILACS), Scopus, Web of Science, OpenGrey, ClinicalTrials.gov, and ProQuest Dissertations and Theses Global (PQDT) were searched by using a search strategy that incorporated terms for sepsis, clinical decision support, and early detection. Title, abstract, and full-text screening was performed by 2 independent reviewers, who consulted a third reviewer as needed. One reviewer performed data charting with a sample of data. This was checked by a second reviewer and via discussions with the review team, as necessary. Results: A total of 33 studies were included in this review---13 (39\%) pediatric studies, 18 (55\%) neonatal studies, and 2 (6\%) maternal studies. All studies were published after 2011, and 27 (82\%) were published from 2017 onward. The most common outcome investigated in pediatric studies was the accuracy of sepsis identification (9/13, 69\%). Pediatric CCDS systems used different combinations of 18 diverse clinical criteria to detect sepsis across the 13 identified studies. In neonatal studies, 78\% (14/18) of the studies investigated the Kaiser Permanente early-onset sepsis risk calculator. All studies investigated sepsis treatment and management outcomes, with 83\% (15/18) reporting on antibiotics-related outcomes. Usability and cost-related outcomes were each reported in only 2 (6\%) of the 31 pediatric or neonatal studies. Both studies on maternal populations were short abstracts. Conclusions: This review found limited research investigating CCDS systems to support the early detection of sepsis among pediatric, neonatal, and maternal patients, despite the high burden of sepsis in these vulnerable populations. We have highlighted the need for a consensus definition for pediatric and neonatal sepsis and the study of usability and cost-related outcomes as critical areas for future research. International Registered Report Identifier (IRRID): RR2-10.2196/24899 ", doi="10.2196/35061", url="https://medinform.jmir.org/2022/5/e35061", url="http://www.ncbi.nlm.nih.gov/pubmed/35522467" } @Article{info:doi/10.2196/32456, author="Tendedez, Helena and Ferrario, Maria-Angela and McNaney, Roisin and Gradinar, Adrian", title="Exploring Human-Data Interaction in Clinical Decision-making Using Scenarios: Co-design Study", journal="JMIR Hum Factors", year="2022", month="May", day="6", volume="9", number="2", pages="e32456", keywords="data-supported decision-making", keywords="health care professionals", keywords="respiratory care", keywords="scenario-based design", keywords="clinical decision-making", keywords="decision support", keywords="COPD", keywords="respiratory conditions", keywords="digital health", keywords="user-centered design", keywords="health technologies", abstract="Background: When caring for patients with chronic conditions such as chronic obstructive pulmonary disease (COPD), health care professionals (HCPs) rely on multiple data sources to make decisions. Collating and visualizing these data, for example, on clinical dashboards, holds the potential to support timely and informed decision-making. Most studies on data-supported decision-making (DSDM) technologies for health care have focused on their technical feasibility or quantitative effectiveness. Although these studies are an important contribution to the literature, they do not further our limited understanding of how HCPs engage with these technologies and how they can be designed to support specific contexts of use. To advance our knowledge in this area, we must work with HCPs to explore this space and the real-world complexities of health care work and service structures. Objective: This study aimed to qualitatively explore how DSDM technologies could support HCPs in their decision-making regarding COPD care. We created a scenario-based research tool called Respire, which visualizes HCPs' data needs about their patients with COPD and services. We used Respire with HCPs to uncover rich and nuanced findings about human-data interaction in this context, focusing on the real-world challenges that HCPs face when carrying out their work and making decisions. Methods: We engaged 9 respiratory HCPs from 2 collaborating health care organizations to design Respire. We then used Respire as a tool to investigate human-data interaction in the context of decision-making about COPD care. The study followed a co-design approach that had 3 stages and spanned 2 years. The first stage involved 5 workshops with HCPs to identify data interaction scenarios that would support their work. The second stage involved creating Respire, an interactive scenario-based web app that visualizes HCPs' data needs, incorporating feedback from HCPs. The final stage involved 11 one-to-one sessions with HCPs to use Respire, focusing on how they envisaged that it could support their work and decisions about care. Results: We found that HCPs trust data differently depending on where it came from and who recorded it, sporadic and subjective data generated by patients have value but create challenges for decision-making, and HCPs require support in interpreting and responding to new data and its use cases. Conclusions: Our study uncovered important lessons for the design of DSDM technologies to support health care contexts. We show that although DSDM technologies have the potential to support patient care and health care delivery, important sociotechnical and human-data interaction challenges influence the design and deployment of these technologies. Exploring these considerations during the design process can ensure that DSDM technologies are designed with a holistic view of how decision-making and engagement with data occur in health care contexts. ", doi="10.2196/32456", url="https://humanfactors.jmir.org/2022/2/e32456", url="http://www.ncbi.nlm.nih.gov/pubmed/35522463" } @Article{info:doi/10.2196/34830, author="Bandari, Ela and Beuzen, Tomas and Habashy, Lara and Raza, Javairia and Yang, Xudong and Kapeluto, Jordanna and Meneilly, Graydon and Madden, Kenneth", title="Machine Learning Decision Support for Detecting Lipohypertrophy With Bedside Ultrasound: Proof-of-Concept Study", journal="JMIR Form Res", year="2022", month="May", day="6", volume="6", number="5", pages="e34830", keywords="insulin", keywords="lipoma", keywords="machine learning", keywords="diagnostic ultrasound", keywords="lipohypertrophy", keywords="diabetes", keywords="ultrasound images", abstract="Background: The most common dermatological complication of insulin therapy is lipohypertrophy. Objective: As a proof of concept, we built and tested an automated model using a convolutional neural network (CNN) to detect the presence of lipohypertrophy in ultrasound images. Methods: Ultrasound images were obtained in a blinded fashion using a portable GE LOGIQ e machine with an L8-18I-D probe (5-18 MHz; GE Healthcare). The data were split into train, validation, and test splits of 70\%, 15\%, and 15\%, respectively. Given the small size of the data set, image augmentation techniques were used to expand the size of the training set and improve the model's generalizability. To compare the performance of the different architectures, the team considered the accuracy and recall of the models when tested on our test set. Results: The DenseNet CNN architecture was found to have the highest accuracy (76\%) and recall (76\%) in detecting lipohypertrophy in ultrasound images compared to other CNN architectures. Additional work showed that the YOLOv5m object detection model could be used to help detect the approximate location of lipohypertrophy in ultrasound images identified as containing lipohypertrophy by the DenseNet CNN. Conclusions: We were able to demonstrate the ability of machine learning approaches to automate the process of detecting and locating lipohypertrophy. ", doi="10.2196/34830", url="https://formative.jmir.org/2022/5/e34830", url="http://www.ncbi.nlm.nih.gov/pubmed/35404833" } @Article{info:doi/10.2196/37522, author="Gustafson, H. David and Mares, Marie-Louise and Johnston, C. Darcie and Landucci, Gina and Pe-Romashko, Klaren and Vjorn, J. Olivia and Hu, Yaxin and Maus, Adam and Mahoney, E. Jane and Mutlu, Bilge", title="Using Smart Displays to Implement an eHealth System for Older Adults With Multiple Chronic Conditions: Protocol for a Randomized Controlled Trial", journal="JMIR Res Protoc", year="2022", month="May", day="5", volume="11", number="5", pages="e37522", keywords="eHealth", keywords="aged", keywords="geriatrics", keywords="multiple chronic conditions", keywords="chronic pain", keywords="smart displays", keywords="smart speakers", keywords="quality of life", keywords="primary care", keywords="health expenditures", keywords="mobile phone", abstract="Background: Voice-controlled smart speakers and displays have a unique but unproven potential for delivering eHealth interventions. Many laptop- and smartphone-based interventions have been shown to improve multiple outcomes, but voice-controlled platforms have not been tested in large-scale rigorous trials. Older adults with multiple chronic health conditions, who need tools to help with their daily management, may be especially good candidates for interventions on voice-controlled devices because these patients often have physical limitations, such as tremors or vision problems, that make the use of laptops and smartphones challenging. Objective: The aim of this study is to assess whether participants using an evidence-based intervention (ElderTree) on a smart display will experience decreased pain interference and improved quality of life and related measures in comparison with participants using ElderTree on a laptop and control participants who are given no device or access to ElderTree. Methods: A total of 291 adults aged ?60 years with chronic pain and ?3 additional chronic conditions will be recruited from primary care clinics and community organizations and randomized 1:1:1 to ElderTree access on a smart display along with their usual care, ElderTree access on a touch screen laptop along with usual care, or usual care alone. All patients will be followed for 8 months. The primary outcomes are differences between groups in measures of pain interference and psychosocial quality of life. The secondary outcomes are between-group differences in system use at 8 months, physical quality of life, pain intensity, hospital readmissions, communication with medical providers, health distress, well-being, loneliness, and irritability. We will also examine mediators and moderators of the effects of ElderTree on both platforms. At baseline, 4 months, and 8 months, patients will complete written surveys comprising validated scales selected for good psychometric properties with similar populations. ElderTree use data will be collected continuously in system logs. We will use linear mixed-effects models to evaluate outcomes over time, with treatment condition and time acting as between-participant factors. Separate analyses will be conducted for each outcome. Results: Recruitment began in August 2021 and will run through April 2023. The intervention period will end in December 2023. The findings will be disseminated via peer-reviewed publications. Conclusions: To our knowledge, this is the first study with a large sample and long time frame to examine whether a voice-controlled smart device can perform as well as or better than a laptop in implementing a health intervention for older patients with multiple chronic health conditions. As patients with multiple conditions are such a large cohort, the implications for cost as well as patient well-being are significant. Making the best use of current and developing technologies is a critical part of this effort. Trial Registration: ClinicalTrials.gov NCT04798196; https://clinicaltrials.gov/ct2/show/NCT04798196 International Registered Report Identifier (IRRID): PRR1-10.2196/37522 ", doi="10.2196/37522", url="https://www.researchprotocols.org/2022/5/e37522", url="http://www.ncbi.nlm.nih.gov/pubmed/35511229" } @Article{info:doi/10.2196/38513, author="Weir, Charlene", title="Through the Narrative Looking Glass: Commentary on ``Impact of Electronic Health Records on Information Practices in Mental Health Contexts: Scoping Review''", journal="J Med Internet Res", year="2022", month="May", day="4", volume="24", number="5", pages="e38513", keywords="electronic health records", keywords="psychiatry", keywords="mental health", keywords="electronic medical records", keywords="health informatics", keywords="mental illness", keywords="scoping review", keywords="clinical decision support", doi="10.2196/38513", url="https://www.jmir.org/2022/5/e38513", url="http://www.ncbi.nlm.nih.gov/pubmed/35507399" } @Article{info:doi/10.2196/30405, author="Kariotis, Charles Timothy and Prictor, Megan and Chang, Shanton and Gray, Kathleen", title="Impact of Electronic Health Records on Information Practices in Mental Health Contexts: Scoping Review", journal="J Med Internet Res", year="2022", month="May", day="4", volume="24", number="5", pages="e30405", keywords="electronic health records", keywords="psychiatry", keywords="mental health", keywords="electronic medical records", keywords="health informatics", keywords="mental illness", keywords="scoping review", keywords="clinical decision support", abstract="Background: The adoption of electronic health records (EHRs) and electronic medical records (EMRs) has been slow in the mental health context, partly because of concerns regarding the collection of sensitive information, the standardization of mental health data, and the risk of negatively affecting therapeutic relationships. However, EHRs and EMRs are increasingly viewed as critical to improving information practices such as the documentation, use, and sharing of information and, more broadly, the quality of care provided. Objective: This paper aims to undertake a scoping review to explore the impact of EHRs on information practices in mental health contexts and also explore how sensitive information, data standardization, and therapeutic relationships are managed when using EHRs in mental health contexts. Methods: We considered a scoping review to be the most appropriate method for this review because of the relatively recent uptake of EHRs in mental health contexts. A comprehensive search of electronic databases was conducted with no date restrictions for articles that described the use of EHRs, EMRs, or associated systems in the mental health context. One of the authors reviewed all full texts, with 2 other authors each screening half of the full-text articles. The fourth author mediated the disagreements. Data regarding study characteristics were charted. A narrative and thematic synthesis approach was taken to analyze the included studies' results and address the research questions. Results: The final review included 40 articles. The included studies were highly heterogeneous with a variety of study designs, objectives, and settings. Several themes and subthemes were identified that explored the impact of EHRs on information practices in the mental health context. EHRs improved the amount of information documented compared with paper. However, mental health--related information was regularly missing from EHRs, especially sensitive information. EHRs introduced more standardized and formalized documentation practices that raised issues because of the focus on narrative information in the mental health context. EHRs were found to disrupt information workflows in the mental health context, especially when they did not include appropriate templates or care plans. Usability issues also contributed to workflow concerns. Managing the documentation of sensitive information in EHRs was problematic; clinicians sometimes watered down sensitive information or chose to keep it in separate records. Concerningly, the included studies rarely involved service user perspectives. Furthermore, many studies provided limited information on the functionality or technical specifications of the EHR being used. Conclusions: We identified several areas in which work is needed to ensure that EHRs benefit clinicians and service users in the mental health context. As EHRs are increasingly considered critical for modern health systems, health care decision-makers should consider how EHRs can better reflect the complexity and sensitivity of information practices and workflows in the mental health context. ", doi="10.2196/30405", url="https://www.jmir.org/2022/5/e30405", url="http://www.ncbi.nlm.nih.gov/pubmed/35507393" } @Article{info:doi/10.2196/35219, author="Kopka, Marvin and Schmieding, L. Malte and Rieger, Tobias and Roesler, Eileen and Balzer, Felix and Feufel, A. Markus", title="Determinants of Laypersons' Trust in Medical Decision Aids: Randomized Controlled Trial", journal="JMIR Hum Factors", year="2022", month="May", day="3", volume="9", number="2", pages="e35219", keywords="symptom checkers", keywords="disposition advice", keywords="anthropomorphism", keywords="artificial intelligence", keywords="urgency assessment", keywords="patient-centered care", keywords="human-computer interaction", keywords="consumer health", keywords="information technology", keywords="IT", keywords="mobile phone", abstract="Background: Symptom checker apps are patient-facing decision support systems aimed at providing advice to laypersons on whether, where, and how to seek health care (disposition advice). Such advice can improve laypersons' self-assessment and ultimately improve medical outcomes. Past research has mainly focused on the accuracy of symptom checker apps' suggestions. To support decision-making, such apps need to provide not only accurate but also trustworthy advice. To date, only few studies have addressed the question of the extent to which laypersons trust symptom checker app advice or the factors that moderate their trust. Studies on general decision support systems have shown that framing automated systems (anthropomorphic or emphasizing expertise), for example, by using icons symbolizing artificial intelligence (AI), affects users' trust. Objective: This study aims to identify the factors influencing laypersons' trust in the advice provided by symptom checker apps. Primarily, we investigated whether designs using anthropomorphic framing or framing the app as an AI increases users' trust compared with no such framing. Methods: Through a web-based survey, we recruited 494 US residents with no professional medical training. The participants had to first appraise the urgency of a fictitious patient description (case vignette). Subsequently, a decision aid (mock symptom checker app) provided disposition advice contradicting the participants' appraisal, and they had to subsequently reappraise the vignette. Participants were randomized into 3 groups: 2 experimental groups using visual framing (anthropomorphic, 160/494, 32.4\%, vs AI, 161/494, 32.6\%) and a neutral group without such framing (173/494, 35\%). Results: Most participants (384/494, 77.7\%) followed the decision aid's advice, regardless of its urgency level. Neither anthropomorphic framing (odds ratio 1.120, 95\% CI 0.664-1.897) nor framing as AI (odds ratio 0.942, 95\% CI 0.565-1.570) increased behavioral or subjective trust (P=.99) compared with the no-frame condition. Even participants who were extremely certain in their own decisions (ie, 100\% certain) commonly changed it in favor of the symptom checker's advice (19/34, 56\%). Propensity to trust and eHealth literacy were associated with increased subjective trust in the symptom checker (propensity to trust b=0.25; eHealth literacy b=0.2), whereas sociodemographic variables showed no such link with either subjective or behavioral trust. Conclusions: Contrary to our expectation, neither the anthropomorphic framing nor the emphasis on AI increased trust in symptom checker advice compared with that of a neutral control condition. However, independent of the interface, most participants trusted the mock app's advice, even when they were very certain of their own assessment. Thus, the question arises as to whether laypersons use such symptom checkers as substitutes rather than as aids in their own decision-making. With trust in symptom checkers already high at baseline, the benefit of symptom checkers depends on interface designs that enable users to adequately calibrate their trust levels during usage. Trial Registration: Deutsches Register Klinischer Studien DRKS00028561; https://tinyurl.com/rv4utcfb (retrospectively registered). ", doi="10.2196/35219", url="https://humanfactors.jmir.org/2022/2/e35219", url="http://www.ncbi.nlm.nih.gov/pubmed/35503248" } @Article{info:doi/10.2196/29118, author="Gisladottir, Undina and Nakikj, Drashko and Jhunjhunwala, Rashi and Panton, Jasmine and Brat, Gabriel and Gehlenborg, Nils", title="Effective Communication of Personalized Risks and Patient Preferences During Surgical Informed Consent Using Data Visualization: Qualitative Semistructured Interview Study With Patients After Surgery", journal="JMIR Hum Factors", year="2022", month="Apr", day="29", volume="9", number="2", pages="e29118", keywords="data visualization", keywords="surgical informed consent", keywords="shared decision-making", keywords="biomedical informatics", abstract="Background: There is no consensus on which risks to communicate to a prospective surgical patient during informed consent or how. Complicating the process, patient preferences may diverge from clinical assumptions and are often not considered for discussion. Such discrepancies can lead to confusion and resentment, raising the potential for legal action. To overcome these issues, we propose a visual consent tool that incorporates patient preferences and communicates personalized risks to patients using data visualization. We used this platform to identify key effective visual elements to communicate personalized surgical risks. Objective: Our main focus is to understand how to best communicate personalized risks using data visualization. To contextualize patient responses to the main question, we examine how patients perceive risks before surgery (research question 1), how suitably the visual consent tool is able to present personalized surgical risks (research question 2), how well our visualizations convey those personalized surgical risks (research question 3), and how the visual consent tool could improve the informed consent process and how it can be used (research question 4). Methods: We designed a visual consent tool to meet the objectives of our study. To calculate and list personalized surgical risks, we used the American College of Surgeons risk calculator. We created multiple visualization mock-ups using visual elements previously determined to be well-received for risk communication. Semistructured interviews were conducted with patients after surgery, and each of the mock-ups was presented and evaluated independently and in the context of our visual consent tool design. The interviews were transcribed, and thematic analysis was performed to identify major themes. We also applied a quantitative approach to the analysis to assess the prevalence of different perceptions of the visualizations presented in our tool. Results: In total, 20 patients were interviewed, with a median age of 59 (range 29-87) years. Thematic analysis revealed factors that influenced the perception of risk (the surgical procedure, the cognitive capacity of the patient, and the timing of consent; research question 1); factors that influenced the perceived value of risk visualizations (preference for rare event communication, preference for risk visualization, and usefulness of comparison with the average; research question 3); and perceived usefulness and use cases of the visual consent tool (research questions 2 and 4). Most importantly, we found that patients preferred the visual consent tool to current text-based documents and had no unified preferences for risk visualization. Furthermore, our findings suggest that patient concerns were not often represented in existing risk calculators. Conclusions: We identified key elements that influence effective visual risk communication in the perioperative setting and pointed out the limitations of the existing calculators in addressing patient concerns. Patient preference is highly variable and should influence choices regarding risk presentation and visualization. ", doi="10.2196/29118", url="https://humanfactors.jmir.org/2022/2/e29118", url="http://www.ncbi.nlm.nih.gov/pubmed/35486432" } @Article{info:doi/10.2196/34042, author="Lin, Yuchen and Lemos, Martin and Neuschaefer-Rube, Christiane", title="Digital Health and Learning in Speech-Language Pathology, Phoniatrics, and Otolaryngology: Survey Study for Designing a Digital Learning Toolbox App", journal="JMIR Med Educ", year="2022", month="Apr", day="27", volume="8", number="2", pages="e34042", keywords="digital learning", keywords="mLearning", keywords="mHealth", keywords="speech-language pathology", keywords="phoniatrics", keywords="otolaryngology", keywords="communication disorders", keywords="mobile phone", abstract="Background: The digital age has introduced opportunities and challenges for clinical education and practice caused by infinite incoming information and novel technologies for health. In the interdisciplinary field of communication sciences and disorders (CSD), engagement with digital topics has emerged slower than in other health fields, and effective strategies for accessing, managing, and focusing on digital resources are greatly needed. Objective: We aimed to conceptualize and investigate preferences of stakeholders regarding a digital learning toolbox, an app containing a library of current resources for CSD. This cross-sectional survey study conducted in German-speaking countries investigated professional and student perceptions and preferences regarding such an app's features, functions, content, and associated concerns. Methods: An open web-based survey was disseminated to professionals and students in the field of CSD, including speech-language pathologists (SLPs; German: Logop{\"a}d*innen), speech-language pathology students, phoniatricians, otolaryngologists, and medical students. Insights into preferences and perceptions across professions, generations, and years of experience regarding a proposed app were investigated. Results: Of the 164 participants, an overwhelming majority (n=162, 98.8\%) indicated readiness to use such an app, and most participants (n=159, 96.9\%) perceived the proposed app to be helpful. Participants positively rated app functions that would increase utility (eg, tutorial, quality rating function, filters based on content or topic, and digital format); however, they had varied opinions regarding an app community feature. Regarding app settings, most participants rated the option to share digital resources through social media links (144/164, 87.8\%), receive and manage push notifications (130/164, 79.3\%), and report technical issues (160/164, 97.6\%) positively. However, significant variance was noted across professions (H3=8.006; P=.046) and generations (H3=9.309; P=.03) regarding a username-password function, with SLPs indicating greater perceived usefulness in comparison to speech-language pathology students (P=.045), as was demonstrated by Generation X versus Generation Z (P=.04). Participants perceived a range of clinical topics to be important; however, significant variance was observed across professions, between physicians and SLPs regarding the topic of diagnostics (H3=9.098; P=.03) and therapy (H3=21.236; P<.001). Concerns included technical challenges, data protection, quality of the included resources, and sustainability of the proposed app. Conclusions: This investigation demonstrated that professionals and students show initial readiness to engage in the co-design and use of an interdisciplinary digital learning toolbox app. Specifically, this app could support effective access, sharing, evaluation, and knowledge management in a digital age of rapid change. Formalized digital skills education in the field of CSD is just a part of the solution. It will be crucial to explore flexible, adaptive strategies collaboratively for managing digital resources and tools to optimize targeted selection and use of relevant, high-quality evidence in a world of bewildering data. ", doi="10.2196/34042", url="https://mededu.jmir.org/2022/2/e34042", url="http://www.ncbi.nlm.nih.gov/pubmed/35475980" } @Article{info:doi/10.2196/35475, author="Hu, Danqing and Li, Shaolei and Zhang, Huanyao and Wu, Nan and Lu, Xudong", title="Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non--Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study", journal="JMIR Med Inform", year="2022", month="Apr", day="25", volume="10", number="4", pages="e35475", keywords="non--small cell lung cancer", keywords="lymph node metastasis prediction", keywords="natural language processing", keywords="electronic medical records", keywords="lung cancer", keywords="prediction models", keywords="decision making", keywords="machine learning", keywords="algorithm", keywords="forest modeling", abstract="Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non--small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician's evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pN1\&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician's evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician's evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models. ", doi="10.2196/35475", url="https://medinform.jmir.org/2022/4/e35475", url="http://www.ncbi.nlm.nih.gov/pubmed/35468085" } @Article{info:doi/10.2196/33320, author="Dupont, Charl{\`e}ss and Smets, Tinne and Monnet, Fanny and Pivodic, Lara and De Vleminck, Aline and Van Audenhove, Chantal and Van den Block, Lieve", title="Publicly Available, Interactive Web-Based Tools to Support Advance Care Planning: Systematic Review", journal="J Med Internet Res", year="2022", month="Apr", day="20", volume="24", number="4", pages="e33320", keywords="advance care planning", keywords="systematic review", keywords="web-based tools", keywords="health communication", keywords="quality of online content", abstract="Background: There is an increasing number of interactive web-based advance care planning (ACP) support tools, which are web-based aids in any format encouraging reflection, communication, and processing of publicly available information, most of which cannot be found in the peer-reviewed literature. Objective: This study aims to conduct a systematic review of web-based ACP support tools to describe the characteristics, readability, and quality of content and investigate whether and how they are evaluated. Methods: We systematically searched the web-based gray literature databases OpenGrey, ClinicalTrials.gov, ProQuest, British Library, Grey Literature in the Netherlands, and Health Services Research Projects in Progress, as well as Google and app stores, and consulted experts using the following eligibility criteria: web-based, designed for the general population, accessible to everyone, interactive (encouraging reflection, communication, and processing of information), and in English or Dutch. The quality of content was evaluated using the Quality Evaluation Scoring Tool (score 0-28---a higher score indicates better quality). To synthesize the characteristics of the ACP tools, readability and quality of content, and whether and how they were evaluated, we used 4 data extraction tables. Results: A total of 30 tools met the eligibility criteria, including 15 (50\%) websites, 10 (33\%) web-based portals, 3 (10\%) apps, and 2 (7\%) with a combination of formats. Of the 30 tools, 24 (80\%) mentioned a clear aim, including 7 (23\%) that supported reflection or communication, 8 (27\%) that supported people in making decisions, 7 (23\%) that provided support to document decisions, and 2 (7\%) that aimed to achieve all these aims. Of the 30 tools, 7 (23\%) provided information on the development, all of which were developed in collaboration with health care professionals, and 3 (10\%) with end users. Quality scores ranged between 11 and 28, with most of the lower-scoring tools not referring to information sources. Conclusions: A variety of ACP support tools are available on the web, varying in the quality of content. In the future, users should be involved in the development process of ACP support tools, and the content should be substantiated by scientific evidence. Trial Registration: PROSPERO CRD42020184112; https://tinyurl.com/mruf8b43 ", doi="10.2196/33320", url="https://www.jmir.org/2022/4/e33320", url="http://www.ncbi.nlm.nih.gov/pubmed/35442207" } @Article{info:doi/10.2196/33799, author="Yang, Xinyu and Mu, Dongmei and Peng, Hao and Li, Hua and Wang, Ying and Wang, Ping and Wang, Yue and Han, Siqi", title="Research and Application of Artificial Intelligence Based on Electronic Health Records of Patients With Cancer: Systematic Review", journal="JMIR Med Inform", year="2022", month="Apr", day="20", volume="10", number="4", pages="e33799", keywords="electronic health records", keywords="artificial intelligence", keywords="neoplasms", keywords="machine learning", abstract="Background: With the accumulation of electronic health records and the development of artificial intelligence, patients with cancer urgently need new evidence of more personalized clinical and demographic characteristics and more sophisticated treatment and prevention strategies. However, no research has systematically analyzed the application and significance of artificial intelligence based on electronic health records in cancer care. Objective: The aim of this study was to conduct a review to introduce the current state and limitations of artificial intelligence based on electronic health records of patients with cancer and to summarize the performance of artificial intelligence in mining electronic health records and its impact on cancer care. Methods: Three databases were systematically searched to retrieve potentially relevant papers published from January 2009 to October 2020. Four principal reviewers assessed the quality of the papers and reviewed them for eligibility based on the inclusion criteria in the extracted data. The summary measures used in this analysis were the number and frequency of occurrence of the themes. Results: Of the 1034 papers considered, 148 papers met the inclusion criteria. Cancer care, especially cancers of female organs and digestive organs, could benefit from artificial intelligence based on electronic health records through cancer emergencies and prognostic estimates, cancer diagnosis and prediction, tumor stage detection, cancer case detection, and treatment pattern recognition. The models can always achieve an area under the curve of 0.7. Ensemble methods and deep learning are on the rise. In addition, electronic medical records in the existing studies are mainly in English and from private institutional databases. Conclusions: Artificial intelligence based on electronic health records performed well and could be useful for cancer care. Improving the performance of artificial intelligence can help patients receive more scientific-based and accurate treatments. There is a need for the development of new methods and electronic health record data sharing and for increased passion and support from cancer specialists. ", doi="10.2196/33799", url="https://medinform.jmir.org/2022/4/e33799", url="http://www.ncbi.nlm.nih.gov/pubmed/35442195" } @Article{info:doi/10.2196/34626, author="Sch{\"o}pfer, C{\'e}line and Ehrler, Frederic and Berger, Antoine and Bollondi Pauly, Catherine and Buytaert, Laurence and De La Serna, Camille and Hartheiser, Florence and Fassier, Thomas and Clavien, Christine", title="A Mobile App for Advance Care Planning and Advance Directives (Accordons-nous): Development and Usability Study", journal="JMIR Hum Factors", year="2022", month="Apr", day="20", volume="9", number="2", pages="e34626", keywords="usability", keywords="mobile apps", keywords="advance directives", keywords="advance care planning", keywords="mHealth", keywords="mobile health", keywords="palliative care", keywords="mobile phone", abstract="Background: Advance care planning, including advance directives, is an important tool that allows patients to express their preferences for care if they are no longer able to express themselves. We developed Accordons-nous, a smartphone app that informs patients about advance care planning and advance directives, facilitates communication on these sensitive topics, and helps patients express their values and preferences for care. Objective: The first objective of this study is to conduct a usability test of this app. The second objective is to collect users' critical opinions on the usability and relevance of the tool. Methods: We conducted a usability test by means of a think-aloud method, asking 10 representative patients to complete 7 browsing tasks. We double coded the filmed sessions to obtain descriptive data on task completion (with or without help), time spent, number of clicks, and the types of problems encountered. We assessed the severity of the problems encountered and identified the modifications needed to address these problems. We evaluated the readability of the app using Scolarius, a French equivalent of the Flesch Reading Ease test. By means of a posttest questionnaire, we asked participants to assess the app's usability (System Usability Scale), relevance (Mobile App Rating Scale, section F), and whether they would recommend the app to the target groups: patients, health professionals, and patients' caring relatives. Results: Participants completed the 7 think-aloud tasks in 80\% (56/70) of the cases without any help from the experimenter, in 16\% (11/70) of the cases with some help, and failed in 4\% (3/70) of the cases. The analysis of failures and difficulties encountered revealed a series of major usability problems that could be addressed with minor modifications to the app. Accordons-nous obtained high scores on readability (overall score of 87.4 on Scolarius test, corresponding to elementary school level), usability (85.3/100 on System Usability Scale test), relevance (4.3/5 on the Mobile App Rating Scale, section F), and overall subjective endorsement on 3 I would recommend questions (4.7/5). Conclusions: This usability test helped us make the final changes to our app before its official launch. ", doi="10.2196/34626", url="https://humanfactors.jmir.org/2022/2/e34626", url="http://www.ncbi.nlm.nih.gov/pubmed/35442206" } @Article{info:doi/10.2196/33395, author="Wang, Jinwan and Wang, Shuai and Zhu, Xuefang Mark and Yang, Tao and Yin, Qingfeng and Hou, Ya", title="Risk Prediction of Major Adverse Cardiovascular Events Occurrence Within 6 Months After Coronary Revascularization: Machine Learning Study", journal="JMIR Med Inform", year="2022", month="Apr", day="20", volume="10", number="4", pages="e33395", keywords="major adverse cardiovascular events", keywords="risk prediction", keywords="machine learning", keywords="oversampling", keywords="data imbalance", abstract="Background: As a major health hazard, the incidence of coronary heart disease has been increasing year by year. Although coronary revascularization, mainly percutaneous coronary intervention, has played an important role in the treatment of coronary heart disease, major adverse cardiovascular events (MACE) such as recurrent or persistent angina pectoris after coronary revascularization remain a very difficult problem in clinical practice. Objective: Given the high probability of MACE after coronary revascularization, the aim of this study was to develop and validate a predictive model for MACE occurrence within 6 months based on machine learning algorithms. Methods: A retrospective study was performed including 1004 patients who had undergone coronary revascularization at The People's Hospital of Liaoning Province and Affiliated Hospital of Liaoning University of Traditional Chinese Medicine from June 2019 to December 2020. According to the characteristics of available data, an oversampling strategy was adopted for initial preprocessing. We then employed six machine learning algorithms, including decision tree, random forest, logistic regression, na{\"i}ve Bayes, support vector machine, and extreme gradient boosting (XGBoost), to develop prediction models for MACE depending on clinical information and 6-month follow-up information. Among all samples, 70\% were randomly selected for training and the remaining 30\% were used for model validation. Model performance was assessed based on accuracy, precision, recall, F1-score, confusion matrix, area under the receiver operating characteristic (ROC) curve (AUC), and visualization of the ROC curve. Results: Univariate analysis showed that 21 patient characteristic variables were statistically significant (P<.05) between the groups without and with MACE. Coupled with these significant factors, among the six machine learning algorithms, XGBoost stood out with an accuracy of 0.7788, precision of 0.8058, recall of 0.7345, F1-score of 0.7685, and AUC of 0.8599. Further exploration of the models to identify factors affecting the occurrence of MACE revealed that use of anticoagulant drugs and course of the disease consistently ranked in the top two predictive factors in three developed models. Conclusions: The machine learning risk models constructed in this study can achieve acceptable performance of MACE prediction, with XGBoost performing the best, providing a valuable reference for pointed intervention and clinical decision-making in MACE prevention. ", doi="10.2196/33395", url="https://medinform.jmir.org/2022/4/e33395", url="http://www.ncbi.nlm.nih.gov/pubmed/35442202" } @Article{info:doi/10.2196/29455, author="Tardini, Elisa and Zhang, Xinhua and Canahuate, Guadalupe and Wentzel, Andrew and Mohamed, R. Abdallah S. and Van Dijk, Lisanne and Fuller, D. Clifton and Marai, Elisabeta G.", title="Optimal Treatment Selection in Sequential Systemic and Locoregional Therapy of Oropharyngeal Squamous Carcinomas: Deep Q-Learning With a Patient-Physician Digital Twin Dyad", journal="J Med Internet Res", year="2022", month="Apr", day="20", volume="24", number="4", pages="e29455", keywords="digital twin dyad", keywords="reinforcement learning", keywords="head and neck cancer", abstract="Background: Currently, selection of patients for sequential versus concurrent chemotherapy and radiation regimens lacks evidentiary support and it is based on locally optimal decisions for each step. Objective: We aim to optimize the multistep treatment of patients with head and neck cancer and predict multiple patient survival and toxicity outcomes, and we develop, apply, and evaluate a first application of deep Q-learning (DQL) and simulation to this problem. Methods: The treatment decision DQL digital twin and the patient's digital twin were created, trained, and evaluated on a data set of 536 patients with oropharyngeal squamous cell carcinoma with the goal of, respectively, determining the optimal treatment decisions with respect to survival and toxicity metrics and predicting the outcomes of the optimal treatment on the patient. Of the data set of 536 patients, the models were trained on a subset of 402 (75\%) patients (split randomly) and evaluated on a separate set of 134 (25\%) patients. Training and evaluation of the digital twin dyad was completed in August 2020. The data set includes 3-step sequential treatment decisions and complete relevant history of the patient cohort treated at MD Anderson Cancer Center between 2005 and 2013, with radiomics analysis performed for the segmented primary tumor volumes. Results: On the test set, we found mean 87.35\% (SD 11.15\%) and median 90.85\% (IQR 13.56\%) accuracies in treatment outcome prediction, matching the clinicians' outcomes and improving the (predicted) survival rate by +3.73\% (95\% CI --0.75\% to 8.96\%) and the dysphagia rate by +0.75\% (95\% CI --4.48\% to 6.72\%) when following DQL treatment decisions. Conclusions: Given the prediction accuracy and predicted improvement regarding the medically relevant outcomes yielded by this approach, this digital twin dyad of the patient-physician dynamic treatment problem has the potential of aiding physicians in determining the optimal course of treatment and in assessing its outcomes. ", doi="10.2196/29455", url="https://www.jmir.org/2022/4/e29455", url="http://www.ncbi.nlm.nih.gov/pubmed/35442211" } @Article{info:doi/10.2196/21208, author="Ivanova, Julia and Tang, Tianyu and Idouraine, Nassim and Murcko, Anita and Whitfield, Jo Mary and Dye, Christy and Chern, Darwyn and Grando, Adela", title="Behavioral Health Professionals' Perceptions on Patient-Controlled Granular Information Sharing (Part 1): Focus Group Study", journal="JMIR Ment Health", year="2022", month="Apr", day="20", volume="9", number="4", pages="e21208", keywords="behavioral health professional", keywords="granular information", keywords="granular information sharing", keywords="electronic health record", keywords="integrated health care", keywords="electronic consent tool", abstract="Background: Patient-controlled granular information sharing (PC-GIS) allows a patient to select specific health information ``granules,'' such as diagnoses and medications; choose with whom the information is shared; and decide how the information can be used. Previous studies suggest that health professionals have mixed or concerned opinions about the process and impact of PC-GIS for care and research. Further understanding of behavioral health professionals' views on PC-GIS are needed for successful implementation and use of this technology. Objective: The aim of this study was to evaluate changes in health professionals' opinions on PC-GIS before and after a demonstrative case study. Methods: Four focus groups were conducted at two integrated health care facilities: one serious mental illness facility and one general behavioral health facility. A total of 28 participants were given access to outcomes of a previous study where patients had control over medical record sharing. Participants were surveyed before and after focus groups on their views about PC-GIS. Thematic analysis of focus group output was paired with descriptive statistics and exploratory factor analysis of surveys. Results: Behavioral health professionals showed a significant opinion shift toward concern after the focus group intervention, specifically on the topics of patient understanding (P=.001), authorized electronic health record access (P=.03), patient-professional relationship (P=.006), patient control acceptance (P<.001), and patient rights (P=.02). Qualitative methodology supported these results. The themes of professional considerations (2234/4025, 55.5\% of codes) and necessity of health information (260/766, 33.9\%) identified key aspects of PC-GIS concerns. Conclusions: Behavioral health professionals agreed that a trusting patient-professional relationship is integral to the optimal implementation of PC-GIS, but were concerned about the potential negative impacts of PC-GIS on patient safety and quality of care. ", doi="10.2196/21208", url="https://mental.jmir.org/2022/4/e21208", url="http://www.ncbi.nlm.nih.gov/pubmed/35442199" } @Article{info:doi/10.2196/18792, author="Ivanova, Julia and Tang, Tianyu and Idouraine, Nassim and Murcko, Anita and Whitfield, Jo Mary and Dye, Christy and Chern, Darwyn and Grando, Adela", title="Behavioral Health Professionals' Perceptions on Patient-Controlled Granular Information Sharing (Part 2): Focus Group Study", journal="JMIR Ment Health", year="2022", month="Apr", day="20", volume="9", number="4", pages="e18792", keywords="behavioral health", keywords="patient information", keywords="granular information", keywords="electronic health record", keywords="integrated health care", keywords="electronic consent tool", abstract="Background: Patient-directed selection and sharing of health information ``granules'' is known as granular information sharing. In a previous study, patients with behavioral health conditions categorized their own health information into sensitive categories (eg, mental health) and chose the health professionals (eg, pharmacists) who should have access to those records. Little is known about behavioral health professionals' perspectives of patient-controlled granular information sharing (PC-GIS). Objective: This study aimed to assess behavioral health professionals' (1) understanding of and opinions about PC-GIS; (2) accuracy in assessing redacted medical information; (3) reactions to patient rationale for health data categorization, assignment of sensitivity, and sharing choices; and (4) recommendations to improve PC-GIS. Methods: Four 2-hour focus groups and pre- and postsurveys were conducted at 2 facilities. During the focus groups, outcomes from a previous study on patients' choices for medical record sharing were discussed. Thematic analysis was applied to focus group transcripts to address study objectives. Results: A total of 28 health professionals were recruited. Over half (14/25, 56\%) were unaware or provided incorrect definitions of granular information sharing. After PC-GIS was explained, all professionals demonstrated understanding of the terminology and process. Most (26/32 codes, 81\%) recognized that key medical data had been redacted from the study case. A majority (41/62 codes, 66\%) found the patient rationale for categorization and data sharing choices to be unclear. Finally, education and other approaches to inform and engage patients in granular information sharing were recommended. Conclusions: This study provides detailed insights from behavioral health professionals on granular information sharing. Outcomes will inform the development, deployment, and evaluation of an electronic consent tool for granular health data sharing. ", doi="10.2196/18792", url="https://mental.jmir.org/2022/4/e18792", url="http://www.ncbi.nlm.nih.gov/pubmed/35442213" } @Article{info:doi/10.2196/35257, author="Bae, Ho Jung and Han, Wook Hyun and Yang, Young Sun and Song, Gyuseon and Sa, Soonok and Chung, Eun Goh and Seo, Yeon Ji and Jin, Hyo Eun and Kim, Heecheon and An, DongUk", title="Natural Language Processing for Assessing Quality Indicators in Free-Text Colonoscopy and Pathology Reports: Development and Usability Study", journal="JMIR Med Inform", year="2022", month="Apr", day="15", volume="10", number="4", pages="e35257", keywords="natural language processing", keywords="colonoscopy", keywords="adenoma", keywords="endoscopy", abstract="Background: Manual data extraction of colonoscopy quality indicators is time and labor intensive. Natural language processing (NLP), a computer-based linguistics technique, can automate the extraction of important clinical information, such as adverse events, from unstructured free-text reports. NLP information extraction can facilitate the optimization of clinical work by helping to improve quality control and patient management. Objective: We developed an NLP pipeline to analyze free-text colonoscopy and pathology reports and evaluated its ability to automatically assess adenoma detection rate (ADR), sessile serrated lesion detection rate (SDR), and postcolonoscopy surveillance intervals. Methods: The NLP tool for extracting colonoscopy quality indicators was developed using a data set of 2000 screening colonoscopy reports from a single health care system, with an associated 1425 pathology reports. The NLP system was then tested on a data set of 1000 colonoscopy reports and its performance was compared with that of 5 human annotators. Additionally, data from 54,562 colonoscopies performed between 2010 and 2019 were analyzed using the NLP pipeline. Results: The NLP pipeline achieved an overall accuracy of 0.99-1.00 for identifying polyp subtypes, 0.99-1.00 for identifying the anatomical location of polyps, and 0.98 for counting the number of neoplastic polyps. The NLP pipeline achieved performance similar to clinical experts for assessing ADR, SDR, and surveillance intervals. NLP analysis of a 10-year colonoscopy data set identified great individual variance in colonoscopy quality indicators among 25 endoscopists. Conclusions: The NLP pipeline could accurately extract information from colonoscopy and pathology reports and demonstrated clinical efficacy for assessing ADR, SDR, and surveillance intervals in these reports. Implementation of the system enabled automated analysis and feedback on quality indicators, which could motivate endoscopists to improve the quality of their performance and improve clinical decision-making in colorectal cancer screening programs. ", doi="10.2196/35257", url="https://medinform.jmir.org/2022/4/e35257", url="http://www.ncbi.nlm.nih.gov/pubmed/35436226" } @Article{info:doi/10.2196/34954, author="Fong, Allan and Iscoe, Mark and Sinsky, A. Christine and Haimovich, D. Adrian and Williams, Brian and O'Connell, T. Ryan and Goldstein, Richard and Melnick, Edward", title="Cluster Analysis of Primary Care Physician Phenotypes for Electronic Health Record Use: Retrospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Apr", day="15", volume="10", number="4", pages="e34954", keywords="electronic health record", keywords="phenotypes", keywords="cluster analysis", keywords="unsupervised machine learning", keywords="machine learning", keywords="EHR", keywords="primary care", abstract="Background: Electronic health records (EHRs) have become ubiquitous in US office-based physician practices. However, the different ways in which users engage with EHRs remain poorly characterized. Objective: The aim of this study is to explore EHR use phenotypes among ambulatory care physicians. Methods: In this retrospective cohort analysis, we applied affinity propagation, an unsupervised clustering machine learning technique, to identify EHR user types among primary care physicians. Results: We identified 4 distinct phenotype clusters generalized across internal medicine, family medicine, and pediatrics specialties. Total EHR use varied for physicians in 2 clusters with above-average ratios of work outside of scheduled hours. This finding suggested that one cluster of physicians may have worked outside of scheduled hours out of necessity, whereas the other preferred ad hoc work hours. The two remaining clusters represented physicians with below-average EHR time and physicians who spend the largest proportion of their EHR time on documentation. Conclusions: These findings demonstrate the utility of cluster analysis for exploring EHR use phenotypes and may offer opportunities for interventions to improve interface design to better support users' needs. ", doi="10.2196/34954", url="https://medinform.jmir.org/2022/4/e34954", url="http://www.ncbi.nlm.nih.gov/pubmed/35275070" } @Article{info:doi/10.2196/35013, author="Costa, Silva Thiago Bulh{\~o}es da and Shinoda, Lucas and Moreno, Alfredo Ramon and Krieger, E. Jose and Gutierrez, Marco", title="Blockchain-Based Architecture Design for Personal Health Record: Development and Usability Study", journal="J Med Internet Res", year="2022", month="Apr", day="13", volume="24", number="4", pages="e35013", keywords="electronic health record", keywords="personal health record", keywords="blockchain", keywords="smart contract", abstract="Background: The importance of blockchain-based architectures for personal health record (PHR) lies in the fact that they are thought and developed to allow patients to control and at least partly collect their health data. Ideally, these systems should provide the full control of such data to the respective owner. In spite of this importance, most of the works focus more on describing how blockchain models can be used in a PHR scenario rather than whether these models are in fact feasible and robust enough to support a large number of users. Objective: To achieve a consistent, reproducible, and comparable PHR system, we build a novel ledger-oriented architecture out of a permissioned distributed network, providing patients with a manner to securely collect, store, share, and manage their health data. We also emphasize the importance of suitable ledgers and smart contracts to operate the blockchain network as well as discuss the necessity of standardizing evaluation metrics to compare related (net)works. Methods: We adopted the Hyperledger Fabric platform to implement our blockchain-based architecture design and the Hyperledger Caliper framework to provide a detailed assessment of our system: first, under workload, ranging from 100 to 2500 simultaneous record submissions, and second, increasing the network size from 3 to 13 peers. In both experiments, we used throughput and average latency as the primary metrics. We also created a health database, a cryptographic unit, and a server to complement the blockchain network. Results: With a 3-peer network, smart contracts that write on the ledger have throughputs, measured in transactions per second (tps) in an order of magnitude close to 102 tps, while those contracts that only read have rates close to 103 tps. Smart contracts that write also have latencies, measured in seconds, in an order of magnitude close to 101 seconds, while that only read have delays close to 100 seconds. In particular, smart contracts that retrieve, list, and view history have throughputs varying, respectively, from 1100 tps to 1300 tps, 650 tps to 750 tps, and 850 tps to 950 tps, impacting the overall system response if they are equally requested under the same workload. Varying the network size and applying an equal fixed load, in turn, writing throughputs go from 102 tps to 101 tps and latencies go from 101 seconds to 102 seconds, while reading ones maintain similar values. Conclusions: To the best of our knowledge, we are the first to evaluate, using Hyperledger Caliper, the performance of a PHR blockchain architecture and the first to evaluate each smart contract separately. Nevertheless, blockchain systems achieve performances far below what the traditional distributed databases achieve, indicating that the assessment of blockchain solutions for PHR is a major concern to be addressed before putting them into a real production. ", doi="10.2196/35013", url="https://www.jmir.org/2022/4/e35013", url="http://www.ncbi.nlm.nih.gov/pubmed/35416782" } @Article{info:doi/10.2196/30523, author="Davidson, Brittany and Ferrer Portillo, Mara Katiuska and Wac, Marceli and McWilliams, Chris and Bourdeaux, Christopher and Craddock, Ian", title="Requirements for a Bespoke Intensive Care Unit Dashboard in Response to the COVID-19 Pandemic: Semistructured Interview Study", journal="JMIR Hum Factors", year="2022", month="Apr", day="13", volume="9", number="2", pages="e30523", keywords="intensive care", keywords="critical care", keywords="COVID-19", keywords="human-centered design", keywords="dashboard", keywords="eHealth", keywords="disease monitoring", keywords="monitoring", keywords="ICU", keywords="design", keywords="development", keywords="interview", abstract="Background: Intensive care units (ICUs) around the world are in high demand due to patients with COVID-19 requiring hospitalization. As researchers at the University of Bristol, we were approached to develop a bespoke data visualization dashboard to assist two local ICUs during the pandemic that will centralize disparate data sources in the ICU to help reduce the cognitive load on busy ICU staff in the ever-evolving pandemic. Objective: The aim of this study was to conduct interviews with ICU staff in University Hospitals Bristol and Weston National Health Service Foundation Trust to elicit requirements for a bespoke dashboard to monitor the high volume of patients, particularly during the COVID-19 pandemic. Methods: We conducted six semistructured interviews with clinical staff to obtain an overview of their requirements for the dashboard and to ensure its ultimate suitability for end users. Interview questions aimed to understand the job roles undertaken in the ICU, potential uses of the dashboard, specific issues associated with managing COVID-19 patients, key data of interest, and any concerns about the introduction of a dashboard into the ICU. Results: From our interviews, we found the following design requirements: (1) a flexible dashboard, where the functionality can be updated quickly and effectively to respond to emerging information about the management of this new disease; (2) a mobile dashboard, which allows staff to move around on wards with a dashboard, thus potentially replacing paper forms to enable detailed and consistent data entry; (3) a customizable and intuitive dashboard, where individual users would be able to customize the appearance of the dashboard to suit their role; (4) real-time data and trend analysis via informative data visualizations that help busy ICU staff to understand a patient's clinical trajectory; and (5) the ability to manage tasks and staff, tracking both staff and patient movements, handovers, and task monitoring to ensure the highest quality of care. Conclusions: The findings of this study confirm that digital solutions for ICU use would potentially reduce the cognitive load of ICU staff and reduce clinical errors at a time of notably high demand of intensive health care. ", doi="10.2196/30523", url="https://humanfactors.jmir.org/2022/2/e30523", url="http://www.ncbi.nlm.nih.gov/pubmed/35038301" } @Article{info:doi/10.2196/29982, author="Park, Yeongjun James and Hsu, Tzu-Chun and Hu, Jiun-Ruey and Chen, Chun-Yuan and Hsu, Wan-Ting and Lee, Matthew and Ho, Joshua and Lee, Chien-Chang", title="Predicting Sepsis Mortality in a Population-Based National Database: Machine Learning Approach", journal="J Med Internet Res", year="2022", month="Apr", day="13", volume="24", number="4", pages="e29982", keywords="sepsis", keywords="mortality", keywords="machine learning", keywords="SuperLearner", abstract="Background: Although machine learning (ML) algorithms have been applied to point-of-care sepsis prognostication, ML has not been used to predict sepsis mortality in an administrative database. Therefore, we examined the performance of common ML algorithms in predicting sepsis mortality in adult patients with sepsis and compared it with that of the conventional context knowledge--based logistic regression approach. Objective: The aim of this study is to examine the performance of common ML algorithms in predicting sepsis mortality in adult patients with sepsis and compare it with that of the conventional context knowledge--based logistic regression approach. Methods: We examined inpatient admissions for sepsis in the US National Inpatient Sample using hospitalizations in 2010-2013 as the training data set. We developed four ML models to predict in-hospital mortality: logistic regression with least absolute shrinkage and selection operator regularization, random forest, gradient-boosted decision tree, and deep neural network. To estimate their performance, we compared our models with the Super Learner model. Using hospitalizations in 2014 as the testing data set, we examined the models' area under the receiver operating characteristic curve (AUC), confusion matrix results, and net reclassification improvement. Results: Hospitalizations of 923,759 adults were included in the analysis. Compared with the reference logistic regression (AUC: 0.786, 95\% CI 0.783-0.788), all ML models showed superior discriminative ability (P<.001), including logistic regression with least absolute shrinkage and selection operator regularization (AUC: 0.878, 95\% CI 0.876-0.879), random forest (AUC: 0.878, 95\% CI 0.877-0.880), xgboost (AUC: 0.888, 95\% CI 0.886-0.889), and neural network (AUC: 0.893, 95\% CI 0.891-0.895). All 4 ML models showed higher sensitivity, specificity, positive predictive value, and negative predictive value compared with the reference logistic regression model (P<.001). We obtained similar results from the Super Learner model (AUC: 0.883, 95\% CI 0.881-0.885). Conclusions: ML approaches can improve sensitivity, specificity, positive predictive value, negative predictive value, discrimination, and calibration in predicting in-hospital mortality in patients hospitalized with sepsis in the United States. These models need further validation and could be applied to develop more accurate models to compare risk-standardized mortality rates across hospitals and geographic regions, paving the way for research and policy initiatives studying disparities in sepsis care. ", doi="10.2196/29982", url="https://www.jmir.org/2022/4/e29982", url="http://www.ncbi.nlm.nih.gov/pubmed/35416785" } @Article{info:doi/10.2196/31461, author="Emani, Srinivas and Rui, Angela and Rocha, Lima Hermano Alexandre and Rizvi, F. Rubina and Jua{\c{c}}aba, Ferreira Sergio and Jackson, Purcell Gretchen and Bates, W. David", title="Physicians' Perceptions of and Satisfaction With Artificial Intelligence in Cancer Treatment: A Clinical Decision Support System Experience and Implications for Low-Middle--Income Countries", journal="JMIR Cancer", year="2022", month="Apr", day="7", volume="8", number="2", pages="e31461", keywords="artificial intelligence", keywords="cancer", keywords="low-middle--income countries", keywords="physicians", keywords="perceptions", keywords="Watson for Oncology", keywords="implementation", keywords="local context", doi="10.2196/31461", url="https://cancer.jmir.org/2022/2/e31461", url="http://www.ncbi.nlm.nih.gov/pubmed/35389353" } @Article{info:doi/10.2196/35543, author="van Veenendaal, Haske and Peters, J. Loes and Ubbink, T. Dirk and Stubenrouch, E. Fabienne and Stiggelbout, M. Anne and Brand, LP Paul and Vreugdenhil, Gerard and Hilders, GJM Carina", title="Effectiveness of Individual Feedback and Coaching on Shared Decision-making Consultations in Oncology Care: Protocol for a Randomized Clinical Trial", journal="JMIR Res Protoc", year="2022", month="Apr", day="6", volume="11", number="4", pages="e35543", keywords="decision-making", keywords="shared", keywords="education", keywords="professional", keywords="feedback learning", keywords="coaching", keywords="medical consultation", keywords="medical oncology", keywords="palliative care", abstract="Background: Shared decision-making (SDM) is particularly important in oncology as many treatments involve serious side effects, and treatment decisions involve a trade-off between benefits and risks. However, the implementation of SDM in oncology care is challenging, and clinicians state that it is difficult to apply SDM in their actual workplace. Training clinicians is known to be an effective means of improving SDM but is considered time consuming. Objective: This study aims to address the effectiveness of an individual SDM training program using the concept of deliberate practice. Methods: This multicenter, single-blinded randomized clinical trial will be performed at 12 Dutch hospitals. Clinicians involved in decisions with oncology patients will be invited to participate in the study and allocated to the control or intervention group. All clinicians will record 3 decision-making processes with 3 different oncology patients. Clinicians in the intervention group will receive the following SDM intervention: completing e-learning, reflecting on feedback reports, performing a self-assessment and defining 1 to 3 personal learning questions, and participating in face-to-face coaching. Clinicians in the control group will not receive the SDM intervention until the end of the study. The primary outcome will be the extent to which clinicians involve their patients in the decision-making process, as scored using the Observing Patient Involvement--5 instrument. As secondary outcomes, patients will rate their perceived involvement in decision-making, and the duration of the consultations will be registered. All participating clinicians and their patients will receive information about the study and complete an informed consent form beforehand. Results: This trial was retrospectively registered on August 03, 2021. Approval for the study was obtained from the ethical review board (medical research ethics committee Delft and Leiden, the Netherlands [N20.170]). Recruitment and data collection procedures are ongoing and are expected to be completed by July 2022; we plan to complete data analyses by December 2022. As of February 2022, a total of 12 hospitals have been recruited to participate in the study, and 30 clinicians have started the SDM training program. Conclusions: This theory-based and blended approach will increase our knowledge of effective and feasible training methods for clinicians in the field of SDM. The intervention will be tailored to the context of individual clinicians and will target the knowledge, attitude, and skills of clinicians. The patients will also be involved in the design and implementation of the study. Trial Registration: Netherlands Trial Registry NL9647; https://www.trialregister.nl/trial/9647 International Registered Report Identifier (IRRID): DERR1-10.2196/35543 ", doi="10.2196/35543", url="https://www.researchprotocols.org/2022/4/e35543", url="http://www.ncbi.nlm.nih.gov/pubmed/35383572" } @Article{info:doi/10.2196/32399, author="Lowery, Julie and Fagerlin, Angela and Larkin, R. Angela and Wiener, S. Renda and Skurla, E. Sarah and Caverly, J. Tanner", title="Implementation of a Web-Based Tool for Shared Decision-making in Lung Cancer Screening: Mixed Methods Quality Improvement Evaluation", journal="JMIR Hum Factors", year="2022", month="Apr", day="1", volume="9", number="2", pages="e32399", keywords="shared decision-making", keywords="lung cancer", keywords="screening", keywords="clinical decision support", keywords="academic detailing", keywords="quality improvement", keywords="implementation", abstract="Background: Lung cancer risk and life expectancy vary substantially across patients eligible for low-dose computed tomography lung cancer screening (LCS), which has important consequences for optimizing LCS decisions for different patients. To account for this heterogeneity during decision-making, web-based decision support tools are needed to enable quick calculations and streamline the process of obtaining individualized information that more accurately informs patient-clinician LCS discussions. We created DecisionPrecision, a clinician-facing web-based decision support tool, to help tailor the LCS discussion to a patient's individualized lung cancer risk and estimated net benefit. Objective: The objective of our study is to test two strategies for implementing DecisionPrecision in primary care at eight Veterans Affairs medical centers: a quality improvement (QI) training approach and academic detailing (AD). Methods: Phase 1 comprised a multisite, cluster randomized trial comparing the effectiveness of standard implementation (adding a link to DecisionPrecision in the electronic health record vs standard implementation plus the Learn, Engage, Act, and Process [LEAP] QI training program). The primary outcome measure was the use of DecisionPrecision at each site before versus after LEAP QI training. The second phase of the study examined the potential effectiveness of AD as an implementation strategy for DecisionPrecision at all 8 medical centers. Outcomes were assessed by comparing absolute tool use before and after AD visits and conducting semistructured interviews with a subset of primary care physicians (PCPs) following the AD visits. Results: Phase 1 findings showed that sites that participated in the LEAP QI training program used DecisionPrecision significantly more often than the standard implementation sites (tool used 190.3, SD 174.8 times on average over 6 months at LEAP sites vs 3.5 SD 3.7 at standard sites; P<.001). However, this finding was confounded by the lack of screening coordinators at standard implementation sites. In phase 2, there was no difference in the 6-month tool use between before and after AD (95\% CI ?5.06 to 6.40; P=.82). Follow-up interviews with PCPs indicated that the AD strategy increased provider awareness and appreciation for the benefits of the tool. However, other priorities and limited time prevented PCPs from using them during routine clinical visits. Conclusions: The phase 1 findings did not provide conclusive evidence of the benefit of a QI training approach for implementing a decision support tool for LCS among PCPs. In addition, phase 2 findings showed that our light-touch, single-visit AD strategy did not increase tool use. To enable tool use by PCPs, prediction-based tools must be fully automated and integrated into electronic health records, thereby helping providers personalize LCS discussions among their many competing demands. PCPs also need more time to engage in shared decision-making discussions with their patients. Trial Registration: ClinicalTrials.gov NCT02765412; https://clinicaltrials.gov/ct2/show/NCT02765412 ", doi="10.2196/32399", url="https://humanfactors.jmir.org/2022/2/e32399", url="http://www.ncbi.nlm.nih.gov/pubmed/35363144" } @Article{info:doi/10.2196/35373, author="Thapa, Rahul and Garikipati, Anurag and Shokouhi, Sepideh and Hurtado, Myrna and Barnes, Gina and Hoffman, Jana and Calvert, Jacob and Katzmann, Lynne and Mao, Qingqing and Das, Ritankar", title="Predicting Falls in Long-term Care Facilities: Machine Learning Study", journal="JMIR Aging", year="2022", month="Apr", day="1", volume="5", number="2", pages="e35373", keywords="vital signs", keywords="machine learning", keywords="blood pressure", keywords="skilled nursing facilities", keywords="independent living facilities", keywords="assisted living facilities", keywords="fall prediction", keywords="elderly care", keywords="elderly population", keywords="older adult", keywords="aging", abstract="Background: Short-term fall prediction models that use electronic health records (EHRs) may enable the implementation of dynamic care practices that specifically address changes in individualized fall risk within senior care facilities. Objective: The aim of this study is to implement machine learning (ML) algorithms that use EHR data to predict a 3-month fall risk in residents from a variety of senior care facilities providing different levels of care. Methods: This retrospective study obtained EHR data (2007-2021) from Juniper Communities' proprietary database of 2785 individuals primarily residing in skilled nursing facilities, independent living facilities, and assisted living facilities across the United States. We assessed the performance of 3 ML-based fall prediction models and the Juniper Communities' fall risk assessment. Additional analyses were conducted to examine how changes in the input features, training data sets, and prediction windows affected the performance of these models. Results: The Extreme Gradient Boosting model exhibited the highest performance, with an area under the receiver operating characteristic curve of 0.846 (95\% CI 0.794-0.894), specificity of 0.848, diagnostic odds ratio of 13.40, and sensitivity of 0.706, while achieving the best trade-off in balancing true positive and negative rates. The number of active medications was the most significant feature associated with fall risk, followed by a resident's number of active diseases and several variables associated with vital signs, including diastolic blood pressure and changes in weight and respiratory rates. The combination of vital signs with traditional risk factors as input features achieved higher prediction accuracy than using either group of features alone. Conclusions: This study shows that the Extreme Gradient Boosting technique can use a large number of features from EHR data to make short-term fall predictions with a better performance than that of conventional fall risk assessments and other ML models. The integration of routinely collected EHR data, particularly vital signs, into fall prediction models may generate more accurate fall risk surveillance than models without vital signs. Our data support the use of ML models for dynamic, cost-effective, and automated fall predictions in different types of senior care facilities. ", doi="10.2196/35373", url="https://aging.jmir.org/2022/2/e35373", url="http://www.ncbi.nlm.nih.gov/pubmed/35363146" } @Article{info:doi/10.2196/33145, author="Al-Zubaidy, Mohaimen and Hogg, Jeffry H. D. and Maniatopoulos, Gregory and Talks, James and Teare, Dawn Marion and Keane, A. Pearse and R Beyer, Fiona", title="Stakeholder Perspectives on Clinical Decision Support Tools to Inform Clinical Artificial Intelligence Implementation: Protocol for a Framework Synthesis for Qualitative Evidence", journal="JMIR Res Protoc", year="2022", month="Apr", day="1", volume="11", number="4", pages="e33145", keywords="artificial intelligence", keywords="clinical decision support tools", keywords="digital health", keywords="implementation", keywords="qualitative evidence synthesis", keywords="stakeholders", keywords="clinical decision", keywords="decision support", abstract="Background: Quantitative systematic reviews have identified clinical artificial intelligence (AI)-enabled tools with adequate performance for real-world implementation. To our knowledge, no published report or protocol synthesizes the full breadth of stakeholder perspectives. The absence of such a rigorous foundation perpetuates the ``AI chasm,'' which continues to delay patient benefit. Objective: The aim of this research is to synthesize stakeholder perspectives of computerized clinical decision support tools in any health care setting. Synthesized findings will inform future research and the implementation of AI into health care services. Methods: The search strategy will use MEDLINE (Ovid), Scopus, CINAHL (EBSCO), ACM Digital Library, and Science Citation Index (Web of Science). Following deduplication, title, abstract, and full text screening will be performed by 2 independent reviewers with a third topic expert arbitrating. The quality of included studies will be appraised to support interpretation. Best-fit framework synthesis will be performed, with line-by-line coding completed by 2 independent reviewers. Where appropriate, these findings will be assigned to 1 of 22 a priori themes defined by the Nonadoption, Abandonment, Scale-up, Spread, and Sustainability framework. New domains will be inductively generated for outlying findings. The placement of findings within themes will be reviewed iteratively by a study advisory group including patient and lay representatives. Results: Study registration was obtained from PROSPERO (CRD42021256005) in May 2021. Final searches were executed in April, and screening is ongoing at the time of writing. Full text data analysis is due to be completed in October 2021. We anticipate that the study will be submitted for open-access publication in late 2021. Conclusions: This paper describes the protocol for a qualitative evidence synthesis aiming to define barriers and facilitators to the implementation of computerized clinical decision support tools from all relevant stakeholders. The results of this study are intended to expedite the delivery of patient benefit from AI-enabled clinical tools. Trial Registration: PROSPERO CRD42021256005; https://tinyurl.com/r4x3thvp International Registered Report Identifier (IRRID): DERR1-10.2196/33145 ", doi="10.2196/33145", url="https://www.researchprotocols.org/2022/4/e33145", url="http://www.ncbi.nlm.nih.gov/pubmed/35363141" } @Article{info:doi/10.2196/32949, author="Jung, Christian and Mamandipoor, Behrooz and Fj{\o}lner, Jesper and Bruno, Romano Raphael and Wernly, Bernhard and Artigas, Antonio and Bollen Pinto, Bernardo and Schefold, C. Joerg and Wolff, Georg and Kelm, Malte and Beil, Michael and Sviri, Sigal and van Heerden, V. Peter and Szczeklik, Wojciech and Czuczwar, Miroslaw and Elhadi, Muhammed and Joannidis, Michael and Oeyen, Sandra and Zafeiridis, Tilemachos and Marsh, Brian and Andersen, H. Finn and Moreno, Rui and Cecconi, Maurizio and Leaver, Susannah and De Lange, W. Dylan and Guidet, Bertrand and Flaatten, Hans and Osmani, Venet", title="Disease-Course Adapting Machine Learning Prognostication Models in Elderly Patients Critically Ill With COVID-19: Multicenter Cohort Study With External Validation", journal="JMIR Med Inform", year="2022", month="Mar", day="31", volume="10", number="3", pages="e32949", keywords="machine-based learning", keywords="outcome prediction", keywords="COVID-19", keywords="pandemic", keywords="machine learning", keywords="prediction models", keywords="clinical informatics", keywords="patient data", keywords="elderly population", abstract="Background: The COVID-19 pandemic caused by SARS-CoV-2 is challenging health care systems globally. The disease disproportionately affects the elderly population, both in terms of disease severity and mortality risk. Objective: The aim of this study was to evaluate machine learning--based prognostication models for critically ill elderly COVID-19 patients, which dynamically incorporated multifaceted clinical information on evolution of the disease. Methods: This multicenter cohort study (COVIP study) obtained patient data from 151 intensive care units (ICUs) from 26 countries. Different models based on the Sequential Organ Failure Assessment (SOFA) score, logistic regression (LR), random forest (RF), and extreme gradient boosting (XGB) were derived as baseline models that included admission variables only. We subsequently included clinical events and time-to-event as additional variables to derive the final models using the same algorithms and compared their performance with that of the baseline group. Furthermore, we derived baseline and final models on a European patient cohort, which were externally validated on a non-European cohort that included Asian, African, and US patients. Results: In total, 1432 elderly (?70 years old) COVID-19--positive patients admitted to an ICU were included for analysis. Of these, 809 (56.49\%) patients survived up to 30 days after admission. The average length of stay was 21.6 (SD 18.2) days. Final models that incorporated clinical events and time-to-event information provided superior performance (area under the receiver operating characteristic curve of 0.81; 95\% CI 0.804-0.811), with respect to both the baseline models that used admission variables only and conventional ICU prediction models (SOFA score, P<.001). The average precision increased from 0.65 (95\% CI 0.650-0.655) to 0.77 (95\% CI 0.759-0.770). Conclusions: Integrating important clinical events and time-to-event information led to a superior accuracy of 30-day mortality prediction compared with models based on the admission information and conventional ICU prediction models. This study shows that machine-learning models provide additional information and may support complex decision-making in critically ill elderly COVID-19 patients. Trial Registration: ClinicalTrials.gov NCT04321265; https://clinicaltrials.gov/ct2/show/NCT04321265 ", doi="10.2196/32949", url="https://medinform.jmir.org/2022/3/e32949", url="http://www.ncbi.nlm.nih.gov/pubmed/35099394" } @Article{info:doi/10.2196/34096, author="McLeod, Graeme and Kennedy, Iain and Simpson, Eilidh and Joss, Judith and Goldmann, Katriona", title="Pilot Project for a Web-Based Dynamic Nomogram to Predict Survival 1 Year After Hip Fracture Surgery: Retrospective Observational Study", journal="Interact J Med Res", year="2022", month="Mar", day="30", volume="11", number="1", pages="e34096", keywords="hip fracture", keywords="survival", keywords="prediction", keywords="nomogram", keywords="web", keywords="surgery", keywords="postoperative", keywords="machine learning", keywords="model", keywords="mortality", keywords="hip", keywords="fracture", abstract="Background: Hip fracture is associated with high mortality. Identification of individual risk informs anesthetic and surgical decision-making and can reduce the risk of death. However, interpreting mathematical models and applying them in clinical practice can be difficult. There is a need to simplify risk indices for clinicians and laypeople alike. Objective: Our primary objective was to develop a web-based nomogram for prediction of survival up to 365 days after hip fracture surgery. Methods: We collected data from 329 patients. Our variables included sex; age; BMI; white cell count; levels of lactate, creatinine, hemoglobin, and C-reactive protein; physical status according to the American Society of Anesthesiologists Physical Status Classification System; socioeconomic status; duration of surgery; total time in the operating room; side of surgery; and procedure urgency. Thereafter, we internally calibrated and validated a Cox proportional hazards model of survival 365 days after hip fracture surgery; logistic regression models of survival 30, 120, and 365 days after surgery; and a binomial model. To present the models on a laptop, tablet, or mobile phone in a user-friendly way, we built an app using Shiny (RStudio). The app showed a drop-down box for model selection and horizontal sliders for data entry, model summaries, and prediction and survival plots. A slider represented patient follow-up over 365 days. Results: Of the 329 patients, 24 (7.3\%) died within 30 days of surgery, 65 (19.8\%) within 120 days, and 94 (28.6\%) within 365 days. In all models, the independent predictors of mortality were age, BMI, creatinine level, and lactate level. The logistic model also incorporated white cell count as a predictor. The Cox proportional hazards model showed that mortality differed as follows: age 80 vs 60 years had a hazard ratio (HR) of 0.6 (95\% CI 0.3-1.1), a plasma lactate level of 2 vs 1 mmol/L had an HR of 2.4 (95\% CI 1.5-3.9), and a plasma creatinine level of 60 vs 90 mol/L had an HR of 2.3 (95\% CI 1.3-3.9). Conclusions: In conclusion, we provide an easy-to-read web-based nomogram that predicts survival up to 365 days after hip fracture. The Cox proportional hazards model and logistic models showed good discrimination, with concordance index values of 0.732 and 0.781, respectively. ", doi="10.2196/34096", url="https://www.i-jmr.org/2022/1/e34096", url="http://www.ncbi.nlm.nih.gov/pubmed/35238320" } @Article{info:doi/10.2196/30577, author="Oommen, Thomas and Thommandram, Anirudh and Palanica, Adam and Fossat, Yan", title="A Free Open-Source Bayesian Vancomycin Dosing App for Adults: Design and Evaluation Study", journal="JMIR Form Res", year="2022", month="Mar", day="30", volume="6", number="3", pages="e30577", keywords="medical informatics", keywords="therapeutic drug monitoring", keywords="vancomcyin", keywords="Bayesian prediction", keywords="drug monitoring", keywords="clinical data", keywords="tool development", keywords="digital health tools", abstract="Background: It has been suggested that Bayesian dosing apps can assist in the therapeutic drug monitoring of patients receiving vancomycin. Unfortunately, Bayesian dosing tools are often unaffordable to resource-limited hospitals. Our aim was to improve vancomycin dosing in adults. We created a free and open-source dose adjustment app, VancoCalc, which uses Bayesian inference to aid clinicians in dosing and monitoring of vancomycin. Objective: The aim of this paper is to describe the design, development, usability, and evaluation of a free open-source Bayesian vancomycin dosing app, VancoCalc. Methods: The app build and model fitting process were described. Previously published pharmacokinetic models were used as priors. The ability of the app to predict vancomycin concentrations was performed using a small data set comprising of 52 patients, aged 18 years and over, who received at least 1 dose of intravenous vancomycin and had at least 2 vancomycin concentrations drawn between July 2018 and January 2021 at Lakeridge Health Corporation Ontario, Canada. With these estimated and actual concentrations, median prediction error (bias), median absolute error (accuracy), and root mean square error (precision) were calculated to evaluate the accuracy of the Bayesian estimated pharmacokinetic parameters. Results: A total of 52 unique patients' initial vancomycin concentrations were used to predict subsequent concentration; 104 total vancomycin concentrations were assessed. The median prediction error was --0.600 ug/mL (IQR --3.06, 2.95), the median absolute error was 3.05 ug/mL (IQR 1.44, 4.50), and the root mean square error was 5.34. Conclusions: We described a free, open-source Bayesian vancomycin dosing calculator based on revisions of currently available calculators. Based on this small retrospective preliminary sample of patients, the app offers reasonable accuracy and bias, which may be used in everyday practice. By offering this free, open-source app, further prospective validation could be implemented in the near future. ", doi="10.2196/30577", url="https://formative.jmir.org/2022/3/e30577", url="http://www.ncbi.nlm.nih.gov/pubmed/35353046" } @Article{info:doi/10.2196/34207, author="Subramanian, Hemang and Subramanian, Susmitha", title="Improving Diagnosis Through Digital Pathology: Proof-of-Concept Implementation Using Smart Contracts and Decentralized File Storage", journal="J Med Internet Res", year="2022", month="Mar", day="28", volume="24", number="3", pages="e34207", keywords="digital pathology", keywords="nonfungible token standard", keywords="decentralized storage", keywords="security and patient data confidentiality using design", keywords="pathology", keywords="storage", keywords="security", keywords="confidentiality", keywords="data", keywords="design", keywords="diagnosis", keywords="proof of concept", keywords="implementation", keywords="software", keywords="blockchain", keywords="limitation", keywords="privacy", abstract="Background: Recent advancements in digital pathology resulting from advances in imaging and digitization have increased the convenience and usability of pathology for disease diagnosis, especially in oncology, urology, and gastroenteric diagnosis. However, despite the possibilities to include low-cost diagnosis and viable telemedicine, digital pathology is not yet accessible owing to expensive storage, data security requirements, and network bandwidth limitations to transfer high-resolution images and associated data. The increase in storage, transmission, and security complexity concerning data collection and diagnosis makes it even more challenging to use artificial intelligence algorithms for machine-assisted disease diagnosis. We designed and prototyped a digital pathology system that uses blockchain-based smart contracts using the nonfungible token (NFT) standard and the Interplanetary File System for data storage. Our design remediates shortcomings in the existing digital pathology systems infrastructure, which is centralized. The proposed design is extendable to other fields of medicine that require high-fidelity image and data storage. Our solution is implemented in data systems that can improve access quality of care and reduce the cost of access to specialized pathological diagnosis, reducing cycle times for diagnosis. Objective: The main objectives of this study are to highlight the issues in digital pathology and suggest that a software architecture--based blockchain and the Interplanetary File System create a low-cost data storage and transmission technology. Methods: We used the design science research method consisting of 6 stages to inform our design overall. We innovated over existing public-private designs for blockchains but using a 2-layered approach that separates actual file storage from metadata and data persistence. Results: Here, we identified key challenges to adopting digital pathology, including challenges concerning long-term storage and the transmission of information. Next, using accepted frameworks in NFT-based intelligent contracts and recent innovations in distributed secure storage, we proposed a decentralized, secure, and privacy-preserving digital pathology system. Our design and prototype implementation using Solidity, web3.js, Ethereum, and node.js helped us address several challenges facing digital pathology. We demonstrated how our solution, which combines NFT smart contract standard with persistent decentralized file storage, solves most of the challenges of digital pathology and sets the stage for reducing costs and improving patient care and speed of diagnosis. Conclusions: We identified technical limitations that increase costs and reduce the mass adoption of digital pathology. We presented several design innovations using NFT decentralized storage standards to prototype a system. We also presented the implementation details of a unique security architecture for a digital pathology system. We illustrated how this design can overcome privacy, security, network-based storage, and data transmission limitations. We illustrated how improving these factors sets the stage for improving data quality and standardized application of machine learning and artificial intelligence to such data. ", doi="10.2196/34207", url="https://www.jmir.org/2022/3/e34207", url="http://www.ncbi.nlm.nih.gov/pubmed/35343905" } @Article{info:doi/10.2196/32508, author="Su, Po-Yuan and Wei, Yi-Chia and Luo, Hao and Liu, Chi-Hung and Huang, Wen-Yi and Chen, Kuan-Fu and Lin, Ching-Po and Wei, Hung-Yu and Lee, Tsong-Hai", title="Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study", journal="JMIR Med Inform", year="2022", month="Mar", day="25", volume="10", number="3", pages="e32508", keywords="cerebrovascular disease", keywords="acute ischemic stroke", keywords="machine learning", keywords="random forest", keywords="early outcome", keywords="prediction", keywords="explanation", keywords="SHapley Additive exPlanations", abstract="Background: Timely and accurate outcome prediction plays a vital role in guiding clinical decisions on acute ischemic stroke. Early condition deterioration and severity after the acute stage are determinants for long-term outcomes. Therefore, predicting early outcomes is crucial in acute stroke management. However, interpreting the predictions and transforming them into clinically explainable concepts are as important as the predictions themselves. Objective: This work focused on machine learning model analysis in predicting the early outcomes of ischemic stroke and used model explanation skills in interpreting the results. Methods: Acute ischemic stroke patients registered on the Stroke Registry of the Chang Gung Healthcare System (SRICHS) in 2009 were enrolled for machine learning predictions of the two primary outcomes: modified Rankin Scale (mRS) at hospital discharge and in-hospital deterioration. We compared 4 machine learning models, namely support vector machine (SVM), random forest (RF), light gradient boosting machine (LGBM), and deep neural network (DNN), with the area under the curve (AUC) of the receiver operating characteristic curve. Further, 3 resampling methods, random under sampling (RUS), random over sampling, and the synthetic minority over-sampling technique, dealt with the imbalanced data. The models were explained based on the ranking of feature importance and the SHapley Additive exPlanations (SHAP). Results: RF performed well in both outcomes (discharge mRS: mean AUC 0.829, SD 0.018; in-hospital deterioration: mean AUC 0.710, SD 0.023 on original data and 0.728, SD 0.036 on resampled data with RUS for imbalanced data). In addition, DNN outperformed other models in predicting in-hospital deterioration on data without resampling (mean AUC 0.732, SD 0.064). In general, resampling contributed to the limited improvement of model performance in predicting in-hospital deterioration using imbalanced data. The features obtained from the National Institutes of Health Stroke Scale (NIHSS), white blood cell differential counts, and age were the key features for predicting discharge mRS. In contrast, the NIHSS total score, initial blood pressure, having diabetes mellitus, and features from hemograms were the most important features in predicting in-hospital deterioration. The SHAP summary described the impacts of the feature values on each outcome prediction. Conclusions: Machine learning models are feasible in predicting early stroke outcomes. An enriched feature bank could improve model performance. Initial neurological levels and age determined the activity independence at hospital discharge. In addition, physiological and laboratory surveillance aided in predicting in-hospital deterioration. The use of the SHAP explanatory method successfully transformed machine learning predictions into clinically meaningful results. ", doi="10.2196/32508", url="https://medinform.jmir.org/2022/3/e32508", url="http://www.ncbi.nlm.nih.gov/pubmed/35072631" } @Article{info:doi/10.2196/33325, author="Rosen, K. Rochelle and Garbern, C. Stephanie and Gainey, Monique and Lantini, Ryan and Nasrin, Sabiha and Nelson, J. Eric and Elshabassi, Nour and Alam, H. Nur and Sultana, Sufia and Hasnin, Tahmida and Qu, Kexin and Schmid, H. Christopher and Levine, C. Adam", title="Designing a Novel Clinician Decision Support Tool for the Management of Acute Diarrhea in Bangladesh: Formative Qualitative Study", journal="JMIR Hum Factors", year="2022", month="Mar", day="25", volume="9", number="1", pages="e33325", keywords="clinical decision support tools", keywords="diarrhea management", keywords="focus group", keywords="formative qualitative research", keywords="low- and middle-income countries", keywords="mobile phone", abstract="Background: The availability of mobile clinical decision support (CDS) tools has grown substantially with the increased prevalence of smartphone devices and apps. Although health care providers express interest in integrating mobile health (mHealth) technologies into their clinical settings, concerns have been raised, including perceived disagreements between information provided by mobile CDS tools and standard guidelines. Despite their potential to transform health care delivery, there remains limited literature on the provider's perspective on the clinical utility of mobile CDS tools for improving patient outcomes, especially in low- and middle-income countries. Objective: This study aims to describe providers' perceptions about the utility of a mobile CDS tool accessed via a smartphone app for diarrhea management in Bangladesh. In addition, feedback was collected on the preliminary components of the mobile CDS tool to address clinicians' concerns and incorporate their preferences. Methods: From November to December 2020, qualitative data were gathered through 8 web-based focus group discussions with physicians and nurses from 3 Bangladeshi hospitals. Each discussion was conducted in the local language---Bangla---and audio recorded for transcription and translation by the local research team. Transcripts and codes were entered into NVivo (version 12; QSR International), and applied thematic analysis was used to identify themes that explore the clinical utility of an mHealth app for assessing dehydration severity in patients with acute diarrhea. Summaries of concepts and themes were generated from reviews of the aggregated coded data; thematic memos were written and used for the final analysis. Results: Of the 27 focus group participants, 14 (52\%) were nurses and 13 (48\%) were physicians; 15 (56\%) worked at a diarrhea specialty hospital and 12 (44\%) worked in government district or subdistrict hospitals. Participants' experience in their current position ranged from 2 to 14 years, with an average of 10.3 (SD 9.0) years. Key themes from the qualitative data analysis included current experience with CDS, overall perception of the app's utility and its potential role in clinical care, barriers to and facilitators of app use, considerations of overtreatment and undertreatment, and guidelines for the app's clinical recommendations. Participants felt that the tool would initially take time to use, but once learned, it could be useful during epidemic cholera. Some felt that clinical experience remains an important part of treatment that can be supplemented, but not replaced, by a CDS tool. In addition, diagnostic information, including mid-upper arm circumference and blood pressure, might not be available to directly inform programming decisions. Conclusions: Participants were positive about the mHealth app and its potential to inform diarrhea management. They provided detailed feedback, which developers used to revise the mobile CDS tool. These formative qualitative data provided timely and relevant feedback to improve the utility of a CDS tool for diarrhea treatment in Bangladesh. ", doi="10.2196/33325", url="https://humanfactors.jmir.org/2022/1/e33325", url="http://www.ncbi.nlm.nih.gov/pubmed/35333190" } @Article{info:doi/10.2196/34201, author="Liu, Nan and Xie, Feng and Siddiqui, Javaid Fahad and Ho, Wah Andrew Fu and Chakraborty, Bibhas and Nadarajan, Devi Gayathri and Tan, Kiat Kenneth Boon and Ong, Hock Marcus Eng", title="Leveraging Large-Scale Electronic Health Records and Interpretable Machine Learning for Clinical Decision Making at the Emergency Department: Protocol for System Development and Validation", journal="JMIR Res Protoc", year="2022", month="Mar", day="25", volume="11", number="3", pages="e34201", keywords="electronic health records", keywords="machine learning", keywords="clinical decision making", keywords="emergency department", abstract="Background: There is a growing demand globally for emergency department (ED) services. An increase in ED visits has resulted in overcrowding and longer waiting times. The triage process plays a crucial role in assessing and stratifying patients' risks and ensuring that the critically ill promptly receive appropriate priority and emergency treatment. A substantial amount of research has been conducted on the use of machine learning tools to construct triage and risk prediction models; however, the black box nature of these models has limited their clinical application and interpretation. Objective: In this study, we plan to develop an innovative, dynamic, and interpretable System for Emergency Risk Triage (SERT) for risk stratification in the ED by leveraging large-scale electronic health records (EHRs) and machine learning. Methods: To achieve this objective, we will conduct a retrospective, single-center study based on a large, longitudinal data set obtained from the EHRs of the largest tertiary hospital in Singapore. Study outcomes include adverse events experienced by patients, such as the need for an intensive care unit and inpatient death. With preidentified candidate variables drawn from expert opinions and relevant literature, we will apply an interpretable machine learning--based AutoScore to develop 3 SERT scores. These 3 scores can be used at different times in the ED, that is, on arrival, during ED stay, and at admission. Furthermore, we will compare our novel SERT scores with established clinical scores and previously described black box machine learning models as baselines. Receiver operating characteristic analysis will be conducted on the testing cohorts for performance evaluation. Results: The study is currently being conducted. The extracted data indicate approximately 1.8 million ED visits by over 810,000 unique patients. Modelling results are expected to be published in 2022. Conclusions: The SERT scoring system proposed in this study will be unique and innovative because of its dynamic nature and modelling transparency. If successfully validated, our proposed solution will establish a standard for data processing and modelling by taking advantage of large-scale EHRs and interpretable machine learning tools. International Registered Report Identifier (IRRID): DERR1-10.2196/34201 ", doi="10.2196/34201", url="https://www.researchprotocols.org/2022/3/e34201", url="http://www.ncbi.nlm.nih.gov/pubmed/35333179" } @Article{info:doi/10.2196/28639, author="Knop, Michael and Weber, Sebastian and Mueller, Marius and Niehaves, Bjoern", title="Human Factors and Technological Characteristics Influencing the Interaction of Medical Professionals With Artificial Intelligence--Enabled Clinical Decision Support Systems: Literature Review", journal="JMIR Hum Factors", year="2022", month="Mar", day="24", volume="9", number="1", pages="e28639", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="CDSS", keywords="decision-making", keywords="diagnostic decision support", keywords="human--computer interaction", keywords="human--AI collaboration", keywords="machine learning", keywords="patient outcomes", keywords="deep learning", keywords="trust", keywords="literature review", abstract="Background: The digitization and automation of diagnostics and treatments promise to alter the quality of health care and improve patient outcomes, whereas the undersupply of medical personnel, high workload on medical professionals, and medical case complexity increase. Clinical decision support systems (CDSSs) have been proven to help medical professionals in their everyday work through their ability to process vast amounts of patient information. However, comprehensive adoption is partially disrupted by specific technological and personal characteristics. With the rise of artificial intelligence (AI), CDSSs have become an adaptive technology with human-like capabilities and are able to learn and change their characteristics over time. However, research has not reflected on the characteristics and factors essential for effective collaboration between human actors and AI-enabled CDSSs. Objective: Our study aims to summarize the factors influencing effective collaboration between medical professionals and AI-enabled CDSSs. These factors are essential for medical professionals, management, and technology designers to reflect on the adoption, implementation, and development of an AI-enabled CDSS. Methods: We conducted a literature review including 3 different meta-databases, screening over 1000 articles and including 101 articles for full-text assessment. Of the 101 articles, 7 (6.9\%) met our inclusion criteria and were analyzed for our synthesis. Results: We identified the technological characteristics and human factors that appear to have an essential effect on the collaboration of medical professionals and AI-enabled CDSSs in accordance with our research objective, namely, training data quality, performance, explainability, adaptability, medical expertise, technological expertise, personality, cognitive biases, and trust. Comparing our results with those from research on non-AI CDSSs, some characteristics and factors retain their importance, whereas others gain or lose relevance owing to the uniqueness of human-AI interactions. However, only a few (1/7, 14\%) studies have mentioned the theoretical foundations and patient outcomes related to AI-enabled CDSSs. Conclusions: Our study provides a comprehensive overview of the relevant characteristics and factors that influence the interaction and collaboration between medical professionals and AI-enabled CDSSs. Rather limited theoretical foundations currently hinder the possibility of creating adequate concepts and models to explain and predict the interrelations between these characteristics and factors. For an appropriate evaluation of the human-AI collaboration, patient outcomes and the role of patients in the decision-making process should be considered. ", doi="10.2196/28639", url="https://humanfactors.jmir.org/2022/1/e28639", url="http://www.ncbi.nlm.nih.gov/pubmed/35323118" } @Article{info:doi/10.2196/29943, author="Timiliotis, Joanna and Bl{\"u}mke, Bibiana and Serf{\"o}z{\"o}, Daniel Peter and Gilbert, Stephen and Ondr{\'e}sik, Marta and T{\"u}rk, Ewelina and Hirsch, Christian Martin and Eckstein, Jens", title="A Novel Diagnostic Decision Support System for Medical Professionals: Prospective Feasibility Study", journal="JMIR Form Res", year="2022", month="Mar", day="24", volume="6", number="3", pages="e29943", keywords="diagnostic decision support system", keywords="DDSS", keywords="probabilistic reasoning", keywords="artificial intelligence", keywords="dyspnea", keywords="emergency department", keywords="internal medicine", keywords="symptom checker", abstract="Background: Continuously growing medical knowledge and the increasing amount of data make it difficult for medical professionals to keep track of all new information and to place it in the context of existing information. A variety of digital technologies and artificial intelligence--based methods are currently available as persuasive tools to empower physicians in clinical decision-making and improve health care quality. A novel diagnostic decision support system (DDSS) prototype developed by Ada Health GmbH with a focus on traceability, transparency, and usability will be examined more closely in this study. Objective: The aim of this study is to test the feasibility and functionality of a novel DDSS prototype, exploring its potential and performance in identifying the underlying cause of acute dyspnea in patients at the University Hospital Basel. Methods: A prospective, observational feasibility study was conducted at the emergency department (ED) and internal medicine ward of the University Hospital Basel, Switzerland. A convenience sample of 20 adult patients admitted to the ED with dyspnea as the chief complaint and a high probability of inpatient admission was selected. A study physician followed the patients admitted to the ED throughout the hospitalization without interfering with the routine clinical work. Routinely collected health-related personal data from these patients were entered into the DDSS prototype. The DDSS prototype's resulting disease probability list was compared with the gold-standard main diagnosis provided by the treating physician. Results: The DDSS presented information with high clarity and had a user-friendly, novel, and transparent interface. The DDSS prototype was not perfectly suited for the ED as case entry was time-consuming (1.5-2 hours per case). It provided accurate decision support in the clinical inpatient setting (average of cases in which the correct diagnosis was the first diagnosis listed: 6/20, 30\%, SD 2.10\%; average of cases in which the correct diagnosis was listed as one of the top 3: 11/20, 55\%, SD 2.39\%; average of cases in which the correct diagnosis was listed as one of the top 5: 14/20, 70\%, SD 2.26\%) in patients with dyspnea as the main presenting complaint. Conclusions: The study of the feasibility and functionality of the tool was successful, with some limitations. Used in the right place, the DDSS has the potential to support physicians in their decision-making process by showing new pathways and unintentionally ignored diagnoses. The DDSS prototype had some limitations regarding the process of data input, diagnostic accuracy, and completeness of the integrated medical knowledge. The results of this study provide a basis for the tool's further development. In addition, future studies should be conducted with the aim to overcome the current limitations of the tool and study design. Trial Registration: ClinicalTrials.gov NCT04827342; https://clinicaltrials.gov/ct2/show/NCT04827342 ", doi="10.2196/29943", url="https://formative.jmir.org/2022/3/e29943", url="http://www.ncbi.nlm.nih.gov/pubmed/35323125" } @Article{info:doi/10.2196/27210, author="Mitchell, Ross Joseph and Szepietowski, Phillip and Howard, Rachel and Reisman, Phillip and Jones, D. Jennie and Lewis, Patricia and Fridley, L. Brooke and Rollison, E. Dana", title="A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study", journal="J Med Internet Res", year="2022", month="Mar", day="23", volume="24", number="3", pages="e27210", keywords="natural language processing", keywords="NLP", keywords="BERT", keywords="transformer", keywords="pathology", keywords="ICD-O-3", keywords="deep learning", keywords="cancer", abstract="Background: Information in pathology reports is critical for cancer care. Natural language processing (NLP) systems used to extract information from pathology reports are often narrow in scope or require extensive tuning. Consequently, there is growing interest in automated deep learning approaches. A powerful new NLP algorithm, bidirectional encoder representations from transformers (BERT), was published in late 2018. BERT set new performance standards on tasks as diverse as question answering, named entity recognition, speech recognition, and more. Objective: The aim of this study is to develop a BERT-based system to automatically extract detailed tumor site and histology information from free-text oncological pathology reports. Methods: We pursued three specific aims: extract accurate tumor site and histology descriptions from free-text pathology reports, accommodate the diverse terminology used to indicate the same pathology, and provide accurate standardized tumor site and histology codes for use by downstream applications. We first trained a base language model to comprehend the technical language in pathology reports. This involved unsupervised learning on a training corpus of 275,605 electronic pathology reports from 164,531 unique patients that included 121 million words. Next, we trained a question-and-answer (Q\&A) model that connects a Q\&A layer to the base pathology language model to answer pathology questions. Our Q\&A system was designed to search for the answers to two predefined questions in each pathology report: What organ contains the tumor? and What is the kind of tumor or carcinoma? This involved supervised training on 8197 pathology reports, each with ground truth answers to these 2 questions determined by certified tumor registrars. The data set included 214 tumor sites and 193 histologies. The tumor site and histology phrases extracted by the Q\&A model were used to predict International Classification of Diseases for Oncology, Third Edition (ICD-O-3), site and histology codes. This involved fine-tuning two additional BERT models: one to predict site codes and another to predict histology codes. Our final system includes a network of 3 BERT-based models. We call this CancerBERT network (caBERTnet). We evaluated caBERTnet using a sequestered test data set of 2050 pathology reports with ground truth answers determined by certified tumor registrars. Results: caBERTnet's accuracies for predicting group-level site and histology codes were 93.53\% (1895/2026) and 97.6\% (1993/2042), respectively. The top 5 accuracies for predicting fine-grained ICD-O-3 site and histology codes with 5 or more samples each in the training data set were 92.95\% (1794/1930) and 96.01\% (1853/1930), respectively. Conclusions: We have developed an NLP system that outperforms existing algorithms at predicting ICD-O-3 codes across an extensive range of tumor sites and histologies. Our new system could help reduce treatment delays, increase enrollment in clinical trials of new therapies, and improve patient outcomes. ", doi="10.2196/27210", url="https://www.jmir.org/2022/3/e27210", url="http://www.ncbi.nlm.nih.gov/pubmed/35319481" } @Article{info:doi/10.2196/30130, author="Strauss, T. Alexandra and Morgan, Cameron and El Khuri, Christopher and Slogeris, Becky and Smith, G. Aria and Klein, Eili and Toerper, Matt and DeAngelo, Anthony and Debraine, Arnaud and Peterson, Susan and Gurses, P. Ayse and Levin, Scott and Hinson, Jeremiah", title="A Patient Outcomes--Driven Feedback Platform for Emergency Medicine Clinicians: Human-Centered Design and Usability Evaluation of Linking Outcomes Of Patients (LOOP)", journal="JMIR Hum Factors", year="2022", month="Mar", day="23", volume="9", number="1", pages="e30130", keywords="emergency medicine", keywords="usability", keywords="human-centered design", keywords="health informatics", keywords="feedback", keywords="practice-based learning and improvement", keywords="emergency room", keywords="ER", keywords="platform", keywords="outcomes", keywords="closed-loop learning", abstract="Background: The availability of patient outcomes--based feedback is limited in episodic care environments such as the emergency department. Emergency medicine (EM) clinicians set care trajectories for a majority of hospitalized patients and provide definitive care to an even larger number of those discharged into the community. EM clinicians are often unaware of the short- and long-term health outcomes of patients and how their actions may have contributed. Despite large volumes of patients and data, outcomes-driven learning that targets individual clinician experiences is meager. Integrated electronic health record (EHR) systems provide opportunity, but they do not have readily available functionality intended for outcomes-based learning. Objective: This study sought to unlock insights from routinely collected EHR data through the development of an individualizable patient outcomes feedback platform for EM clinicians. Here, we describe the iterative development of this platform, Linking Outcomes Of Patients (LOOP), under a human-centered design framework, including structured feedback obtained from its use. Methods: This multimodal study consisting of human-centered design studios, surveys (24 physicians), interviews (11 physicians), and a LOOP application usability evaluation (12 EM physicians for ?30 minutes each) was performed between August 2019 and February 2021. The study spanned 3 phases: (1) conceptual development under a human-centered design framework, (2) LOOP technical platform development, and (3) usability evaluation comparing pre- and post-LOOP feedback gathering practices in the EHR. Results: An initial human-centered design studio and EM clinician surveys revealed common themes of disconnect between EM clinicians and their patients after the encounter. Fundamental postencounter outcomes of death (15/24, 63\% respondents identified as useful), escalation of care (20/24, 83\%), and return to ED (16/24, 67\%) were determined high yield for demonstrating proof-of-concept in our LOOP application. The studio aided the design and development of LOOP, which integrated physicians throughout the design and content iteration. A final LOOP prototype enabled usability evaluation and iterative refinement prior to launch. Usability evaluation compared to status quo (ie, pre-LOOP) feedback gathering practices demonstrated a shift across all outcomes from ``not easy'' to ``very easy'' to obtain and from ``not confident'' to ``very confident'' in estimating outcomes after using LOOP. On a scale from 0 (unlikely) to 10 (most likely), the users were very likely (9.5) to recommend LOOP to a colleague. Conclusions: This study demonstrates the potential for human-centered design of a patient outcomes--driven feedback platform for individual EM providers. We have outlined a framework for working alongside clinicians with a multidisciplined team to develop and test a tool that augments their clinical experience and enables closed-loop learning. ", doi="10.2196/30130", url="https://humanfactors.jmir.org/2022/1/e30130", url="http://www.ncbi.nlm.nih.gov/pubmed/35319469" } @Article{info:doi/10.2196/23844, author="Lim, Sahnah and Islam, S. Nadia", title="Small Practices, Big (QI) Dreams: Customizing Quality Improvement (QI) Efforts for Under-Resourced Primary Care Practices to Improve Diabetes Disparities", journal="JMIR Diabetes", year="2022", month="Mar", day="18", volume="7", number="1", pages="e23844", keywords="electronic health record", keywords="quality improvement", keywords="health equity", keywords="clinical practice guidelines", keywords="diabetes", abstract="International Registered Report Identifier (IRRID): RR2-10.1186/s13063-019-3711-y ", doi="10.2196/23844", url="https://diabetes.jmir.org/2022/1/e23844", url="http://www.ncbi.nlm.nih.gov/pubmed/35302500" } @Article{info:doi/10.2196/29019, author="Jones, K. Emma and Banks, Alyssa and Melton, B. Genevieve and Porta, M. Carolyn and Tignanelli, J. Christopher", title="Barriers to and Facilitators for Acceptance of Comprehensive Clinical Decision Support System--Driven Care Maps for Patients With Thoracic Trauma: Interview Study Among Health Care Providers and Nurses", journal="JMIR Hum Factors", year="2022", month="Mar", day="16", volume="9", number="1", pages="e29019", keywords="clinical decision support systems", keywords="rib fractures", keywords="trauma", keywords="Unified Theory of Acceptance and Use of Technology", keywords="human computer interaction", abstract="Background: Comprehensive clinical decision support (CDS) care maps can improve the delivery of care and clinical outcomes. However, they are frequently plagued by usability problems and poor user acceptance. Objective: This study aims to characterize factors influencing successful design and use of comprehensive CDS care maps and identify themes associated with end-user acceptance of a thoracic trauma CDS care map earlier in the process than has traditionally been done. This was a planned adaptive redesign stage of a User Acceptance and System Adaptation Design development and implementation strategy for a CDS care map. This stage was based on a previously developed prototype CDS care map guided by the Unified Theory of Acceptance and Use of Technology. Methods: A total of 22 multidisciplinary end users (physicians, advanced practice providers, and nurses) were identified and recruited using snowball sampling. Qualitative interviews were conducted, audio-recorded, and transcribed verbatim. Generation of prespecified codes and the interview guide was informed by the Unified Theory of Acceptance and Use of Technology constructs and investigative team experience. Interviews were blinded and double-coded. Thematic analysis of interview scripts was conducted and yielded descriptive themes about factors influencing the construction and potential use of an acceptable CDS care map. Results: A total of eight dominant themes were identified: alert fatigue (theme 1), automation (theme 2), redundancy (theme 3), minimalistic design (theme 4), evidence based (theme 5), prevent errors (theme 6), comprehensive across the spectrum of disease (theme 7), and malleability (theme 8). Themes 1 to 4 addressed factors directly affecting end users, and themes 5 to 8 addressed factors affecting patient outcomes. More experienced providers prioritized a system that is easy to use. Nurses prioritized a system that incorporated evidence into decision support. Clinicians across specialties, roles, and ages agreed that the amount of extra work generated should be minimal and that the system should help them administer optimal care efficiently. Conclusions: End user feedback reinforces attention toward factors that improve the acceptance and use of a CDS care map for patients with thoracic trauma. Common themes focused on system complexity, the ability of the system to fit different populations and settings, and optimal care provision. Identifying these factors early in the development and implementation process may facilitate user-centered design and improve adoption. ", doi="10.2196/29019", url="https://humanfactors.jmir.org/2022/1/e29019", url="http://www.ncbi.nlm.nih.gov/pubmed/35293873" } @Article{info:doi/10.2196/24680, author="Tamori, Honoka and Yamashina, Hiroko and Mukai, Masami and Morii, Yasuhiro and Suzuki, Teppei and Ogasawara, Katsuhiko", title="Acceptance of the Use of Artificial Intelligence in Medicine Among Japan's Doctors and the Public: A Questionnaire Survey", journal="JMIR Hum Factors", year="2022", month="Mar", day="16", volume="9", number="1", pages="e24680", keywords="artificial intelligence", keywords="technology acceptance", keywords="surveys and questionnaires", keywords="doctors vs public", abstract="Background: The use of artificial intelligence (AI) in the medical industry promises many benefits, so AI has been introduced to medical practice primarily in developed countries. In Japan, the government is preparing for the rollout of AI in the medical industry. This rollout depends on doctors and the public accepting the technology. Therefore it is necessary to consider acceptance among doctors and among the public. However, little is known about the acceptance of AI in medicine in Japan. Objective: This study aimed to obtain detailed data on the acceptance of AI in medicine by comparing the acceptance among Japanese doctors with that among the Japanese public. Methods: We conducted an online survey, and the responses of doctors and members of the public were compared. AI in medicine was defined as the use of AI to determine diagnosis and treatment without requiring a doctor. A questionnaire was prepared referred to as the unified theory of acceptance and use of technology, a model of behavior toward new technologies. It comprises 20 items, and each item was rated on a five-point scale. Using this questionnaire, we conducted an online survey in 2018 among 399 doctors and 600 members of the public. The sample-wide responses were analyzed, and then the responses of the doctors were compared with those of the public using t tests. Results: Regarding the sample-wide responses (N=999), 653 (65.4\%) of the respondents believed, in the future, AI in medicine would be necessary, whereas only 447 (44.7\%) expressed an intention to use AI-driven medicine. Additionally, 730 (73.1\%) believed that regulatory legislation was necessary, and 734 (73.5\%) were concerned about where accountability lies. Regarding the comparison between doctors and the public, doctors (mean 3.43, SD 1.00) were more likely than members of the public (mean 3.23, SD 0.92) to express intention to use AI-driven medicine (P<.001), suggesting that optimism about AI in medicine is greater among doctors compared to the public. Conclusions: Many of the respondents were optimistic about the role of AI in medicine. However, when asked whether they would like to use AI-driven medicine, they tended to give a negative response. This trend suggests that concerns about the lack of regulation and about accountability hindered acceptance. Additionally, the results revealed that doctors were more enthusiastic than members of the public regarding AI-driven medicine. For the successful implementation of AI in medicine, it would be necessary to inform the public and doctors about the relevant laws and to take measures to remove their concerns about them. ", doi="10.2196/24680", url="https://humanfactors.jmir.org/2022/1/e24680", url="http://www.ncbi.nlm.nih.gov/pubmed/35293878" } @Article{info:doi/10.2196/28880, author="Liao, JunHua and Liu, LunXin and Duan, HaiHan and Huang, YunZhi and Zhou, LiangXue and Chen, LiangYin and Wang, ChaoHua", title="Using a Convolutional Neural Network and Convolutional Long Short-term Memory to Automatically Detect Aneurysms on 2D Digital Subtraction Angiography Images: Framework Development and Validation", journal="JMIR Med Inform", year="2022", month="Mar", day="16", volume="10", number="3", pages="e28880", keywords="convolutional neural network", keywords="convolutional long short-term memory", keywords="cerebral aneurysm", keywords="deep learning", abstract="Background: It is hard to distinguish cerebral aneurysms from overlapping vessels in 2D digital subtraction angiography (DSA) images due to these images' lack of spatial information. Objective: The aims of this study were to (1) construct a deep learning diagnostic system to improve the ability to detect posterior communicating artery aneurysms on 2D DSA images and (2) validate the efficiency of the deep learning diagnostic system in 2D DSA aneurysm detection. Methods: We proposed a 2-stage detection system. First, we established the region localization stage to automatically locate specific detection regions of raw 2D DSA sequences. Second, in the intracranial aneurysm detection stage, we constructed a bi-input+RetinaNet+convolutional long short-term memory (C-LSTM) framework to compare its performance for aneurysm detection with that of 3 existing frameworks. Each of the frameworks had a 5-fold cross-validation scheme. The receiver operating characteristic curve, the area under the curve (AUC) value, mean average precision, sensitivity, specificity, and accuracy were used to assess the abilities of different frameworks. Results: A total of 255 patients with posterior communicating artery aneurysms and 20 patients without aneurysms were included in this study. The best AUC values of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks were 0.95, 0.96, 0.92, and 0.97, respectively. The mean sensitivities of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks and human experts were 89\% (range 67.02\%-98.43\%), 88\% (range 65.76\%-98.06\%), 87\% (range 64.53\%-97.66\%), 89\% (range 67.02\%-98.43\%), and 90\% (range 68.30\%-98.77\%), respectively. The mean specificities of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks and human experts were 80\% (range 56.34\%-94.27\%), 89\% (range 67.02\%-98.43\%), 86\% (range 63.31\%-97.24\%), 93\% (range 72.30\%-99.56\%), and 90\% (range 68.30\%-98.77\%), respectively. The mean accuracies of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks and human experts were 84.50\% (range 69.57\%-93.97\%), 88.50\% (range 74.44\%-96.39\%), 86.50\% (range 71.97\%-95.22\%), 91\% (range 77.63\%-97.72\%), and 90\% (range 76.34\%-97.21\%), respectively. Conclusions: According to our results, more spatial and temporal information can help improve the performance of the frameworks. Therefore, the bi-input+RetinaNet+C-LSTM framework had the best performance when compared to that of the other frameworks. Our study demonstrates that our system can assist physicians in detecting intracranial aneurysms on 2D DSA images. ", doi="10.2196/28880", url="https://medinform.jmir.org/2022/3/e28880", url="http://www.ncbi.nlm.nih.gov/pubmed/35294371" } @Article{info:doi/10.2196/26634, author="Zhang, Zheqing and Yang, Luqian and Han, Wentao and Wu, Yaoyu and Zhang, Linhui and Gao, Chun and Jiang, Kui and Liu, Yun and Wu, Huiqun", title="Machine Learning Prediction Models for Gestational Diabetes Mellitus: Meta-analysis", journal="J Med Internet Res", year="2022", month="Mar", day="16", volume="24", number="3", pages="e26634", keywords="digital health", keywords="gestational diabetes mellitus", keywords="machine learning", keywords="prediction model", keywords="prognostic model", abstract="Background: Gestational diabetes mellitus (GDM) is a common endocrine metabolic disease, involving a carbohydrate intolerance of variable severity during pregnancy. The incidence of GDM-related complications and adverse pregnancy outcomes has declined, in part, due to early screening. Machine learning (ML) models are increasingly used to identify risk factors and enable the early prediction of GDM. Objective: The aim of this study was to perform a meta-analysis and comparison of published prognostic models for predicting the risk of GDM and identify predictors applicable to the models. Methods: Four reliable electronic databases were searched for studies that developed ML prediction models for GDM in the general population instead of among high-risk groups only. The novel Prediction Model Risk of Bias Assessment Tool (PROBAST) was used to assess the risk of bias of the ML models. The Meta-DiSc software program (version 1.4) was used to perform the meta-analysis and determination of heterogeneity. To limit the influence of heterogeneity, we also performed sensitivity analyses, a meta-regression, and subgroup analysis. Results: A total of 25 studies that included women older than 18 years without a history of vital disease were analyzed. The pooled area under the receiver operating characteristic curve (AUROC) for ML models predicting GDM was 0.8492; the pooled sensitivity was 0.69 (95\% CI 0.68-0.69; P<.001; I2=99.6\%) and the pooled specificity was 0.75 (95\% CI 0.75-0.75; P<.001; I2=100\%). As one of the most commonly employed ML methods, logistic regression achieved an overall pooled AUROC of 0.8151, while non--logistic regression models performed better, with an overall pooled AUROC of 0.8891. Additionally, maternal age, family history of diabetes, BMI, and fasting blood glucose were the four most commonly used features of models established by the various feature selection methods. Conclusions: Compared to current screening strategies, ML methods are attractive for predicting GDM. To expand their use, the importance of quality assessments and unified diagnostic criteria should be further emphasized. ", doi="10.2196/26634", url="https://www.jmir.org/2022/3/e26634", url="http://www.ncbi.nlm.nih.gov/pubmed/35294369" } @Article{info:doi/10.2196/33357, author="Moghisi, Reihaneh and El Morr, Christo and Pace, T. Kenneth and Hajiha, Mohammad and Huang, Jimmy", title="A Machine Learning Approach to Predict the Outcome of Urinary Calculi Treatment Using Shock Wave Lithotripsy: Model Development and Validation Study", journal="Interact J Med Res", year="2022", month="Mar", day="16", volume="11", number="1", pages="e33357", keywords="lithotripsy", keywords="urolithiasis", keywords="machine learning", keywords="treatment outcome", keywords="ensemble learning", keywords="AdaBoost", keywords="renal stones", keywords="kidney disease", abstract="Background: Shock wave lithotripsy (SWL), ureteroscopy, and percutaneous nephrolithotomy are established treatments for renal stones. Historically, SWL has been a predominant and commonly used procedure for treating upper tract renal stones smaller than 20 mm in diameter due to its noninvasive nature. However, the reported failure rate of SWL after one treatment session ranges from 30\% to 89\%. The failure rate can be reduced by identifying candidates likely to benefit from SWL and manage patients who are likely to fail SWL with other treatment modalities. This would enhance and optimize treatment results for SWL candidates. Objective: We proposed to develop a machine learning model that can predict SWL outcomes to assist practitioners in the decision-making process when considering patients for stone treatment. Methods: A data set including 58,349 SWL procedures performed during 31,569 patient visits for SWL to a single hospital between 1990 and 2016 was used to construct and validate the predictive model. The AdaBoost algorithm was applied to a data set with 17 predictive attributes related to patient demographics and stone characteristics, with success or failure as an outcome. The AdaBoost algorithm was also applied to a training data set. The generated model's performance was compared to that of 5 other machine learning algorithms, namely C4.5 decision tree, na{\"i}ve Bayes, Bayesian network, K-nearest neighbors, and multilayer perceptron. Results: The developed model was validated with a testing data set and performed significantly better than the models generated by the other 5 predictive algorithms. The sensitivity and specificity of the model were 0.875 and 0.653, respectively, while its positive predictive value was 0.7159 and negative predictive value was 0.839. The C-statistics of the receiver operating characteristic (ROC) analysis was 0.843, which reflects an excellent test. Conclusions: ?We have developed a rigorous machine learning model to assist physicians and decision-makers to choose patients with renal stones who are most likely to have successful SWL treatment based on their demographics and stone characteristics. The proposed machine learning model can assist physicians and decision-makers in planning for SWL treatment and allow for more effective use of limited health care resources and improve patient prognoses. ", doi="10.2196/33357", url="https://www.i-jmr.org/2022/1/e33357", url="http://www.ncbi.nlm.nih.gov/pubmed/35293872" } @Article{info:doi/10.2196/30587, author="Nam, Borum and Kim, Young Joo and Kim, Young In and Cho, Hwan Baek", title="Selective Prediction With Long Short-term Memory Using Unit-Wise Batch Standardization for Time Series Health Data Sets: Algorithm Development and Validation", journal="JMIR Med Inform", year="2022", month="Mar", day="15", volume="10", number="3", pages="e30587", keywords="artificial intelligence", keywords="recurrent neural networks", keywords="biomedical informatics", keywords="computer-aided analysis", keywords="mobile phone", abstract="Background: In any health care system, both the classification of data and the confidence level of such classifications are important. Therefore, a selective prediction model is required to classify time series health data according to confidence levels of prediction. Objective: This study aims to develop a method using long short-term memory (LSTM) models with a reject option for time series health data classification. Methods: An existing selective prediction method was adopted to implement an option for rejecting a classification output in LSTM models. However, a conventional selection function approach to LSTM does not achieve acceptable performance during learning stages. To tackle this problem, we proposed a unit-wise batch standardization that attempts to normalize each hidden unit in LSTM to apply the structural characteristics of LSTM models that concern the selection function. Results: The ability of our method to approximate the target confidence level was compared by coverage violations for 2 time series of health data sets consisting of human activity and arrhythmia. For both data sets, our approach yielded lower average coverage violations (0.98\% and 1.79\% for each data set) than those of the conventional approach. In addition, the classification performance when using the reject option was compared with that of other normalization methods. Our method demonstrated superior performance for selective risk (12.63\% and 17.82\% for each data set), false-positive rates (2.09\% and 5.8\% for each data set), and false-negative rates (10.58\% and 17.24\% for each data set). Conclusions: Our normalization approach can help make selective predictions for time series health data. We expect this technique to enhance the confidence of users in classification systems and improve collaborative efforts between humans and artificial intelligence in the medical field through the use of classification that considers confidence. ", doi="10.2196/30587", url="https://medinform.jmir.org/2022/3/e30587", url="http://www.ncbi.nlm.nih.gov/pubmed/35289753" } @Article{info:doi/10.2196/33182, author="Lu, Sheng-Chieh and Xu, Cai and Nguyen, H. Chandler and Geng, Yimin and Pfob, Andr{\'e} and Sidey-Gibbons, Chris", title="Machine Learning--Based Short-Term Mortality Prediction Models for Patients With Cancer Using Electronic Health Record Data: Systematic Review and Critical Appraisal", journal="JMIR Med Inform", year="2022", month="Mar", day="14", volume="10", number="3", pages="e33182", keywords="machine learning", keywords="cancer mortality", keywords="artificial intelligence", keywords="clinical prediction models", keywords="end-of-life care", abstract="Background: In the United States, national guidelines suggest that aggressive cancer care should be avoided in the final months of life. However, guideline compliance currently requires clinicians to make judgments based on their experience as to when a patient is nearing the end of their life. Machine learning (ML) algorithms may facilitate improved end-of-life care provision for patients with cancer by identifying patients at risk of short-term mortality. Objective: This study aims to summarize the evidence for applying ML in ?1-year cancer mortality prediction to assist with the transition to end-of-life care for patients with cancer. Methods: We searched MEDLINE, Embase, Scopus, Web of Science, and IEEE to identify relevant articles. We included studies describing ML algorithms predicting ?1-year mortality in patients of oncology. We used the prediction model risk of bias assessment tool to assess the quality of the included studies. Results: We included 15 articles involving 110,058 patients in the final synthesis. Of the 15 studies, 12 (80\%) had a high or unclear risk of bias. The model performance was good: the area under the receiver operating characteristic curve ranged from 0.72 to 0.92. We identified common issues leading to biased models, including using a single performance metric, incomplete reporting of or inappropriate modeling practice, and small sample size. Conclusions: We found encouraging signs of ML performance in predicting short-term cancer mortality. Nevertheless, no included ML algorithms are suitable for clinical practice at the current stage because of the high risk of bias and uncertainty regarding real-world performance. Further research is needed to develop ML models using the modern standards of algorithm development and reporting. ", doi="10.2196/33182", url="https://medinform.jmir.org/2022/3/e33182", url="http://www.ncbi.nlm.nih.gov/pubmed/35285816" } @Article{info:doi/10.2196/32903, author="Humbert-Droz, Marie and Mukherjee, Pritam and Gevaert, Olivier", title="Strategies to Address the Lack of Labeled Data for Supervised Machine Learning Training With Electronic Health Records: Case Study for the Extraction of Symptoms From Clinical Notes", journal="JMIR Med Inform", year="2022", month="Mar", day="14", volume="10", number="3", pages="e32903", keywords="clinical text mining", keywords="weak supervision", keywords="text classification", keywords="symptom extraction", keywords="EHR", keywords="machine learning", keywords="natural language processing", abstract="Background: Automated extraction of symptoms from clinical notes is a challenging task owing to the multidimensional nature of symptom description. The availability of labeled training data is extremely limited owing to the nature of the data containing protected health information. Natural language processing and machine learning to process clinical text for such a task have great potential. However, supervised machine learning requires a great amount of labeled data to train a model, which is at the origin of the main bottleneck in model development. Objective: The aim of this study is to address the lack of labeled data by proposing 2 alternatives to manual labeling for the generation of training labels for supervised machine learning with English clinical text. We aim to demonstrate that using lower-quality labels for training leads to good classification results. Methods: We addressed the lack of labels with 2 strategies. The first approach took advantage of the structured part of electronic health records and used diagnosis codes (International Classification of Disease--10th revision) to derive training labels. The second approach used weak supervision and data programming principles to derive training labels. We propose to apply the developed framework to the extraction of symptom information from outpatient visit progress notes of patients with cardiovascular diseases. Results: We used >500,000 notes for training our classification model with International Classification of Disease--10th revision codes as labels and >800,000 notes for training using labels derived from weak supervision. We show that the dependence between prevalence and recall becomes flat provided a sufficiently large training set is used (>500,000 documents). We further demonstrate that using weak labels for training rather than the electronic health record codes derived from the patient encounter leads to an overall improved recall score (10\% improvement, on average). Finally, the external validation of our models shows excellent predictive performance and transferability, with an overall increase of 20\% in the recall score. Conclusions: This work demonstrates the power of using a weak labeling pipeline to annotate and extract symptom mentions in clinical text, with the prospects to facilitate symptom information integration for a downstream clinical task such as clinical decision support. ", doi="10.2196/32903", url="https://medinform.jmir.org/2022/3/e32903", url="http://www.ncbi.nlm.nih.gov/pubmed/35285805" } @Article{info:doi/10.2196/33006, author="Yang, Ting-Ya and Chien, Tsair-Wei and Lai, Feng-Jie", title="Web-Based Skin Cancer Assessment and Classification Using Machine Learning and Mobile Computerized Adaptive Testing in a Rasch Model: Development Study", journal="JMIR Med Inform", year="2022", month="Mar", day="9", volume="10", number="3", pages="e33006", keywords="skin cancer assessment", keywords="computerized adaptive testing", keywords="na{\"i}ve Bayes", keywords="k-nearest neighbors", keywords="logistic regression", keywords="Rasch partial credit model", keywords="receiver operating characteristic curve", keywords="mobile phone", abstract="Background: Web-based computerized adaptive testing (CAT) implementation of the skin cancer (SC) risk scale could substantially reduce participant burden without compromising measurement precision. However, the CAT of SC classification has not been reported in academics thus far. Objective: We aim to build a CAT-based model using machine learning to develop an app for automatic classification of SC to help patients assess the risk at an early stage. Methods: We extracted data from a population-based Australian cohort study of SC risk (N=43,794) using the Rasch simulation scheme. All 30 feature items were calibrated using the Rasch partial credit model. A total of 1000 cases following a normal distribution (mean 0, SD 1) based on the item and threshold difficulties were simulated using three techniques of machine learning---na{\"i}ve Bayes, k-nearest neighbors, and logistic regression---to compare the model accuracy in training and testing data sets with a proportion of 70:30, where the former was used to predict the latter. We calculated the sensitivity, specificity, receiver operating characteristic curve (area under the curve [AUC]), and CIs along with the accuracy and precision across the proposed models for comparison. An app that classifies the SC risk of the respondent was developed. Results: We observed that the 30-item k-nearest neighbors model yielded higher AUC values of 99\% and 91\% for the 700 training and 300 testing cases, respectively, than its 2 counterparts using the hold-out validation but had lower AUC values of 85\% (95\% CI 83\%-87\%) in the k-fold cross-validation and that an app that predicts SC classification for patients was successfully developed and demonstrated in this study. Conclusions: The 30-item SC prediction model, combined with the Rasch web-based CAT, is recommended for classifying SC in patients. An app we developed to help patients self-assess SC risk at an early stage is required for application in the future. ", doi="10.2196/33006", url="https://medinform.jmir.org/2022/3/e33006", url="http://www.ncbi.nlm.nih.gov/pubmed/35262505" } @Article{info:doi/10.2196/35768, author="Ma, Zhuo and Huang, Sijia and Wu, Xiaoqing and Huang, Yinying and Chan, Wai-Chi Sally and Lin, Yilan and Zheng, Xujuan and Zhu, Jiemin", title="Development of a Prognostic App (iCanPredict) to Predict Survival for Chinese Women With Breast Cancer: Retrospective Study", journal="J Med Internet Res", year="2022", month="Mar", day="9", volume="24", number="3", pages="e35768", keywords="app", keywords="breast cancer", keywords="survival prediction model", keywords="iCanPredict", abstract="Background: Accurate prediction of survival is crucial for both physicians and women with breast cancer to enable clinical decision making on appropriate treatments. The currently available survival prediction tools were developed based on demographic and clinical data obtained from specific populations and may underestimate or overestimate the survival of women with breast cancer in China. Objective: This study aims to develop and validate a prognostic app to predict the overall survival of women with breast cancer in China. Methods: Nine-year (January 2009-December 2017) clinical data of women with breast cancer who received surgery and adjuvant therapy from 2 hospitals in Xiamen were collected and matched against the death data from the Xiamen Center of Disease Control and Prevention. All samples were randomly divided (7:3 ratio) into a training set for model construction and a test set for model external validation. Multivariable Cox regression analysis was used to construct a survival prediction model. The model performance was evaluated by receiver operating characteristic (ROC) curve and Brier score. Finally, by running the survival prediction model in the app background thread, the prognostic app, called iCanPredict, was developed for women with breast cancer in China. Results: A total of 1592 samples were included for data analysis. The training set comprised 1114 individuals and the test set comprised 478 individuals. Age at diagnosis, clinical stage, molecular classification, operative type, axillary lymph node dissection, chemotherapy, and endocrine therapy were incorporated into the model, where age at diagnosis (hazard ratio [HR] 1.031, 95\% CI 1.011-1.051; P=.002), clinical stage (HR 3.044, 95\% CI 2.347-3.928; P<.001), and endocrine therapy (HR 0.592, 95\% CI 0.384-0.914; P=.02) significantly influenced the survival of women with breast cancer. The operative type (P=.81) and the other 4 variables (molecular classification [P=.91], breast reconstruction [P=.36], axillary lymph node dissection [P=.32], and chemotherapy [P=.84]) were not significant. The ROC curve of the training set showed that the model exhibited good discrimination for predicting 1- (area under the curve [AUC] 0.802, 95\% CI 0.713-0.892), 5- (AUC 0.813, 95\% CI 0.760-0.865), and 10-year (AUC 0.740, 95\% CI 0.672-0.808) overall survival. The Brier scores at 1, 5, and 10 years after diagnosis were 0.005, 0.055, and 0.103 in the training set, respectively, and were less than 0.25, indicating good predictive ability. The test set externally validated model discrimination and calibration. In the iCanPredict app, when physicians or women input women's clinical information and their choice of surgery and adjuvant therapy, the corresponding 10-year survival prediction will be presented. Conclusions: This survival prediction model provided good model discrimination and calibration. iCanPredict is the first tool of its kind in China to provide survival predictions to women with breast cancer. iCanPredict will increase women's awareness of the similar survival rate of different surgeries and the importance of adherence to endocrine therapy, ultimately helping women to make informed decisions regarding treatment for breast cancer. ", doi="10.2196/35768", url="https://www.jmir.org/2022/3/e35768", url="http://www.ncbi.nlm.nih.gov/pubmed/35262503" } @Article{info:doi/10.2196/32800, author="Tajirian, Tania and Jankowicz, Damian and Lo, Brian and Sequeira, Lydia and Strudwick, Gillian and Almilaji, Khaled and Stergiopoulos, Vicky", title="Tackling the Burden of Electronic Health Record Use Among Physicians in a Mental Health Setting: Physician Engagement Strategy", journal="J Med Internet Res", year="2022", month="Mar", day="8", volume="24", number="3", pages="e32800", keywords="burnout", keywords="organizational strategy", keywords="electronic health record use", keywords="clinical informatics", keywords="medical informatics", doi="10.2196/32800", url="https://www.jmir.org/2022/3/e32800", url="http://www.ncbi.nlm.nih.gov/pubmed/35258473" } @Article{info:doi/10.2196/32313, author="Choi, Eun-Ji and Jun, Joon Tae and Park, Han-Seung and Lee, Jung-Hee and Lee, Kyoo-Hyung and Kim, Young-Hak and Lee, Young-Shin and Kang, Young-Ah and Jeon, Mijin and Kang, Hyeran and Woo, Jimin and Lee, Je-Hwan", title="Predicting Long-term Survival After Allogeneic Hematopoietic Cell Transplantation in Patients With Hematologic Malignancies: Machine Learning--Based Model Development and Validation", journal="JMIR Med Inform", year="2022", month="Mar", day="7", volume="10", number="3", pages="e32313", keywords="machine learning", keywords="hematopoietic cell transplantation", keywords="hematologic malignancies", keywords="prediction", keywords="survival", keywords="stem cell", keywords="transplant", keywords="malignancy", keywords="model", keywords="outcome", keywords="algorithm", keywords="bias", keywords="validation", abstract="Background: Scoring systems developed for predicting survival after allogeneic hematopoietic cell transplantation (HCT) show suboptimal prediction power, and various factors affect posttransplantation outcomes. Objective: A prediction model using a machine learning--based algorithm can be an alternative for concurrently applying multiple variables and can reduce potential biases. In this regard, the aim of this study is to establish and validate a machine learning--based predictive model for survival after allogeneic HCT in patients with hematologic malignancies. Methods: Data from 1470 patients with hematologic malignancies who underwent allogeneic HCT between December 1993 and June 2020 at Asan Medical Center, Seoul, South Korea, were retrospectively analyzed. Using the gradient boosting machine algorithm, we evaluated a model predicting the 5-year posttransplantation survival through 10-fold cross-validation. Results: The prediction model showed good performance with a mean area under the receiver operating characteristic curve of 0.788 (SD 0.03). Furthermore, we developed a risk score predicting probabilities of posttransplantation survival in 294 randomly selected patients, and an agreement between the estimated predicted and observed risks of overall death, nonrelapse mortality, and relapse incidence was observed according to the risk score. Additionally, the calculated score demonstrated the possibility of predicting survival according to the different transplantation-related factors, with the visualization of the importance of each variable. Conclusions: We developed a machine learning--based model for predicting long-term survival after allogeneic HCT in patients with hematologic malignancies. Our model provides a method for making decisions regarding patient and donor candidates or selecting transplantation-related resources, such as conditioning regimens. ", doi="10.2196/32313", url="https://medinform.jmir.org/2022/3/e32313", url="http://www.ncbi.nlm.nih.gov/pubmed/35254275" } @Article{info:doi/10.2196/37419, author="Stausberg, J{\"u}rgen and Uslu, Aykut", title="Authors' Reply to: Interpretation Bias Toward the Positive Impacts of Digital Interventions in Health Care. Comment on ``Value of the Electronic Medical Record for Hospital Care: Update From the Literature''", journal="J Med Internet Res", year="2022", month="Mar", day="4", volume="24", number="3", pages="e37419", keywords="cost analysis", keywords="costs and cost analyses", keywords="economic advantage", keywords="electronic medical records", keywords="electronic records", keywords="health care", keywords="hospitals", keywords="computerized medical records systems", keywords="quality of health care", keywords="secondary data", doi="10.2196/37419", url="https://www.jmir.org/2022/3/e37419", url="http://www.ncbi.nlm.nih.gov/pubmed/35254272" } @Article{info:doi/10.2196/37208, author="Shakibaei Bonakdeh, Erfan", title="Interpretation Bias Toward the Positive Impacts of Digital Interventions in Health Care. Comment on ``Value of the Electronic Medical Record for Hospital Care: Update From the Literature''", journal="J Med Internet Res", year="2022", month="Mar", day="4", volume="24", number="3", pages="e37208", keywords="cost analysis", keywords="costs and cost analyses", keywords="economic advantage", keywords="electronic medical records", keywords="electronic records", keywords="health care", keywords="hospitals", keywords="computerized medical records system", keywords="quality of health care", keywords="secondary data", doi="10.2196/37208", url="https://www.jmir.org/2022/3/e37208", url="http://www.ncbi.nlm.nih.gov/pubmed/35254276" } @Article{info:doi/10.2196/33310, author="Van Cleve, Raymond and Edmond, Sara and Snow, Jennifer and Black, C. Anne and Pomeranz, L. Jamie and Becker, William", title="Classification of Patients for Whom Benefit of Long-term Opioid Therapy No Longer Outweighs Harm: Protocol for a Delphi Study", journal="JMIR Res Protoc", year="2022", month="Mar", day="4", volume="11", number="3", pages="e33310", keywords="modified Delphi technique", keywords="long-term opioid treatment", keywords="chronic pain", keywords="opioid therapy", keywords="opioids", keywords="pain management", keywords="Delphi study", abstract="Background: Patients with chronic pain prescribed long-term opioid therapy may come to a point where the benefits of the therapy are outweighed by the risks and tapering is indicated. At the 2019 Veterans Health Administration State of the Art Conference, there was an acknowledgment of a lack of clinical guidance with regard to treating this subset of patients. Some of the participants believed clinicians and patients would both benefit from a new diagnostic entity describing this situation. Objective: The aim of this study was to determine if a new diagnostic entity was needed and what the criteria of the diagnostic entity would be. Given the ability of the Delphi method to synthesize input from a broad range of experts, we felt this technique was the most appropriate for this study. Methods: We designed a modified Delphi technique involving 3 rounds. The first round is a series of open-ended questions asking about the necessity of this diagnostic entity, how this condition is different from opioid use disorder, and what its possible diagnostic criteria would be. After synthesizing the responses collected, a second round will be conducted to ask participants to rate the different responses offered by their peers. These ratings will be collected and analyzed, and will generate a preliminary definition for this clinical phenomena. In the third round, we will circulate this definition with the aim of achieving consensus. Results: The modified Delphi study was initiated in July of 2020 and analysis is currently underway. Conclusions: This protocol has been approved by the Internal Review Board at the Connecticut Veterans Affairs and the study is in process. This protocol may assist other researchers conducting similar studies. International Registered Report Identifier (IRRID): DERR1-10.2196/33310 ", doi="10.2196/33310", url="https://www.researchprotocols.org/2022/3/e33310", url="http://www.ncbi.nlm.nih.gov/pubmed/35254277" } @Article{info:doi/10.2196/31760, author="Pappy, George and Aczon, Melissa and Wetzel, Randall and Ledbetter, David", title="Predicting High Flow Nasal Cannula Failure in an Intensive Care Unit Using a Recurrent Neural Network With Transfer Learning and Input Data Perseveration: Retrospective Analysis", journal="JMIR Med Inform", year="2022", month="Mar", day="3", volume="10", number="3", pages="e31760", keywords="high flow nasal cannula", keywords="HFNC failure", keywords="predictive model", keywords="deep learning", keywords="transfer learning", keywords="LSTM", keywords="RNN", keywords="input data perseveration", abstract="Background: High flow nasal cannula (HFNC) provides noninvasive respiratory support for children who are critically ill who may tolerate it more readily than other noninvasive ventilation (NIV) techniques such as bilevel positive airway pressure and continuous positive airway pressure. Moreover, HFNC may preclude the need for mechanical ventilation (intubation). Nevertheless, NIV or intubation may ultimately be necessary for certain patients. Timely prediction of HFNC failure can provide an indication for increasing respiratory support. Objective: The aim of this study is to develop and compare machine learning (ML) models to predict HFNC failure. Methods: A retrospective study was conducted using the Virtual Pediatric Intensive Care Unit database of electronic medical records of patients admitted to a tertiary pediatric intensive care unit between January 2010 and February 2020. Patients aged <19 years, without apnea, and receiving HFNC treatment were included. A long short-term memory (LSTM) model using 517 variables (vital signs, laboratory data, and other clinical parameters) was trained to generate a continuous prediction of HFNC failure, defined as escalation to NIV or intubation within 24 hours of HFNC initiation. For comparison, 7 other models were trained: a logistic regression (LR) using the same 517 variables, another LR using only 14 variables, and 5 additional LSTM-based models using the same 517 variables as the first LSTM model and incorporating additional ML techniques (transfer learning, input perseveration, and ensembling). Performance was assessed using the area under the receiver operating characteristic (AUROC) curve at various times following HFNC initiation. The sensitivity, specificity, and positive and negative predictive values of predictions at 2 hours after HFNC initiation were also evaluated. These metrics were also computed for a cohort with primarily respiratory diagnoses. Results: A total of 834 HFNC trials (455 [54.6\%] training, 173 [20.7\%] validation, and 206 [24.7\%] test) met the inclusion criteria, of which 175 (21\%; training: 103/455, 22.6\%; validation: 30/173, 17.3\%; test: 42/206, 20.4\%) escalated to NIV or intubation. The LSTM models trained with transfer learning generally performed better than the LR models, with the best LSTM model achieving an AUROC of 0.78 versus 0.66 for the 14-variable LR and 0.71 for the 517-variable LR 2 hours after initiation. All models except for the 14-variable LR achieved higher AUROCs in the respiratory cohort than in the general intensive care unit population. Conclusions: ML models trained using electronic medical record data were able to identify children at risk of HFNC failure within 24 hours of initiation. LSTM models that incorporated transfer learning, input data perseveration, and ensembling showed improved performance compared with the LR and standard LSTM models. ", doi="10.2196/31760", url="https://medinform.jmir.org/2022/3/e31760", url="http://www.ncbi.nlm.nih.gov/pubmed/35238792" } @Article{info:doi/10.2196/30104, author="Ip, Wui and Prahalad, Priya and Palma, Jonathan and Chen, H. Jonathan", title="A Data-Driven Algorithm to Recommend Initial Clinical Workup for Outpatient Specialty Referral: Algorithm Development and Validation Using Electronic Health Record Data and Expert Surveys", journal="JMIR Med Inform", year="2022", month="Mar", day="3", volume="10", number="3", pages="e30104", keywords="recommender system", keywords="electronic health records", keywords="clinical decision support", keywords="specialty consultation", keywords="machine learning", keywords="EHR", keywords="algorithm", keywords="algorithm development", keywords="algorithm validation", keywords="automation", keywords="prediction", keywords="patient needs", abstract="Background: Millions of people have limited access to specialty care. The problem is exacerbated by ineffective specialty visits due to incomplete prereferral workup, leading to delays in diagnosis and treatment. Existing processes to guide prereferral diagnostic workup are labor-intensive (ie, building a consensus guideline between primary care doctors and specialists) and require the availability of the specialists (ie, electronic consultation). Objective: Using pediatric endocrinology as an example, we develop a recommender algorithm to anticipate patients' initial workup needs at the time of specialty referral and compare it to a reference benchmark using the most common workup orders. We also evaluate the clinical appropriateness of the algorithm recommendations. Methods: Electronic health record data were extracted from 3424 pediatric patients with new outpatient endocrinology referrals at an academic institution from 2015 to 2020. Using item co-occurrence statistics, we predicted the initial workup orders that would be entered by specialists and assessed the recommender's performance in a holdout data set based on what the specialists actually ordered. We surveyed endocrinologists to assess the clinical appropriateness of the predicted orders and to understand the initial workup process. Results: Specialists (n=12) indicated that <50\% of new patient referrals arrive with complete initial workup for common referral reasons. The algorithm achieved an area under the receiver operating characteristic curve of 0.95 (95\% CI 0.95-0.96). Compared to a reference benchmark using the most common orders, precision and recall improved from 37\% to 48\% (P<.001) and from 27\% to 39\% (P<.001) for the top 4 recommendations, respectively. The top 4 recommendations generated for common referral conditions (abnormal thyroid studies, obesity, amenorrhea) were considered clinically appropriate the majority of the time by specialists surveyed and practice guidelines reviewed. Conclusions: ?An item association--based recommender algorithm can predict appropriate specialists' workup orders with high discriminatory accuracy. This could support future clinical decision support tools to increase effectiveness and access to specialty referrals. Our study demonstrates important first steps toward a data-driven paradigm for outpatient specialty consultation with a tier of automated recommendations that proactively enable initial workup that would otherwise be delayed by awaiting an in-person visit. ", doi="10.2196/30104", url="https://medinform.jmir.org/2022/3/e30104", url="http://www.ncbi.nlm.nih.gov/pubmed/35238788" } @Article{info:doi/10.2196/30956, author="Weaver, Wyllie Colin George and Basmadjian, B. Robert and Williamson, Tyler and McBrien, Kerry and Sajobi, Tolu and Boyne, Devon and Yusuf, Mohamed and Ronksley, Everett Paul", title="Reporting of Model Performance and Statistical Methods in Studies That Use Machine Learning to Develop Clinical Prediction Models: Protocol for a Systematic Review", journal="JMIR Res Protoc", year="2022", month="Mar", day="3", volume="11", number="3", pages="e30956", keywords="machine learning", keywords="clinical prediction", keywords="research reporting", keywords="statistics", keywords="research methods", keywords="clinical prediction models", keywords="artificial intelligence", keywords="modeling", keywords="eHealth", keywords="digital medicine", keywords="prediction", abstract="Background: With the growing excitement of the potential benefits of using machine learning and artificial intelligence in medicine, the number of published clinical prediction models that use these approaches has increased. However, there is evidence (albeit limited) that suggests that the reporting of machine learning--specific aspects in these studies is poor. Further, there are no reviews assessing the reporting quality or broadly accepted reporting guidelines for these aspects. Objective: This paper presents the protocol for a systematic review that will assess the reporting quality of machine learning--specific aspects in studies that use machine learning to develop clinical prediction models. Methods: We will include studies that use a supervised machine learning algorithm to develop a prediction model for use in clinical practice (ie, for diagnosis or prognosis of a condition or identification of candidates for health care interventions). We will search MEDLINE for studies published in 2019, pseudorandomly sort the records, and screen until we obtain 100 studies that meet our inclusion criteria. We will assess reporting quality with a novel checklist developed in parallel with this review, which includes content derived from existing reporting guidelines, textbooks, and consultations with experts. The checklist will cover 4 key areas where the reporting of machine learning studies is unique: modelling steps (order and data used for each step), model performance (eg, reporting the performance of each model compared), statistical methods (eg, describing the tuning approach), and presentation of models (eg, specifying the predictors that contributed to the final model). Results: We completed data analysis in August 2021 and are writing the manuscript. We expect to submit the results to a peer-reviewed journal in early 2022. Conclusions: This review will contribute to more standardized and complete reporting in the field by identifying areas where reporting is poor and can be improved. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42020206167; https://www.crd.york.ac.uk/PROSPERO/display\_record.php?RecordID=206167 International Registered Report Identifier (IRRID): RR1-10.2196/30956 ", doi="10.2196/30956", url="https://www.researchprotocols.org/2022/3/e30956", url="http://www.ncbi.nlm.nih.gov/pubmed/35238322" } @Article{info:doi/10.2196/33026, author="Tian, Simiao and Bi, Mei and Bi, Yanhong and Che, Xiaoyu and Liu, Yazhuo", title="A Bayesian Network Analysis of the Probabilistic Relationships Between Various Obesity Phenotypes and Cardiovascular Disease Risk in Chinese Adults: Chinese Population-Based Observational Study", journal="JMIR Med Inform", year="2022", month="Mar", day="2", volume="10", number="3", pages="e33026", keywords="Bayesian network", keywords="metabolic health", keywords="obesity", keywords="cardiovascular disease risk", abstract="Background: Cardiovascular disease (CVD) risk among individuals with different BMI levels might depend on their metabolic health. The extent to which metabolic health status and BMI affect CVD risk, either directly or through a mediator, in the Chinese population remains unclear. Objective: In this study, the Bayesian network (BN) perspective is adopted to characterize the multivariable probabilistic connections between CVD risk and metabolic health and obesity status and identify potential factors that influence these relationships among Chinese adults. Methods: The study population comprised 6276 Chinese adults aged 30 to 74 years who participated in the China Health and Nutrition Survey 2009. BMI was used to categorize participants as normal weight, overweight, or obese, and metabolic health was defined by the Adult Treatment Panel-3 criteria. Participants were categorized into 6 phenotypes according to their metabolic health and BMI categorization. The 10-year risk of CVD was determined using the Framingham Risk Score. BN modeling was used to identify the network structure of the variables and compute the conditional probability of CVD risk for the different metabolic obesity phenotypes with the given structure. Results: Of 6276 participants, 64.67\% (n=4059), 20.37\% (n=1279), and 14.95\% (n=938) had a low, moderate, and high 10-year CVD risk. An averaged BN with a stable network structure was constructed by learning 300 bootstrapped networks from the data. Using BN reasoning, the conditional probability of high CVD risk increased as age progressed. The conditional probability of high CVD risk was 0.43\% (95\% CI 0.2\%-0.87\%) for the 30 to 40 years age group, 2.25\% (95\% CI 1.75\%-2.88\%) for the 40 to 50 years age group, 16.13\% (95\% CI 14.86\%-17.5\%) for the 50 to 60 years age group, and 52.02\% (95\% CI 47.62\%-56.38\%) for those aged ?70 years. When metabolic health and BMI categories were instantiated to their different statuses, the conditional probability of high CVD risk increased from 7.01\% (95\% CI 6.27\%-7.83\%) for participants who were metabolically healthy normal weight to 10.47\% (95\% CI 7.63\%-14.18\%) for their metabolically healthy obese (MHO) counterparts and up to 21.74\% and 34.48\% among participants who were metabolically unhealthy normal weight and metabolically unhealthy obese (MUO), respectively. Sex was a significant modifier of the conditional probability distribution of metabolic obesity phenotypes and high CVD risk, with a conditional probability of high CVD risk of only 2.02\% and 22.7\% among MHO and MUO women, respectively, compared with 21.92\% and 48.21\% for their male MHO and MUO counterparts, respectively. Conclusions: BN modeling was applied to investigate the relationship between CVD risk and metabolic health and obesity phenotypes in Chinese adults. The results suggest that both metabolic health and obesity status are important for CVD prevention; closer attention should be paid to BMI and metabolic status changes over time. ", doi="10.2196/33026", url="https://medinform.jmir.org/2022/3/e33026", url="http://www.ncbi.nlm.nih.gov/pubmed/35234651" } @Article{info:doi/10.2196/31615, author="Seedahmed, I. Mohamed and Mogilnicka, Izabella and Zeng, Siyang and Luo, Gang and Whooley, A. Mary and McCulloch, E. Charles and Koth, Laura and Arjomandi, Mehrdad", title="Performance of a Computational Phenotyping Algorithm for Sarcoidosis Using Diagnostic Codes in Electronic Medical Records: Case Validation Study From 2 Veterans Affairs Medical Centers", journal="JMIR Form Res", year="2022", month="Mar", day="2", volume="6", number="3", pages="e31615", keywords="sarcoidosis", keywords="electronic medical records", keywords="EMRs", keywords="computational phenotype", keywords="diagnostic codes", keywords="Veterans Affairs", keywords="VA", keywords="practice guidelines", abstract="Background: Electronic medical records (EMRs) offer the promise of computationally identifying sarcoidosis cases. However, the accuracy of identifying these cases in the EMR is unknown. Objective: The aim of this study is to determine the statistical performance of using the International Classification of Diseases (ICD) diagnostic codes to identify patients with sarcoidosis in the EMR. Methods: We used the ICD diagnostic codes to identify sarcoidosis cases by searching the EMRs of the San Francisco and Palo Alto Veterans Affairs medical centers and randomly selecting 200 patients. To improve the diagnostic accuracy of the computational algorithm in cases where histopathological data are unavailable, we developed an index of suspicion to identify cases with a high index of suspicion for sarcoidosis (confirmed and probable) based on clinical and radiographic features alone using the American Thoracic Society practice guideline. Through medical record review, we determined the positive predictive value (PPV) of diagnosing sarcoidosis by two computational methods: using ICD codes alone and using ICD codes plus the high index of suspicion. Results: Among the 200 patients, 158 (79\%) had a high index of suspicion for sarcoidosis. Of these 158 patients, 142 (89.9\%) had documentation of nonnecrotizing granuloma, confirming biopsy-proven sarcoidosis. The PPV of using ICD codes alone was 79\% (95\% CI 78.6\%-80.5\%) for identifying sarcoidosis cases and 71\% (95\% CI 64.7\%-77.3\%) for identifying histopathologically confirmed sarcoidosis in the EMRs. The inclusion of the generated high index of suspicion to identify confirmed sarcoidosis cases increased the PPV significantly to 100\% (95\% CI 96.5\%-100\%). Histopathology documentation alone was 90\% sensitive compared with high index of suspicion. Conclusions: ICD codes are reasonable classifiers for identifying sarcoidosis cases within EMRs with a PPV of 79\%. Using a computational algorithm to capture index of suspicion data elements could significantly improve the case-identification accuracy. ", doi="10.2196/31615", url="https://formative.jmir.org/2022/3/e31615", url="http://www.ncbi.nlm.nih.gov/pubmed/35081036" } @Article{info:doi/10.2196/30883, author="Zhang, Zhan and Joy, Karen and Harris, Richard and Ozkaynak, Mustafa and Adelgais, Kathleen and Munjal, Kevin", title="Applications and User Perceptions of Smart Glasses in Emergency Medical Services: Semistructured Interview Study", journal="JMIR Hum Factors", year="2022", month="Feb", day="28", volume="9", number="1", pages="e30883", keywords="smart glasses", keywords="hands-free technologies", keywords="emergency medical services", keywords="user studies", keywords="mobile phone", abstract="Background: Smart glasses have been gaining momentum as a novel technology because of their advantages in enabling hands-free operation and see-what-I-see remote consultation. Researchers have primarily evaluated this technology in hospital settings; however, limited research has investigated its application in prehospital operations. Objective: The aim of this study is to understand the potential of smart glasses to support the work practices of prehospital providers, such as emergency medical services (EMS) personnel. Methods: We conducted semistructured interviews with 13 EMS providers recruited from 4 hospital-based EMS agencies in an urban area in the east coast region of the United States. The interview questions covered EMS workflow, challenges encountered, technology needs, and users' perceptions of smart glasses in supporting daily EMS work. During the interviews, we demonstrated a system prototype to elicit more accurate and comprehensive insights regarding smart glasses. Interviews were transcribed verbatim and analyzed using the open coding technique. Results: We identified four potential application areas for smart glasses in EMS: enhancing teleconsultation between distributed prehospital and hospital providers, semiautomating patient data collection and documentation in real time, supporting decision-making and situation awareness, and augmenting quality assurance and training. Compared with the built-in touch pad, voice commands and hand gestures were indicated as the most preferred and suitable interaction mechanisms. EMS providers expressed positive attitudes toward using smart glasses during prehospital encounters. However, several potential barriers and user concerns need to be considered and addressed before implementing and deploying smart glasses in EMS practice. They are related to hardware limitations, human factors, reliability, workflow, interoperability, and privacy. Conclusions: Smart glasses can be a suitable technological means for supporting EMS work. We conclude this paper by discussing several design considerations for realizing the full potential of this hands-free technology. ", doi="10.2196/30883", url="https://humanfactors.jmir.org/2022/1/e30883", url="http://www.ncbi.nlm.nih.gov/pubmed/35225816" } @Article{info:doi/10.2196/33043, author="Zeng, Siyang and Arjomandi, Mehrdad and Luo, Gang", title="Automatically Explaining Machine Learning Predictions on Severe Chronic Obstructive Pulmonary Disease Exacerbations: Retrospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Feb", day="25", volume="10", number="2", pages="e33043", keywords="chronic obstructive pulmonary disease", keywords="forecasting", keywords="machine learning", keywords="patient care management", abstract="Background: Chronic obstructive pulmonary disease (COPD) is a major cause of death and places a heavy burden on health care. To optimize the allocation of precious preventive care management resources and improve the outcomes for high-risk patients with COPD, we recently built the most accurate model to date to predict severe COPD exacerbations, which need inpatient stays or emergency department visits, in the following 12 months. Our model is a machine learning model. As is the case with most machine learning models, our model does not explain its predictions, forming a barrier for clinical use. Previously, we designed a method to automatically provide rule-type explanations for machine learning predictions and suggest tailored interventions with no loss of model performance. This method has been tested before for asthma outcome prediction but not for COPD outcome prediction. Objective: This study aims to assess the generalizability of our automatic explanation method for predicting severe COPD exacerbations. Methods: The patient cohort included all patients with COPD who visited the University of Washington Medicine facilities between 2011 and 2019. In a secondary analysis of 43,576 data instances, we used our formerly developed automatic explanation method to automatically explain our model's predictions and suggest tailored interventions. Results: Our method explained the predictions for 97.1\% (100/103) of the patients with COPD whom our model correctly predicted to have severe COPD exacerbations in the following 12 months and the predictions for 73.6\% (134/182) of the patients with COPD who had ?1 severe COPD exacerbation in the following 12 months. Conclusions: Our automatic explanation method worked well for predicting severe COPD exacerbations. After further improving our method, we hope to use it to facilitate future clinical use of our model. International Registered Report Identifier (IRRID): RR2-10.2196/13783 ", doi="10.2196/33043", url="https://medinform.jmir.org/2022/2/e33043", url="http://www.ncbi.nlm.nih.gov/pubmed/35212634" } @Article{info:doi/10.2196/29124, author="Hibler, A. Elizabeth and Fought, J. Angela and Kershaw, N. Kiarri and Molsberry, Rebecca and Nowakowski, Virginia and Lindner, Deborah", title="Novel Interactive Tool for Breast and Ovarian Cancer Risk Assessment (Bright Pink Assess Your Risk): Development and Usability Study", journal="J Med Internet Res", year="2022", month="Feb", day="24", volume="24", number="2", pages="e29124", keywords="breast cancer", keywords="ovarian cancer", keywords="risk assessment", keywords="genetic testing", abstract="Background: The lifetime risk of breast and ovarian cancer is significantly higher among women with genetic susceptibility or a strong family history. However, current risk assessment tools and clinical practices may identify only 10\% of asymptomatic carriers of susceptibility genes. Bright Pink developed the Assess Your Risk (AYR) tool to estimate breast and ovarian cancer risk through a user-friendly, informative web-based quiz for risk assessment at the population level. Objective: This study aims to present the AYR tool, describe AYR users, and present evidence that AYR works as expected by comparing classification using the AYR tool with gold standard genetic testing guidelines. Methods: The AYR is a recently developed population-level risk assessment tool that includes 26 questions based on the National Comprehensive Cancer Network (NCCN) guidelines and factors from other commonly used risk assessment tools. We included all women who completed the AYR between November 2018 and January 2019, with the exception of self-reported cancer or no knowledge of family history. We compared AYR classifications with those that were independently created using NCCN criteria using measures of validity and the McNemar test. Results: There were 143,657 AYR completions, and most participants were either at increased or average risk for breast cancer or ovarian cancer (137,315/143,657, 95.59\%). Using our estimates of increased and average risk as the gold standard, based on the NCCN guidelines, we estimated the sensitivity and specificity for the AYR algorithm--generated risk categories as 100\% and 89.9\%, respectively (P<.001). The specificity improved when we considered the additional questions asked by the AYR to define increased risk, which were not examined by the NCCN criteria. By race, ethnicity, and age group; we found that the lowest observed specificity was for the Asian race (85.9\%) and the 30 to 39 years age group (87.6\%) for the AYR-generated categories compared with the NCCN criteria. Conclusions: These results demonstrate that Bright Pink's AYR is an accurate tool for use by the general population to identify women at increased risk of breast and ovarian cancer. We plan to validate the tool longitudinally in future studies, including the impact of race, ethnicity, and age on breast and ovarian cancer risk assessment. ", doi="10.2196/29124", url="https://www.jmir.org/2022/2/e29124", url="http://www.ncbi.nlm.nih.gov/pubmed/35200148" } @Article{info:doi/10.2196/29803, author="Schwartz, L. Jessica and Tseng, Eva and Maruthur, M. Nisa and Rouhizadeh, Masoud", title="Identification of Prediabetes Discussions in Unstructured Clinical Documentation: Validation of a Natural Language Processing Algorithm", journal="JMIR Med Inform", year="2022", month="Feb", day="24", volume="10", number="2", pages="e29803", keywords="prediabetes", keywords="prediabetes discussions", keywords="prediabetes management", keywords="chronic disease management", keywords="physician-patient communication", keywords="natural language processing", keywords="machine learning", abstract="Background: Prediabetes affects 1 in 3 US adults. Most are not receiving evidence-based interventions, so understanding how providers discuss prediabetes with patients will inform how to improve their care. Objective: This study aimed to develop a natural language processing (NLP) algorithm using machine learning techniques to identify discussions of prediabetes in narrative documentation. Methods: We developed and applied a keyword search strategy to identify discussions of prediabetes in clinical documentation for patients with prediabetes. We manually reviewed matching notes to determine which represented actual prediabetes discussions. We applied 7 machine learning models against our manual annotation. Results: Machine learning classifiers were able to achieve classification results that were close to human performance with up to 98\% precision and recall to identify prediabetes discussions in clinical documentation. Conclusions: We demonstrated that prediabetes discussions can be accurately identified using an NLP algorithm. This approach can be used to understand and identify prediabetes management practices in primary care, thereby informing interventions to improve guideline-concordant care. ", doi="10.2196/29803", url="https://medinform.jmir.org/2022/2/e29803", url="http://www.ncbi.nlm.nih.gov/pubmed/35200154" } @Article{info:doi/10.2196/34392, author="Shah, K. Sumit and McElfish, A. Pearl", title="Cancer Screening Recommendations During the COVID-19 Pandemic: Scoping Review", journal="JMIR Cancer", year="2022", month="Feb", day="24", volume="8", number="1", pages="e34392", keywords="COVID-19", keywords="cancer prevention and early detection", keywords="cancer screenings", keywords="breast cancer screening", keywords="cervical cancer screening", keywords="colorectal cancer screening", abstract="Background: Cancer screening tests are recommended to prevent cancer-associated mortality by detecting precancerous and cancerous lesions in early stages. The COVID-19 pandemic disrupted the use of preventive health care services. Although there was an increase in the number of cancer screening tests beginning in late 2020, screenings remained 29\% to 36\% lower than in the prepandemic era. Objective: The aim of this review is to assist health care providers in identifying approaches for prioritizing patients and increasing breast, cervical, and colorectal cancer screening during the uncertainty of the COVID-19 pandemic. Methods: We used the scoping review framework to identify articles on PubMed and EBSCO databases. A total of 403 articles were identified, and 23 articles were selected for this review. The literature review ranged from January 1, 2020, to September 30, 2021. Results: The articles included two primary categories of recommendations: (1) risk stratification and triage to prioritize screenings and (2) alternative methods to conduct cancer screenings. Risk stratification and triage recommendations focused on prioritizing high-risk patients with an abnormal or suspicious result on the previous screening test, patients in certain age groups and sex, patients with a personal medical or family cancer history, patients that are currently symptomatic, and patients that are predisposed to hereditary cancers and cancer-causing mutations. Other recommended strategies included identifying areas facing the most disparities, creating algorithms and using artificial intelligence to create cancer risk scores, leveraging in-person visits to assess cancer risk, and providing the option of open access screenings where patients can schedule screenings and can be assigned a priority category by health care staff. Some recommended using telemedicine to categorize patients and determine screening eligibility for patients with new complaints. Several articles noted the importance of implementing preventive measures such as COVID-19 screening prior to the procedures, maintaining hygiene measures, and social distancing in waiting rooms. Alternative screening methods that do not require an in-person clinic visit and can effectively screen patients for cancers included mailing self-collection sampling kits for cervical and colorectal cancers, and implementing or expanding mobile screening units. Conclusions: Although the COVID-19 pandemic had devastating effects on population health globally, it could be an opportunity to adapt and evolve cancer screening methods. Disruption often creates innovation, and focus on alternative methods for cancer screenings may help reach rural and underresourced areas after the pandemic has ended. ", doi="10.2196/34392", url="https://cancer.jmir.org/2022/1/e34392", url="http://www.ncbi.nlm.nih.gov/pubmed/35142621" } @Article{info:doi/10.2196/31083, author="Ackermann, Khalia and Baker, Jannah and Green, Malcolm and Fullick, Mary and Varinli, Hilal and Westbrook, Johanna and Li, Ling", title="Computerized Clinical Decision Support Systems for the Early Detection of Sepsis Among Adult Inpatients: Scoping Review", journal="J Med Internet Res", year="2022", month="Feb", day="23", volume="24", number="2", pages="e31083", keywords="sepsis", keywords="early detection of disease", keywords="clinical decision support systems", keywords="patient safety", keywords="electronic health records", keywords="sepsis care pathway", abstract="Background: Sepsis is a significant cause of morbidity and mortality worldwide. Early detection of sepsis followed promptly by treatment initiation improves patient outcomes and saves lives. Hospitals are increasingly using computerized clinical decision support (CCDS) systems for the rapid identification of adult patients with sepsis. Objective: This scoping review aims to systematically describe studies reporting on the use and evaluation of CCDS systems for the early detection of adult inpatients with sepsis. Methods: The protocol for this scoping review was previously published. A total of 10 electronic databases (MEDLINE, Embase, CINAHL, the Cochrane database, LILACS [Latin American and Caribbean Health Sciences Literature], Scopus, Web of Science, OpenGrey, ClinicalTrials.gov, and PQDT [ProQuest Dissertations and Theses]) were comprehensively searched using terms for sepsis, CCDS, and detection to identify relevant studies. Title, abstract, and full-text screening were performed by 2 independent reviewers using predefined eligibility criteria. Data charting was performed by 1 reviewer with a second reviewer checking a random sample of studies. Any disagreements were discussed with input from a third reviewer. In this review, we present the results for adult inpatients, including studies that do not specify patient age. Results: A search of the electronic databases retrieved 12,139 studies following duplicate removal. We identified 124 studies for inclusion after title, abstract, full-text screening, and hand searching were complete. Nearly all studies (121/124, 97.6\%) were published after 2009. Half of the studies were journal articles (65/124, 52.4\%), and the remainder were conference abstracts (54/124, 43.5\%) and theses (5/124, 4\%). Most studies used a single cohort (54/124, 43.5\%) or before-after (42/124, 33.9\%) approach. Across all 124 included studies, patient outcomes were the most frequently reported outcomes (107/124, 86.3\%), followed by sepsis treatment and management (75/124, 60.5\%), CCDS usability (14/124, 11.3\%), and cost outcomes (9/124, 7.3\%). For sepsis identification, the systemic inflammatory response syndrome criteria were the most commonly used, alone (50/124, 40.3\%), combined with organ dysfunction (28/124, 22.6\%), or combined with other criteria (23/124, 18.5\%). Over half of the CCDS systems (68/124, 54.8\%) were implemented alongside other sepsis-related interventions. Conclusions: The current body of literature investigating the implementation of CCDS systems for the early detection of adult inpatients with sepsis is extremely diverse. There is substantial variability in study design, CCDS criteria and characteristics, and outcomes measured across the identified literature. Future research on CCDS system usability, cost, and impact on sepsis morbidity is needed. International Registered Report Identifier (IRRID): RR2-10.2196/24899 ", doi="10.2196/31083", url="https://www.jmir.org/2022/2/e31083", url="http://www.ncbi.nlm.nih.gov/pubmed/35195528" } @Article{info:doi/10.2196/34907, author="Ellis, A. Louise and Sarkies, Mitchell and Churruca, Kate and Dammery, Genevieve and Meulenbroeks, Isabelle and Smith, L. Carolynn and Pomare, Chiara and Mahmoud, Zeyad and Zurynski, Yvonne and Braithwaite, Jeffrey", title="The Science of Learning Health Systems: Scoping Review of Empirical Research", journal="JMIR Med Inform", year="2022", month="Feb", day="23", volume="10", number="2", pages="e34907", keywords="learning health systems", keywords="learning health care systems", keywords="implementation science", keywords="evaluation", keywords="health system", keywords="health care system", keywords="empirical research", keywords="medical informatics", keywords="review", abstract="Background: The development and adoption of a learning health system (LHS) has been proposed as a means to address key challenges facing current and future health care systems. The first review of the LHS literature was conducted 5 years ago, identifying only a small number of published papers that had empirically examined the implementation or testing of an LHS. It is timely to look more closely at the published empirical research and to ask the question, Where are we now? 5 years on from that early LHS review. Objective: This study performed a scoping review of empirical research within the LHS domain. Taking an ``implementation science'' lens, the review aims to map out the empirical research that has been conducted to date, identify limitations, and identify future directions for the field. Methods: Two academic databases (PubMed and Scopus) were searched using the terms ``learning health* system*'' for papers published between January 1, 2016, to January 31, 2021, that had an explicit empirical focus on LHSs. Study information was extracted relevant to the review objective, including each study's publication details; primary concern or focus; context; design; data type; implementation framework, model, or theory used; and implementation determinants or outcomes examined. Results: A total of 76 studies were included in this review. Over two-thirds of the studies were concerned with implementing a particular program, system, or platform (53/76, 69.7\%) designed to contribute to achieving an LHS. Most of these studies focused on a particular clinical context or patient population (37/53, 69.8\%), with far fewer studies focusing on whole hospital systems (4/53, 7.5\%) or on other broad health care systems encompassing multiple facilities (12/53, 22.6\%). Over two-thirds of the program-specific studies utilized quantitative methods (37/53, 69.8\%), with a smaller number utilizing qualitative methods (10/53, 18.9\%) or mixed-methods designs (6/53, 11.3\%). The remaining 23 studies were classified into 1 of 3 key areas: ethics, policies, and governance (10/76, 13.2\%); stakeholder perspectives of LHSs (5/76, 6.6\%); or LHS-specific research strategies and tools (8/76, 10.5\%). Overall, relatively few studies were identified that incorporated an implementation science framework. Conclusions: Although there has been considerable growth in empirical applications of LHSs within the past 5 years, paralleling the recent emergence of LHS-specific research strategies and tools, there are few high-quality studies. Comprehensive reporting of implementation and evaluation efforts is an important step to moving the LHS field forward. In particular, the routine use of implementation determinant and outcome frameworks will improve the assessment and reporting of barriers, enablers, and implementation outcomes in this field and will enable comparison and identification of trends across studies. ", doi="10.2196/34907", url="https://medinform.jmir.org/2022/2/e34907", url="http://www.ncbi.nlm.nih.gov/pubmed/35195529" } @Article{info:doi/10.2196/29279, author="Suraj, Varun and Del Vecchio Fitz, Catherine and Kleiman, B. Laura and Bhavnani, K. Suresh and Jani, Chinmay and Shah, Surbhi and McKay, R. Rana and Warner, Jeremy and Alterovitz, Gil", title="SMART COVID Navigator, a Clinical Decision Support Tool for COVID-19 Treatment: Design and Development Study", journal="J Med Internet Res", year="2022", month="Feb", day="18", volume="24", number="2", pages="e29279", keywords="COVID-19", keywords="clinical decision support", keywords="precision medicine", keywords="web application", keywords="FHIR", abstract="Background: COVID-19 caused by SARS-CoV-2 has infected 219 million individuals at the time of writing of this paper. A large volume of research findings from observational studies about disease interactions with COVID-19 is being produced almost daily, making it difficult for physicians to keep track of the latest information on COVID-19's effect on patients with certain pre-existing conditions. Objective: In this paper, we describe the creation of a clinical decision support tool, the SMART COVID Navigator, a web application to assist clinicians in treating patients with COVID-19. Our application allows clinicians to access a patient's electronic health records and identify disease interactions from a large set of observational research studies that affect the severity and fatality due to COVID-19. Methods: The SMART COVID Navigator takes a 2-pronged approach to clinical decision support. The first part is a connection to electronic health record servers, allowing the application to access a patient's medical conditions. The second is accessing data sets with information from various observational studies to determine the latest research findings about COVID-19 outcomes for patients with certain medical conditions. By connecting these 2 data sources, users can see how a patient's medical history will affect their COVID-19 outcomes. Results: The SMART COVID Navigator aggregates patient health information from multiple Fast Healthcare Interoperability Resources--enabled electronic health record systems. This allows physicians to see a comprehensive view of patient health records. The application accesses 2 data sets of over 1100 research studies to provide information on the fatality and severity of COVID-19 for several pre-existing conditions. We also analyzed the results of the collected studies to determine which medical conditions result in an increased chance of severity and fatality of COVID-19 progression. We found that certain conditions result in a higher likelihood of severity and fatality probabilities. We also analyze various cancer tissues and find that the probabilities for fatality vary greatly depending on the tissue being examined. Conclusions: The SMART COVID Navigator allows physicians to predict the fatality and severity of COVID-19 progression given a particular patient's medical conditions. This can allow physicians to determine how aggressively to treat patients infected with COVID-19 and to prioritize different patients for treatment considering their prior medical conditions. ", doi="10.2196/29279", url="https://www.jmir.org/2022/2/e29279", url="http://www.ncbi.nlm.nih.gov/pubmed/34932493" } @Article{info:doi/10.2196/33440, author="Xiao, Jialong and Mo, Miao and Wang, Zezhou and Zhou, Changming and Shen, Jie and Yuan, Jing and He, Yulian and Zheng, Ying", title="The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Feb", day="18", volume="10", number="2", pages="e33440", keywords="breast cancer", keywords="machine learning", keywords="survival analysis", keywords="random survival forest", keywords="support vector machine", keywords="medical informatics", keywords="prediction models", abstract="Background: Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance. Objective: This study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression. Methods: This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70\%) and a test set (6653 cases, 30\%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score. Results: The RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95\% CI 6.542, 8.930) and a C-index of 0.827 (95\% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95\% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95\% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95\% CI 0.793, 0.832; P<.001). The 4 models' 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer. Conclusions: The RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis. ", doi="10.2196/33440", url="https://medinform.jmir.org/2022/2/e33440", url="http://www.ncbi.nlm.nih.gov/pubmed/35179504" } @Article{info:doi/10.2196/30345, author="Montoto, Carmen and Gisbert, P. Javier and Guerra, Iv{\'a}n and Plaza, Roc{\'i}o and Pajares Villarroya, Ram{\'o}n and Moreno Almaz{\'a}n, Luis and L{\'o}pez Mart{\'i}n, Carmen Mar{\'i}a Del and Dom{\'i}nguez Antonaya, Mercedes and Vera Mendoza, Isabel and Aparicio, Jes{\'u}s and Mart{\'i}nez, Vicente and Tagarro, Ignacio and Fernandez-Nistal, Alonso and Canales, Lea and Menke, Sebastian and Gomoll{\'o}n, Fernando and ", title="Evaluation of Natural Language Processing for the Identification of Crohn Disease--Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project", journal="JMIR Med Inform", year="2022", month="Feb", day="18", volume="10", number="2", pages="e30345", keywords="natural language processing", keywords="linguistic validation", keywords="artificial intelligence", keywords="electronic health records", keywords="Crohn disease", keywords="inflammatory bowel disease", abstract="Background: The exploration of clinically relevant information in the free text of electronic health records (EHRs) holds the potential to positively impact clinical practice as well as knowledge regarding Crohn disease (CD), an inflammatory bowel disease that may affect any segment of the gastrointestinal tract. The EHRead technology, a clinical natural language processing (cNLP) system, was designed to detect and extract clinical information from narratives in the clinical notes contained in EHRs. Objective: The aim of this study is to validate the performance of the EHRead technology in identifying information of patients with CD. Methods: We used the EHRead technology to explore and extract CD-related clinical information from EHRs. To validate this tool, we compared the output of the EHRead technology with a manually curated gold standard to assess the quality of our cNLP system in detecting records containing any reference to CD and its related variables. Results: The validation metrics for the main variable (CD) were a precision of 0.88, a recall of 0.98, and an F1 score of 0.93. Regarding the secondary variables, we obtained a precision of 0.91, a recall of 0.71, and an F1 score of 0.80 for CD flare, while for the variable vedolizumab (treatment), a precision, recall, and F1 score of 0.86, 0.94, and 0.90 were obtained, respectively. Conclusions: This evaluation demonstrates the ability of the EHRead technology to identify patients with CD and their related variables from the free text of EHRs. To the best of our knowledge, this study is the first to use a cNLP system for the identification of CD in EHRs written in Spanish. ", doi="10.2196/30345", url="https://medinform.jmir.org/2022/2/e30345", url="http://www.ncbi.nlm.nih.gov/pubmed/35179507" } @Article{info:doi/10.2196/29806, author="Sung, Sheng-Feng and Hsieh, Cheng-Yang and Hu, Ya-Han", title="Early Prediction of Functional Outcomes After Acute Ischemic Stroke Using Unstructured Clinical Text: Retrospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Feb", day="17", volume="10", number="2", pages="e29806", keywords="acute ischemic stroke", keywords="bag-of-words", keywords="extreme gradient boosting", keywords="machine learning", keywords="MetaMap", keywords="natural language processing", keywords="outcome prediction", keywords="text classification", keywords="unstructured clinical text", abstract="Background: Several prognostic scores have been proposed to predict functional outcomes after an acute ischemic stroke (AIS). Most of these scores are based on structured information and have been used to develop prediction models via the logistic regression method. With the increased use of electronic health records and the progress in computational power, data-driven predictive modeling by using machine learning techniques is gaining popularity in clinical decision-making. Objective: We aimed to investigate whether machine learning models created by using unstructured text could improve the prediction of functional outcomes at an early stage after AIS. Methods: We identified all consecutive patients who were hospitalized for the first time for AIS from October 2007 to December 2019 by using a hospital stroke registry. The study population was randomly split into a training (n=2885) and test set (n=962). Free text in histories of present illness and computed tomography reports was transformed into input variables via natural language processing. Models were trained by using the extreme gradient boosting technique to predict a poor functional outcome at 90 days poststroke. Model performance on the test set was evaluated by using the area under the receiver operating characteristic curve (AUC). Results: The AUCs of text-only models ranged from 0.768 to 0.807 and were comparable to that of the model using National Institutes of Health Stroke Scale (NIHSS) scores (0.811). Models using both patient age and text achieved AUCs of 0.823 and 0.825, which were similar to those of the model containing age and NIHSS scores (0.841); the model containing preadmission comorbidities, level of consciousness, age, and neurological deficit (PLAN) scores (0.837); and the model containing Acute Stroke Registry and Analysis of Lausanne (ASTRAL) scores (0.840). Adding variables from clinical text improved the predictive performance of the model containing age and NIHSS scores, the model containing PLAN scores, and the model containing ASTRAL scores (the AUC increased from 0.841 to 0.861, from 0.837 to 0.856, and from 0.840 to 0.860, respectively). Conclusions: Unstructured clinical text can be used to improve the performance of existing models for predicting poststroke functional outcomes. However, considering the different terminologies that are used across health systems, each individual health system may consider using the proposed methods to develop and validate its own models. ", doi="10.2196/29806", url="https://medinform.jmir.org/2022/2/e29806", url="http://www.ncbi.nlm.nih.gov/pubmed/35175201" } @Article{info:doi/10.2196/33651, author="Fyhr, AnnSofie and Persson, Johanna and Ek, {\AA}sa", title="Usage and Usability of a National e-Library for Chemotherapy Regimens: Mixed Methods Study", journal="JMIR Hum Factors", year="2022", month="Feb", day="17", volume="9", number="1", pages="e33651", keywords="chemotherapy regimens", keywords="user evaluation", keywords="standardization", keywords="patient safety", keywords="chemotherapy", keywords="safety", keywords="usability", keywords="e-library", keywords="medication errors", abstract="Background: Accurate information about chemotherapy drugs and regimens is needed to reduce chemotherapy errors. A national e-library, as a common knowledge source with standardized chemotherapy nomenclature and content, was developed. Since the information in the library is both complex and extensive, it is central that the users can use the resource as intended. Objective: The aim of this study was to evaluate the usage and usability of an extensive e-library for chemotherapy regimens developed to reduce medication errors, support the health care staff in their work, and increase patient safety. Methods: To obtain a comprehensive evaluation, a mixed methods study was performed for a broad view of the usage, including a compilation of subjective views of the users (web survey, spontaneous user feedback, and qualitative interviews), analysis of statistics from the website, and an expert evaluation of the usability of the webpage. Results: Statistics from the website show an average of just over 2500 visits and 870 unique visitors per month. Most visits took place Mondays to Fridays, but there were 5-10 visits per day on weekends. The web survey, with 292 answers, shows that the visitors were mainly physicians and nurses. Almost 80\% (224/292) of respondents searched for regimens and 90\% (264/292) found what they were looking for and were satisfied with their visit. The expert evaluation shows that the e-library follows many existing design principles, thus providing some useful improvement suggestions. A total of 86 emails were received in 2020 with user feedback, most of which were from nurses. The main part (78\%, 67/86) contained a question, and the rest had discovered errors mainly in some regimen. The interviews reveal that most hospitals use a computerized physician order entry system, and they use the e-library in various ways, import XML files, transfer information, or use it as a reference. One hospital without a system uses the administration schedules from the library. Conclusions: The user evaluation indicates that the e-library is used in the intended manner and that the users can interact without problems. Users have different needs depending on their profession and their workplace, and these can be supported. The combination of methods applied ensures that the design and content comply with the users' needs and serves as feedback for continuous design and learning. With a broad national usage, the e-library can become a source for organizational and national learning and a source for continuous improvement of cancer care in Sweden. ", doi="10.2196/33651", url="https://humanfactors.jmir.org/2022/1/e33651", url="http://www.ncbi.nlm.nih.gov/pubmed/35175199" } @Article{info:doi/10.2196/23355, author="Wang, Le and Goh, Huat Kim and Yeow, Adrian and Poh, Hermione and Li, Ke and Yeow, Lin Joannas Jie and Tan, Gamaliel and Soh, Christina", title="Habit and Automaticity in Medical Alert Override: Cohort Study", journal="J Med Internet Res", year="2022", month="Feb", day="16", volume="24", number="2", pages="e23355", keywords="alert systems", keywords="habits", keywords="electronic medical record", keywords="health personnel alert fatigue", abstract="Background: Prior literature suggests that alert dismissal could be linked to physicians' habits and automaticity. The evidence for this perspective has been mainly observational data. This study uses log data from an electronic medical records system to empirically validate this perspective. Objective: We seek to quantify the association between habit and alert dismissal in physicians. Methods: We conducted a retrospective analysis using the log data comprising 66,049 alerts generated from hospitalized patients in a hospital from March 2017 to December 2018. We analyzed 1152 physicians exposed to a specific clinical support alert triggered in a hospital's electronic medical record system to estimate the extent to which the physicians' habit strength, which had been developed from habitual learning, impacted their propensity toward alert dismissal. We further examined the association between a physician's habit strength and their subsequent incidences of alert dismissal. Additionally, we recorded the time taken by the physician to respond to the alert and collected data on other clinical and environmental factors related to the alerts as covariates for the analysis. Results: We found that a physician's prior dismissal of alerts leads to their increased habit strength to dismiss alerts. Furthermore, a physician's habit strength to dismiss alerts was found to be positively associated with incidences of subsequent alert dismissals after their initial alert dismissal. Alert dismissal due to habitual learning was also found to be pervasive across all physician ranks, from junior interns to senior attending specialists. Further, the dismissal of alerts had been observed to typically occur after a very short processing time. Our study found that 72.5\% of alerts were dismissed in under 3 seconds after the alert appeared, and 13.2\% of all alerts were dismissed in under 1 second after the alert appeared. We found empirical support that habitual dismissal is one of the key factors associated with alert dismissal. We also found that habitual dismissal of alerts is self-reinforcing, which suggests significant challenges in disrupting or changing alert dismissal habits once they are formed. Conclusions: Habitual tendencies are associated with the dismissal of alerts. This relationship is pervasive across all levels of physician rank and experience, and the effect is self-reinforcing. ", doi="10.2196/23355", url="https://www.jmir.org/2022/2/e23355", url="http://www.ncbi.nlm.nih.gov/pubmed/35171102" } @Article{info:doi/10.2196/34560, author="Bove, Riley and Schleimer, Erica and Sukhanov, Paul and Gilson, Michael and Law, M. Sindy and Barnecut, Andrew and Miller, L. Bruce and Hauser, L. Stephen and Sanders, J. Stephan and Rankin, P. Katherine", title="Building a Precision Medicine Delivery Platform for Clinics: The University of California, San Francisco, BRIDGE Experience", journal="J Med Internet Res", year="2022", month="Feb", day="15", volume="24", number="2", pages="e34560", keywords="precision medicine", keywords="clinical implementation", keywords="in silico trials", keywords="clinical dashboard", keywords="precision", keywords="implementation", keywords="dashboard", keywords="design", keywords="experience", keywords="analytic", keywords="tool", keywords="analysis", keywords="decision-making", keywords="real time", keywords="platform", keywords="human-centered design", doi="10.2196/34560", url="https://www.jmir.org/2022/2/e34560", url="http://www.ncbi.nlm.nih.gov/pubmed/35166689" } @Article{info:doi/10.2196/29541, author="Garcia, P. Angely and De La Vega, F. Shelley and Mercado, P. Susan", title="Health Information Systems for Older Persons in Select Government Tertiary Hospitals and Health Centers in the Philippines: Cross-sectional Study", journal="J Med Internet Res", year="2022", month="Feb", day="14", volume="24", number="2", pages="e29541", keywords="health information systems", keywords="the Philippines", keywords="aged", keywords="hospitals", keywords="community health centers", keywords="database", keywords="geriatric assessment", keywords="elderly", keywords="digital health", keywords="medical records", keywords="health policy", abstract="Background: The rapid aging of the world's population requires systems that support health facilities' provision of integrated care at multiple levels of the health care system. The use of health information systems (HISs) at the point of care has shown positive effects on clinical processes and patient health in several settings of care. Objective: We sought to describe HISs for older persons (OPs) in select government tertiary hospitals and health centers in the Philippines. Specifically, we aimed to review the existing policies and guidelines related to HISs for OPs in the country, determine the proportion of select government hospitals and health centers with existing health information specific for OPs, and describe the challenges related to HISs in select health facilities. Methods: We utilized the data derived from the findings of the Focused Interventions for Frail Older Adults Research and Development Project (FITforFrail), a cross-sectional and ethics committee--approved study. A facility-based listing of services and human resources specific to geriatric patients was conducted in purposively sampled 27 tertiary government hospitals identified as geriatric centers and 16 health centers across all regions in the Philippines. We also reviewed the existing policies and guidelines related to HISs for OPs in the country. Results: Based on the existing guidelines, multiple agencies were involved in the provision of services for OPs, with several records containing health information of OPs. However, there is no existing HIS specific for OPs in the country. Only 14 (52\%) of the 27 hospitals and 4 (25\%) of the 16 health centers conduct comprehensive geriatric assessment (CGA). All tertiary hospitals and health centers are able to maintain medical records of their patients, and almost all (26/27, 96\%) hospitals and all (16/16, 100\%) health centers have data on top causes of morbidity and mortality. Meanwhile, the presence of specific disease registries varied per hospitals and health centers. Challenges to HISs include the inability to update databases due to inadequately trained personnel, use of an offline facility--based HIS, an unstable internet connection, and technical issues and nonuniform reporting of categories for age group classification. Conclusions: Current HISs for OPs are characterized by fragmentation, multiple sources, and inaccessibility. Barriers to achieving appropriate HISs for OPs include the inability to update HISs in hospitals and health centers and a lack of standardization by age group and disease classification. Thus, we recommend a 1-person, 1-record electronic medical record system for OPs and the disaggregation and analysis across demographic and socioeconomic parameters to inform policies and programs that address the complex needs of OPs. CGA as a required routine procedure for all OPs and its integration with the existing HISs in the country are also recommended. ", doi="10.2196/29541", url="https://www.jmir.org/2022/2/e29541", url="http://www.ncbi.nlm.nih.gov/pubmed/35156927" } @Article{info:doi/10.2196/27936, author="Matsui, Hiroki and Yamana, Hayato and Fushimi, Kiyohide and Yasunaga, Hideo", title="Development of Deep Learning Models for Predicting In-Hospital Mortality Using an Administrative Claims Database: Retrospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Feb", day="11", volume="10", number="2", pages="e27936", keywords="prognostic model", keywords="deep learning", keywords="real-world data", keywords="acute care", keywords="claims data", keywords="myocardial infarction", keywords="heart failure", keywords="stroke", keywords="pneumonia", abstract="Background: Administrative claims databases have been used widely in studies because they have large sample sizes and are easily available. However, studies using administrative databases lack information on disease severity, so a risk adjustment method needs to be developed. Objective: We aimed to develop and validate deep learning--based prediction models for in-hospital mortality of acute care patients. Methods: The main model was developed using only administrative claims data (age, sex, diagnoses, and procedures on the day of admission). We also constructed disease-specific models for acute myocardial infarction, heart failure, stroke, and pneumonia using common severity indices for these diseases. Using the Japanese Diagnosis Procedure Combination data from July 2010 to March 2017, we identified 46,665,933 inpatients and divided them into derivation and validation cohorts in a ratio of 95:5. The main model was developed using a 9-layer deep neural network with 4 hidden dense layers that had 1000 nodes and were fully connected to adjacent layers. We evaluated model discrimination ability by an area under the receiver operating characteristic curve (AUC) and calibration ability by calibration plot. Results: Among the eligible patients, 2,005,035 (4.3\%) died. Discrimination and calibration of the models were satisfactory. The AUC of the main model in the validation cohort was 0.954 (95\% CI 0.954-0.955). The main model had higher discrimination ability than the disease-specific models. Conclusions: Our deep learning--based model using diagnoses and procedures produced valid predictions of in-hospital mortality. ", doi="10.2196/27936", url="https://medinform.jmir.org/2022/2/e27936", url="http://www.ncbi.nlm.nih.gov/pubmed/34997958" } @Article{info:doi/10.2196/34932, author="Shara, Nawar and Anderson, M. Kelley and Falah, Noor and Ahmad, F. Maryam and Tavazoei, Darya and Hughes, M. Justin and Talmadge, Bethany and Crovatt, Samantha and Dempers, Ramon", title="Early Identification of Maternal Cardiovascular Risk Through Sourcing and Preparing Electronic Health Record Data: Machine Learning Study", journal="JMIR Med Inform", year="2022", month="Feb", day="10", volume="10", number="2", pages="e34932", keywords="electronic health record", keywords="maternal health", keywords="machine learning", keywords="maternal morbidity and mortality", keywords="cardiovascular risk", keywords="data transformation", keywords="extract", keywords="transform", keywords="load", keywords="artificial intelligence", keywords="electronic medical record", abstract="Background: Health care data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes. However, the differences that exist in each individual's health records, combined with the lack of health data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. Although these problems exist throughout health care, they are especially prevalent within maternal health and exacerbate the maternal morbidity and mortality crisis in the United States. Objective: This study aims to demonstrate that patient records extracted from the electronic health records (EHRs) of a large tertiary health care system can be made actionable for the goal of effectively using ML to identify maternal cardiovascular risk before evidence of diagnosis or intervention within the patient's record. Maternal patient records were extracted from the EHRs of a large tertiary health care system and made into patient-specific, complete data sets through a systematic method. Methods: We outline the effort that was required to define the specifications of the computational systems, the data set, and access to relevant systems, while ensuring that data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for their use by a proprietary risk stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient's baselines to inform early interventions. Results: Patient records can be made actionable for the goal of effectively using ML, specifically to identify cardiovascular risk in pregnant patients. Conclusions: Upon acquiring data, including their concatenation, anonymization, and normalization across multiple EHRs, the use of an ML-based tool can provide early identification of cardiovascular risk in pregnant patients. ", doi="10.2196/34932", url="https://medinform.jmir.org/2022/2/e34932", url="http://www.ncbi.nlm.nih.gov/pubmed/35142637" } @Article{info:doi/10.2196/30512, author="P{\'e}rez-Mart{\'i}, Montserrat and Casad{\'o}-Mar{\'i}n, Lina and Guill{\'e}n-Villar, Abraham", title="Electronic Records With Tablets at the Point of Care in an Internal Medicine Unit: Before-After Time Motion Study", journal="JMIR Hum Factors", year="2022", month="Feb", day="10", volume="9", number="1", pages="e30512", keywords="electronic health records", keywords="nursing", keywords="computer handheld", keywords="equipment and supplies (devices tablets mobile phones, devices and technologies)", keywords="workflow", abstract="Background: There are many benefits of nursing professionals being able to consult electronic health records (EHRs) at the point of care. It promotes quality and patient security, communication, continuity of care, and time dedicated to records. Objective: The aim of this study was to evaluate whether making EHRs available at the point of care with tablets reduces nurses' time spent on records compared with the current system. The analysis included sociodemographic and qualitative variables, time spent per patient, and work shift. This time difference can be used for direct patient care. Methods: A before-after time motion study was carried out in the internal medicine unit. There was a total of 130 observations of 2 hours to 3 hours in duration of complete patient records that were carried out at the beginning of the nurses' work shifts. We calculated the time dedicated to measuring vital signs, patient evaluation, and EHR recording. The main variable was time spent per patient. Results: The average time spent per patient (total time/patients admitted) was lower with the tablet group (mean 4.22, SD 0.14 minutes) than with the control group (mean 4.66, SD 0.12 minutes); there were statistically significant differences (W=3.20, P=.001) and a low effect (d=.44) between groups. The tablet group saved an average of 0.44 (SD 0.13) minutes per patient. Similar results were obtained for the afternoon shift, which saved an average of 0.60 (SD 0.15) minutes per patient (t34=3.82, P=.01) and high effect (d=.77). However, although there was a mean difference of 0.26 (SD 0.22) minutes per patient for the night shift, this was not statistically significant (t29=1.16, P=.25). The ``nonparticipating'' average age was higher (49.57, SD 2.92 years) compared with the ``afternoon shift participants'' and ``night shift participants'' (P=.007). ``Nonparticipants'' of the night shift had a worse perception of the project. Conclusions: This investigation determined that, with EHRs at the point of care, the time spent for registration by the nursing staff decreases, because of reduced movements and avoiding data transcription. It eliminates unnecessary work that does not add value, and therefore, care is improved. So, we think EHRs at the point of care should be the future or natural method for nursing to undertake. However, variables that could have a negative effect include age, night shift, and nurses' perceptions. Therefore, it is proposed that training in the different work platforms and the participation of nurses are fundamental axes that any institution should consider before their implementation. ", doi="10.2196/30512", url="https://humanfactors.jmir.org/2022/1/e30512", url="http://www.ncbi.nlm.nih.gov/pubmed/35142624" } @Article{info:doi/10.2196/32714, author="Khan, Ullah Waqas and Shachak, Aviv and Seto, Emily", title="Understanding Decision-Making in the Adoption of Digital Health Technology: The Role of Behavioral Economics' Prospect Theory", journal="J Med Internet Res", year="2022", month="Feb", day="7", volume="24", number="2", pages="e32714", keywords="decision-making", keywords="digital health technology adoption", keywords="prospect theory", doi="10.2196/32714", url="https://www.jmir.org/2022/2/e32714", url="http://www.ncbi.nlm.nih.gov/pubmed/35129459" } @Article{info:doi/10.2196/28199, author="Scheder-Bieschin, Justus and Bl{\"u}mke, Bibiana and de Buijzer, Erwin and Cotte, Fabienne and Echterdiek, Fabian and Nacsa, J{\'u}lia and Ondresik, Marta and Ott, Matthias and Paul, Gregor and Schilling, Tobias and Schmitt, Anne and Wicks, Paul and Gilbert, Stephen", title="Improving Emergency Department Patient-Physician Conversation Through an Artificial Intelligence Symptom-Taking Tool: Mixed Methods Pilot Observational Study", journal="JMIR Form Res", year="2022", month="Feb", day="7", volume="6", number="2", pages="e28199", keywords="symptom assessment application", keywords="anamnesis", keywords="health care system", keywords="patient history taking", keywords="diagnosis", keywords="emergency department", abstract="Background: Establishing rapport and empathy between patients and their health care provider is important but challenging in the context of a busy and crowded emergency department (ED). Objective: We explore the hypotheses that rapport building, documentation, and time efficiency might be improved in the ED by providing patients a digital tool that uses Bayesian reasoning--based techniques to gather relevant symptoms and history for handover to clinicians. Methods: A 2-phase pilot evaluation was carried out in the ED of a German tertiary referral and major trauma hospital that treats an average of 120 patients daily. Phase 1 observations guided iterative improvement of the digital tool, which was then further evaluated in phase 2. All patients who were willing and able to provide consent were invited to participate, excluding those with severe injury or illness requiring immediate treatment, with traumatic injury, incapable of completing a health assessment, and aged <18 years. Over an 18-day period with 1699 patients presenting to the ED, 815 (47.96\%) were eligible based on triage level. With available recruitment staff, 135 were approached, of whom 81 (60\%) were included in the study. In a mixed methods evaluation, patients entered information into the tool, accessed by clinicians through a dashboard. All users completed evaluation Likert-scale questionnaires rating the tool's performance. The feasibility of a larger trial was evaluated through rates of recruitment and questionnaire completion. Results: Respondents strongly endorsed the tool for facilitating conversation (61/81, 75\% of patients, 57/78, 73\% of physician ratings, and 10/10, 100\% of nurse ratings). Most nurses judged the tool as potentially time saving, whereas most physicians only agreed for a subset of medical specialties (eg, surgery). Patients reported high usability and understood the tool's questions. The tool was recommended by most patients (63/81, 78\%), in 53\% (41/77) of physician ratings, and in 76\% (61/80) of nurse ratings. Questionnaire completion rates were 100\% (81/81) by patients and 96\% (78/81 enrolled patients) by physicians. Conclusions: This pilot confirmed that a larger study in the setting would be feasible. The tool has clear potential to improve patient--health care provider interaction and could also contribute to ED efficiency savings. Future research and development will extend the range of patients for whom the history-taking tool has clinical utility. Trial Registration: German Clinical Trials Register DRKS00024115; https://drks.de/drks\_web/navigate.do?navigationId=trial.HTML\&TRIAL\_ID=DRKS00024115 ", doi="10.2196/28199", url="https://formative.jmir.org/2022/2/e28199", url="http://www.ncbi.nlm.nih.gov/pubmed/35129452" } @Article{info:doi/10.2196/30351, author="Durojaiye, Ashimiyu and Fackler, James and McGeorge, Nicolette and Webster, Kristen and Kharrazi, Hadi and Gurses, Ayse", title="Examining Diurnal Differences in Multidisciplinary Care Teams at a Pediatric Trauma Center Using Electronic Health Record Data: Social Network Analysis", journal="J Med Internet Res", year="2022", month="Feb", day="4", volume="24", number="2", pages="e30351", keywords="pediatric trauma", keywords="multidisciplinary health team", keywords="multi-team systems", keywords="social network analysis", keywords="electronic health record", keywords="process mining", keywords="fluid teams", abstract="Background: The care of pediatric trauma patients is delivered by multidisciplinary care teams with high fluidity that may vary in composition and organization depending on the time of day. Objective: This study aims to identify and describe diurnal variations in multidisciplinary care teams taking care of pediatric trauma patients using social network analysis on electronic health record (EHR) data. Methods: Metadata of clinical activities were extracted from the EHR and processed into an event log, which was divided into 6 different event logs based on shift (day or night) and location (emergency department, pediatric intensive care unit, and floor). Social networks were constructed from each event log by creating an edge among the functional roles captured within a similar time interval during a shift. Overlapping communities were identified from the social networks. Day and night network structures for each care location were compared and validated via comparison with secondary analysis of qualitatively derived care team data, obtained through semistructured interviews; and member-checking interviews with clinicians. Results: There were 413 encounters in the 1-year study period, with 65.9\% (272/413) and 34.1\% (141/413) beginning during day and night shifts, respectively. A single community was identified at all locations during the day and in the pediatric intensive care unit at night, whereas multiple communities corresponding to individual specialty services were identified in the emergency department and on the floor at night. Members of the trauma service belonged to all communities, suggesting that they were responsible for care coordination. Health care professionals found the networks to be largely accurate representations of the composition of the care teams and the interactions among them. Conclusions: Social network analysis was successfully used on EHR data to identify and describe diurnal differences in the composition and organization of multidisciplinary care teams at a pediatric trauma center. ", doi="10.2196/30351", url="https://www.jmir.org/2022/2/e30351", url="http://www.ncbi.nlm.nih.gov/pubmed/35119372" } @Article{info:doi/10.2196/27887, author="Khanbhai, Mustafa and Symons, Joshua and Flott, Kelsey and Harrison-White, Stephanie and Spofforth, Jamie and Klaber, Robert and Manton, David and Darzi, Ara and Mayer, Erik", title="Enriching the Value of Patient Experience Feedback: Web-Based Dashboard Development Using Co-design and Heuristic Evaluation", journal="JMIR Hum Factors", year="2022", month="Feb", day="3", volume="9", number="1", pages="e27887", keywords="patient experience", keywords="friends and family test", keywords="quality dashboard", keywords="co-design", keywords="heuristic evaluation", keywords="usability", abstract="Background: There is an abundance of patient experience data held within health care organizations, but stakeholders and staff are often unable to use the output in a meaningful and timely way to improve care delivery. Dashboards, which use visualized data to summarize key patient experience feedback, have the potential to address these issues. Objective: The aim of this study is to develop a patient experience dashboard with an emphasis on Friends and Family Test (FFT) reporting, as per the national policy drive. Methods: A 2-stage approach was used---participatory co-design involving 20 co-designers to develop a dashboard prototype, followed by iterative dashboard testing. Language analysis was performed on free-text patient experience data from the FFT, and the themes and sentiments generated were used to populate the dashboard with associated FFT metrics. Heuristic evaluation and usability testing were conducted to refine the dashboard and assess user satisfaction using the system usability score. Results: The qualitative analysis from the co-design process informed the development of the dashboard prototype with key dashboard requirements and a significant preference for bubble chart display. The heuristic evaluation revealed that most cumulative scores had no usability problems (18/20, 90\%), had cosmetic problems only (7/20, 35\%), or had minor usability problems (5/20, 25\%). The mean System Usability Scale score was 89.7 (SD 7.9), suggesting an excellent rating. Conclusions: The growing capacity to collect and process patient experience data suggests that data visualization will be increasingly important in turning feedback into improvements to care. Through heuristic usability, we demonstrated that very large FFT data can be presented in a thematically driven, simple visual display without the loss of the nuances and still allow for the exploration of the original free-text comments. This study establishes guidance for optimizing the design of patient experience dashboards that health care providers find meaningful, which in turn drives patient-centered care. ", doi="10.2196/27887", url="https://humanfactors.jmir.org/2022/1/e27887", url="http://www.ncbi.nlm.nih.gov/pubmed/35113022" } @Article{info:doi/10.2196/30483, author="Pi{\v c}ulin, Matej and Smole, Tim and ?unkovi{\v c}, Bojan and Kokalj, Enja and Robnik-{\vS}ikonja, Marko and Kukar, Matja? and Fotiadis, I. Dimitrios and Pezoulas, C. Vasileios and Tachos, S. Nikolaos and Barlocco, Fausto and Mazzarotto, Francesco and Popovi{\'c}, Dejana and Maier, S. Lars and Velicki, Lazar and Olivotto, Iacopo and MacGowan, A. Guy and Jakovljevi{\'c}, G. Djordje and Filipovi{\'c}, Nenad and Bosni{\'c}, Zoran", title="Disease Progression of Hypertrophic Cardiomyopathy: Modeling Using Machine Learning", journal="JMIR Med Inform", year="2022", month="Feb", day="2", volume="10", number="2", pages="e30483", keywords="hypertrophic cardiomyopathy", keywords="disease progression", keywords="machine learning", keywords="artificial intelligence", keywords="AI", keywords="ML", keywords="cardiomyopathy", keywords="cardiovascular disease", keywords="sudden cardiac death", keywords="SCD", keywords="prediction", keywords="prediction model", keywords="validation", abstract="Background: Cardiovascular disorders in general are responsible for 30\% of deaths worldwide. Among them, hypertrophic cardiomyopathy (HCM) is a genetic cardiac disease that is present in about 1 of 500 young adults and can cause sudden cardiac death (SCD). Objective: Although the current state-of-the-art methods model the risk of SCD for patients, to the best of our knowledge, no methods are available for modeling the patient's clinical status up to 10 years ahead. In this paper, we propose a novel machine learning (ML)-based tool for predicting disease progression for patients diagnosed with HCM in terms of adverse remodeling of the heart during a 10-year period. Methods: The method consisted of 6 predictive regression models that independently predict future values of 6 clinical characteristics: left atrial size, left atrial volume, left ventricular ejection fraction, New York Heart Association functional classification, left ventricular internal diastolic diameter, and left ventricular internal systolic diameter. We supplemented each prediction with the explanation that is generated using the Shapley additive explanation method. Results: The final experiments showed that predictive error is lower on 5 of the 6 constructed models in comparison to experts (on average, by 0.34) or a consortium of experts (on average, by 0.22). The experiments revealed that semisupervised learning and the artificial data from virtual patients help improve predictive accuracies. The best-performing random forest model improved R2 from 0.3 to 0.6. Conclusions: By engaging medical experts to provide interpretation and validation of the results, we determined the models' favorable performance compared to the performance of experts for 5 of 6 targets. ", doi="10.2196/30483", url="https://medinform.jmir.org/2022/2/e30483", url="http://www.ncbi.nlm.nih.gov/pubmed/35107432" } @Article{info:doi/10.2196/29978, author="Schilling, Maximilian and Rickmann, Lennart and Hutschenreuter, Gabriele and Spreckelsen, Cord", title="Reduction of Platelet Outdating and Shortage by Forecasting Demand With Statistical Learning and Deep Neural Networks: Modeling Study", journal="JMIR Med Inform", year="2022", month="Feb", day="1", volume="10", number="2", pages="e29978", keywords="platelets", keywords="demand forecasting", keywords="time series forecasting", keywords="blood inventory management", keywords="statistical learning", keywords="deep learning", keywords="LASSO", keywords="LSTM", abstract="Background: Platelets are a valuable and perishable blood product. Managing platelet inventory is a demanding task because of short shelf lives and high variation in daily platelet use patterns. Predicting platelet demand is a promising step toward avoiding obsolescence and shortages and ensuring optimal care. Objective: The aim of this study is to forecast platelet demand for a given hospital using both a statistical model and a deep neural network. In addition, we aim to calculate the possible reduction in waste and shortage of platelets using said predictions in a retrospective simulation of the platelet inventory. Methods: Predictions of daily platelet demand were made by a least absolute shrinkage and selection operator (LASSO) model and a recurrent neural network (RNN) with long short-term memory (LSTM). Both models used the same set of 81 clinical features. Predictions were passed to a simulation of the blood inventory to calculate the possible reduction in waste and shortage as compared with historical data. Results: From January 1, 2008, to December 31, 2018, the waste and shortage rates for platelets were 10.1\% and 6.5\%, respectively. In simulations of platelet inventory, waste could be lowered to 4.9\% with the LASSO and 5\% with the RNN, whereas shortages were 2.1\% and 1.7\% with the LASSO and RNN, respectively. Daily predictions of platelet demand for the next 2 days had mean absolute percent errors of 25.5\% (95\% CI 24.6\%-26.6\%) with the LASSO and 26.3\% (95\% CI 25.3\%-27.4\%) with the LSTM (P=.01). Predictions for the next 4 days had mean absolute percent errors of 18.1\% (95\% CI 17.6\%-18.6\%) with the LASSO and 19.2\% (95\% CI 18.6\%-19.8\%) with the LSTM (P<.001). Conclusions: Both models allow for predictions of platelet demand with similar and sufficient accuracy to significantly reduce waste and shortage in a retrospective simulation study. The possible improvements in platelet inventory management are roughly equivalent to US \$250,000 per year. ", doi="10.2196/29978", url="https://medinform.jmir.org/2022/2/e29978", url="http://www.ncbi.nlm.nih.gov/pubmed/35103612" } @Article{info:doi/10.2196/29458, author="Hui, Matthew Ka Ho and Lam, Simon Hugh and Chow, Twinny Cheuk Hin and Li, Janice Yuen Shun and Leung, Tom Pok Him and Chan, Brian Long Yin and Lee, Ping Chui and Ewig, Ying Celeste Lom and Cheung, Ting Yin and Lam, Teddy Tai Ning", title="Using Electronic Health Records for Personalized Dosing of Intravenous Vancomycin in Critically Ill Neonates: Model and Web-Based Interface Development Study", journal="JMIR Med Inform", year="2022", month="Jan", day="31", volume="10", number="1", pages="e29458", keywords="digital health", keywords="web-based user interface", keywords="personalized medicine", keywords="dose individualization", keywords="therapeutic drug monitoring", keywords="Bayesian estimation", keywords="antibiotics", keywords="vancomycin", keywords="infectious disease", keywords="neonate", abstract="Background: Intravenous (IV) vancomycin is used in the treatment of severe infection in neonates. However, its efficacy is compromised by elevated risks of acute kidney injury. The risk is even higher among neonates admitted to the neonatal intensive care unit (NICU), in whom the pharmacokinetics of vancomycin vary widely. Therapeutic drug monitoring is an integral part of vancomycin treatment to balance efficacy against toxicity. It involves individual dose adjustments based on the observed serum vancomycin concentration (VCs). However, the existing trough-based approach shows poor evidence for clinical benefits. The updated clinical practice guideline recommends population pharmacokinetic (popPK) model--based approaches, targeting area under curve, preferably through the Bayesian approach. Since Bayesian methods cannot be performed manually and require specialized computer programs, there is a need to provide clinicians with a user-friendly interface to facilitate accurate personalized dosing recommendations for vancomycin in critically ill neonates. Objective: We used medical data from electronic health records (EHRs) to develop a popPK model and subsequently build a web-based interface to perform model-based individual dose optimization of IV vancomycin for NICU patients in local medical institutions. Methods: Medical data of subjects prescribed IV vancomycin in the NICUs of Prince of Wales Hospital and Queen Elizabeth Hospital in Hong Kong were extracted from EHRs, namely the Clinical Information System, In-Patient Medication Order Entry, and electronic Patient Record. Patient demographics, such as body weight and postmenstrual age (PMA), serum creatinine (SCr), vancomycin administration records, and VCs were collected. The popPK model employed a 2-compartment infusion model. Various covariate models were tested against body weight, PMA, and SCr, and were evaluated for the best goodness of fit. A previously published web-based dosing interface was adapted to develop the interface in this study. Results: The final data set included EHR data extracted from 207 subjects, with a total of 689 VCs measurements. The final model chosen explained 82\% of the variability in vancomycin clearance. All parameter estimates were within the bootstrapping CIs. Predictive plots, residual plots, and visual predictive checks demonstrated good model predictability. Model approximations showed that the model-based Bayesian approach consistently promoted a probability of target attainment (PTA) above 75\% for all subjects, while only half of the subjects could achieve a PTA over 50\% with the trough-based approach. The dosing interface was developed with the capability to optimize individual doses with the model-based empirical or Bayesian approach. Conclusions: Using EHRs, a satisfactory popPK model was verified and adopted to develop a web-based individual dose optimization interface. The interface is expected to improve treatment outcomes of IV vancomycin for severe infections among critically ill neonates. This study provides the foundation for a cohort study to demonstrate the utility of the new approach compared with previous dosing methods. ", doi="10.2196/29458", url="https://medinform.jmir.org/2022/1/e29458", url="http://www.ncbi.nlm.nih.gov/pubmed/35099393" } @Article{info:doi/10.2196/29289, author="Ritchie, B. Jordon and Frey, J. Lewis and Lamy, Jean-Baptiste and Bellcross, Cecelia and Morrison, Heath and Schiffman, D. Joshua and Welch, M. Brandon", title="Automated Clinical Practice Guideline Recommendations for Hereditary Cancer Risk Using Chatbots and Ontologies: System Description", journal="JMIR Cancer", year="2022", month="Jan", day="31", volume="8", number="1", pages="e29289", keywords="service-oriented architecture", keywords="restful API", keywords="hereditary cancer", keywords="risk assessment", keywords="clinical practice guidelines", keywords="consumer health informatics", abstract="Background: Identifying patients at risk of hereditary cancer based on their family health history is a highly nuanced task. Frequently, patients at risk are not referred for genetic counseling as providers lack the time and training to collect and assess their family health history. Consequently, patients at risk do not receive genetic counseling and testing that they need to determine the preventive steps they should take to mitigate their risk. Objective: This study aims to automate clinical practice guideline recommendations for hereditary cancer risk based on patient family health history. Methods: We combined chatbots, web application programming interfaces, clinical practice guidelines, and ontologies into a web service--oriented system that can automate family health history collection and assessment. We used Owlready2 and Prot{\'e}g{\'e} to develop a lightweight, patient-centric clinical practice guideline domain ontology using hereditary cancer criteria from the American College of Medical Genetics and Genomics and the National Cancer Comprehensive Network. Results: The domain ontology has 758 classes, 20 object properties, 23 datatype properties, and 42 individuals and encompasses 44 cancers, 144 genes, and 113 clinical practice guideline criteria. So far, it has been used to assess >5000 family health history cases. We created 192 test cases to ensure concordance with clinical practice guidelines. The average test case completes in 4.5 (SD 1.9) seconds, the longest in 19.6 seconds, and the shortest in 2.9 seconds. Conclusions: Web service--enabled, chatbot-oriented family health history collection and ontology-driven clinical practice guideline criteria risk assessment is a simple and effective method for automating hereditary cancer risk screening. ", doi="10.2196/29289", url="https://cancer.jmir.org/2022/1/e29289", url="http://www.ncbi.nlm.nih.gov/pubmed/35099392" } @Article{info:doi/10.2196/29015, author="Moon, Sungrim and Carlson, A. Luke and Moser, D. Ethan and Agnikula Kshatriya, Singh Bhavani and Smith, Y. Carin and Rocca, A. Walter and Gazzuola Rocca, Liliana and Bielinski, J. Suzette and Liu, Hongfang and Larson, B. Nicholas", title="Identifying Information Gaps in Electronic Health Records by Using Natural Language Processing: Gynecologic Surgery History Identification", journal="J Med Internet Res", year="2022", month="Jan", day="28", volume="24", number="1", pages="e29015", keywords="information gap", keywords="health information interoperability", keywords="natural language processing", keywords="electronic health records", keywords="gynecologic surgery", keywords="surgery", keywords="medical informatics", keywords="digital health", keywords="eHealth", keywords="gynecology", abstract="Background: Electronic health records (EHRs) are a rich source of longitudinal patient data. However, missing information due to clinical care that predated the implementation of EHR system(s) or care that occurred at different medical institutions impedes complete ascertainment of a patient's medical history. Objective: This study aimed to investigate information discrepancies and to quantify information gaps by comparing the gynecological surgical history extracted from an EHR of a single institution by using natural language processing (NLP) techniques with the manually curated surgical history information through chart review of records from multiple independent regional health care institutions. Methods: To facilitate high-throughput evaluation, we developed a rule-based NLP algorithm to detect gynecological surgery history from the unstructured narrative of the Mayo Clinic EHR. These results were compared to a gold standard cohort of 3870 women with gynecological surgery status adjudicated using the Rochester Epidemiology Project medical records--linkage system. We quantified and characterized the information gaps observed that led to misclassification of the surgical status. Results: The NLP algorithm achieved precision of 0.85, recall of 0.82, and F1-score of 0.83 in the test set (n=265) relative to outcomes abstracted from the Mayo EHR. This performance attenuated when directly compared to the gold standard (precision 0.79, recall 0.76, and F1-score 0.76), with the majority of misclassifications being false negatives in nature. We then applied the algorithm to the remaining patients (n=3340) and identified 2 types of information gaps through error analysis. First, 6\% (199/3340) of women in this study had no recorded surgery information or partial information in the EHR. Second, 4.3\% (144/3340) of women had inconsistent or inaccurate information within the clinical narrative owing to misinterpreted information, erroneous ``copy and paste,'' or incorrect information provided by patients. Additionally, the NLP algorithm misclassified the surgery status of 3.6\% (121/3340) of women. Conclusions: Although NLP techniques were able to adequately recreate the gynecologic surgical status from the clinical narrative, missing or inaccurately reported and recorded information resulted in much of the misclassification observed. Therefore, alternative approaches to collect or curate surgical history are needed. ", doi="10.2196/29015", url="https://www.jmir.org/2022/1/e29015", url="http://www.ncbi.nlm.nih.gov/pubmed/35089141" } @Article{info:doi/10.2196/32215, author="Gama, F{\'a}bio and Tyskbo, Daniel and Nygren, Jens and Barlow, James and Reed, Julie and Svedberg, Petra", title="Implementation Frameworks for Artificial Intelligence Translation Into Health Care Practice: Scoping Review", journal="J Med Internet Res", year="2022", month="Jan", day="27", volume="24", number="1", pages="e32215", keywords="implementation framework", keywords="artificial intelligence", keywords="scoping review", abstract="Background: Significant efforts have been made to develop artificial intelligence (AI) solutions for health care improvement. Despite the enthusiasm, health care professionals still struggle to implement AI in their daily practice. Objective: This paper aims to identify the implementation frameworks used to understand the application of AI in health care practice. Methods: A scoping review was conducted using the Cochrane, Evidence Based Medicine Reviews, Embase, MEDLINE, and PsycINFO databases to identify publications that reported frameworks, models, and theories concerning AI implementation in health care. This review focused on studies published in English and investigating AI implementation in health care since 2000. A total of 2541 unique publications were retrieved from the databases and screened on titles and abstracts by 2 independent reviewers. Selected articles were thematically analyzed against the Nilsen taxonomy of implementation frameworks, and the Greenhalgh framework for the nonadoption, abandonment, scale-up, spread, and sustainability (NASSS) of health care technologies. Results: In total, 7 articles met all eligibility criteria for inclusion in the review, and 2 articles included formal frameworks that directly addressed AI implementation, whereas the other articles provided limited descriptions of elements influencing implementation. Collectively, the 7 articles identified elements that aligned with all the NASSS domains, but no single article comprehensively considered the factors known to influence technology implementation. New domains were identified, including dependency on data input and existing processes, shared decision-making, the role of human oversight, and ethics of population impact and inequality, suggesting that existing frameworks do not fully consider the unique needs of AI implementation. Conclusions: This literature review demonstrates that understanding how to implement AI in health care practice is still in its early stages of development. Our findings suggest that further research is needed to provide the knowledge necessary to develop implementation frameworks to guide the future implementation of AI in clinical practice and highlight the opportunity to draw on existing knowledge from the field of implementation science. ", doi="10.2196/32215", url="https://www.jmir.org/2022/1/e32215", url="http://www.ncbi.nlm.nih.gov/pubmed/35084349" } @Article{info:doi/10.2196/35225, author="Kawamura, Ren and Harada, Yukinori and Sugimoto, Shu and Nagase, Yuichiro and Katsukura, Shinichi and Shimizu, Taro", title="Incidence of Diagnostic Errors Among Unexpectedly Hospitalized Patients Using an Automated Medical History--Taking System With a Differential Diagnosis Generator: Retrospective Observational Study", journal="JMIR Med Inform", year="2022", month="Jan", day="27", volume="10", number="1", pages="e35225", keywords="artificial intelligence", keywords="automated medical history--taking", keywords="diagnostic errors", keywords="outpatient", keywords="Safer Dx", abstract="Background: Automated medical history--taking systems that generate differential diagnosis lists have been suggested to contribute to improved diagnostic accuracy. However, the effect of these systems on diagnostic errors in clinical practice remains unknown. Objective: This study aimed to assess the incidence of diagnostic errors in an outpatient department, where an artificial intelligence (AI)--driven automated medical history--taking system that generates differential diagnosis lists was implemented in clinical practice. Methods: We conducted a retrospective observational study using data from a community hospital in Japan. We included patients aged 20 years and older who used an AI-driven, automated medical history--taking system that generates differential diagnosis lists in the outpatient department of internal medicine for whom the index visit was between July 1, 2019, and June 30, 2020, followed by unplanned hospitalization within 14 days. The primary endpoint was the incidence of diagnostic errors, which were detected using the Revised Safer Dx Instrument by at least two independent reviewers. To evaluate the effect of differential diagnosis lists from the AI system on the incidence of diagnostic errors, we compared the incidence of these errors between a group where the AI system generated the final diagnosis in the differential diagnosis list and a group where the AI system did not generate the final diagnosis in the list; the Fisher exact test was used for comparison between these groups. For cases with confirmed diagnostic errors, further review was conducted to identify the contributing factors of these errors via discussion among three reviewers, using the Safer Dx Process Breakdown Supplement as a reference. Results: A total of 146 patients were analyzed. A final diagnosis was confirmed for 138 patients and was observed in the differential diagnosis list from the AI system for 69 patients. Diagnostic errors occurred in 16 out of 146 patients (11.0\%, 95\% CI 6.4\%-17.2\%). Although statistically insignificant, the incidence of diagnostic errors was lower in cases where the final diagnosis was included in the differential diagnosis list from the AI system than in cases where the final diagnosis was not included in the list (7.2\% vs 15.9\%, P=.18). Conclusions: The incidence of diagnostic errors among patients in the outpatient department of internal medicine who used an automated medical history--taking system that generates differential diagnosis lists seemed to be lower than the previously reported incidence of diagnostic errors. This result suggests that the implementation of an automated medical history--taking system that generates differential diagnosis lists could be beneficial for diagnostic safety in the outpatient department of internal medicine. ", doi="10.2196/35225", url="https://medinform.jmir.org/2022/1/e35225", url="http://www.ncbi.nlm.nih.gov/pubmed/35084347" } @Article{info:doi/10.2196/28934, author="Liu, Yun-Chung and Cheng, Hao-Yuan and Chang, Tu-Hsuan and Ho, Te-Wei and Liu, Ting-Chi and Yen, Ting-Yu and Chou, Chia-Ching and Chang, Luan-Yin and Lai, Feipei", title="Evaluation of the Need for Intensive Care in Children With Pneumonia: Machine Learning Approach", journal="JMIR Med Inform", year="2022", month="Jan", day="27", volume="10", number="1", pages="e28934", keywords="child pneumonia", keywords="intensive care", keywords="machine learning", keywords="decision making", keywords="clinical index", abstract="Background: Timely decision-making regarding intensive care unit (ICU) admission for children with pneumonia is crucial for a better prognosis. Despite attempts to establish a guideline or triage system for evaluating ICU care needs, no clinically applicable paradigm is available. Objective: The aim of this study was to develop machine learning (ML) algorithms to predict ICU care needs for pediatric pneumonia patients within 24 hours of admission, evaluate their performance, and identify clinical indices for making decisions for pediatric pneumonia patients. Methods: Pneumonia patients admitted to National Taiwan University Hospital from January 2010 to December 2019 aged under 18 years were enrolled. Their underlying diseases, clinical manifestations, and laboratory data at admission were collected. The outcome of interest was ICU transfer within 24 hours of hospitalization. We compared clinically relevant features between early ICU transfer patients and patients without ICU care. ML algorithms were developed to predict ICU admission. The performance of the algorithms was evaluated using sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and average precision. The relative feature importance of the best-performing algorithm was compared with physician-rated feature importance for explainability. Results: A total of 8464 pediatric hospitalizations due to pneumonia were recorded, and 1166 (1166/8464, 13.8\%) hospitalized patients were transferred to the ICU within 24 hours. Early ICU transfer patients were younger (P<.001), had higher rates of underlying diseases (eg, cardiovascular, neuropsychological, and congenital anomaly/genetic disorders; P<.001), had abnormal laboratory data, had higher pulse rates (P<.001), had higher breath rates (P<.001), had lower oxygen saturation (P<.001), and had lower peak body temperature (P<.001) at admission than patients without ICU transfer. The random forest (RF) algorithm achieved the best performance (sensitivity 0.94, 95\% CI 0.92-0.95; specificity 0.94, 95\% CI 0.92-0.95; AUC 0.99, 95\% CI 0.98-0.99; and average precision 0.93, 95\% CI 0.90-0.96). The lowest systolic blood pressure and presence of cardiovascular and neuropsychological diseases ranked in the top 10 in both RF relative feature importance and clinician judgment. Conclusions: The ML approach could provide a clinically applicable triage algorithm and identify important clinical indices, such as age, underlying diseases, abnormal vital signs, and laboratory data for evaluating the need for intensive care in children with pneumonia. ", doi="10.2196/28934", url="https://medinform.jmir.org/2022/1/e28934", url="http://www.ncbi.nlm.nih.gov/pubmed/35084358" } @Article{info:doi/10.2196/28916, author="Buck, Christoph and Doctor, Eileen and Hennrich, Jasmin and J{\"o}hnk, Jan and Eymann, Torsten", title="General Practitioners' Attitudes Toward Artificial Intelligence--Enabled Systems: Interview Study", journal="J Med Internet Res", year="2022", month="Jan", day="27", volume="24", number="1", pages="e28916", keywords="artificial intelligence", keywords="AI", keywords="attitude", keywords="primary care", keywords="general practitioner", keywords="GP", keywords="qualitative interview", keywords="diagnosis", keywords="clinical decision support system", abstract="Background: General practitioners (GPs) care for a large number of patients with various diseases in very short timeframes under high uncertainty. Thus, systems enabled by artificial intelligence (AI) are promising and time-saving solutions that may increase the quality of care. Objective: This study aims to understand GPs' attitudes toward AI-enabled systems in medical diagnosis. Methods: We interviewed 18 GPs from Germany between March 2020 and May 2020 to identify determinants of GPs' attitudes toward AI-based systems in diagnosis. By analyzing the interview transcripts, we identified 307 open codes, which we then further structured to derive relevant attitude determinants. Results: We merged the open codes into 21 concepts and finally into five categories: concerns, expectations, environmental influences, individual characteristics, and minimum requirements of AI-enabled systems. Concerns included all doubts and fears of the participants regarding AI-enabled systems. Expectations reflected GPs' thoughts and beliefs about expected benefits and limitations of AI-enabled systems in terms of GP care. Environmental influences included influences resulting from an evolving working environment, key stakeholders' perspectives and opinions, the available information technology hardware and software resources, and the media environment. Individual characteristics were determinants that describe a physician as a person, including character traits, demographic characteristics, and knowledge. In addition, the interviews also revealed the minimum requirements of AI-enabled systems, which were preconditions that must be met for GPs to contemplate using AI-enabled systems. Moreover, we identified relationships among these categories, which we conflate in our proposed model. Conclusions: This study provides a thorough understanding of the perspective of future users of AI-enabled systems in primary care and lays the foundation for successful market penetration. We contribute to the research stream of analyzing and designing AI-enabled systems and the literature on attitudes toward technology and practice by fostering the understanding of GPs and their attitudes toward such systems. Our findings provide relevant information to technology developers, policymakers, and stakeholder institutions of GP care. ", doi="10.2196/28916", url="https://www.jmir.org/2022/1/e28916", url="http://www.ncbi.nlm.nih.gov/pubmed/35084342" } @Article{info:doi/10.2196/31356, author="Triep, Karen and Leichtle, Benedikt Alexander and Meister, Martin and Fiedler, Martin Georg and Endrich, Olga", title="Real-world Health Data and Precision for the Diagnosis of Acute Kidney Injury, Acute-on-Chronic Kidney Disease, and Chronic Kidney Disease: Observational Study", journal="JMIR Med Inform", year="2022", month="Jan", day="25", volume="10", number="1", pages="e31356", keywords="acute kidney injury", keywords="chronic kidney disease", keywords="acute-on-chronic", keywords="real-world health data", keywords="clinical decision support", keywords="KDIGO", keywords="ICD coding", abstract="Background: The criteria for the diagnosis of kidney disease outlined in the Kidney Disease: Improving Global Outcomes guidelines are based on a patient's current, historical, and baseline data. The diagnosis of acute kidney injury, chronic kidney disease, and acute-on-chronic kidney disease requires previous measurements of creatinine, back-calculation, and the interpretation of several laboratory values over a certain period. Diagnoses may be hindered by unclear definitions of the individual creatinine baseline and rough ranges of normal values that are set without adjusting for age, ethnicity, comorbidities, and treatment. The classification of correct diagnoses and sufficient staging improves coding, data quality, reimbursement, the choice of therapeutic approach, and a patient's outcome. Objective: In this study, we aim to apply a data-driven approach to assign diagnoses of acute, chronic, and acute-on-chronic kidney diseases with the help of a complex rule engine. Methods: Real-time and retrospective data from the hospital's clinical data warehouse of inpatient and outpatient cases treated between 2014 and 2019 were used. Delta serum creatinine, baseline values, and admission and discharge data were analyzed. A Kidney Disease: Improving Global Outcomes--based SQL algorithm applied specific diagnosis-based International Classification of Diseases (ICD) codes to inpatient stays. Text mining on discharge documentation was also conducted to measure the effects on diagnosis. Results: We show that this approach yielded an increased number of diagnoses (4491 cases in 2014 vs 11,124 cases of ICD-coded kidney disease and injury in 2019) and higher precision in documentation and coding. The percentage of unspecific ICD N19-coded diagnoses of N19 codes generated dropped from 19.71\% (1544/7833) in 2016 to 4.38\% (416/9501) in 2019. The percentage of specific ICD N18-coded diagnoses of N19 codes generated increased from 50.1\% (3924/7833) in 2016 to 62.04\% (5894/9501) in 2019. Conclusions: Our data-driven method supports the process and reliability of diagnosis and staging and improves the quality of documentation and data. Measuring patient outcomes will be the next step in this project. ", doi="10.2196/31356", url="https://medinform.jmir.org/2022/1/e31356", url="http://www.ncbi.nlm.nih.gov/pubmed/35076410" } @Article{info:doi/10.2196/28036, author="Yu, Jia-Ruei and Chen, Chun-Hsien and Huang, Tsung-Wei and Lu, Jang-Jih and Chung, Chia-Ru and Lin, Ting-Wei and Wu, Min-Hsien and Tseng, Yi-Ju and Wang, Hsin-Yao", title="Energy Efficiency of Inference Algorithms for Clinical Laboratory Data Sets: Green Artificial Intelligence Study", journal="J Med Internet Res", year="2022", month="Jan", day="25", volume="24", number="1", pages="e28036", keywords="medical informatics", keywords="machine learning", keywords="algorithms", keywords="energy consumption", keywords="artificial intelligence", keywords="energy efficient", keywords="medical domain", keywords="medical data sets", keywords="informatics", abstract="Background: The use of artificial intelligence (AI) in the medical domain has attracted considerable research interest. Inference applications in the medical domain require energy-efficient AI models. In contrast to other types of data in visual AI, data from medical laboratories usually comprise features with strong signals. Numerous energy optimization techniques have been developed to relieve the burden on the hardware required to deploy a complex learning model. However, the energy efficiency levels of different AI models used for medical applications have not been studied. Objective: The aim of this study was to explore and compare the energy efficiency levels of commonly used machine learning algorithms---logistic regression (LR), k-nearest neighbor, support vector machine, random forest (RF), and extreme gradient boosting (XGB) algorithms, as well as four different variants of neural network (NN) algorithms---when applied to clinical laboratory datasets. Methods: We applied the aforementioned algorithms to two distinct clinical laboratory data sets: a mass spectrometry data set regarding Staphylococcus aureus for predicting methicillin resistance (3338 cases; 268 features) and a urinalysis data set for predicting Trichomonas vaginalis infection (839,164 cases; 9 features). We compared the performance of the nine inference algorithms in terms of accuracy, area under the receiver operating characteristic curve (AUROC), time consumption, and power consumption. The time and power consumption levels were determined using performance counter data from Intel Power Gadget 3.5. Results: The experimental results indicated that the RF and XGB algorithms achieved the two highest AUROC values for both data sets (84.7\% and 83.9\%, respectively, for the mass spectrometry data set; 91.1\% and 91.4\%, respectively, for the urinalysis data set). The XGB and LR algorithms exhibited the shortest inference time for both data sets (0.47 milliseconds for both in the mass spectrometry data set; 0.39 and 0.47 milliseconds, respectively, for the urinalysis data set). Compared with the RF algorithm, the XGB and LR algorithms exhibited a 45\% and 53\%-60\% reduction in inference time for the mass spectrometry and urinalysis data sets, respectively. In terms of energy efficiency, the XGB algorithm exhibited the lowest power consumption for the mass spectrometry data set (9.42 Watts) and the LR algorithm exhibited the lowest power consumption for the urinalysis data set (9.98 Watts). Compared with a five-hidden-layer NN, the XGB and LR algorithms achieved 16\%-24\% and 9\%-13\% lower power consumption levels for the mass spectrometry and urinalysis data sets, respectively. In all experiments, the XGB algorithm exhibited the best performance in terms of accuracy, run time, and energy efficiency. Conclusions: The XGB algorithm achieved balanced performance levels in terms of AUROC, run time, and energy efficiency for the two clinical laboratory data sets. Considering the energy constraints in real-world scenarios, the XGB algorithm is ideal for medical AI applications. ", doi="10.2196/28036", url="https://www.jmir.org/2022/1/e28036", url="http://www.ncbi.nlm.nih.gov/pubmed/35076405" } @Article{info:doi/10.2196/28366, author="Yamanaka, Syunsuke and Goto, Tadahiro and Morikawa, Koji and Watase, Hiroko and Okamoto, Hiroshi and Hagiwara, Yusuke and Hasegawa, Kohei", title="Machine Learning Approaches for Predicting Difficult Airway and First-Pass Success in the Emergency Department: Multicenter Prospective Observational Study", journal="Interact J Med Res", year="2022", month="Jan", day="25", volume="11", number="1", pages="e28366", keywords="intubation", keywords="machine learning", keywords="difficult airway", keywords="first-pass success", abstract="Background: There is still room for improvement in the modified LEMON (look, evaluate, Mallampati, obstruction, neck mobility) criteria for difficult airway prediction and no prediction tool for first-pass success in the emergency department (ED). Objective: We applied modern machine learning approaches to predict difficult airways and first-pass success. Methods: In a multicenter prospective study that enrolled consecutive patients who underwent tracheal intubation in 13 EDs, we developed 7 machine learning models (eg, random forest model) using routinely collected data (eg, demographics, initial airway assessment). The outcomes were difficult airway and first-pass success. Model performance was evaluated using c-statistics, calibration slopes, and association measures (eg, sensitivity) in the test set (randomly selected 20\% of the data). Their performance was compared with the modified LEMON criteria for difficult airway success and a logistic regression model for first-pass success. Results: Of 10,741 patients who underwent intubation, 543 patients (5.1\%) had a difficult airway, and 7690 patients (71.6\%) had first-pass success. In predicting a difficult airway, machine learning models---except for k-point nearest neighbor and multilayer perceptron---had higher discrimination ability than the modified LEMON criteria (all, P?.001). For example, the ensemble method had the highest c-statistic (0.74 vs 0.62 with the modified LEMON criteria;?P<.001). Machine learning models---except k-point nearest neighbor and random forest models---had higher discrimination ability for first-pass success. In particular, the ensemble model had the highest c-statistic (0.81 vs 0.76 with the reference regression;?P<.001). Conclusions: Machine learning models demonstrated greater ability for predicting difficult airway and first-pass success in the ED. ", doi="10.2196/28366", url="https://www.i-jmr.org/2022/1/e28366", url="http://www.ncbi.nlm.nih.gov/pubmed/35076398" } @Article{info:doi/10.2196/26555, author="Svendsen, Jagd Malene and Sandal, Fleng Louise and Kj{\ae}r, Per and Nicholl, I. Barbara and Cooper, Kay and Mair, Frances and Hartvigsen, Jan and Stochkendahl, Jensen Mette and S{\o}gaard, Karen and Mork, Jarle Paul and Rasmussen, Charlotte", title="Using Intervention Mapping to Develop a Decision Support System--Based Smartphone App (selfBACK) to Support Self-management of Nonspecific Low Back Pain: Development and Usability Study", journal="J Med Internet Res", year="2022", month="Jan", day="24", volume="24", number="1", pages="e26555", keywords="intervention mapping", keywords="behavior change", keywords="low back pain", keywords="self-management", keywords="mHealth", keywords="app-based intervention", keywords="decision support system", keywords="digital health intervention", keywords="mobile phone", abstract="Background: International guidelines consistently endorse the promotion of self-management for people with low back pain (LBP); however, implementation of these guidelines remains a challenge. Digital health interventions, such as those that can be provided by smartphone apps, have been proposed as a promising mode of supporting self-management in people with chronic conditions, including LBP. However, the evidence base for digital health interventions to support self-management of LBP is weak, and detailed descriptions and documentation of the interventions are lacking. Structured intervention mapping (IM) constitutes a 6-step process that can be used to guide the development of complex interventions. Objective: The aim of this paper is to describe the IM process for designing and creating an app-based intervention designed to support self-management of nonspecific LBP to reduce pain-related disability. Methods: The first 5 steps of the IM process were systematically applied. The core processes included literature reviews, brainstorming and group discussions, and the inclusion of stakeholders and representatives from the target population. Over a period of >2 years, the intervention content and the technical features of delivery were created, tested, and revised through user tests, feasibility studies, and a pilot study. Results: A behavioral outcome was identified as a proxy for reaching the overall program goal, that is, increased use of evidence-based self-management strategies. Physical exercises, education, and physical activity were the main components of the self-management intervention and were designed and produced to be delivered via a smartphone app. All intervention content was theoretically underpinned by the behavior change theory and the normalization process theory. Conclusions: We describe a detailed example of the application of the IM approach for the development of a theory-driven, complex, and digital intervention designed to support self-management of LBP. This description provides transparency in the developmental process of the intervention and can be a possible blueprint for designing and creating future digital health interventions for self-management. ", doi="10.2196/26555", url="https://www.jmir.org/2022/1/e26555", url="http://www.ncbi.nlm.nih.gov/pubmed/35072645" } @Article{info:doi/10.2196/34333, author="Parra, Federico and Benezeth, Yannick and Yang, Fan", title="Automatic Assessment of Emotion Dysregulation in American, French, and Tunisian Adults and New Developments in Deep Multimodal Fusion: Cross-sectional Study", journal="JMIR Ment Health", year="2022", month="Jan", day="24", volume="9", number="1", pages="e34333", keywords="emotion dysregulation", keywords="deep multimodal fusion", keywords="small data", keywords="psychometrics", abstract="Background: Emotion dysregulation is a key dimension of adult psychological functioning. There is an interest in developing a computer-based, multimodal, and automatic measure. Objective: We wanted to train a deep multimodal fusion model to estimate emotion dysregulation in adults based on their responses to the Multimodal Developmental Profile, a computer-based psychometric test, using only a small training sample and without transfer learning. Methods: Two hundred and forty-eight participants from 3 different countries took the Multimodal Developmental Profile test, which exposed them to 14 picture and music stimuli and asked them to express their feelings about them, while the software extracted the following features from the video and audio signals: facial expressions, linguistic and paralinguistic characteristics of speech, head movements, gaze direction, and heart rate variability derivatives. Participants also responded to the brief version of the Difficulties in Emotional Regulation Scale. We separated and averaged the feature signals that corresponded to the responses to each stimulus, building a structured data set. We transformed each person's per-stimulus structured data into a multimodal codex, a grayscale image created by projecting each feature's normalized intensity value onto a cartesian space, deriving each pixel's position by applying the Uniform Manifold Approximation and Projection method. The codex sequence was then fed to 2 network types. First, 13 convolutional neural networks dealt with the spatial aspect of the problem, estimating emotion dysregulation by analyzing each of the codified responses. These convolutional estimations were then fed to a transformer network that decoded the temporal aspect of the problem, estimating emotional dysregulation based on the succession of responses. We introduce a Feature Map Average Pooling layer, which computes the mean of the convolved feature maps produced by our convolution layers, dramatically reducing the number of learnable weights and increasing regularization through an ensembling effect. We implemented 8-fold cross-validation to provide a good enough estimation of the generalization ability to unseen samples. Most of the experiments mentioned in this paper are easily replicable using the associated Google Colab system. Results: We found an average Pearson correlation (r) of 0.55 (with an average P value of <.001) between ground truth emotion dysregulation and our system's estimation of emotion dysregulation. An average mean absolute error of 0.16 and a mean concordance correlation coefficient of 0.54 were also found. Conclusions: In psychometry, our results represent excellent evidence of convergence validity, suggesting that the Multimodal Developmental Profile could be used in conjunction with this methodology to provide a valid measure of emotion dysregulation in adults. Future studies should replicate our findings using a hold-out test sample. Our methodology could be implemented more generally to train deep neural networks where only small training samples are available. ", doi="10.2196/34333", url="https://mental.jmir.org/2022/1/e34333", url="http://www.ncbi.nlm.nih.gov/pubmed/35072643" } @Article{info:doi/10.2196/31549, author="He, Fang and Page, H. John and Weinberg, R. Kerry and Mishra, Anirban", title="The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study", journal="J Med Internet Res", year="2022", month="Jan", day="21", volume="24", number="1", pages="e31549", keywords="COVID-19", keywords="predictive algorithm", keywords="prognostic model", keywords="machine learning", abstract="Background: The current COVID-19 pandemic is unprecedented; under resource-constrained settings, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients; however, there are only few risk scores derived from a substantially large electronic health record (EHR) data set, using simplified predictors as input. Objective: The objectives of this study were to develop and validate simplified machine learning algorithms that predict COVID-19 adverse outcomes; to evaluate the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration of the algorithms; and to derive clinically meaningful thresholds. Methods: We performed machine learning model development and validation via a cohort study using multicenter, patient-level, longitudinal EHRs from the Optum COVID-19 database that provides anonymized, longitudinal EHR from across the United States. The models were developed based on clinical characteristics to predict 28-day in-hospital mortality, intensive care unit (ICU) admission, respiratory failure, and mechanical ventilator usages at inpatient setting. Data from patients who were admitted from February 1, 2020, to September 7, 2020, were randomly sampled into development, validation, and test data sets; data collected from September 7, 2020, to November 15, 2020, were reserved as the postdevelopment prospective test data set. Results: Of the 3.7 million patients in the analysis, 585,867 patients were diagnosed or tested positive for SARS-CoV-2, and 50,703 adult patients were hospitalized with COVID-19 between February 1 and November 15, 2020. Among the study cohort (n=50,703), there were 6204 deaths, 9564 ICU admissions, 6478 mechanically ventilated or EMCO patients, and 25,169 patients developed acute respiratory distress syndrome or respiratory failure within 28 days since hospital admission. The algorithms demonstrated high accuracy (AUC 0.89, 95\% CI 0.89-0.89 on the test data set [n=10,752]), consistent prediction through the second wave of the pandemic from September to November (AUC 0.85, 95\% CI 0.85-0.86) on the postdevelopment prospective test data set [n=14,863], great clinical relevance, and utility. Besides, a comprehensive set of 386 input covariates from baseline or at admission were included in the analysis; the end-to-end pipeline automates feature selection and model development. The parsimonious model with only 10 input predictors produced comparably accurate predictions; these 10 predictors (age, blood urea nitrogen, SpO2, systolic and diastolic blood pressures, respiration rate, pulse, temperature, albumin, and major cognitive disorder excluding stroke) are commonly measured and concordant with recognized risk factors for COVID-19. Conclusions: The systematic approach and rigorous validation demonstrate consistent model performance to predict even beyond the period of data collection, with satisfactory discriminatory power and great clinical utility. Overall, the study offers an accurate, validated, and reliable prediction model based on only 10 clinical features as a prognostic tool to stratifying patients with COVID-19 into intermediate-, high-, and very high-risk groups. This simple predictive tool is shared with a wider health care community, to enable service as an early warning system to alert physicians of possible high-risk patients, or as a resource triaging tool to optimize health care resources. ", doi="10.2196/31549", url="https://www.jmir.org/2022/1/e31549", url="http://www.ncbi.nlm.nih.gov/pubmed/34951865" } @Article{info:doi/10.2196/33518, author="Willis, C. Van and Thomas Craig, Jean Kelly and Jabbarpour, Yalda and Scheufele, L. Elisabeth and Arriaga, E. Yull and Ajinkya, Monica and Rhee, B. Kyu and Bazemore, Andrew", title="Digital Health Interventions to Enhance Prevention in Primary Care: Scoping Review", journal="JMIR Med Inform", year="2022", month="Jan", day="21", volume="10", number="1", pages="e33518", keywords="digital technology", keywords="primary health care", keywords="preventive medicine", keywords="telemedicine", keywords="clinical decision support systems", abstract="Background: Disease prevention is a central aspect of primary care practice and is comprised of primary (eg, vaccinations), secondary (eg, screenings), tertiary (eg, chronic condition monitoring), and quaternary (eg, prevention of overmedicalization) levels. Despite rapid digital transformation of primary care practices, digital health interventions (DHIs) in preventive care have yet to be systematically evaluated. Objective: This review aimed to identify and describe the scope and use of current DHIs for preventive care in primary care settings. Methods: A scoping review to identify literature published from 2014 to 2020 was conducted across multiple databases using keywords and Medical Subject Headings terms covering primary care professionals, prevention and care management, and digital health. A subgroup analysis identified relevant studies conducted in US primary care settings, excluding DHIs that use the electronic health record (EHR) as a retrospective data capture tool. Technology descriptions, outcomes (eg, health care performance and implementation science), and study quality as per Oxford levels of evidence were abstracted. Results: The search yielded 5274 citations, of which 1060 full-text articles were identified. Following a subgroup analysis, 241 articles met the inclusion criteria. Studies primarily examined DHIs among health information technologies, including EHRs (166/241, 68.9\%), clinical decision support (88/241, 36.5\%), telehealth (88/241, 36.5\%), and multiple technologies (154/241, 63.9\%). DHIs were predominantly used for tertiary prevention (131/241, 54.4\%). Of the core primary care functions, comprehensiveness was addressed most frequently (213/241, 88.4\%). DHI users were providers (205/241, 85.1\%), patients (111/241, 46.1\%), or multiple types (89/241, 36.9\%). Reported outcomes were primarily clinical (179/241, 70.1\%), and statistically significant improvements were common (192/241, 79.7\%). Results were summarized across the following 5 topics for the most novel/distinct DHIs: population-centered, patient-centered, care access expansion, panel-centered (dashboarding), and application-driven DHIs. The quality of the included studies was moderate to low. Conclusions: Preventive DHIs in primary care settings demonstrated meaningful improvements in both clinical and nonclinical outcomes, and across user types; however, adoption and implementation in the US were limited primarily to EHR platforms, and users were mainly clinicians receiving alerts regarding care management for their patients. Evaluations of negative results, effects on health disparities, and many other gaps remain to be explored. ", doi="10.2196/33518", url="https://medinform.jmir.org/2022/1/e33518", url="http://www.ncbi.nlm.nih.gov/pubmed/35060909" } @Article{info:doi/10.2196/17278, author="Elangovan, Deepa and Long, Soon Chiau and Bakrin, Safina Faizah and Tan, Siang Ching and Goh, Wen Khang and Yeoh, Fei Siang and Loy, Jun Mei and Hussain, Zahid and Lee, Seng Kah and Idris, Che Azam and Ming, Chiau Long", title="The Use of Blockchain Technology in the Health Care Sector: Systematic Review", journal="JMIR Med Inform", year="2022", month="Jan", day="20", volume="10", number="1", pages="e17278", keywords="blockchain", keywords="health care", keywords="hospital information system", keywords="data integrity", keywords="access control", keywords="data logging", keywords="health informatics", abstract="Background: Blockchain technology is a part of Industry 4.0's new Internet of Things applications: decentralized systems, distributed ledgers, and immutable and cryptographically secure technology. This technology entails a series of transaction lists with identical copies shared and retained by different groups or parties. One field where blockchain technology has tremendous potential is health care, due to the more patient-centric approach to the health care system as well as blockchain's ability to connect disparate systems and increase the accuracy of electronic health records. Objective: The aim of this study was to systematically review studies on the use of blockchain technology in health care and to analyze the characteristics of the studies that have implemented blockchain technology. Methods: This study used a systematic review methodology to find literature related to the implementation aspect of blockchain technology in health care. Relevant papers were searched for using PubMed, SpringerLink, IEEE Xplore, Embase, Scopus, and EBSCOhost. A quality assessment of literature was performed on the 22 selected papers by assessing their trustworthiness and relevance. Results: After full screening, 22 papers were included. A table of evidence was constructed, and the results of the selected papers were interpreted. The results of scoring for measuring the quality of the publications were obtained and interpreted. Out of 22 papers, a total of 3 (14\%) high-quality papers, 9 (41\%) moderate-quality papers, and 10 (45\%) low-quality papers were identified. Conclusions: Blockchain technology was found to be useful in real health care environments, including for the management of electronic medical records, biomedical research and education, remote patient monitoring, pharmaceutical supply chains, health insurance claims, health data analytics, and other potential areas. The main reasons for the implementation of blockchain technology in the health care sector were identified as data integrity, access control, data logging, data versioning, and nonrepudiation. The findings could help the scientific community to understand the implementation aspect of blockchain technology. The results from this study help in recognizing the accessibility and use of blockchain technology in the health care sector. ", doi="10.2196/17278", url="https://medinform.jmir.org/2022/1/e17278", url="http://www.ncbi.nlm.nih.gov/pubmed/35049516" } @Article{info:doi/10.2196/28842, author="Kumar, Sajit and Nanelia, Alicia and Mariappan, Ragunathan and Rajagopal, Adithya and Rajan, Vaibhav", title="Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study", journal="JMIR Med Inform", year="2022", month="Jan", day="20", volume="10", number="1", pages="e28842", keywords="representation learning", keywords="deep collective matrix factorization", keywords="electronic medical records", keywords="knowledge graphs", keywords="multiview learning", keywords="graph embeddings", keywords="clinical decision support", abstract="Background: Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network--based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals. Objective: This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks. Methods: Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set. Results: Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. Conclusions: Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations. ", doi="10.2196/28842", url="https://medinform.jmir.org/2022/1/e28842", url="http://www.ncbi.nlm.nih.gov/pubmed/35049514" } @Article{info:doi/10.2196/28659, author="Hwang, Jeonghwan and Lee, Taeheon and Lee, Honggu and Byun, Seonjeong", title="A Clinical Decision Support System for Sleep Staging Tasks With Explanations From Artificial Intelligence: User-Centered Design and Evaluation Study", journal="J Med Internet Res", year="2022", month="Jan", day="19", volume="24", number="1", pages="e28659", keywords="sleep staging", keywords="clinical decision support", keywords="user-centered design", keywords="medical artificial intelligence", abstract="Background: Despite the unprecedented performance of deep learning algorithms in clinical domains, full reviews of algorithmic predictions by human experts remain mandatory. Under these circumstances, artificial intelligence (AI) models are primarily designed as clinical decision support systems (CDSSs). However, from the perspective of clinical practitioners, the lack of clinical interpretability and user-centered interfaces hinders the adoption of these AI systems in practice. Objective: This study aims to develop an AI-based CDSS for assisting polysomnographic technicians in reviewing AI-predicted sleep staging results. This study proposed and evaluated a CDSS that provides clinically sound explanations for AI predictions in a user-centered manner. Methods: Our study is based on a user-centered design framework for developing explanations in a CDSS that identifies why explanations are needed, what information should be contained in explanations, and how explanations can be provided in the CDSS. We conducted user interviews, user observation sessions, and an iterative design process to identify three key aspects for designing explanations in the CDSS. After constructing the CDSS, the tool was evaluated to investigate how the CDSS explanations helped technicians. We measured the accuracy of sleep staging and interrater reliability with macro-F1 and Cohen $\kappa$ scores to assess quantitative improvements after our tool was adopted. We assessed qualitative improvements through participant interviews that established how participants perceived and used the tool. Results: The user study revealed that technicians desire explanations that are relevant to key electroencephalogram (EEG) patterns for sleep staging when assessing the correctness of AI predictions. Here, technicians wanted explanations that could be used to evaluate whether the AI models properly locate and use these patterns during prediction. On the basis of this, information that is closely related to sleep EEG patterns was formulated for the AI models. In the iterative design phase, we developed a different visualization strategy for each pattern based on how technicians interpreted the EEG recordings with these patterns during their workflows. Our evaluation study on 9 polysomnographic technicians quantitatively and qualitatively investigated the helpfulness of the tool. For technicians with <5 years of work experience, their quantitative sleep staging performance improved significantly from 56.75 to 60.59 with a P value of .05. Qualitatively, participants reported that the information provided effectively supported them, and they could develop notable adoption strategies for the tool. Conclusions: Our findings indicate that formulating clinical explanations for automated predictions using the information in the AI with a user-centered design process is an effective strategy for developing a CDSS for sleep staging. ", doi="10.2196/28659", url="https://www.jmir.org/2022/1/e28659", url="http://www.ncbi.nlm.nih.gov/pubmed/35044311" } @Article{info:doi/10.2196/34573, author="Petsani, Despoina and Ahmed, Sara and Petronikolou, Vasileia and Kehayia, Eva and Alastalo, Mika and Santonen, Teemu and Merino-Barbancho, Beatriz and Cea, Gloria and Segkouli, Sofia and Stavropoulos, G. Thanos and Billis, Antonis and Doumas, Michael and Almeida, Rosa and Nagy, Enik? and Broeckx, Leen and Bamidis, Panagiotis and Konstantinidis, Evdokimos", title="Digital Biomarkers for Supporting Transitional Care Decisions: Protocol for a Transnational Feasibility Study", journal="JMIR Res Protoc", year="2022", month="Jan", day="19", volume="11", number="1", pages="e34573", keywords="Living Lab", keywords="cocreation", keywords="transitional care", keywords="technology", keywords="feasibility study", abstract="Background: Virtual Health and Wellbeing Living Lab Infrastructure is a Horizon 2020 project that aims to harmonize Living Lab procedures and facilitate access to European health and well-being research infrastructures. In this context, this study presents a joint research activity that will be conducted within Virtual Health and Wellbeing Living Lab Infrastructure in the transitional care domain to test and validate the harmonized Living Lab procedures and infrastructures. The collection of data from various sources (information and communications technology and clinical and patient-reported outcome measures) demonstrated the capacity to assess risk and support decisions during care transitions, but there is no harmonized way of combining this information. Objective: This study primarily aims to evaluate the feasibility and benefit of collecting multichannel data across Living Labs on the topic of transitional care and to harmonize data processes and collection. In addition, the authors aim to investigate the collection and use of digital biomarkers and explore initial patterns in the data that demonstrate the potential to predict transition outcomes, such as readmissions and adverse events. Methods: The current research protocol presents a multicenter, prospective, observational cohort study that will consist of three phases, running consecutively in multiple sites: a cocreation phase, a testing and simulation phase, and a transnational pilot phase. The cocreation phase aims to build a common understanding among different sites, investigate the differences in hospitalization discharge management among countries, and the willingness of different stakeholders to use technological solutions in the transitional care process. The testing and simulation phase aims to explore ways of integrating observation of a patient's clinical condition, patient involvement, and discharge education in transitional care. The objective of the simulation phase is to evaluate the feasibility and the barriers faced by health care professionals in assessing transition readiness. Results: The cocreation phase will be completed by April 2022. The testing and simulation phase will begin in September 2022 and will partially overlap with the deployment of the transnational pilot phase that will start in the same month. The data collection of the transnational pilots will be finalized by the end of June 2023. Data processing is expected to be completed by March 2024. The results will consist of guidelines and implementation pathways for large-scale studies and an analysis for identifying initial patterns in the acquired data. Conclusions: The knowledge acquired through this research will lead to harmonized procedures and data collection for Living Labs that support transitions in care. International Registered Report Identifier (IRRID): PRR1-10.2196/34573 ", doi="10.2196/34573", url="https://www.researchprotocols.org/2022/1/e34573", url="http://www.ncbi.nlm.nih.gov/pubmed/35044303" } @Article{info:doi/10.2196/27434, author="Ahne, Adrian and Fagherazzi, Guy and Tannier, Xavier and Czernichow, Thomas and Orchard, Francisco", title="Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study", journal="J Med Internet Res", year="2022", month="Jan", day="18", volume="24", number="1", pages="e27434", keywords="evidence-based medicine", keywords="clinical decision making", keywords="clinical decision support", keywords="digital health", keywords="medical informatics", keywords="transparency", keywords="hierarchical clustering", keywords="active learning", keywords="classification", keywords="memory consumption", keywords="natural language processing", abstract="Background: The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty. Objective: We developed a methodology able to explore and target topics of interest via an interactive user interface for health professionals with limited computer science knowledge. We aim to reach near state-of-the-art performance while reducing memory consumption, increasing scalability, and minimizing user interaction effort to improve the clinical decision-making process. The performance was evaluated on diabetes-related abstracts from PubMed. Methods: The methodology consists of 4 parts: (1) a novel interpretable hierarchical clustering of documents where each node is defined by headwords (words that best represent the documents in the node), (2) an efficient classification system to target topics, (3) minimized user interaction effort through active learning, and (4) a visual user interface. We evaluated our approach on 50,911 diabetes-related abstracts providing a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against 3 other strategies: random selection of training instances, uncertainty sampling that chooses instances about which the model is most uncertain, and an expected gradient length strategy based on convolutional neural networks (CNNs). Results: For the hierarchical clustering performance, we achieved an F1 score of 0.73 compared to 0.76 achieved by scikit-learn. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1 score of all MeSH codes resulted in a satisfying 0.62 F1 score using our approach, 0.61 using the uncertainty strategy, 0.63 using the CNN, and 0.45 using the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents. Conclusions: We proposed an easy-to-use tool for health professionals with limited computer science knowledge who combine their domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore, our approach is memory efficient and highly parallelizable, making it interesting for large Big Data sets. This approach can be used by health professionals to gain deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process. ", doi="10.2196/27434", url="https://www.jmir.org/2022/1/e27434", url="http://www.ncbi.nlm.nih.gov/pubmed/35040795" } @Article{info:doi/10.2196/33470, author="Larsen, Kevin and Akindele, Bilikis and Head, Henry and Evans, Rick and Mehta, Purvi and Hlatky, Quinn and Krause, Brendan and Chen, Sydney and King, Dominic", title="Developing a User-Centered Digital Clinical Decision Support App for Evidence-Based Medication Recommendations for Type 2 Diabetes Mellitus: Prototype User Testing and Validation Study", journal="JMIR Hum Factors", year="2022", month="Jan", day="18", volume="9", number="1", pages="e33470", keywords="clinical decision support", keywords="user-centered design", keywords="user testing", keywords="type 2 diabetes mellitus", keywords="evidence-based guidelines", keywords="validation", keywords="workflows", keywords="electronic health record", keywords="decision support", keywords="design", keywords="diabetes", abstract="Background: Closing the gap between care recommended by evidence-based guidelines and care delivered in practice is an ongoing challenge across systems and delivery models. Clinical decision support systems (CDSSs) are widely deployed to augment clinicians in their complex decision-making processes. Despite published success stories, the poor usability of many CDSSs has contributed to fragmented workflows and alert fatigue. Objective: This study aimed to validate the application of a user-centered design (UCD) process in the development of a standards-based medication recommender for type 2 diabetes mellitus in a simulated setting. The prototype app was evaluated for effectiveness, efficiency, and user satisfaction. Methods: We conducted interviews with 8 clinical leaders with 8 rounds of iterative user testing with 2-8 prescribers in each round to inform app development. With the resulting prototype app, we conducted a validation study with 43 participants. The participants were assigned to one of two groups and completed a 2-hour remote user testing session. Both groups reviewed mock patient facts and ordered diabetes medications for the patients. The Traditional group used a mock electronic health record (EHR) for the review in Period 1 and used the prototype app in Period 2, while the Tool group used the prototype app during both time periods. The perceived cognitive load associated with task performance during each period was assessed with the National Aeronautics and Space Administration Task Load Index. Participants also completed the System Usability Scale (SUS) questionnaire and Kano Survey. Results: Average SUS scores from the questionnaire, taken at the end of 5 of the 8 user testing sessions, ranged from 68-86. The results of the validation study are as follows: percent adherence to evidence-based guidelines was greater with the use of the prototype app than with the EHR across time periods with the Traditional group (prototype app mean 96.2 vs EHR mean 72.0, P<.001) and between groups during Period 1 (Tool group mean 92.6 vs Traditional group mean 72.0, P<.001). Task completion times did not differ between groups (P=.23), but the Tool group completed medication ordering more quickly in Period 2 (Period 1 mean 130.7 seconds vs Period 2 mean 107.7 seconds, P<.001). Based on an adjusted $\alpha$ level owing to violation of the assumption of homogeneity of variance (Ps>.03), there was no effect on screens viewed and on perceived cognitive load (all Ps>.14). Conclusions: Through deployment of the UCD process, a point-of-care medication recommender app holds promise of improving adherence to evidence-based guidelines; in this case, those from the American Diabetes Association. Task-time performance suggests that with practice the T2DM app may support a more efficient ordering process for providers, and SUS scores indicate provider satisfaction with the app. ", doi="10.2196/33470", url="https://humanfactors.jmir.org/2022/1/e33470", url="http://www.ncbi.nlm.nih.gov/pubmed/34784293" } @Article{info:doi/10.2196/32939, author="Chew, Jocelyn Han Shi and Achananuparp, Palakorn", title="Perceptions and Needs of Artificial Intelligence in Health Care to Increase Adoption: Scoping Review", journal="J Med Internet Res", year="2022", month="Jan", day="14", volume="24", number="1", pages="e32939", keywords="artificial intelligence", keywords="health care", keywords="service delivery", keywords="perceptions", keywords="needs", keywords="scoping", keywords="review", abstract="Background: Artificial intelligence (AI) has the potential to improve the efficiency and effectiveness of health care service delivery. However, the perceptions and needs of such systems remain elusive, hindering efforts to promote AI adoption in health care. Objective: This study aims to provide an overview of the perceptions and needs of AI to increase its adoption in health care. Methods: A systematic scoping review was conducted according to the 5-stage framework by Arksey and O'Malley. Articles that described the perceptions and needs of AI in health care were searched across nine databases: ACM Library, CINAHL, Cochrane Central, Embase, IEEE Xplore, PsycINFO, PubMed, Scopus, and Web of Science for studies that were published from inception until June 21, 2021. Articles that were not specific to AI, not research studies, and not written in English were omitted. Results: Of the 3666 articles retrieved, 26 (0.71\%) were eligible and included in this review. The mean age of the participants ranged from 30 to 72.6 years, the proportion of men ranged from 0\% to 73.4\%, and the sample sizes for primary studies ranged from 11 to 2780. The perceptions and needs of various populations in the use of AI were identified for general, primary, and community health care; chronic diseases self-management and self-diagnosis; mental health; and diagnostic procedures. The use of AI was perceived to be positive because of its availability, ease of use, and potential to improve efficiency and reduce the cost of health care service delivery. However, concerns were raised regarding the lack of trust in data privacy, patient safety, technological maturity, and the possibility of full automation. Suggestions for improving the adoption of AI in health care were highlighted: enhancing personalization and customizability; enhancing empathy and personification of AI-enabled chatbots and avatars; enhancing user experience, design, and interconnectedness with other devices; and educating the public on AI capabilities. Several corresponding mitigation strategies were also identified in this study. Conclusions: The perceptions and needs of AI in its use in health care are crucial in improving its adoption by various stakeholders. Future studies and implementations should consider the points highlighted in this study to enhance the acceptability and adoption of AI in health care. This would facilitate an increase in the effectiveness and efficiency of health care service delivery to improve patient outcomes and satisfaction. ", doi="10.2196/32939", url="https://www.jmir.org/2022/1/e32939", url="http://www.ncbi.nlm.nih.gov/pubmed/35029538" } @Article{info:doi/10.2196/33873, author="Andrade, Q. Andre and Calabretto, Jean-Pierre and Pratt, L. Nicole and Kalisch-Ellett, M. Lisa and Kassie, M. Gizat and LeBlanc, T. Vanessa and Ramsay, Emmae and Roughead, E. Elizabeth", title="Implementation and Evaluation of a Digitally Enabled Precision Public Health Intervention to Reduce Inappropriate Gabapentinoid Prescription: Cluster Randomized Controlled Trial", journal="J Med Internet Res", year="2022", month="Jan", day="10", volume="24", number="1", pages="e33873", keywords="audit and feedback", keywords="digital health", keywords="precision public health", keywords="digital intervention", keywords="primary care", keywords="physician", keywords="health professional", keywords="health education", abstract="Background: Digital technologies can enable rapid targeted delivery of audit and feedback interventions at scale. Few studies have evaluated how mode of delivery affects clinical professional behavior change and none have assessed the feasibility of such an initiative at a national scale. Objective: The aim of this study was to develop and evaluate the effect of audit and feedback by digital versus postal (letter) mode of delivery on primary care physician behavior. Methods: This study was developed as part of the Veterans' Medicines Advice and Therapeutics Education Services (MATES) program, an intervention funded by the Australian Government Department of Veterans' Affairs that provides targeted education and patient-specific audit with feedback to Australian general practitioners, as well as educational material to veterans and other health professionals. We performed a cluster randomized controlled trial of a multifaceted intervention to reduce inappropriate gabapentinoid prescription, comparing digital and postal mode of delivery. All veteran patients targeted also received an educational intervention (postal delivery). Efficacy was measured using a linear mixed-effects model as the average number of gabapentinoid prescriptions standardized by defined daily dose (individual level), and number of veterans visiting a psychologist in the 6 and 12 months following the intervention. Results: The trial involved 2552 general practitioners in Australia and took place in March 2020. Both intervention groups had a significant reduction in total gabapentinoid prescription by the end of the study period (digital: mean reduction of 11.2\%, P=.004; postal: mean reduction of 11.2\%, P=.001). We found no difference between digital and postal mode of delivery in reduction of gabapentinoid prescriptions at 12 months (digital: --0.058, postal: --0.058, P=.98). Digital delivery increased initiations to psychologists at 12 months (digital: 3.8\%, postal: 2.0\%, P=.02). Conclusions: Our digitally delivered professional behavior change intervention was feasible, had comparable effectiveness to the postal intervention with regard to changes in medicine use, and had increased effectiveness with regard to referrals to a psychologist. Given the logistical benefits of digital delivery in nationwide programs, the results encourage exploration of this mode in future interventions. ", doi="10.2196/33873", url="https://www.jmir.org/2022/1/e33873", url="http://www.ncbi.nlm.nih.gov/pubmed/35006086" } @Article{info:doi/10.2196/28762, author="Garcia, Gracie and Crenner, Christopher", title="Comparing International Experiences With Electronic Health Records Among Emergency Medicine Physicians in the United States and Norway: Semistructured Interview Study", journal="JMIR Hum Factors", year="2022", month="Jan", day="7", volume="9", number="1", pages="e28762", keywords="electronic health records", keywords="electronic medical records", keywords="health information technology", keywords="health information exchange", keywords="health policy", keywords="international", keywords="emergency medicine", keywords="medical informatics", keywords="meaningful use", keywords="burnout", abstract="Background: The variability in physicians' attitudes regarding electronic health records (EHRs) is widely recognized. Both human and technological factors contribute to user satisfaction. This exploratory study considers these variables by comparing emergency medicine physician experiences with EHRs in the United States and Norway. Objective: This study is unique as it aims to compare individual experiences with EHRs. It creates an opportunity to expand perspective, challenge the unknown, and explore how this technology affects clinicians globally. Research often highlights the challenge that health information technology has created for users: Are the negative consequences of this technology shared among countries? Does it affect medical practice? What determines user satisfaction? Can this be measured internationally? Do specific factors account for similarities or differences? This study begins by investigating these questions by comparing cohort experiences. Fundamental differences between nations will also be addressed. Methods: We used semistructured, participant-driven, in-depth interviews (N=12) for data collection in conjunction with ethnographic observations. The conversations were recorded and transcribed. Texts were then analyzed using NVivo software (QSR International) to develop codes for direct comparison among countries. Comprehensive understanding of the data required triangulation, specifically using thematic and interpretive phenomenological analysis. Narrative analysis ensured appropriate context of the NVivo (QSR International) query results. Results: Each interview resulted in mixed discussions regarding the benefits and disadvantages of EHRs. All the physicians recognized health care's dependence on this technology. In Norway, physicians perceived more benefits compared with those based in the United States. Americans reported fewer benefits and disproportionally high disadvantages. Both cohorts believed that EHRs have increased user workload. However, this was mentioned 2.6 times more frequently by Americans (United States [n=40] vs Norway [n=15]). Financial influences regarding health information technology use were of great concern for American physicians but rarely mentioned among Norwegian physicians (United States [n=37] vs Norway [n=6]). Technology dysfunctions were the most common complaint from Norwegian physicians. Participants from each country noted increased frustration among older colleagues. Conclusions: Despite differences spanning geographical, organizational, and cultural boundaries, much is to be learned by comparing individual experiences. Both cohorts experienced EHR-related frustrations, although etiology differed. The overall number of complaints was significantly higher among American physicians. This study augments the idea that policy, regulation, and administration have compelling influence on user experience. Global EHR optimization requires additional investigation, and these results help to establish a foundation for future research. ", doi="10.2196/28762", url="https://humanfactors.jmir.org/2022/1/e28762", url="http://www.ncbi.nlm.nih.gov/pubmed/34994702" } @Article{info:doi/10.2196/28953, author="Zeng, Siyang and Arjomandi, Mehrdad and Tong, Yao and Liao, C. Zachary and Luo, Gang", title="Developing a Machine Learning Model to Predict Severe Chronic Obstructive Pulmonary Disease Exacerbations: Retrospective Cohort Study", journal="J Med Internet Res", year="2022", month="Jan", day="6", volume="24", number="1", pages="e28953", keywords="chronic obstructive pulmonary disease", keywords="machine learning", keywords="forecasting", keywords="symptom exacerbation", keywords="patient care management", abstract="Background: Chronic obstructive pulmonary disease (COPD) poses a large burden on health care. Severe COPD exacerbations require emergency department visits or inpatient stays, often cause an irreversible decline in lung function and health status, and account for 90.3\% of the total medical cost related to COPD. Many severe COPD exacerbations are deemed preventable with appropriate outpatient care. Current models for predicting severe COPD exacerbations lack accuracy, making it difficult to effectively target patients at high risk for preventive care management to reduce severe COPD exacerbations and improve outcomes. Objective: The aim of this study is to develop a more accurate model to predict severe COPD exacerbations. Methods: We examined all patients with COPD who visited the University of Washington Medicine facilities between 2011 and 2019 and identified 278 candidate features. By performing secondary analysis on 43,576 University of Washington Medicine data instances from 2011 to 2019, we created a machine learning model to predict severe COPD exacerbations in the next year for patients with COPD. Results: The final model had an area under the receiver operating characteristic curve of 0.866. When using the top 9.99\% (752/7529) of the patients with the largest predicted risk to set the cutoff threshold for binary classification, the model gained an accuracy of 90.33\% (6801/7529), a sensitivity of 56.6\% (103/182), and a specificity of 91.17\% (6698/7347). Conclusions: Our model provided a more accurate prediction of severe COPD exacerbations in the next year compared with prior published models. After further improvement of its performance measures (eg, by adding features extracted from clinical notes), our model could be used in a decision support tool to guide the identification of patients with COPD and at high risk for care management to improve outcomes. International Registered Report Identifier (IRRID): RR2-10.2196/13783 ", doi="10.2196/28953", url="https://www.jmir.org/2022/1/e28953", url="http://www.ncbi.nlm.nih.gov/pubmed/34989686" } @Article{info:doi/10.2196/32724, author="Kraus, Moritz and Saller, Michael Maximilian and Baumbach, Felix Sebastian and Neuerburg, Carl and Stumpf, Cordula Ulla and B{\"o}cker, Wolfgang and Keppler, Martin Alexander", title="Prediction of Physical Frailty in Orthogeriatric Patients Using Sensor Insole--Based Gait Analysis and Machine Learning Algorithms: Cross-sectional Study", journal="JMIR Med Inform", year="2022", month="Jan", day="5", volume="10", number="1", pages="e32724", keywords="wearables", keywords="insole sensors", keywords="orthogeriatric", keywords="artificial intelligence", keywords="prediction models", keywords="machine learning", keywords="gait analysis", keywords="digital sensors", keywords="digital health", keywords="aging", keywords="prediction algorithms", keywords="geriatric", keywords="mobile health", keywords="mobile insoles", abstract="Background: Assessment of the physical frailty of older patients is of great importance in many medical disciplines to be able to implement individualized therapies. For physical tests, time is usually used as the only objective measure. To record other objective factors, modern wearables offer great potential for generating valid data and integrating the data into medical decision-making. Objective: The aim of this study was to compare the predictive value of insole data, which were collected during the Timed-Up-and-Go (TUG) test, to the benchmark standard questionnaire for sarcopenia (SARC-F: strength, assistance with walking, rising from a chair, climbing stairs, and falls) and physical assessment (TUG test) for evaluating physical frailty, defined by the Short Physical Performance Battery (SPPB), using machine learning algorithms. Methods: This cross-sectional study included patients aged >60 years with independent ambulation and no mental or neurological impairment. A comprehensive set of parameters associated with physical frailty were assessed, including body composition, questionnaires (European Quality of Life 5-dimension [EQ 5D 5L], SARC-F), and physical performance tests (SPPB, TUG), along with digital sensor insole gait parameters collected during the TUG test. Physical frailty was defined as an SPPB score?8. Advanced statistics, including random forest (RF) feature selection and machine learning algorithms (K-nearest neighbor [KNN] and RF) were used to compare the diagnostic value of these parameters to identify patients with physical frailty. Results: Classified by the SPPB, 23 of the 57 eligible patients were defined as having physical frailty. Several gait parameters were significantly different between the two groups (with and without physical frailty). The area under the receiver operating characteristic curve (AUROC) of the TUG test was superior to that of the SARC-F (0.862 vs 0.639). The recursive feature elimination algorithm identified 9 parameters, 8 of which were digital insole gait parameters. Both the KNN and RF algorithms trained with these parameters resulted in excellent results (AUROC of 0.801 and 0.919, respectively). Conclusions: A gait analysis based on machine learning algorithms using sensor soles is superior to the SARC-F and the TUG test to identify physical frailty in orthogeriatric patients. ", doi="10.2196/32724", url="https://medinform.jmir.org/2022/1/e32724", url="http://www.ncbi.nlm.nih.gov/pubmed/34989684" } @Article{info:doi/10.2196/28783, author="Savoy, April and Saleem, J. Jason and Barker, C. Barry and Patel, Himalaya and Kara, Areeba", title="Clinician Perspectives on Unmet Needs for Mobile Technology Among Hospitalists: Workflow Analysis Based on Semistructured Interviews", journal="JMIR Hum Factors", year="2022", month="Jan", day="4", volume="9", number="1", pages="e28783", keywords="electronic health records", keywords="hospital medicine", keywords="user-computer interface", keywords="human-computer interaction", keywords="usability", keywords="mental workload", keywords="workflow analysis", abstract="Background: The hospitalist workday is cognitively demanding and dominated by activities away from patients' bedsides. Although mobile technologies are offered as solutions, clinicians report lower expectations of mobile technology after actual use. Objective: The purpose of this study is to better understand opportunities for integrating mobile technology and apps into hospitalists' workflows. We aim to identify difficult tasks and contextual factors that introduce inefficiencies and characterize hospitalists' perspectives on mobile technology and apps. Methods: We conducted a workflow analysis based on semistructured interviews. At a Midwestern US medical center, we recruited physicians and nurse practitioners from hospitalist and inpatient teaching teams and internal medicine residents. Interviews focused on tasks perceived as frequent, redundant, and difficult. Additionally, participants were asked to describe opportunities for mobile technology interventions. We analyzed contributing factors, impacted workflows, and mobile app ideas. Results: Over 3 months, we interviewed 12 hospitalists. Participants collectively identified chart reviews, orders, and documentation as the most frequent, redundant, and difficult tasks. Based on those tasks, the intake, discharge, and rounding workflows were characterized as difficult and inefficient. The difficulty was associated with a lack of access to electronic health records at the bedside. Contributing factors for inefficiencies were poor usability and inconsistent availability of health information technology combined with organizational policies. Participants thought mobile apps designed to improve team communications would be most beneficial. Based on our analysis, mobile apps focused on data entry and presentation supporting specific tasks should also be prioritized. Conclusions: Based on our results, there are prioritized opportunities for mobile technology to decrease difficulty and increase the efficiency of hospitalists' workflows. Mobile technology and task-specific mobile apps with enhanced usability could decrease overreliance on hospitalists' memory and fragmentation of clinical tasks across locations. This study informs the design and implementation processes of future health information technologies to improve continuity in hospital-based medicine. ", doi="10.2196/28783", url="https://humanfactors.jmir.org/2022/1/e28783", url="http://www.ncbi.nlm.nih.gov/pubmed/34643530" } @Article{info:doi/10.2196/32635, author="Kumar, Bharat and Zetumer, Samuel and Swee, Melissa and Endelman, Keyser Ellen L. and Suneja, Manish and Davis, Benjamin", title="Reducing Delays in Diagnosing Primary Immunodeficiency Through the Development and Implementation of a Clinical Decision Support Tool: Protocol for a Quality Improvement Project", journal="JMIR Res Protoc", year="2022", month="Jan", day="4", volume="11", number="1", pages="e32635", keywords="immunology", keywords="clinical decision support", keywords="diagnostic decision-making", abstract="Background: Primary immunodeficiencies (PIs) are a set of heterogeneous chronic disorders characterized by immune dysfunction. They are diagnostically challenging because of their clinical heterogeneity, knowledge gaps among primary care physicians, and continuing shortages of clinically trained immunologists. As a result, patients with undiagnosed PIs are at increased risk for recurrent infections, cancers, and autoimmune diseases. Objective: The aim of this research is to develop and implement a clinical decision support (CDS) tool for the identification of underlying PIs. Methods: We will develop and implement a CDS tool for the identification of underlying PIs among patients who receive primary care through a health care provider at the University of Iowa Hospitals and Clinics. The CDS tool will function through an algorithm that is based on the Immune Deficiency Foundation's 10 Warning Signs for Primary Immunodeficiency. Over the course of a year, we will use Lean Six Sigma principles and the Define, Measure, Analyze, Improve, and Control (DMAIC) framework to guide the project. The primary measure is the number of newly diagnosed PI patients per month. Secondary measures include the following: (1) the number of new patients identified by the CDS as being at high risk for PI, (2) the number of new PI cases in which immunoglobulin replacement or rotating antibiotics are started, (3) the cost of evaluation of each patient identified by the CDS tool as being at high risk for PIs, (4) the number of new consults not diagnosed with a PI, and (5) patient satisfaction with the process of referral to the Immunology Clinic. Results: This study was determined to not be Human Subjects Research by the Institutional Review Board at the University of Iowa. Data collection will begin in August 2021. Conclusions: The development and implementation of a CDS tool is a promising approach to identifying patients with underlying PI. This protocol assesses whether such an approach will be able to achieve its objective of reducing diagnostic delays. The disciplined approach, using Lean Six Sigma and the DMAIC framework, will guide implementation to maximize opportunities for a successful intervention that meets the study's goals and objectives as well as to allow for replication and adaptation of these methods at other sites. International Registered Report Identifier (IRRID): PRR1-10.2196/32635 ", doi="10.2196/32635", url="https://www.researchprotocols.org/2022/1/e32635", url="http://www.ncbi.nlm.nih.gov/pubmed/34587114" } @Article{info:doi/10.2196/34415, author="Ko, Hoon and Huh, Jimi and Kim, Won Kyung and Chung, Heewon and Ko, Yousun and Kim, Keun Jai and Lee, Hee Jei and Lee, Jinseok", title="A Deep Residual U-Net Algorithm for Automatic Detection and Quantification of Ascites on Abdominopelvic Computed Tomography Images Acquired in the Emergency Department: Model Development and Validation", journal="J Med Internet Res", year="2022", month="Jan", day="3", volume="24", number="1", pages="e34415", keywords="ascites", keywords="computed tomography", keywords="deep residual U-Net", keywords="artificial intelligence", abstract="Background: Detection and quantification of intra-abdominal free fluid (ie, ascites) on computed tomography (CT) images are essential processes for finding emergent or urgent conditions in patients. In an emergency department, automatic detection and quantification of ascites will be beneficial. Objective: We aimed to develop an artificial intelligence (AI) algorithm for the automatic detection and quantification of ascites simultaneously using a single deep learning model (DLM). Methods: We developed 2D DLMs based on deep residual U-Net, U-Net, bidirectional U-Net, and recurrent residual U-Net (R2U-Net) algorithms to segment areas of ascites on abdominopelvic CT images. Based on segmentation results, the DLMs detected ascites by classifying CT images into ascites images and nonascites images. The AI algorithms were trained using 6337 CT images from 160 subjects (80 with ascites and 80 without ascites) and tested using 1635 CT images from 40 subjects (20 with ascites and 20 without ascites). The performance of the AI algorithms was evaluated for diagnostic accuracy of ascites detection and for segmentation accuracy of ascites areas. Of these DLMs, we proposed an AI algorithm with the best performance. Results: The segmentation accuracy was the highest for the deep residual U-Net model with a mean intersection over union (mIoU) value of 0.87, followed by U-Net, bidirectional U-Net, and R2U-Net models (mIoU values of 0.80, 0.77, and 0.67, respectively). The detection accuracy was the highest for the deep residual U-Net model (0.96), followed by U-Net, bidirectional U-Net, and R2U-Net models (0.90, 0.88, and 0.82, respectively). The deep residual U-Net model also achieved high sensitivity (0.96) and high specificity (0.96). Conclusions: We propose a deep residual U-Net--based AI algorithm for automatic detection and quantification of ascites on abdominopelvic CT scans, which provides excellent performance. ", doi="10.2196/34415", url="https://www.jmir.org/2022/1/e34415", url="http://www.ncbi.nlm.nih.gov/pubmed/34982041" } @Article{info:doi/10.2196/27008, author="Yao, Li-Hung and Leung, Ka-Chun and Tsai, Chu-Lin and Huang, Chien-Hua and Fu, Li-Chen", title="A Novel Deep Learning--Based System for Triage in the Emergency Department Using Electronic Medical Records: Retrospective Cohort Study", journal="J Med Internet Res", year="2021", month="Dec", day="27", volume="23", number="12", pages="e27008", keywords="emergency department", keywords="triage system", keywords="deep learning", keywords="hospital admission", keywords="data to text", keywords="electronic health record", abstract="Background: Emergency department (ED) crowding has resulted in delayed patient treatment and has become a universal health care problem. Although a triage system, such as the 5-level emergency severity index, somewhat improves the process of ED treatment, it still heavily relies on the nurse's subjective judgment and triages too many patients to emergency severity index level 3 in current practice. Hence, a system that can help clinicians accurately triage a patient's condition is imperative. Objective: This study aims to develop a deep learning--based triage system using patients' ED electronic medical records to predict clinical outcomes after ED treatments. Methods: We conducted a retrospective study using data from an open data set from the National Hospital Ambulatory Medical Care Survey from 2012 to 2016 and data from a local data set from the National Taiwan University Hospital from 2009 to 2015. In this study, we transformed structured data into text form and used convolutional neural networks combined with recurrent neural networks and attention mechanisms to accomplish the classification task. We evaluated our performance using area under the receiver operating characteristic curve (AUROC). Results: A total of 118,602 patients from the National Hospital Ambulatory Medical Care Survey were included in this study for predicting hospitalization, and the accuracy and AUROC were 0.83 and 0.87, respectively. On the other hand, an external experiment was to use our own data set from the National Taiwan University Hospital that included 745,441 patients, where the accuracy and AUROC were similar, that is, 0.83 and 0.88, respectively. Moreover, to effectively evaluate the prediction quality of our proposed system, we also applied the model to other clinical outcomes, including mortality and admission to the intensive care unit, and the results showed that our proposed method was approximately 3\% to 5\% higher in accuracy than other conventional methods. Conclusions: Our proposed method achieved better performance than the traditional method, and its implementation is relatively easy, it includes commonly used variables, and it is better suited for real-world clinical settings. It is our future work to validate our novel deep learning--based triage algorithm with prospective clinical trials, and we hope to use it to guide resource allocation in a busy ED once the validation succeeds. ", doi="10.2196/27008", url="https://www.jmir.org/2021/12/e27008", url="http://www.ncbi.nlm.nih.gov/pubmed/34958305" } @Article{info:doi/10.2196/25328, author="Madalinski, Mariusz and Prudham, Roger", title="Can Real-time Computer-Aided Detection Systems Diminish the Risk of Postcolonoscopy Colorectal Cancer?", journal="JMIR Med Inform", year="2021", month="Dec", day="24", volume="9", number="12", pages="e25328", keywords="artificial intelligence", keywords="colonoscopy", keywords="adenoma", keywords="real-time computer-aided detection", keywords="colonic polyp", doi="10.2196/25328", url="https://medinform.jmir.org/2021/12/e25328", url="http://www.ncbi.nlm.nih.gov/pubmed/34571490" } @Article{info:doi/10.2196/31042, author="Burgon, Trever and Casebeer, Linda and Aasen, Holly and Valdenor, Czarlota and Tamondong-Lachica, Diana and de Belen, Enrico and Paculdo, David and Peabody, John", title="Measuring and Improving Evidence-Based Patient Care Using a Web-Based Gamified Approach in Primary Care (QualityIQ): Randomized Controlled Trial", journal="J Med Internet Res", year="2021", month="Dec", day="23", volume="23", number="12", pages="e31042", keywords="quality improvement", keywords="physician engagement", keywords="MIPS", keywords="case simulation", keywords="feedback", keywords="value-based care", keywords="care standardization", keywords="simulation", keywords="gamification", keywords="medical education", keywords="continuing education", keywords="outcome", keywords="serious game", keywords="decision-support", abstract="Background: Unwarranted variability in clinical practice is a challenging problem in practice today, leading to poor outcomes for patients and low-value care for providers, payers, and patients. Objective: In this study, we introduced a novel tool, QualityIQ, and determined the extent to which it helps primary care physicians to align care decisions with the latest best practices included in the Merit-Based Incentive Payment System (MIPS). Methods: We developed the fully automated QualityIQ patient simulation platform with real-time evidence-based feedback and gamified peer benchmarking. Each case included workup, diagnosis, and management questions with explicit evidence-based scoring criteria. We recruited practicing primary care physicians across the United States into the study via the web and conducted a cross-sectional study of clinical decisions among a national sample of primary care physicians, randomized to continuing medical education (CME) and non-CME study arms. Physicians ``cared'' for 8 weekly cases that covered typical primary care scenarios. We measured participation rates, changes in quality scores (including MIPS scores), self-reported practice change, and physician satisfaction with the tool. The primary outcomes for this study were evidence-based care scores within each case, adherence to MIPS measures, and variation in clinical decision-making among the primary care providers caring for the same patient. Results: We found strong, scalable engagement with the tool, with 75\% of participants (61 non-CME and 59 CME) completing at least 6 of 8 total cases. We saw significant improvement in evidence-based clinical decisions across multiple conditions, such as diabetes (+8.3\%, P<.001) and osteoarthritis (+7.6\%, P=.003) and with MIPS-related quality measures, such as diabetes eye examinations (+22\%, P<.001), depression screening (+11\%, P<.001), and asthma medications (+33\%, P<.001). Although the CME availability did not increase enrollment in the study, participants who were offered CME credits were more likely to complete at least 6 of the 8 cases. Conclusions: Although CME availability did not prove to be important, the short, clinically detailed case simulations with real-time feedback and gamified peer benchmarking did lead to significant improvements in evidence-based care decisions among all practicing physicians. Trial Registration: ClinicalTrials.gov NCT03800901; https://clinicaltrials.gov/ct2/show/NCT03800901 ", doi="10.2196/31042", url="https://www.jmir.org/2021/12/e31042", url="http://www.ncbi.nlm.nih.gov/pubmed/34941547" } @Article{info:doi/10.2196/26323, author="Uslu, Aykut and Stausberg, J{\"u}rgen", title="Value of the Electronic Medical Record for Hospital Care: Update From the Literature", journal="J Med Internet Res", year="2021", month="Dec", day="23", volume="23", number="12", pages="e26323", keywords="cost analysis", keywords="costs and cost analyses", keywords="economic advantage", keywords="electronic medical records", keywords="electronic records", keywords="health care", keywords="hospitals", keywords="medical records systems computerized", keywords="quality of health care", keywords="secondary data", abstract="Background: Electronic records could improve quality and efficiency of health care. National and international bodies propagate this belief worldwide. However, the evidence base concerning the effects and advantages of electronic records is questionable. The outcome of health care systems is influenced by many components, making assertions about specific types of interventions difficult. Moreover, electronic records itself constitute a complex intervention offering several functions with possibly positive as well as negative effects on the outcome of health care systems. Objective: The aim of this review is to summarize empirical studies about the value of electronic medical records (EMRs) for hospital care published between 2010 and spring 2019. Methods: The authors adopted their method from a series of literature reviews. The literature search was performed on MEDLINE with ``Medical Record System, Computerized'' as the essential keyword. The selection process comprised 2 phases looking for a consent of both authors. Starting with 1345 references, 23 were finally included in the review. The evaluation combined a scoring of the studies' quality, a description of data sources in case of secondary data analyses, and a qualitative assessment of the publications' conclusions concerning the medical record's impact on quality and efficiency of health care. Results: The majority of the studies stemmed from the United States (19/23, 83\%). Mostly, the studies used publicly available data (``secondary data studies''; 17/23, 74\%). A total of 18 studies analyzed the effect of an EMR on the quality of health care (78\%), 16 the effect on the efficiency of health care (70\%). The primary data studies achieved a mean score of 4.3 (SD 1.37; theoretical maximum 10); the secondary data studies a mean score of 7.1 (SD 1.26; theoretical maximum 9). From the primary data studies, 2 demonstrated a reduction of costs. There was not one study that failed to demonstrate a positive effect on the quality of health care. Overall, 9/16 respective studies showed a reduction of costs (56\%); 14/18 studies showed an increase of health care quality (78\%); the remaining 4 studies missed explicit information about the proposed positive effect. Conclusions: This review revealed a clear evidence about the value of EMRs. In addition to an awesome majority of economic advantages, the review also showed improvements in quality of care by all respective studies. The use of secondary data studies has prevailed over primary data studies in the meantime. Future work could focus on specific aspects of electronic records to guide their implementation and operation. ", doi="10.2196/26323", url="https://www.jmir.org/2021/12/e26323", url="http://www.ncbi.nlm.nih.gov/pubmed/34941544" } @Article{info:doi/10.2196/26613, author="Hannan, Jafrul Md and Parveen, Kohinoor Mosammat and Hoque, Mozammel Md and Chowdhury, Kabir Tanvir and Hasan, Samiul Md and Nandy, Alak", title="Management of Acute Appendicitis in Children During COVID-19 and Perspectives of Pediatric Surgeons From South Asia: Survey Study", journal="JMIR Perioper Med", year="2021", month="Dec", day="21", volume="4", number="2", pages="e26613", keywords="COVID-19", keywords="gastrointestinal", keywords="pediatric", keywords="global surgery", abstract="Background: Nonoperative treatment (NOT) of pediatric appendicitis as opposed to surgery elicits great debate and is potentially influenced by physician preferences. Owing to the effects of the COVID-19 pandemic on health care, the practice of NOT has generally increased by necessity and may, in a post--COVID-19 world, change surgeons' perceptions of NOT. Objective: The aim of this study was to determine whether the use of NOT has increased in South Asia and whether these levels of practice would be sustained after the pandemic subsides. Methods: A survey was conducted among pediatric surgeons regarding their position, institute, and country; the number of appendicitis cases they managed; and their mode of treatment between identical time periods in 2019 and 2020 (April 1 to August 31). The survey also directly posed the question as to whether they would continue with the COVID-19--imposed level of NOT after the effect of the pandemic diminishes. Results: A total of 134 responses were collected out of 200 (67.0\%). A significant increase in the practice of NOT was observed for the entire cohort, although no effect was observed when grouped by country or institute. When grouped by position, senior physicians increased the practice of NOT the most, while junior physicians reported the least change. The data suggest that only professors would be inclined to maintain the COVID-19--level of NOT practice after the pandemic. Conclusions: Increased practice of NOT during the COVID-19 pandemic was observed in South Asia, particularly by senior surgeons. Only professors appeared inclined to consider maintaining this increased level of practice in the post--COVID-19 world. ", doi="10.2196/26613", url="https://periop.jmir.org/2021/2/e26613", url="http://www.ncbi.nlm.nih.gov/pubmed/34818209" } @Article{info:doi/10.2196/25899, author="Pecina, L. Jennifer and Nigon, M. Leah and Penza, S. Kristine and Murray, A. Martha and Kronebusch, J. Beckie and Miller, E. Nathaniel and Jensen, B. Teresa", title="Use of the McIsaac Score to Predict Group A Streptococcal Pharyngitis in Outpatient Nurse Phone Triage and Electronic Visits Compared With In-Person Visits: Retrospective Observational Study", journal="J Med Internet Res", year="2021", month="Dec", day="20", volume="23", number="12", pages="e25899", keywords="strep pharygitis", keywords="e-visit", keywords="electronic visit", keywords="telemedicine", keywords="telecare", keywords="virtual visit", keywords="McIssac score", keywords="nurse phone triage", keywords="scoring system", keywords="sore throat", keywords="group A streptococcus", keywords="telehealth", keywords="nurse", keywords="phone", keywords="triage", abstract="Background: The McIsaac criteria are a validated scoring system used to determine the likelihood of an acute sore throat being caused by group A streptococcus (GAS) to stratify patients who need strep testing. Objective: We aim to compare McIsaac criteria obtained during face-to-face (f2f) and non-f2f encounters. Methods: This retrospective study compared the percentage of positive GAS tests by McIsaac score for scores calculated during nurse protocol phone encounters, e-visits (electronic visits), and in person f2f clinic visits. Results: There was no difference in percentages of positive strep tests between encounter types for any of the McIsaac scores. There were significantly more phone and e-visit encounters with any missing score components compared with f2f visits. For individual score components, there were significantly fewer e-visits missing fever and cough information compared with phone encounters and f2f encounters. F2f encounters were significantly less likely to be missing descriptions of tonsils and lymphadenopathy compared with phone and e-visit encounters. McIsaac scores of 4 had positive GAS rates of 55\% to 68\% across encounter types. There were 4 encounters not missing any score components with a McIsaac score of 0. None of these 4 encounters had a positive GAS test. Conclusions: McIsaac scores of 4 collected during non-f2f care could be used to consider empiric treatment for GAS without testing if significant barriers to testing exist such as the COVID-19 pandemic or geographic barriers. Future studies should evaluate further whether non-f2f encounters with McIsaac scores of 0 can be safely excluded from GAS testing. ", doi="10.2196/25899", url="https://www.jmir.org/2021/12/e25899", url="http://www.ncbi.nlm.nih.gov/pubmed/34932016" } @Article{info:doi/10.2196/33540, author="Hah, Hyeyoung and Goldin, Shevit Deana", title="How Clinicians Perceive Artificial Intelligence--Assisted Technologies in Diagnostic Decision Making: Mixed Methods Approach", journal="J Med Internet Res", year="2021", month="Dec", day="16", volume="23", number="12", pages="e33540", keywords="artificial intelligence algorithms", keywords="AI", keywords="diagnostic capability", keywords="virtual care", keywords="multilevel modeling", keywords="human-AI teaming", keywords="natural language understanding", abstract="Background: With the rapid development of artificial intelligence (AI) and related technologies, AI algorithms are being embedded into various health information technologies that assist clinicians in clinical decision making. Objective: This study aimed to explore how clinicians perceive AI assistance in diagnostic decision making and suggest the paths forward for AI-human teaming for clinical decision making in health care. Methods: This study used a mixed methods approach, utilizing hierarchical linear modeling and sentiment analysis through natural language understanding techniques. Results: A total of 114 clinicians participated in online simulation surveys in 2020 and 2021. These clinicians studied family medicine and used AI algorithms to aid in patient diagnosis. Their overall sentiment toward AI-assisted diagnosis was positive and comparable with diagnoses made without the assistance of AI. However, AI-guided decision making was not congruent with the way clinicians typically made decisions in diagnosing illnesses. In a quantitative survey, clinicians reported perceiving current AI assistance as not likely to enhance diagnostic capability and negatively influenced their overall performance ($\beta$=--0.421, P=.02). Instead, clinicians' diagnostic capabilities tended to be associated with well-known parameters, such as education, age, and daily habit of technology use on social media platforms. Conclusions: This study elucidated clinicians' current perceptions and sentiments toward AI-enabled diagnosis. Although the sentiment was positive, the current form of AI assistance may not be linked with efficient decision making, as AI algorithms are not well aligned with subjective human reasoning in clinical diagnosis. Developers and policy makers in health could gather behavioral data from clinicians in various disciplines to help align AI algorithms with the unique subjective patterns of reasoning that humans employ in clinical diagnosis. ", doi="10.2196/33540", url="https://www.jmir.org/2021/12/e33540", url="http://www.ncbi.nlm.nih.gov/pubmed/34924356" } @Article{info:doi/10.2196/33267, author="Bang, Seok Chang and Lee, Jun Jae and Baik, Ho Gwang", title="Computer-Aided Diagnosis of Gastrointestinal Ulcer and Hemorrhage Using Wireless Capsule Endoscopy: Systematic Review and Diagnostic Test Accuracy Meta-analysis", journal="J Med Internet Res", year="2021", month="Dec", day="14", volume="23", number="12", pages="e33267", keywords="artificial intelligence", keywords="computer-aided diagnosis", keywords="capsule endoscopy", keywords="ulcer", keywords="hemorrhage", keywords="gastrointestinal", keywords="endoscopy", keywords="review", keywords="accuracy", keywords="meta-analysis", keywords="diagnostic", keywords="performance", keywords="machine learning", keywords="prediction models", abstract="Background: Interpretation of capsule endoscopy images or movies is operator-dependent and time-consuming. As a result, computer-aided diagnosis (CAD) has been applied to enhance the efficacy and accuracy of the review process. Two previous meta-analyses reported the diagnostic performance of CAD models for gastrointestinal ulcers or hemorrhage in capsule endoscopy. However, insufficient systematic reviews have been conducted, which cannot determine the real diagnostic validity of CAD models. Objective: To evaluate the diagnostic test accuracy of CAD models for gastrointestinal ulcers or hemorrhage using wireless capsule endoscopic images. Methods: We conducted core databases searching for studies based on CAD models for the diagnosis of ulcers or hemorrhage using capsule endoscopy and presenting data on diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Overall, 39 studies were included. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of ulcers (or erosions) were .97 (95\% confidence interval, .95--.98), .93 (.89--.95), .92 (.89--.94), and 138 (79--243), respectively. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of hemorrhage (or angioectasia) were .99 (.98--.99), .96 (.94--0.97), .97 (.95--.99), and 888 (343--2303), respectively. Subgroup analyses showed robust results. Meta-regression showed that published year, number of training images, and target disease (ulcers vs erosions, hemorrhage vs angioectasia) was found to be the source of heterogeneity. No publication bias was detected. Conclusions: CAD models showed high performance for the optical diagnosis of gastrointestinal ulcer and hemorrhage in wireless capsule endoscopy. ", doi="10.2196/33267", url="https://www.jmir.org/2021/12/e33267", url="http://www.ncbi.nlm.nih.gov/pubmed/34904949" } @Article{info:doi/10.2196/26611, author="Ploug, Thomas and Sundby, Anna and Moeslund, B. Thomas and Holm, S{\o}ren", title="Population Preferences for Performance and Explainability of Artificial Intelligence in Health Care: Choice-Based Conjoint Survey", journal="J Med Internet Res", year="2021", month="Dec", day="13", volume="23", number="12", pages="e26611", keywords="artificial Intelligence", keywords="performance", keywords="transparency", keywords="explainability", keywords="population preferences", keywords="public policy", abstract="Background: Certain types of artificial intelligence (AI), that is, deep learning models, can outperform health care professionals in particular domains. Such models hold considerable promise for improved diagnostics, treatment, and prevention, as well as more cost-efficient health care. They are, however, opaque in the sense that their exact reasoning cannot be fully explicated. Different stakeholders have emphasized the importance of the transparency/explainability of AI decision making. Transparency/explainability may come at the cost of performance. There is need for a public policy regulating the use of AI in health care that balances the societal interests in high performance as well as in transparency/explainability. A public policy should consider the wider public's interests in such features of AI. Objective: This study elicited the public's preferences for the performance and explainability of AI decision making in health care and determined whether these preferences depend on respondent characteristics, including trust in health and technology and fears and hopes regarding AI. Methods: We conducted a choice-based conjoint survey of public preferences for attributes of AI decision making in health care in a representative sample of the adult Danish population. Initial focus group interviews yielded 6 attributes playing a role in the respondents' views on the use of AI decision support in health care: (1) type of AI decision, (2) level of explanation, (3) performance/accuracy, (4) responsibility for the final decision, (5) possibility of discrimination, and (6) severity of the disease to which the AI is applied. In total, 100 unique choice sets were developed using fractional factorial design. In a 12-task survey, respondents were asked about their preference for AI system use in hospitals in relation to 3 different scenarios. Results: Of the 1678 potential respondents, 1027 (61.2\%) participated. The respondents consider the physician having the final responsibility for treatment decisions the most important attribute, with 46.8\% of the total weight of attributes, followed by explainability of the decision (27.3\%) and whether the system has been tested for discrimination (14.8\%). Other factors, such as gender, age, level of education, whether respondents live rurally or in towns, respondents' trust in health and technology, and respondents' fears and hopes regarding AI, do not play a significant role in the majority of cases. Conclusions: The 3 factors that are most important to the public are, in descending order of importance, (1) that physicians are ultimately responsible for diagnostics and treatment planning, (2) that the AI decision support is explainable, and (3) that the AI system has been tested for discrimination. Public policy on AI system use in health care should give priority to such AI system use and ensure that patients are provided with information. ", doi="10.2196/26611", url="https://www.jmir.org/2021/12/e26611", url="http://www.ncbi.nlm.nih.gov/pubmed/34898454" } @Article{info:doi/10.2196/31333, author="Koldeweij, Charlotte and Clarke, Jonathan and Nijman, Joppe and Feather, Calandra and de Wildt, N. Saskia and Appelbaum, Nicholas", title="CE Accreditation and Barriers to CE Marking of Pediatric Drug Calculators for Mobile Devices: Scoping Review and Qualitative Analysis", journal="J Med Internet Res", year="2021", month="Dec", day="13", volume="23", number="12", pages="e31333", keywords="pediatric", keywords="drug dosage calculator", keywords="European regulations", keywords="safety", keywords="medical devices", keywords="medical errors", keywords="app", keywords="application", keywords="mobile health", keywords="pharmacy", abstract="Background: Pediatric drug calculators (PDCs) intended for clinical use qualify as medical devices under the Medical Device Directive and the Medical Device Regulation. The extent to which they comply with European standards on quality and safety is unknown. Objective: This study determines the number of PDCs available as mobile apps for use in the Netherlands that bear a CE mark, and explore the factors influencing the CE marking of such devices among app developers. Methods: A scoping review of Google Play Store and Apple App Store was conducted to identify PDCs available for download in the Netherlands. CE accreditation of the sampled apps was determined by consulting the app landing pages on app stores, by screening the United Kingdom Medicines and Healthcare products Regulatory Agency's online registry of medical devices, and by surveying app developers. The barriers to CE accreditation were also explored through a survey of app developers. Results: Of 632 screened apps, 74 were eligible, including 60 pediatric drug dosage calculators and 14 infusion rate calculators. One app was CE marked. Of the 20 (34\%) respondents to the survey, 8 considered their apps not to be medical devices based on their intent of use or functionality. Three developers had not aimed to make their app available for use in Europe. Other barriers that may explain the limited CE accreditation of sampled PDC apps included poor awareness of European regulations among developers and a lack of restrictions when placing PDCs in app stores. Conclusions: The compliance of PDCs with European standards on medical devices is poor. This puts clinicians and their patients at risk of medical errors resulting from the largely unrestricted use of these apps. ", doi="10.2196/31333", url="https://www.jmir.org/2021/12/e31333", url="http://www.ncbi.nlm.nih.gov/pubmed/34898456" } @Article{info:doi/10.2196/28120, author="Guedalia, Joshua and Lipschuetz, Michal and Cohen, M. Sarah and Sompolinsky, Yishai and Walfisch, Asnat and Sheiner, Eyal and Sergienko, Ruslan and Rosenbloom, Joshua and Unger, Ron and Yagel, Simcha and Hochler, Hila", title="Transporting an Artificial Intelligence Model to Predict Emergency Cesarean Delivery: Overcoming Challenges Posed by Interfacility Variation", journal="J Med Internet Res", year="2021", month="Dec", day="10", volume="23", number="12", pages="e28120", keywords="machine learning", keywords="algorithm transport", keywords="health outcomes", keywords="health care facilities", keywords="artificial intelligence", keywords="AI", keywords="ML", keywords="pregnancy", keywords="birth", keywords="pediatrics", keywords="neonatal", keywords="prenatal", doi="10.2196/28120", url="https://www.jmir.org/2021/12/e28120", url="http://www.ncbi.nlm.nih.gov/pubmed/34890352" } @Article{info:doi/10.2196/30238, author="Rossetti, Collins Sarah and Dykes, C. Patricia and Knaplund, Christopher and Kang, Min-Jeoung and Schnock, Kumiko and Garcia Jr, Pedro Jose and Fu, Li-Heng and Chang, Frank and Thai, Tien and Fred, Matthew and Korach, Z. Tom and Zhou, Li and Klann, G. Jeffrey and Albers, David and Schwartz, Jessica and Lowenthal, Graham and Jia, Haomiao and Liu, Fang and Cato, Kenrick", title="The Communicating Narrative Concerns Entered by Registered Nurses (CONCERN) Clinical Decision Support Early Warning System: Protocol for a Cluster Randomized Pragmatic Clinical Trial", journal="JMIR Res Protoc", year="2021", month="Dec", day="10", volume="10", number="12", pages="e30238", keywords="nursing documentation", keywords="prediction", keywords="early warning system", keywords="deterioration", keywords="clinical trial", keywords="clinical decision support system", keywords="natural language processing", abstract="Background: Every year, hundreds of thousands of inpatients die from cardiac arrest and sepsis, which could be avoided if those patients' risk for deterioration were detected and timely interventions were initiated. Thus, a system is needed to convert real-time, raw patient data into consumable information that clinicians can utilize to identify patients at risk of deterioration and thus prevent mortality and improve patient health outcomes. The overarching goal of the COmmunicating Narrative Concerns Entered by Registered Nurses (CONCERN) study is to implement and evaluate an early warning score system that provides clinical decision support (CDS) in electronic health record systems. With a combination of machine learning and natural language processing, the CONCERN CDS utilizes nursing documentation patterns as indicators of nurses' increased surveillance to predict when patients are at the risk of clinical deterioration. Objective: The objective of this cluster randomized pragmatic clinical trial is to evaluate the effectiveness and usability of the CONCERN CDS system at 2 different study sites. The specific aim is to decrease hospitalized patients' negative health outcomes (in-hospital mortality, length of stay, cardiac arrest, unanticipated intensive care unit transfers, and 30-day hospital readmission rates). Methods: A multiple time-series intervention consisting of 3 phases will be performed through a 1-year period during the cluster randomized pragmatic clinical trial. Phase 1 evaluates the adoption of our algorithm through pilot and trial testing, phase 2 activates optimized versions of the CONCERN CDS based on experience from phase 1, and phase 3 will be a silent release mode where no CDS is viewable to the end user. The intervention deals with a series of processes from system release to evaluation. The system release includes CONCERN CDS implementation and user training. Then, a mixed methods approach will be used with end users to assess the system and clinician perspectives. Results: Data collection and analysis are expected to conclude by August 2022. Based on our previous work on CONCERN, we expect the system to have a positive impact on the mortality rate and length of stay. Conclusions: The CONCERN CDS will increase team-based situational awareness and shared understanding of patients predicted to be at risk for clinical deterioration in need of intervention to prevent mortality and associated harm. Trial Registration: ClinicalTrials.gov NCT03911687; https://clinicaltrials.gov/ct2/show/NCT03911687 International Registered Report Identifier (IRRID): DERR1-10.2196/30238 ", doi="10.2196/30238", url="https://www.researchprotocols.org/2021/12/e30238", url="http://www.ncbi.nlm.nih.gov/pubmed/34889766" } @Article{info:doi/10.2196/32698, author="Pan, Youcheng and Wang, Chenghao and Hu, Baotian and Xiang, Yang and Wang, Xiaolong and Chen, Qingcai and Chen, Junjie and Du, Jingcheng", title="A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation", journal="JMIR Med Inform", year="2021", month="Dec", day="8", volume="9", number="12", pages="e32698", keywords="electronic medical record", keywords="text-to-SQL generation", keywords="BERT", keywords="grammar-based decoding", keywords="tree-structured intermediate representation", abstract="Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text--to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text--to-SQL generation, indicating strong potential to be applied in the real medical scenario. ", doi="10.2196/32698", url="https://medinform.jmir.org/2021/12/e32698", url="http://www.ncbi.nlm.nih.gov/pubmed/34889749" } @Article{info:doi/10.2196/25022, author="Singh, Janmajay and Sato, Masahiro and Ohkuma, Tomoko", title="On Missingness Features in Machine Learning Models for Critical Care: Observational Study", journal="JMIR Med Inform", year="2021", month="Dec", day="8", volume="9", number="12", pages="e25022", keywords="electronic health records", keywords="informative missingness", keywords="machine learning", keywords="missing data", keywords="hospital mortality", keywords="sepsis", abstract="Background: Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient's health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. Objective: The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. Methods: A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. Results: Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2\% to 7.7\%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9\% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. Conclusions: This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further. ", doi="10.2196/25022", url="https://medinform.jmir.org/2021/12/e25022", url="http://www.ncbi.nlm.nih.gov/pubmed/34889756" } @Article{info:doi/10.2196/33049, author="Cha, Dongchul and Pae, Chongwon and Lee, A. Se and Na, Gina and Hur, Kyun Young and Lee, Young Ho and Cho, Ra A. and Cho, Joon Young and Han, Gil Sang and Kim, Huhn Sung and Choi, Young Jae and Park, Hae-Jeong", title="Differential Biases and Variabilities of Deep Learning--Based Artificial Intelligence and Human Experts in Clinical Diagnosis: Retrospective Cohort and Survey Study", journal="JMIR Med Inform", year="2021", month="Dec", day="8", volume="9", number="12", pages="e33049", keywords="human-machine cooperation", keywords="convolutional neural network", keywords="deep learning, class imbalance problem", keywords="otoscopy", keywords="eardrum", keywords="artificial intelligence", keywords="otology", keywords="computer-aided diagnosis", abstract="Background: Deep learning (DL)--based artificial intelligence may have different diagnostic characteristics than human experts in medical diagnosis. As a data-driven knowledge system, heterogeneous population incidence in the clinical world is considered to cause more bias to DL than clinicians. Conversely, by experiencing limited numbers of cases, human experts may exhibit large interindividual variability. Thus, understanding how the 2 groups classify given data differently is an essential step for the cooperative usage of DL in clinical application. Objective: This study aimed to evaluate and compare the differential effects of clinical experience in otoendoscopic image diagnosis in both computers and physicians exemplified by the class imbalance problem and guide clinicians when utilizing decision support systems. Methods: We used digital otoendoscopic images of patients who visited the outpatient clinic in the Department of Otorhinolaryngology at Severance Hospital, Seoul, South Korea, from January 2013 to June 2019, for a total of 22,707 otoendoscopic images. We excluded similar images, and 7500 otoendoscopic images were selected for labeling. We built a DL-based image classification model to classify the given image into 6 disease categories. Two test sets of 300 images were populated: balanced and imbalanced test sets. We included 14 clinicians (otolaryngologists and nonotolaryngology specialists including general practitioners) and 13 DL-based models. We used accuracy (overall and per-class) and kappa statistics to compare the results of individual physicians and the ML models. Results: Our ML models had consistently high accuracies (balanced test set: mean 77.14\%, SD 1.83\%; imbalanced test set: mean 82.03\%, SD 3.06\%), equivalent to those of otolaryngologists (balanced: mean 71.17\%, SD 3.37\%; imbalanced: mean 72.84\%, SD 6.41\%) and far better than those of nonotolaryngologists (balanced: mean 45.63\%, SD 7.89\%; imbalanced: mean 44.08\%, SD 15.83\%). However, ML models suffered from class imbalance problems (balanced test set: mean 77.14\%, SD 1.83\%; imbalanced test set: mean 82.03\%, SD 3.06\%). This was mitigated by data augmentation, particularly for low incidence classes, but rare disease classes still had low per-class accuracies. Human physicians, despite being less affected by prevalence, showed high interphysician variability (ML models: kappa=0.83, SD 0.02; otolaryngologists: kappa=0.60, SD 0.07). Conclusions: Even though ML models deliver excellent performance in classifying ear disease, physicians and ML models have their own strengths. ML models have consistent and high accuracy while considering only the given image and show bias toward prevalence, whereas human physicians have varying performance but do not show bias toward prevalence and may also consider extra information that is not images. To deliver the best patient care in the shortage of otolaryngologists, our ML model can serve a cooperative role for clinicians with diverse expertise, as long as it is kept in mind that models consider only images and could be biased toward prevalent diseases even after data augmentation. ", doi="10.2196/33049", url="https://medinform.jmir.org/2021/12/e33049", url="http://www.ncbi.nlm.nih.gov/pubmed/34889764" } @Article{info:doi/10.2196/33296, author="Izadi, Neda and Etemad, Koorosh and Mehrabi, Yadollah and Eshrati, Babak and Hashemi Nazari, Saeed Seyed", title="The Standardization of Hospital-Acquired Infection Rates Using Prediction Models in Iran: Observational Study of National Nosocomial Infection Registry Data", journal="JMIR Public Health Surveill", year="2021", month="Dec", day="7", volume="7", number="12", pages="e33296", keywords="hospital-acquired infections", keywords="standardized infection ratio", keywords="prediction model", keywords="Iran", abstract="Background: Many factors contribute to the spreading of hospital-acquired infections (HAIs). Objective: This study aimed to standardize the HAI rate using prediction models in Iran based on the National Healthcare Safety Network (NHSN) method. Methods: In this study, the Iranian nosocomial infections surveillance system (INIS) was used to gather data on patients with HAIs (126,314 infections). In addition, the hospital statistics and information system (AVAB) was used to collect data on hospital characteristics. First, well-performing hospitals, including 357 hospitals from all over the country, were selected. Data were randomly split into training (70\%) and testing (30\%) sets. Finally, the standardized infection ratio (SIR) and the corrected SIR were calculated for the HAIs. Results: The mean age of the 100,110 patients with an HAI was 40.02 (SD 23.56) years. The corrected SIRs based on the observed and predicted infections for respiratory tract infections (RTIs), urinary tract infections (UTIs), surgical site infections (SSIs), and bloodstream infections (BSIs) were 0.03 (95\% CI 0-0.09), 1.02 (95\% CI 0.95-1.09), 0.93 (95\% CI 0.85-1.007), and 0.91 (95\% CI 0.54-1.28), respectively. Moreover, the corrected SIRs for RTIs in the infectious disease, burn, obstetrics and gynecology, and internal medicine wards; UTIs in the burn, infectious disease, internal medicine, and intensive care unit wards; SSIs in the burn and infectious disease wards; and BSIs in most wards were >1, indicating that more HAIs were observed than expected. Conclusions: The results of this study can help to promote preventive measures based on scientific evidence. They can also lead to the continuous improvement of the monitoring system by collecting and systematically analyzing data on HAIs and encourage the hospitals to better control their infection rates by establishing a benchmarking system. ", doi="10.2196/33296", url="https://publichealth.jmir.org/2021/12/e33296", url="http://www.ncbi.nlm.nih.gov/pubmed/34879002" } @Article{info:doi/10.2196/29225, author="Kim, S. Rachel and Simon, Steven and Powers, Brett and Sandhu, Amneet and Sanchez, Jose and Borne, T. Ryan and Tumolo, Alexis and Zipse, Matthew and West, Jason J. and Aleong, Ryan and Tzou, Wendy and Rosenberg, A. Michael", title="Machine Learning Methodologies for Prediction of Rhythm-Control Strategy in Patients Diagnosed With Atrial Fibrillation: Observational, Retrospective, Case-Control Study", journal="JMIR Med Inform", year="2021", month="Dec", day="6", volume="9", number="12", pages="e29225", keywords="atrial fibrillation", keywords="rhythm-control", keywords="machine learning", keywords="ablation", keywords="antiarrhythmia agents", keywords="data science", keywords="biostatistics", keywords="artificial intelligence", abstract="Background: The identification of an appropriate rhythm management strategy for patients diagnosed with atrial fibrillation (AF) remains a major challenge for providers. Although clinical trials have identified subgroups of patients in whom a rate- or rhythm-control strategy might be indicated to improve outcomes, the wide range of presentations and risk factors among patients presenting with AF makes such approaches challenging. The strength of electronic health records is the ability to build in logic to guide management decisions, such that the system can automatically identify patients in whom a rhythm-control strategy is more likely and can promote efficient referrals to specialists. However, like any clinical decision support tool, there is a balance between interpretability and accurate prediction. Objective: This study aims to create an electronic health record--based prediction tool to guide patient referral to specialists for rhythm-control management by comparing different machine learning algorithms. Methods: We compared machine learning models of increasing complexity and used up to 50,845 variables to predict the rhythm-control strategy in 42,022 patients within the University of Colorado Health system at the time of AF diagnosis. Models were evaluated on the basis of their classification accuracy, defined by the F1 score and other metrics, and interpretability, captured by inspection of the relative importance of each predictor. Results: We found that age was by far the strongest single predictor of a rhythm-control strategy but that greater accuracy could be achieved with more complex models incorporating neural networks and more predictors for each participant. We determined that the impact of better prediction models was notable primarily in the rate of inappropriate referrals for rhythm-control, in which more complex models provided an average of 20\% fewer inappropriate referrals than simpler, more interpretable models. Conclusions: We conclude that any health care system seeking to incorporate algorithms to guide rhythm management for patients with AF will need to address this trade-off between prediction accuracy and model interpretability. ", doi="10.2196/29225", url="https://medinform.jmir.org/2021/12/e29225", url="http://www.ncbi.nlm.nih.gov/pubmed/34874889" } @Article{info:doi/10.2196/23571, author="M{\"u}ller, Lars and Srinivasan, Aditya and Abeles, R. Shira and Rajagopal, Amutha and Torriani, J. Francesca and Aronoff-Spencer, Eliah", title="A Risk-Based Clinical Decision Support System for Patient-Specific Antimicrobial Therapy (iBiogram): Design and Retrospective Analysis", journal="J Med Internet Res", year="2021", month="Dec", day="3", volume="23", number="12", pages="e23571", keywords="antimicrobial resistance", keywords="clinical decision support", keywords="antibiotic stewardship", keywords="data visualization", abstract="Background: There is a pressing need for digital tools that can leverage big data to help clinicians select effective antibiotic treatments in the absence of timely susceptibility data. Clinical presentation and local epidemiology can inform therapy selection to balance the risk of antimicrobial resistance and patient risk. However, data and clinical expertise must be appropriately integrated into clinical workflows. Objective: The aim of this study is to leverage available data in electronic health records, to develop a data-driven, user-centered, clinical decision support system to navigate patient safety and population health. Methods: We analyzed 5 years of susceptibility testing (1,078,510 isolates) and patient data (30,761 patients) across a large academic medical center. After curating the data according to the Clinical and Laboratory Standards Institute guidelines, we analyzed and visualized the impact of risk factors on clinical outcomes. On the basis of this data-driven understanding, we developed a probabilistic algorithm that maps these data to individual cases and implemented iBiogram, a prototype digital empiric antimicrobial clinical decision support system, which we evaluated against actual prescribing outcomes. Results: We determined patient-specific factors across syndromes and contexts and identified relevant local patterns of antimicrobial resistance by clinical syndrome. Mortality and length of stay differed significantly depending on these factors and could be used to generate heuristic targets for an acceptable risk of underprescription. Combined with the developed remaining risk algorithm, these factors can be used to inform clinicians' reasoning. A retrospective comparison of the iBiogram-suggested therapies versus the actual prescription by physicians showed similar performance for low-risk diseases such as urinary tract infections, whereas iBiogram recognized risk and recommended more appropriate coverage in high mortality conditions such as sepsis. Conclusions: The application of such data-driven, patient-centered tools may guide empirical prescription for clinicians to balance morbidity and mortality with antimicrobial stewardship. ", doi="10.2196/23571", url="https://www.jmir.org/2021/12/e23571", url="http://www.ncbi.nlm.nih.gov/pubmed/34870601" } @Article{info:doi/10.2196/31053, author="van Gils, M. Aniek and Visser, NC Leonie and Hendriksen, MA Heleen and Georges, Jean and Muller, Majon and Bouwman, H. Femke and van der Flier, M. Wiesje and Rhodius-Meester, FM Hanneke", title="Assessing the Views of Professionals, Patients, and Care Partners Concerning the Use of Computer Tools in Memory Clinics: International Survey Study", journal="JMIR Form Res", year="2021", month="Dec", day="3", volume="5", number="12", pages="e31053", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="dementia", keywords="diagnostic testing", keywords="diagnosis", keywords="prognosis", keywords="communication", abstract="Background: Computer tools based on artificial intelligence could aid clinicians in memory clinics in several ways, such as by supporting diagnostic decision-making, web-based cognitive testing, and the communication of diagnosis and prognosis. Objective: This study aims to identify the preferences as well as the main barriers and facilitators related to using computer tools in memory clinics for all end users, that is, clinicians, patients, and care partners. Methods: Between July and October 2020, we sent out invitations to a web-based survey to clinicians using the European Alzheimer's Disease Centers network and the Dutch Memory Clinic network, and 109 clinicians participated (mean age 45 years, SD 10; 53/109, 48.6\% female). A second survey was created for patients and care partners. They were invited via Alzheimer Europe, Alzheimer's Society United Kingdom, Amsterdam Dementia Cohort, and Amsterdam Aging Cohort. A total of 50 patients with subjective cognitive decline, mild cognitive impairment, or dementia (mean age 73 years, SD 8; 17/34, 34\% female) and 46 care partners (mean age 65 years, SD 12; 25/54, 54\% female) participated in this survey. Results: Most clinicians reported a willingness to use diagnostic (88/109, 80.7\%) and prognostic (83/109, 76.1\%) computer tools. User-friendliness (71/109, 65.1\%); Likert scale mean 4.5, SD 0.7), and increasing diagnostic accuracy (76/109, 69.7\%; mean 4.3, SD 0.7) were reported as the main factors stimulating the adoption of a tool. Tools should also save time and provide clear information on reliability and validity. Inadequate integration with electronic patient records (46/109, 42.2\%; mean 3.8, SD 1.0) and fear of losing important clinical information (48/109, 44\%; mean 3.7, SD 1.2) were most frequently indicated as barriers. Patients and care partners were equally positive about the use of computer tools by clinicians, both for diagnosis (69/96, 72\%) and prognosis (73/96, 76\%). In addition, most of them thought favorably regarding the possibility of using the tools themselves. Conclusions: This study showed that computer tools in memory clinics are positively valued by most end users. For further development and implementation, it is essential to overcome the technical and practical barriers of a tool while paying utmost attention to its reliability and validity. ", doi="10.2196/31053", url="https://formative.jmir.org/2021/12/e31053", url="http://www.ncbi.nlm.nih.gov/pubmed/34870612" } @Article{info:doi/10.2196/27024, author="Mosa, Mohammad Abu Saleh and Rana, Zaman Md Kamruz and Islam, Humayera and Hossain, Mosharraf A. K. M. and Yoo, Illhoi", title="A Smartphone-Based Decision Support Tool for Predicting Patients at Risk of Chemotherapy-Induced Nausea and Vomiting: Retrospective Study on App Development Using Decision Tree Induction", journal="JMIR Mhealth Uhealth", year="2021", month="Dec", day="2", volume="9", number="12", pages="e27024", keywords="chemotherapy", keywords="CINV risk factors", keywords="data mining", keywords="prediction", keywords="decision trees", keywords="clinical decision support", keywords="smartphone app", abstract="Background: Chemotherapy-induced nausea and vomiting (CINV) are the two most frightful and unpleasant side effects of chemotherapy. CINV is accountable for poor treatment outcomes, treatment failure, or even death. It can affect patients' overall quality of life, leading to many social, economic, and clinical consequences. Objective: This study compared the performances of different data mining models for predicting the risk of CINV among the patients and developed a smartphone app for clinical decision support to recommend the risk of CINV at the point of care. Methods: Data were collected by retrospective record review from the electronic medical records used at the University of Missouri Ellis Fischel Cancer Center. Patients who received chemotherapy and standard antiemetics at the oncology outpatient service from June 1, 2010, to July 31, 2012, were included in the study. There were six independent data sets of patients based on emetogenicity (low, moderate, and high) and two phases of CINV (acute and delayed). A total of 14 risk factors of CINV were chosen for data mining. For our study, we used five popular data mining algorithms: (1) naive Bayes algorithm, (2) logistic regression classifier, (3) neural network, (4) support vector machine (using sequential minimal optimization), and (5) decision tree. Performance measures, such as accuracy, sensitivity, and specificity with 10-fold cross-validation, were used for model comparisons. A smartphone app called CINV Risk Prediction Application was developed using the ResearchKit in iOS utilizing the decision tree algorithm, which conforms to the criteria of explainable, usable, and actionable artificial intelligence. The app was created using both the bulk questionnaire approach and the adaptive approach. Results: The decision tree performed well in both phases of high emetogenic chemotherapies, with a significant margin compared to the other algorithms. The accuracy measure for the six patient groups ranged from 79.3\% to 94.8\%. The app was developed using the results from the decision tree because of its consistent performance and simple, explainable nature. The bulk questionnaire approach asks 14 questions in the smartphone app, while the adaptive approach can determine questions based on the previous questions' answers. The adaptive approach saves time and can be beneficial when used at the point of care. Conclusions: This study solved a real clinical problem, and the solution can be used for personalized and precise evidence-based CINV management, leading to a better life quality for patients and reduced health care costs. ", doi="10.2196/27024", url="https://mhealth.jmir.org/2021/12/e27024", url="http://www.ncbi.nlm.nih.gov/pubmed/34860677" } @Article{info:doi/10.2196/22798, author="Chang, Wei Che and Lai, Feipei and Christian, Mesakh and Chen, Chun Yu and Hsu, Ching and Chen, Shen Yo and Chang, Hao Dun and Roan, Luen Tyng and Yu, Che Yen", title="Deep Learning--Assisted Burn Wound Diagnosis: Diagnostic Model Development Study", journal="JMIR Med Inform", year="2021", month="Dec", day="2", volume="9", number="12", pages="e22798", keywords="deep learning", keywords="semantic segmentation", keywords="instance segmentation", keywords="burn wounds", keywords="percentage total body surface area", abstract="Background: Accurate assessment of the percentage total body surface area (\%TBSA) of burn wounds is crucial in the management of burn patients. The resuscitation fluid and nutritional needs of burn patients, their need for intensive unit care, and probability of mortality are all directly related to \%TBSA. It is difficult to estimate a burn area of irregular shape by inspection. Many articles have reported discrepancies in estimating \%TBSA by different doctors. Objective: We propose a method, based on deep learning, for burn wound detection, segmentation, and calculation of \%TBSA on a pixel-to-pixel basis. Methods: A 2-step procedure was used to convert burn wound diagnosis into \%TBSA. In the first step, images of burn wounds were collected from medical records and labeled by burn surgeons, and the data set was then input into 2 deep learning architectures, U-Net and Mask R-CNN, each configured with 2 different backbones, to segment the burn wounds. In the second step, we collected and labeled images of hands to create another data set, which was also input into U-Net and Mask R-CNN to segment the hands. The \%TBSA of burn wounds was then calculated by comparing the pixels of mask areas on images of the burn wound and hand of the same patient according to the rule of hand, which states that one's hand accounts for 0.8\% of TBSA. Results: A total of 2591 images of burn wounds were collected and labeled to form the burn wound data set. The data set was randomly split into training, validation, and testing sets in a ratio of 8:1:1. Four hundred images of volar hands were collected and labeled to form the hand data set, which was also split into 3 sets using the same method. For the images of burn wounds, Mask R-CNN with ResNet101 had the best segmentation result with a Dice coefficient (DC) of 0.9496, while U-Net with ResNet101 had a DC of 0.8545. For the hand images, U-Net and Mask R-CNN had similar performance with DC values of 0.9920 and 0.9910, respectively. Lastly, we conducted a test diagnosis in a burn patient. Mask R-CNN with ResNet101 had on average less deviation (0.115\% TBSA) from the ground truth than burn surgeons. Conclusions: This is one of the first studies to diagnose all depths of burn wounds and convert the segmentation results into \%TBSA using different deep learning models. We aimed to assist medical staff in estimating burn size more accurately, thereby helping to provide precise care to burn victims. ", doi="10.2196/22798", url="https://medinform.jmir.org/2021/12/e22798", url="http://www.ncbi.nlm.nih.gov/pubmed/34860674" } @Article{info:doi/10.2196/23440, author="Alanazi, M. Eman and Abdou, Aalaa and Luo, Jake", title="Predicting Risk of Stroke From Lab Tests Using Machine Learning Algorithms: Development and Evaluation of Prediction Models", journal="JMIR Form Res", year="2021", month="Dec", day="2", volume="5", number="12", pages="e23440", keywords="stroke", keywords="lab tests", keywords="machine learning technology", keywords="predictive analytics", abstract="Background: Stroke, a cerebrovascular disease, is one of the major causes of death. It causes significant health and financial burdens for both patients and health care systems. One of the important risk factors for stroke is health-related behavior, which is becoming an increasingly important focus of prevention. Many machine learning models have been built to predict the risk of stroke or to automatically diagnose stroke, using predictors such as lifestyle factors or radiological imaging. However, there have been no models built using data from lab tests. Objective: The aim of this study was to apply computational methods using machine learning techniques to predict stroke from lab test data. Methods: We used the National Health and Nutrition Examination Survey data sets with three different data selection methods (ie, without data resampling, with data imputation, and with data resampling) to develop predictive models. We used four machine learning classifiers and six performance measures to evaluate the performance of the models. Results: We found that accurate and sensitive machine learning models can be created to predict stroke from lab test data. Our results show that the data resampling approach performed the best compared to the other two data selection techniques. Prediction with the random forest algorithm, which was the best algorithm tested, achieved an accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve of 0.96, 0.97, 0.96, 0.75, 0.99, and 0.97, respectively, when all of the attributes were used. Conclusions: The predictive model, built using data from lab tests, was easy to use and had high accuracy. In future studies, we aim to use data that reflect different types of stroke and to explore the data to build a prediction model for each type. ", doi="10.2196/23440", url="https://formative.jmir.org/2021/12/e23440", url="http://www.ncbi.nlm.nih.gov/pubmed/34860663" } @Article{info:doi/10.2196/32507, author="Ben-Shabat, Niv and Sloma, Ariel and Weizman, Tomer and Kiderman, David and Amital, Howard", title="Assessing the Performance of a New Artificial Intelligence--Driven Diagnostic Support Tool Using Medical Board Exam Simulations: Clinical Vignette Study", journal="JMIR Med Inform", year="2021", month="Nov", day="30", volume="9", number="11", pages="e32507", keywords="diagnostic decision support systems", keywords="diagnostic support", keywords="medical decision-making", keywords="medical informatics", keywords="artificial intelligence", keywords="Kahun", keywords="decision support", abstract="Background: Diagnostic decision support systems (DDSS) are computer programs aimed to improve health care by supporting clinicians in the process of diagnostic decision-making. Previous studies on DDSS demonstrated their ability to enhance clinicians' diagnostic skills, prevent diagnostic errors, and reduce hospitalization costs. Despite the potential benefits, their utilization in clinical practice is limited, emphasizing the need for new and improved products. Objective: The aim of this study was to conduct a preliminary analysis of the diagnostic performance of ``Kahun,'' a new artificial intelligence-driven diagnostic tool. Methods: Diagnostic performance was evaluated based on the program's ability to ``solve'' clinical cases from the United States Medical Licensing Examination Step 2 Clinical Skills board exam simulations that were drawn from the case banks of 3 leading preparation companies. Each case included 3 expected differential diagnoses. The cases were entered into the Kahun platform by 3 blinded junior physicians. For each case, the presence and the rank of the correct diagnoses within the generated differential diagnoses list were recorded. Each diagnostic performance was measured in two ways: first, as diagnostic sensitivity, and second, as case-specific success rates that represent diagnostic comprehensiveness. Results: The study included 91 clinical cases with 78 different chief complaints and a mean number of 38 (SD 8) findings for each case. The total number of expected diagnoses was 272, of which 174 were different (some appeared more than once). Of the 272 expected diagnoses, 231 (87.5\%; 95\% CI 76-99) diagnoses were suggested within the top 20 listed diagnoses, 209 (76.8\%; 95\% CI 66-87) were suggested within the top 10, and 168 (61.8\%; 95\% CI 52-71) within the top 5. The median rank of correct diagnoses was 3 (IQR 2-6). Of the 91 expected diagnoses, 62 (68\%; 95\% CI 59-78) of the cases were suggested within the top 20 listed diagnoses, 44 (48\%; 95\% CI 38-59) within the top 10, and 24 (26\%; 95\% CI 17-35) within the top 5. Of the 91 expected diagnoses, in 87 (96\%; 95\% CI 91-100), at least 2 out of 3 of the cases' expected diagnoses were suggested within the top 20 listed diagnoses; 78 (86\%; 95\% CI 79-93) were suggested within the top 10; and 61 (67\%; 95\% CI 57-77) within the top 5. Conclusions: The diagnostic support tool evaluated in this study demonstrated good diagnostic accuracy and comprehensiveness; it also had the ability to manage a wide range of clinical findings. ", doi="10.2196/32507", url="https://medinform.jmir.org/2021/11/e32507", url="http://www.ncbi.nlm.nih.gov/pubmed/34672262" } @Article{info:doi/10.2196/32180, author="Chen, Rai-Fu and Hsiao, Ju-Ling", title="Health Professionals' Perspectives on Electronic Medical Record Infusion and Individual Performance: Model Development and Questionnaire Survey Study", journal="JMIR Med Inform", year="2021", month="Nov", day="30", volume="9", number="11", pages="e32180", keywords="health care professional", keywords="electronic medical records", keywords="IS infusion", keywords="individual performance", keywords="EHR", keywords="electronic health record", keywords="performance", keywords="perspective", keywords="information system", keywords="integration", keywords="decision-making", keywords="health information exchange", keywords="questionnaire", abstract="Background: Electronic medical records (EMRs) are integrated information sources generated by health care professionals (HCPs) from various health care information systems. EMRs play crucial roles in improving the quality of care and medical decision-making and in facilitating cross-hospital health information exchange. Although many hospitals have invested considerable resources and efforts to develop EMRs for several years, the factors affecting the long-term success of EMRs, particularly in the EMR infusion stage, remain unclear. Objective: The aim of this study was to investigate the effects of technology, user, and task characteristics on EMR infusion to determine the factors that largely affect EMR infusion. In addition, we examined the effect of EMR infusion on individual HCP performance. Methods: A questionnaire survey was used to collect data from HCPs with >6 months experience in using EMRs in a Taiwanese teaching hospital. A total of 316 questionnaires were distributed and 211 complete copies were returned, yielding a valid response rate of 66.8\%. The collected data were further analyzed using WarpPLS 5.0. Results: EMR infusion (R2=0.771) was mainly affected by user habits ($\beta$=.411), portability ($\beta$=.217), personal innovativeness ($\beta$=.198), technostress ($\beta$=.169), and time criticality ($\beta$=.168), and individual performance (R2=0.541) was affected by EMR infusion ($\beta$=.735). This finding indicated that user (habit, personal innovativeness, and technostress), technology (portability), and task (mobility and time criticality) characteristics have major effects on EMR infusion. Furthermore, the results indicated that EMR infusion positively affects individual performance. Conclusions: The factors identified in this study can extend information systems infusion theory and provide useful insights for the further improvement of EMR development in hospitals and by the government, specifically in its infusion stage. In addition, the developed instrument can be used as an assessment tool to identify the key factors for EMR infusion, and to evaluate the extent of EMR infusion and the individual performance of hospitals that have implemented EMR systems. Moreover, the results can help governments to understand the urgent needs of hospitals in implementing EMR systems, provide sufficient resources and support to improve the incentives of EMR development, and develop adequate EMR policies for the meaningful use of electronic health records among hospitals and clinics. ", doi="10.2196/32180", url="https://medinform.jmir.org/2021/11/e32180", url="http://www.ncbi.nlm.nih.gov/pubmed/34851297" } @Article{info:doi/10.2196/26123, author="Javidan, Pedram Arshia and Brand, Allan and Cameron, Andrew and D'Ovidio, Tommaso and Persaud, Martin and Lewis, Kirsten and O'Connor, Chris", title="Examination of a Canada-Wide Collaboration Platform for Order Sets: Retrospective Analysis", journal="J Med Internet Res", year="2021", month="Nov", day="29", volume="23", number="11", pages="e26123", keywords="evidence-based medicine", keywords="health informatics", keywords="knowledge translation", keywords="order sets", keywords="Web 2.0", abstract="Background: Knowledge translation and dissemination are some of the main challenges that affect evidence-based medicine. Web 2.0 platforms promote the sharing and collaborative development of content. Executable knowledge tools, such as order sets, are a knowledge translation tool whose localization is critical to its effectiveness but a challenge for organizations to develop independently. Objective: This paper describes a Web 2.0 resource, referred to as the collaborative network (TCN), for order set development designed to share executable knowledge (order sets). This paper also analyzes the scope of its use, describes its use through network analysis, and examines the provision and use of order sets in the platform by organizational size. Methods: Data were collected from Think Research's TxConnect platform. We measured interorganization sharing across Canadian hospitals using descriptive statistics. A weighted chi-square analysis was used to evaluate institutional size to share volumes based on institution size, with post hoc Cramer V score to measure the strength of association. Results: TCN consisted of 12,495 order sets across 683 diagnoses or processes. Between January 2010 and March 2015, a total of 131 health care organizations representing 360 hospitals in Canada downloaded order sets 105,496 times. Order sets related to acute coronary syndrome, analgesia, and venous thromboembolism were most commonly shared. COVID-19 order sets were among the most actively shared, adjusting for order set lifetime. A weighted chi-square analysis showed nonrandom downloading behavior (P<.001), with medium-sized institutions downloading content from larger institutions acting as the most significant driver of this variance (chi-gram=124.70). Conclusions: In this paper, we have described and analyzed a Web 2.0 platform for the sharing of order set content with significant network activity. The robust use of TCN to access customized order sets reflects its value as a resource for health care organizations when they develop or update their own order sets. ", doi="10.2196/26123", url="https://www.jmir.org/2021/11/e26123", url="http://www.ncbi.nlm.nih.gov/pubmed/34847055" } @Article{info:doi/10.2196/32900, author="Mahajan, Abhishaike and Deonarine, Andrew and Bernal, Axel and Lyons, Genevieve and Norgeot, Beau", title="Developing the Total Health Profile, a Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning for Broad Patient Populations: Retrospective Cohort Study", journal="J Med Internet Res", year="2021", month="Nov", day="26", volume="23", number="11", pages="e32900", keywords="multimorbidity", keywords="clinical risk score", keywords="outcome research", keywords="machine learning", keywords="electronic health record", keywords="clinical informatics", keywords="morbidity", keywords="risk", keywords="outcome", keywords="population data", keywords="diagnostic", keywords="demographic", keywords="decision making", keywords="cohort", keywords="prediction", abstract="Background: Multimorbidity clinical risk scores allow clinicians to quickly assess their patients' health for decision making, often for recommendation to care management programs. However, these scores are limited by several issues: existing multimorbidity scores (1) are generally limited to one data group (eg, diagnoses, labs) and may be missing vital information, (2) are usually limited to specific demographic groups (eg, age), and (3) do not formally provide any granularity in the form of more nuanced multimorbidity risk scores to direct clinician attention. Objective: Using diagnosis, lab, prescription, procedure, and demographic data from electronic health records (EHRs), we developed a physiologically diverse and generalizable set of multimorbidity risk scores. Methods: Using EHR data from a nationwide cohort of patients, we developed the total health profile, a set of six integrated risk scores reflecting five distinct organ systems and overall health. We selected the occurrence of an inpatient hospital visitation over a 2-year follow-up window, attributable to specific organ systems, as our risk endpoint. Using a physician-curated set of features, we trained six machine learning models on 794,294 patients to predict the calibrated probability of the aforementioned endpoint, producing risk scores for heart, lung, neuro, kidney, and digestive functions and a sixth score for combined risk. We evaluated the scores using a held-out test cohort of 198,574 patients. Results: Study patients closely matched national census averages, with a median age of 41 years, a median income of \$66,829, and racial averages by zip code of 73.8\% White, 5.9\% Asian, and 11.9\% African American. All models were well calibrated and demonstrated strong performance with areas under the receiver operating curve (AUROCs) of 0.83 for the total health score (THS), 0.89 for heart, 0.86 for lung, 0.84 for neuro, 0.90 for kidney, and 0.83 for digestive functions. There was consistent performance of this scoring system across sexes, diverse patient ages, and zip code income levels. Each model learned to generate predictions by focusing on appropriate clinically relevant patient features, such as heart-related hospitalizations and chronic hypertension diagnosis for the heart model. The THS outperformed the other commonly used multimorbidity scoring systems, specifically the Charlson Comorbidity Index (CCI) and the Elixhauser Comorbidity Index (ECI) overall (AUROCs: THS=0.823, CCI=0.735, ECI=0.649) as well as for every age, sex, and income bracket. Performance improvements were most pronounced for middle-aged and lower-income subgroups. Ablation tests using only diagnosis, prescription, social determinants of health, and lab feature groups, while retaining procedure-related features, showed that the combination of feature groups has the best predictive performance, though only marginally better than the diagnosis-only model on at-risk groups. Conclusions: Massive retrospective EHR data sets have made it possible to use machine learning to build practical multimorbidity risk scores that are highly predictive, personalizable, intuitive to explain, and generalizable across diverse patient populations. ", doi="10.2196/32900", url="https://www.jmir.org/2021/11/e32900", url="http://www.ncbi.nlm.nih.gov/pubmed/34842542" } @Article{info:doi/10.2196/23101, author="Chang, David and Lin, Eric and Brandt, Cynthia and Taylor, Andrew Richard", title="Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison", journal="JMIR Med Inform", year="2021", month="Nov", day="26", volume="9", number="11", pages="e23101", keywords="natural language processing", keywords="graph neural networks", keywords="National NLP Clinical Challenges", keywords="bidirectional encoder representation from transformers", abstract="Background: Although electronic health record systems have facilitated clinical documentation in health care, they have also introduced new challenges, such as the proliferation of redundant information through the use of copy and paste commands or templates. One approach to trimming down bloated clinical documentation and improving clinical summarization is to identify highly similar text snippets with the goal of removing such text. Objective: We developed a natural language processing system for the task of assessing clinical semantic textual similarity. The system assigns scores to pairs of clinical text snippets based on their clinical semantic similarity. Methods: We leveraged recent advances in natural language processing and graph representation learning to create a model that combines linguistic and domain knowledge information from the MedSTS data set to assess clinical semantic textual similarity. We used bidirectional encoder representation from transformers (BERT)--based models as text encoders for the sentence pairs in the data set and graph convolutional networks (GCNs) as graph encoders for corresponding concept graphs that were constructed based on the sentences. We also explored techniques, including data augmentation, ensembling, and knowledge distillation, to improve the model's performance, as measured by the Pearson correlation coefficient (r). Results: Fine-tuning the BERT\_base and ClinicalBERT models on the MedSTS data set provided a strong baseline (Pearson correlation coefficients: 0.842 and 0.848, respectively) compared to those of the previous year's submissions. Our data augmentation techniques yielded moderate gains in performance, and adding a GCN-based graph encoder to incorporate the concept graphs also boosted performance, especially when the node features were initialized with pretrained knowledge graph embeddings of the concepts (r=0.868). As expected, ensembling improved performance, and performing multisource ensembling by using different language model variants, conducting knowledge distillation with the multisource ensemble model, and taking a final ensemble of the distilled models further improved the system's performance (Pearson correlation coefficients: 0.875, 0.878, and 0.882, respectively). Conclusions: This study presents a system for the MedSTS clinical semantic textual similarity benchmark task, which was created by combining BERT-based text encoders and GCN-based graph encoders in order to incorporate domain knowledge into the natural language processing pipeline. We also experimented with other techniques involving data augmentation, pretrained concept embeddings, ensembling, and knowledge distillation to further increase our system's performance. Although the task and its benchmark data set are in the early stages of development, this study, as well as the results of the competition, demonstrates the potential of modern language model--based systems to detect redundant information in clinical notes. ", doi="10.2196/23101", url="https://medinform.jmir.org/2021/11/e23101", url="http://www.ncbi.nlm.nih.gov/pubmed/34842531" } @Article{info:doi/10.2196/31214, author="Mathioudakis, Nestoras and Aboabdo, Moeen and Abusamaan, S. Mohammed and Yuan, Christina and Lewis Boyer, LaPricia and Pilla, J. Scott and Johnson, Erica and Desai, Sanjay and Knight, Amy and Greene, Peter and Golden, H. Sherita", title="Stakeholder Perspectives on an Inpatient Hypoglycemia Informatics Alert: Mixed Methods Study", journal="JMIR Hum Factors", year="2021", month="Nov", day="26", volume="8", number="4", pages="e31214", keywords="informatics alert", keywords="clinical decision support", keywords="hypoglycemia", keywords="hospital", keywords="inpatient", abstract="Background: Iatrogenic hypoglycemia is a common occurrence among hospitalized patients and is associated with poor clinical outcomes and increased mortality. Clinical decision support systems can be used to reduce the incidence of this potentially avoidable adverse event. Objective: This study aims to determine the desired features and functionality of a real-time informatics alert to prevent iatrogenic hypoglycemia in a hospital setting. Methods: Using the Agency for Healthcare Research and Quality Five Rights of Effective Clinical Decision Support Framework, we conducted a mixed methods study using an electronic survey and focus group sessions of hospital-based providers. The goal was to elicit stakeholder input to inform the future development of a real-time informatics alert to target iatrogenic hypoglycemia. In addition to perceptions about the importance of the problem and existing barriers, we sought input regarding the content, format, channel, timing, and recipient for the alert (ie, the Five Rights). Thematic analysis of focus group sessions was conducted using deductive and inductive approaches. Results: A 21-item electronic survey was completed by 102 inpatient-based providers, followed by 2 focus group sessions (6 providers per session). Respondents universally agreed or strongly agreed that inpatient iatrogenic hypoglycemia is an important problem that can be addressed with an informatics alert. Stakeholders expressed a preference for an alert that is nonintrusive, accurate, communicated in near real time to the ordering provider, and provides actionable treatment recommendations. Several electronic medical record tools, including alert indicators in the patient header, glucose management report, and laboratory results section, were deemed acceptable formats for consideration. Concerns regarding alert fatigue were prevalent among both survey respondents and focus group participants. Conclusions: The design preferences identified in this study will provide the framework needed for an informatics team to develop a prototype alert for pilot testing and evaluation. This alert will help meet the needs of hospital-based clinicians caring for patients with diabetes who are at a high risk of treatment-related hypoglycemia. ", doi="10.2196/31214", url="https://humanfactors.jmir.org/2021/4/e31214", url="http://www.ncbi.nlm.nih.gov/pubmed/34842544" } @Article{info:doi/10.2196/26964, author="Matthiesen, Stina and Diederichsen, Z{\"o}ga S{\o}ren and Hansen, Hartmann Mikkel Klitzing and Villumsen, Christina and Lassen, H{\o}jbjerg Mats Christian and Jacobsen, Karl Peter and Risum, Niels and Winkel, Gregers Bo and Philbert, T. Berit and Svendsen, Hastrup Jesper and Andersen, Osman Tariq", title="Clinician Preimplementation Perspectives of a Decision-Support Tool for the Prediction of Cardiac Arrhythmia Based on Machine Learning: Near-Live Feasibility and Qualitative Study", journal="JMIR Hum Factors", year="2021", month="Nov", day="26", volume="8", number="4", pages="e26964", keywords="cardiac arrhythmia", keywords="short-term prediction", keywords="clinical decision support systems", keywords="machine learning", keywords="artificial intelligence", keywords="preimplementation", keywords="qualitative study", keywords="implantable cardioverter defibrillator", keywords="remote follow-up", keywords="sociotechnical", abstract="Background: Artificial intelligence (AI), such as machine learning (ML), shows great promise for improving clinical decision-making in cardiac diseases by outperforming statistical-based models. However, few AI-based tools have been implemented in cardiology clinics because of the sociotechnical challenges during transitioning from algorithm development to real-world implementation. Objective: This study explored how an ML-based tool for predicting ventricular tachycardia and ventricular fibrillation (VT/VF) could support clinical decision-making in the remote monitoring of patients with an implantable cardioverter defibrillator (ICD). Methods: Seven experienced electrophysiologists participated in a near-live feasibility and qualitative study, which included walkthroughs of 5 blinded retrospective patient cases, use of the prediction tool, and questionnaires and interview questions. All sessions were video recorded, and sessions evaluating the prediction tool were transcribed verbatim. Data were analyzed through an inductive qualitative approach based on grounded theory. Results: The prediction tool was found to have potential for supporting decision-making in ICD remote monitoring by providing reassurance, increasing confidence, acting as a second opinion, reducing information search time, and enabling delegation of decisions to nurses and technicians. However, the prediction tool did not lead to changes in clinical action and was found less useful in cases where the quality of data was poor or when VT/VF predictions were found to be irrelevant for evaluating the patient. Conclusions: When transitioning from AI development to testing its feasibility for clinical implementation, we need to consider the following: expectations must be aligned with the intended use of AI; trust in the prediction tool is likely to emerge from real-world use; and AI accuracy is relational and dependent on available information and local workflows. Addressing the sociotechnical gap between the development and implementation of clinical decision-support tools based on ML in cardiac care is essential for succeeding with adoption. It is suggested to include clinical end-users, clinical contexts, and workflows throughout the overall iterative approach to design, development, and implementation. ", doi="10.2196/26964", url="https://humanfactors.jmir.org/2021/4/e26964", url="http://www.ncbi.nlm.nih.gov/pubmed/34842528" } @Article{info:doi/10.2196/26456, author="Cho, Insook and Jin, sun In and Park, Hyunchul and Dykes, C. Patricia", title="Clinical Impact of an Analytic Tool for Predicting the Fall Risk in Inpatients: Controlled Interrupted Time Series", journal="JMIR Med Inform", year="2021", month="Nov", day="25", volume="9", number="11", pages="e26456", keywords="clinical effectiveness", keywords="data analytics", keywords="event prediction", keywords="inpatient falls", keywords="process metrics", abstract="Background: Patient falls are a common cause of harm in acute-care hospitals worldwide. They are a difficult, complex, and common problem requiring a great deal of nurses' time, attention, and effort in practice. The recent rapid expansion of health care predictive analytic applications and the growing availability of electronic health record (EHR) data have resulted in the development of machine learning models that predict adverse events. However, the clinical impact of these models in terms of patient outcomes and clinicians' responses is undetermined. Objective: The purpose of this study was to determine the impact of an electronic analytic tool for predicting fall risk on patient outcomes and nurses' responses. Methods: A controlled interrupted time series (ITS) experiment was conducted in 12 medical-surgical nursing units at a public hospital between May 2017 and April 2019. In six of the units, the patients' fall risk was assessed using the St. Thomas' Risk Assessment Tool in Falling Elderly Inpatients (STRATIFY) system (control units), while in the other six, a predictive model for inpatient fall risks was implemented using routinely obtained data from the hospital's EHR system (intervention units). The primary outcome was the rate of patient falls; secondary outcomes included the rate of falls with injury and analysis of process metrics (nursing interventions that are designed to mitigate the risk of fall). Results: During the study period, there were 42,476 admissions, of which 707 were for falls and 134 for fall injuries. Allowing for differences in the patients' characteristics and baseline process metrics, the number of patients with falls differed between the control (n=382) and intervention (n=325) units. The mean fall rate increased from 1.95 to 2.11 in control units and decreased from 1.92 to 1.79 in intervention units. A separate ITS analysis revealed that the immediate reduction was 29.73\% in the intervention group (z=--2.06, P=.039) and 16.58\% in the control group (z=--1.28, P=.20), but there was no ongoing effect. The injury rate did not differ significantly between the two groups (0.42 vs 0.31, z=1.50, P=.134). Among the process metrics, the risk-targeted interventions increased significantly over time in the intervention group. Conclusions: This early-stage clinical evaluation revealed that implementation of an analytic tool for predicting fall risk may to contribute to an awareness of fall risk, leading to positive changes in nurses' interventions over time. Trial Registration: Clinical Research Information Service (CRIS), Republic of Korea KCT0005286; https://cris.nih.go.kr/cris/search/detailSearch.do/16984 ", doi="10.2196/26456", url="https://medinform.jmir.org/2021/11/e26456", url="http://www.ncbi.nlm.nih.gov/pubmed/34626168" } @Article{info:doi/10.2196/25856, author="Esmaeilzadeh, Pouyan and Mirzaei, Tala and Dharanikota, Spurthy", title="Patients' Perceptions Toward Human--Artificial Intelligence Interaction in Health Care: Experimental Study", journal="J Med Internet Res", year="2021", month="Nov", day="25", volume="23", number="11", pages="e25856", keywords="AI clinical applications", keywords="collective intelligence", keywords="in-person examinations", keywords="perceived benefits", keywords="perceived risks", abstract="Background: It is believed that artificial intelligence (AI) will be an integral part of health care services in the near future and will be incorporated into several aspects of clinical care such as prognosis, diagnostics, and care planning. Thus, many technology companies have invested in producing AI clinical applications. Patients are one of the most important beneficiaries who potentially interact with these technologies and applications; thus, patients' perceptions may affect the widespread use of clinical AI. Patients should be ensured that AI clinical applications will not harm them, and that they will instead benefit from using AI technology for health care purposes. Although human-AI interaction can enhance health care outcomes, possible dimensions of concerns and risks should be addressed before its integration with routine clinical care. Objective: The main objective of this study was to examine how potential users (patients) perceive the benefits, risks, and use of AI clinical applications for their health care purposes and how their perceptions may be different if faced with three health care service encounter scenarios. Methods: We designed a 2{\texttimes}3 experiment that crossed a type of health condition (ie, acute or chronic) with three different types of clinical encounters between patients and physicians (ie, AI clinical applications as substituting technology, AI clinical applications as augmenting technology, and no AI as a traditional in-person visit). We used an online survey to collect data from 634 individuals in the United States. Results: The interactions between the types of health care service encounters and health conditions significantly influenced individuals' perceptions of privacy concerns, trust issues, communication barriers, concerns about transparency in regulatory standards, liability risks, benefits, and intention to use across the six scenarios. We found no significant differences among scenarios regarding perceptions of performance risk and social biases. Conclusions: The results imply that incompatibility with instrumental, technical, ethical, or regulatory values can be a reason for rejecting AI applications in health care. Thus, there are still various risks associated with implementing AI applications in diagnostics and treatment recommendations for patients with both acute and chronic illnesses. The concerns are also evident if the AI applications are used as a recommendation system under physician experience, wisdom, and control. Prior to the widespread rollout of AI, more studies are needed to identify the challenges that may raise concerns for implementing and using AI applications. This study could provide researchers and managers with critical insights into the determinants of individuals' intention to use AI clinical applications. Regulatory agencies should establish normative standards and evaluation guidelines for implementing AI in health care in cooperation with health care institutions. Regular audits and ongoing monitoring and reporting systems can be used to continuously evaluate the safety, quality, transparency, and ethical factors of AI clinical applications. ", doi="10.2196/25856", url="https://www.jmir.org/2021/11/e25856", url="http://www.ncbi.nlm.nih.gov/pubmed/34842535" } @Article{info:doi/10.2196/31442, author="Ramachandran, Raghav and McShea, J. Michael and Howson, N. Stephanie and Burkom, S. Howard and Chang, Hsien-Yen and Weiner, P. Jonathan and Kharrazi, Hadi", title="Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data", journal="JMIR Med Inform", year="2021", month="Nov", day="25", volume="9", number="11", pages="e31442", keywords="persistent high users", keywords="persistent high utilizers", keywords="latent class analysis", keywords="comorbidity patterns", keywords="utilization prediction", keywords="unsupervised clustering", keywords="population health analytics", keywords="health care", keywords="prediction models", keywords="health care services", keywords="health care costs", abstract="Background: A high proportion of health care services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing costs and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve health care system efficiencies and improve the overall quality of patient care. Objective: The aim of this study was to detect key classes of diseases and medications among the study population and to assess the predictive value of these classes in identifying PHUs. Methods: This study was a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring health care costs in the top 20\% of all patients' costs for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied latent class analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in health care utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic covariates, and health utilization covariates. Predictive powers of the regression models were assessed and compared using standard metrics. Results: We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9\% were women, 3.3\% had ?1 hospitalization, and 19.1\% had 10+ outpatient visits in 2013. A total of 8359 (5.09\%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: acute upper respiratory infection (URI) (n=53,232; 4.6\% PHUs), mental health (n=34,456; 12.8\% PHUs), otitis media (n=24,992; 4.5\% PHUs), and musculoskeletal (n=24,799; 15.5\% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model that included LCA categories (F1=38.62\%) compared to that of a complex risk stratification model with a full set of predictors (F1=48.20\%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the mental health and musculoskeletal subpopulations (F1-scores of 48.69\% and 48.15\%, respectively). F1-scores were lower than that of the complex model when the LCA-enabled models were limited to the otitis media and acute URI subpopulations (45.77\% and 43.05\%, respectively). Conclusions: Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other health care settings. ", doi="10.2196/31442", url="https://medinform.jmir.org/2021/11/e31442", url="http://www.ncbi.nlm.nih.gov/pubmed/34592712" } @Article{info:doi/10.2196/28620, author="May, B. Sarah and Giordano, P. Thomas and Gottlieb, Assaf", title="A Phenotyping Algorithm to Identify People With HIV in Electronic Health Record Data (HIV-Phen): Development and Evaluation Study", journal="JMIR Form Res", year="2021", month="Nov", day="25", volume="5", number="11", pages="e28620", keywords="phenotyping", keywords="algorithms", keywords="electronic health records", keywords="people with HIV", keywords="cohort identification", abstract="Background: Identification of people with HIV from electronic health record (EHR) data is an essential first step in the study of important HIV outcomes, such as risk assessment. This task has been historically performed via manual chart review, but the increased availability of large clinical data sets has led to the emergence of phenotyping algorithms to automate this process. Existing algorithms for identifying people with HIV rely on a combination of International Classification of Disease codes and laboratory tests or closely mimic clinical testing guidelines for HIV diagnosis. However, we found that existing algorithms in the literature missed a significant proportion of people with HIV in our data. Objective: The aim of this study is to develop and evaluate HIV-Phen, an updated criteria-based HIV phenotyping algorithm. Methods: We developed an algorithm using HIV-specific laboratory tests and medications and compared it with previously published algorithms in national and local data sets to identify cohorts of people with HIV. Cohort demographics were compared with those reported in the national and local surveillance data. Chart reviews were performed on a subsample of patients from the local database to calculate the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of the algorithm. Results: Our new algorithm identified substantially more people with HIV in both national (up to an 85.75\% increase) and local (up to an 83.20\% increase) EHR databases than the previously published algorithms. The demographic characteristics of people with HIV identified using our algorithm were similar to those reported in national and local HIV surveillance data. Our algorithm demonstrated improved sensitivity over existing algorithms (98\% vs 56\%-92\%) while maintaining a similar overall accuracy (96\% vs 80\%-96\%). Conclusions: We developed and evaluated an updated criteria-based phenotyping algorithm for identifying people with HIV in EHR data that demonstrates improved sensitivity over existing algorithms. ", doi="10.2196/28620", url="https://formative.jmir.org/2021/11/e28620", url="http://www.ncbi.nlm.nih.gov/pubmed/34842532" } @Article{info:doi/10.2196/28854, author="Alvarado, Natasha and McVey, Lynn and Elshehaly, Mai and Greenhalgh, Joanne and Dowding, Dawn and Ruddle, Roy and Gale, P. Chris and Mamas, Mamas and Doherty, Patrick and West, Robert and Feltbower, Richard and Randell, Rebecca", title="Analysis of a Web-Based Dashboard to Support the Use of National Audit Data in Quality Improvement: Realist Evaluation", journal="J Med Internet Res", year="2021", month="Nov", day="23", volume="23", number="11", pages="e28854", keywords="data", keywords="QualDash", keywords="audit", keywords="dashboards", keywords="support", keywords="quality", abstract="Background: Dashboards can support data-driven quality improvements in health care. They visualize data in ways intended to ease cognitive load and support data comprehension, but how they are best integrated into working practices needs further investigation. Objective: This paper reports the findings of a realist evaluation of a web-based quality dashboard (QualDash) developed to support the use of national audit data in quality improvement. Methods: QualDash was co-designed with data users and installed in 8 clinical services (3 pediatric intensive care units and 5 cardiology services) across 5 health care organizations (sites A-E) in England between July and December 2019. Champions were identified to support adoption. Data to evaluate QualDash were collected between July 2019 and August 2021 and consisted of 148.5 hours of observations including hospital wards and clinical governance meetings, log files that captured the extent of use of QualDash over 12 months, and a questionnaire designed to assess the dashboard's perceived usefulness and ease of use. Guided by the principles of realist evaluation, data were analyzed to understand how, why, and in what circumstances QualDash supported the use of national audit data in quality improvement. Results: The observations revealed that variation across sites in the amount and type of resources available to support data use, alongside staff interactions with QualDash, shaped its use and impact. Sites resourced with skilled audit support staff and established reporting systems (sites A and C) continued to use existing processes to report data. A number of constraints influenced use of QualDash in these sites including that some dashboard metrics were not configured in line with user expectations and staff were not fully aware how QualDash could be used to facilitate their work. In less well-resourced services, QualDash automated parts of their reporting process, streamlining the work of audit support staff (site B), and, in some cases, highlighted issues with data completeness that the service worked to address (site E). Questionnaire responses received from 23 participants indicated that QualDash was perceived as useful and easy to use despite its variable use in practice. Conclusions: Web-based dashboards have the potential to support data-driven improvement, providing access to visualizations that can help users address key questions about care quality. Findings from this study point to ways in which dashboard design might be improved to optimize use and impact in different contexts; this includes using data meaningful to stakeholders in the co-design process and actively engaging staff knowledgeable about current data use and routines in the scrutiny of the dashboard metrics and functions. In addition, consideration should be given to the processes of data collection and upload that underpin the quality of the data visualized and consequently its potential to stimulate quality improvement. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2019-033208 ", doi="10.2196/28854", url="https://www.jmir.org/2021/11/e28854", url="http://www.ncbi.nlm.nih.gov/pubmed/34817384" } @Article{info:doi/10.2196/29532, author="Pankhurst, Tanya and Evison, Felicity and Atia, Jolene and Gallier, Suzy and Coleman, Jamie and Ball, Simon and McKee, Deborah and Ryan, Steven and Black, Ruth", title="Introduction of Systematized Nomenclature of Medicine--Clinical Terms Coding Into an Electronic Health Record and Evaluation of its Impact: Qualitative and Quantitative Study", journal="JMIR Med Inform", year="2021", month="Nov", day="23", volume="9", number="11", pages="e29532", keywords="coding standards", keywords="clinical decision support", keywords="Clinician led design", keywords="clinician reported experience", keywords="clinical usability", keywords="data sharing", keywords="diagnoses", keywords="electronic health records", keywords="electronic health record standards", keywords="health data exchange", keywords="health data research", keywords="International Classification of Diseases version 10 (ICD-10)", keywords="National Health Service Blueprint", keywords="patient diagnoses", keywords="population health", keywords="problem list", keywords="research", keywords="Systematized Nomenclature Of Medicine--Clinical Terms (SNOMED-CT)", keywords="use of electronic health data", keywords="user-led design", abstract="Background: This study describes the conversion within an existing electronic health record (EHR) from the International Classification of Diseases, Tenth Revision coding system to the SNOMED-CT (Systematized Nomenclature of Medicine--Clinical Terms) for the collection of patient histories and diagnoses. The setting is a large acute hospital that is designing and building its own EHR. Well-designed EHRs create opportunities for continuous data collection, which can be used in clinical decision support rules to drive patient safety. Collected data can be exchanged across health care systems to support patients in all health care settings. Data can be used for research to prevent diseases and protect future populations. Objective: The aim of this study was to migrate a current EHR, with all relevant patient data, to the SNOMED-CT coding system to optimize clinical use and clinical decision support, facilitate data sharing across organizational boundaries for national programs, and enable remodeling of medical pathways. Methods: The study used qualitative and quantitative data to understand the successes and gaps in the project, clinician attitudes toward the new tool, and the future use of the tool. Results: The new coding system (tool) was well received and immediately widely used in all specialties. This resulted in increased, accurate, and clinically relevant data collection. Clinicians appreciated the increased depth and detail of the new coding, welcomed the potential for both data sharing and research, and provided extensive feedback for further development. Conclusions: Successful implementation of the new system aligned the University Hospitals Birmingham NHS Foundation Trust with national strategy and can be used as a blueprint for similar projects in other health care settings. ", doi="10.2196/29532", url="https://medinform.jmir.org/2021/11/e29532", url="http://www.ncbi.nlm.nih.gov/pubmed/34817387" } @Article{info:doi/10.2196/29749, author="Jan, Zainab and AI-Ansari, Noor and Mousa, Osama and Abd-alrazaq, Alaa and Ahmed, Arfan and Alam, Tanvir and Househ, Mowafa", title="The Role of Machine Learning in Diagnosing Bipolar Disorder: Scoping Review", journal="J Med Internet Res", year="2021", month="Nov", day="19", volume="23", number="11", pages="e29749", keywords="machine learning", keywords="bipolar disorder", keywords="diagnosis", keywords="support vector machine", keywords="clinical data", keywords="mental health", keywords="scoping review", abstract="Background: Bipolar disorder (BD) is the 10th most common cause of frailty in young individuals and has triggered morbidity and mortality worldwide. Patients with BD have a life expectancy 9 to 17 years lower than that of normal people. BD is a predominant mental disorder, but it can be misdiagnosed as depressive disorder, which leads to difficulties in treating affected patients. Approximately 60\% of patients with BD are treated for depression. However, machine learning provides advanced skills and techniques for better diagnosis of BD. Objective: This review aims to explore the machine learning algorithms used for the detection and diagnosis of bipolar disorder and its subtypes. Methods: The study protocol adopted the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We explored 3 databases, namely Google Scholar, ScienceDirect, and PubMed. To enhance the search, we performed backward screening of all the references of the included studies. Based on the predefined selection criteria, 2 levels of screening were performed: title and abstract review, and full review of the articles that met the inclusion criteria. Data extraction was performed independently by all investigators. To synthesize the extracted data, a narrative synthesis approach was followed. Results: We retrieved 573 potential articles were from the 3 databases. After preprocessing and screening, only 33 articles that met our inclusion criteria were identified. The most commonly used data belonged to the clinical category (19, 58\%). We identified different machine learning models used in the selected studies, including classification models (18, 55\%), regression models (5, 16\%), model-based clustering methods (2, 6\%), natural language processing (1, 3\%), clustering algorithms (1, 3\%), and deep learning--based models (3, 9\%). Magnetic resonance imaging data were most commonly used for classifying bipolar patients compared to other groups (11, 34\%), whereas microarray expression data sets and genomic data were the least commonly used. The maximum ratio of accuracy was 98\%, whereas the minimum accuracy range was 64\%. Conclusions: This scoping review provides an overview of recent studies based on machine learning models used to diagnose patients with BD regardless of their demographics or if they were compared to patients with psychiatric diagnoses. Further research can be conducted to provide clinical decision support in the health industry. ", doi="10.2196/29749", url="https://www.jmir.org/2021/11/e29749", url="http://www.ncbi.nlm.nih.gov/pubmed/34806996" } @Article{info:doi/10.2196/30079, author="Wang, Huan and Wu, Wei and Han, Chunxia and Zheng, Jiaqi and Cai, Xinyu and Chang, Shimin and Shi, Junlong and Xu, Nan and Ai, Zisheng", title="Prediction Model of Osteonecrosis of the Femoral Head After Femoral Neck Fracture: Machine Learning--Based Development and Validation Study", journal="JMIR Med Inform", year="2021", month="Nov", day="19", volume="9", number="11", pages="e30079", keywords="femoral neck fracture", keywords="osteonecrosis of the femoral head", keywords="machine learning", keywords="interpretability", abstract="Background: The absolute number of femoral neck fractures (FNFs) is increasing; however, the prediction of traumatic femoral head necrosis remains difficult. Machine learning algorithms have the potential to be superior to traditional prediction methods for the prediction of traumatic femoral head necrosis. Objective: The aim of this study is to use machine learning to construct a model for the analysis of risk factors and prediction of osteonecrosis of the femoral head (ONFH) in patients with FNF after internal fixation. Methods: We retrospectively collected preoperative, intraoperative, and postoperative clinical data of patients with FNF in 4 hospitals in Shanghai and followed up the patients for more than 2.5 years. A total of 259 patients with 43 variables were included in the study. The data were randomly divided into a training set (181/259, 69.8\%) and a validation set (78/259, 30.1\%). External data (n=376) were obtained from a retrospective cohort study of patients with FNF in 3 other hospitals. Least absolute shrinkage and selection operator regression and the support vector machine algorithm were used for variable selection. Logistic regression, random forest, support vector machine, and eXtreme Gradient Boosting (XGBoost) were used to develop the model on the training set. The validation set was used to tune the model hyperparameters to determine the final prediction model, and the external data were used to compare and evaluate the model performance. We compared the accuracy, discrimination, and calibration of the models to identify the best machine learning algorithm for predicting ONFH. Shapley additive explanations and local interpretable model-agnostic explanations were used to determine the interpretability of the black box model. Results: A total of 11 variables were selected for the models. The XGBoost model performed best on the validation set and external data. The accuracy, sensitivity, and area under the receiver operating characteristic curve of the model on the validation set were 0.987, 0.929, and 0.992, respectively. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of the model on the external data were 0.907, 0.807, 0.935, and 0.933, respectively, and the log-loss was 0.279. The calibration curve demonstrated good agreement between the predicted probability and actual risk. The interpretability of the features and individual predictions were realized using the Shapley additive explanations and local interpretable model-agnostic explanations algorithms. In addition, the XGBoost model was translated into a self-made web-based risk calculator to estimate an individual's probability of ONFH. Conclusions: Machine learning performs well in predicting ONFH after internal fixation of FNF. The 6-variable XGBoost model predicted the risk of ONFH well and had good generalization ability on the external data, which can be used for the clinical prediction of ONFH after internal fixation of FNF. ", doi="10.2196/30079", url="https://medinform.jmir.org/2021/11/e30079", url="http://www.ncbi.nlm.nih.gov/pubmed/34806984" } @Article{info:doi/10.2196/26612, author="Lewkowicz, Daniel and Slosarek, Tamara and Wernicke, Sarah and Winne, Antonia and Wohlbrandt, M. Attila and Bottinger, Erwin", title="Digital Therapeutic Care and Decision Support Interventions for People With Low Back Pain: Systematic Review", journal="JMIR Rehabil Assist Technol", year="2021", month="Nov", day="19", volume="8", number="4", pages="e26612", keywords="digital therapeutic care", keywords="decision support interventions", keywords="low back pain", keywords="behavior change techniques", keywords="back", keywords="orthopedic", keywords="systematic review", keywords="digital therapy", keywords="decision support", keywords="mobile phone", abstract="Background: Low back pain (LBP) is the leading cause of worldwide years lost because of disability, with a tremendous economic burden for health care systems. Digital therapeutic care (DTC) programs provide a scalable, universally accessible, and low-cost approach to the multidisciplinary treatment of LBP. Moreover, novel decision support interventions such as personalized feedback messages, push notifications, and data-driven activity recommendations amplify DTC by guiding the user through the program while aiming to increase overall engagement and sustainable behavior change. Objective: This systematic review aims to synthesize recent scientific literature on the impact of DTC apps for people with LBP and outline the implementation of add-on decision support interventions, including their effect on user retention and attrition rates. Methods: We searched bibliographic databases, including MEDLINE, Cochrane Library, Web of Science, and the Physiotherapy Evidence Database, from March 1, 2016, to October 15, 2020, in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines and conducted this review based on related previously published systematic reviews. Besides randomized controlled trials (RCTs), we also included study designs with the evidence level of at least a retrospective comparative study. This enables the consideration of real-world user-generated data and provides information regarding the adoption and effectiveness of DTC apps in a real-life setting. For the appraisal of the risk of bias, we used the Risk of Bias 2 Tool and the Risk of Bias in Non-Randomized Studies of Interventions Tool for the RCTs and nonrandomized trials, respectively. The included studies were narratively synthesized regarding primary and secondary outcome measures, DTC components, applied decision support interventions, user retention, and attrition rates. Results: We retrieved 1388 citations, of which 12 studies are included in this review. Of the 12 studies, 6 (50\%) were RCTs and 6 (50\%) were nonrandomized trials. In all included studies, lower pain levels and increased functionality compared with baseline values were observed in the DTC intervention group. A between-group comparison revealed significant improvements in pain and functionality levels in 67\% (4/6) of the RCTs. The study population was mostly homogeneous, with predominantly female, young to middle-aged participants of normal to moderate weight. The methodological quality assessment revealed moderate to high risks of biases, especially in the nonrandomized trials. Conclusions: This systematic review demonstrates the benefits of DTC for people with LBP. There is also evidence that decision support interventions benefit overall engagement with the app and increase participants' ability to self-manage their recovery process. Finally, including retrospective evaluation studies of real-world user-generated data in future systematic reviews of digital health intervention trials can reveal new insights into the benefits, challenges, and real-life adoption of DTC programs. ", doi="10.2196/26612", url="https://rehab.jmir.org/2021/4/e26612", url="http://www.ncbi.nlm.nih.gov/pubmed/34807837" } @Article{info:doi/10.2196/30743, author="Shah, Surbhi and Switzer, Sean and Shippee, D. Nathan and Wogensen, Pamela and Kosednar, Kathryn and Jones, Emma and Pestka, L. Deborah and Badlani, Sameer and Butler, Mary and Wagner, Brittin and White, Katie and Rhein, Joshua and Benson, Bradley and Reding, Mark and Usher, Michael and Melton, B. Genevieve and Tignanelli, James Christopher", title="Implementation of an Anticoagulation Practice Guideline for COVID-19 via a Clinical Decision Support System in a Large Academic Health System and Its Evaluation: Observational Study", journal="JMIR Med Inform", year="2021", month="Nov", day="18", volume="9", number="11", pages="e30743", keywords="COVID-19", keywords="anticoagulation", keywords="clinical practice guideline", keywords="evidence-based practice", keywords="clinical decision support", keywords="implementation science", keywords="RE-AIM", abstract="Background: Studies evaluating strategies for the rapid development, implementation, and evaluation of clinical decision support (CDS) systems supporting guidelines for diseases with a poor knowledge base, such as COVID-19, are limited. Objective: We developed an anticoagulation clinical practice guideline (CPG) for COVID-19, which was delivered and scaled via CDS across a 12-hospital Midwest health care system. This study represents a preplanned 6-month postimplementation evaluation guided by the RE-AIM (Reach, Effectiveness, Adoption, Implementation, and Maintenance) framework. Methods: The implementation outcomes evaluated were reach, adoption, implementation, and maintenance. To evaluate effectiveness, the association of CPG adherence on hospital admission with clinical outcomes was assessed via multivariable logistic regression and nearest neighbor propensity score matching. A time-to-event analysis was conducted. Sensitivity analyses were also conducted to evaluate the competing risk of death prior to intensive care unit (ICU) admission. The models were risk adjusted to account for age, gender, race/ethnicity, non-English speaking status, area deprivation index, month of admission, remdesivir treatment, tocilizumab treatment, steroid treatment, BMI, Elixhauser comorbidity index, oxygen saturation/fraction of inspired oxygen ratio, systolic blood pressure, respiratory rate, treating hospital, and source of admission. A preplanned subgroup analysis was also conducted in patients who had laboratory values (D-dimer, C-reactive protein, creatinine, and absolute neutrophil to absolute lymphocyte ratio) present. The primary effectiveness endpoint was the need for ICU admission within 48 hours of hospital admission. Results: A total of 2503 patients were included in this study. CDS reach approached 95\% during implementation. Adherence achieved a peak of 72\% during implementation. Variation was noted in adoption across sites and nursing units. Adoption was the highest at hospitals that were specifically transformed to only provide care to patients with COVID-19 (COVID-19 cohorted hospitals; 74\%-82\%) and the lowest in academic settings (47\%-55\%). CPG delivery via the CDS system was associated with improved adherence (odds ratio [OR] 1.43, 95\% CI 1.2-1.7; P<.001). Adherence with the anticoagulation CPG was associated with a significant reduction in the need for ICU admission within 48 hours (OR 0.39, 95\% CI 0.30-0.51; P<.001) on multivariable logistic regression analysis. Similar findings were noted following 1:1 propensity score matching for patients who received adherent versus nonadherent care (21.5\% vs 34.3\% incidence of ICU admission within 48 hours; log-rank test P<.001). Conclusions: Our institutional experience demonstrated that adherence with the institutional CPG delivered via the CDS system resulted in improved clinical outcomes for patients with COVID-19. CDS systems are an effective means to rapidly scale a CPG across a heterogeneous health care system. Further research is needed to investigate factors associated with adherence at low and high adopting sites and nursing units. ", doi="10.2196/30743", url="https://medinform.jmir.org/2021/11/e30743", url="http://www.ncbi.nlm.nih.gov/pubmed/34550900" } @Article{info:doi/10.2196/30042, author="Knapp, Andreas and Harst, Lorenz and Hager, Stefan and Schmitt, Jochen and Scheibe, Madlen", title="Use of Patient-Reported Outcome Measures and Patient-Reported Experience Measures Within Evaluation Studies of Telemedicine Applications: Systematic Review", journal="J Med Internet Res", year="2021", month="Nov", day="17", volume="23", number="11", pages="e30042", keywords="telemedicine", keywords="telehealth", keywords="evaluation", keywords="outcome", keywords="patient-reported outcome measures", keywords="patient-reported outcome", keywords="patient-reported experience measures", keywords="patient-reported experience", keywords="measurement instrument", keywords="questionnaire", abstract="Background: With the rise of digital health technologies and telemedicine, the need for evidence-based evaluation is growing. Patient-reported outcome measures (PROMs) and patient-reported experience measures (PREMs) are recommended as an essential part of the evaluation of telemedicine. For the first time, a systematic review has been conducted to investigate the use of PROMs and PREMs in the evaluation studies of telemedicine covering all application types and medical purposes. Objective: This study investigates the following research questions: in which scenarios are PROMs and PREMs collected for evaluation purposes, which PROM and PREM outcome domains have been covered and how often, which outcome measurement instruments have been used and how often, does the selection and quantity of PROMs and PREMs differ between study types and application types, and has the use of PROMs and PREMs changed over time. Methods: We conducted a systematic literature search of the MEDLINE and Embase databases and included studies published from inception until April 2, 2020. We included studies evaluating telemedicine with patients as the main users; these studies reported PROMs and PREMs within randomized controlled trials, controlled trials, noncontrolled trials, and feasibility trials in English and German. Results: Of the identified 2671 studies, 303 (11.34\%) were included; of the 303 studies, 67 (22.1\%) were feasibility studies, 70 (23.1\%) were noncontrolled trials, 20 (6.6\%) were controlled trials, and 146 (48.2\%) were randomized controlled trials. Health-related quality of life (n=310; mean 1.02, SD 1.05), emotional function (n=244; mean 0.81, SD 1.18), and adherence (n=103; mean 0.34, SD 0.53) were the most frequently assessed outcome domains. Self-developed PROMs were used in 21.4\% (65/303) of the studies, and self-developed PREMs were used in 22.3\% (68/303). PROMs (n=884) were assessed more frequently than PREMs (n=234). As the evidence level of the studies increased, the number of PROMs also increased ($\tau$=?0.45), and the number of PREMs decreased ($\tau$=0.35). Since 2000, not only has the number of studies using PROMs and PREMs increased, but the level of evidence and the number of outcome measurement instruments used have also increased, with the number of PREMs permanently remaining at a lower level. Conclusions: There have been increasingly more studies, particularly high-evidence studies, which use PROMs and PREMs to evaluate telemedicine. PROMs have been used more frequently than PREMs. With the increasing maturity stage of telemedicine applications and higher evidence level, the use of PROMs increased in line with the recommendations of evaluation guidelines. Health-related quality of life and emotional function were measured in almost all the studies. Simultaneously, health literacy as a precondition for using the application adequately, alongside proper training and guidance, has rarely been reported. Further efforts should be pursued to standardize PROM and PREM collection in evaluation studies of telemedicine. ", doi="10.2196/30042", url="https://www.jmir.org/2021/11/e30042", url="http://www.ncbi.nlm.nih.gov/pubmed/34523604" } @Article{info:doi/10.2196/32662, author="Ahn, Imjin and Gwon, Hansle and Kang, Heejun and Kim, Yunha and Seo, Hyeram and Choi, Heejung and Cho, Na Ha and Kim, Minkyoung and Jun, Joon Tae and Kim, Young-Hak", title="Machine Learning--Based Hospital Discharge Prediction for Patients With Cardiovascular Diseases: Development and Usability Study", journal="JMIR Med Inform", year="2021", month="Nov", day="17", volume="9", number="11", pages="e32662", keywords="electronic health records", keywords="cardiovascular diseases", keywords="discharge prediction", keywords="bed management", keywords="explainable artificial intelligence", abstract="Background: Effective resource management in hospitals can improve the quality of medical services by reducing labor-intensive burdens on staff, decreasing inpatient waiting time, and securing the optimal treatment time. The use of hospital processes requires effective bed management; a stay in the hospital that is longer than the optimal treatment time hinders bed management. Therefore, predicting a patient's hospitalization period may support the making of judicious decisions regarding bed management. Objective: First, this study aims to develop a machine learning (ML)--based predictive model for predicting the discharge probability of inpatients with cardiovascular diseases (CVDs). Second, we aim to assess the outcome of the predictive model and explain the primary risk factors of inpatients for patient-specific care. Finally, we aim to evaluate whether our ML-based predictive model helps manage bed scheduling efficiently and detects long-term inpatients in advance to improve the use of hospital processes and enhance the quality of medical services. Methods: We set up the cohort criteria and extracted the data from CardioNet, a manually curated database that specializes in CVDs. We processed the data to create a suitable data set by reindexing the date-index, integrating the present features with past features from the previous 3 years, and imputing missing values. Subsequently, we trained the ML-based predictive models and evaluated them to find an elaborate model. Finally, we predicted the discharge probability within 3 days and explained the outcomes of the model by identifying, quantifying, and visualizing its features. Results: We experimented with 5 ML-based models using 5 cross-validations. Extreme gradient boosting, which was selected as the final model, accomplished an average area under the receiver operating characteristic curve score that was 0.865 higher than that of the other models (ie, logistic regression, random forest, support vector machine, and multilayer perceptron). Furthermore, we performed feature reduction, represented the feature importance, and assessed prediction outcomes. One of the outcomes, the individual explainer, provides a discharge score during hospitalization and a daily feature influence score to the medical team and patients. Finally, we visualized simulated bed management to use the outcomes. Conclusions: In this study, we propose an individual explainer based on an ML-based predictive model, which provides the discharge probability and relative contributions of individual features. Our model can assist medical teams and patients in identifying individual and common risk factors in CVDs and can support hospital administrators in improving the management of hospital beds and other resources. ", doi="10.2196/32662", url="https://medinform.jmir.org/2021/11/e32662", url="http://www.ncbi.nlm.nih.gov/pubmed/34787584" } @Article{info:doi/10.2196/30432, author="Hodgson, Tobias and Burton-Jones, Andrew and Donovan, Raelene and Sullivan, Clair", title="The Role of Electronic Medical Records in Reducing Unwarranted Clinical Variation in Acute Health Care: Systematic Review", journal="JMIR Med Inform", year="2021", month="Nov", day="17", volume="9", number="11", pages="e30432", keywords="clinical variation", keywords="unwarranted clinical variation", keywords="electronic health record", keywords="EHR", keywords="electronic medical record", keywords="EMR", keywords="PowerPlan", keywords="SmartSet", keywords="acute care", keywords="eHealth", keywords="digital health", keywords="health care", keywords="health care outcomes", keywords="outcome", keywords="review", keywords="standard of care", keywords="hospital", keywords="research", keywords="literature", keywords="variation", keywords="intervention", abstract="Background: The use of electronic medical records (EMRs)/electronic health records (EHRs) provides potential to reduce unwarranted clinical variation and thereby improve patient health care outcomes. Minimization of unwarranted clinical variation may raise and refine the standard of patient care provided and satisfy the quadruple aim of health care. Objective: A systematic review of the impact of EMRs and specific subcomponents (PowerPlans/SmartSets) on variation in clinical care processes in hospital settings was undertaken to summarize the existing literature on the effects of EMRs on clinical variation and patient outcomes. Methods: Articles from January 2000 to November 2020 were identified through a comprehensive search that examined EMRs/EHRs and clinical variation or PowerPlans/SmartSets. Thirty-six articles met the inclusion criteria. Articles were examined for evidence for EMR-induced changes in variation and effects on health care outcomes and mapped to the quadruple aim of health care. Results: Most of the studies reported positive effects of EMR-related interventions (30/36, 83\%). All of the 36 included studies discussed clinical variation, but only half measured it (18/36, 50\%). Those studies that measured variation generally examined how changes to variation affected individual patient care (11/36, 31\%) or costs (9/36, 25\%), while other outcomes (population health and clinician experience) were seldom studied. High-quality study designs were rare. Conclusions: The literature provides some evidence that EMRs can help reduce unwarranted clinical variation and thereby improve health care outcomes. However, the evidence is surprisingly thin because of insufficient attention to the measurement of clinical variation, and to the chain of evidence from EMRs to variation in clinical practices to health care outcomes. ", doi="10.2196/30432", url="https://medinform.jmir.org/2021/11/e30432", url="http://www.ncbi.nlm.nih.gov/pubmed/34787585" } @Article{info:doi/10.2196/25192, author="Amin, Shiraz and Gupta, Vedant and Du, Gaixin and McMullen, Colleen and Sirrine, Matthew and Williams, V. Mark and Smyth, S. Susan and Chadha, Romil and Stearley, Seth and Li, Jing", title="Developing and Demonstrating the Viability and Availability of the Multilevel Implementation Strategy for Syncope Optimal Care Through Engagement (MISSION) Syncope App: Evidence-Based Clinical Decision Support Tool", journal="J Med Internet Res", year="2021", month="Nov", day="16", volume="23", number="11", pages="e25192", keywords="cardiology", keywords="medical diagnosis", keywords="medicine", keywords="mobile applications", keywords="prognostics and health", keywords="syncope", abstract="Background: Syncope evaluation and management is associated with testing overuse and unnecessary hospitalizations. The 2017 American College of Cardiology/American Heart Association (ACC/AHA) Syncope Guideline aims to standardize clinical practice and reduce unnecessary services. The use of clinical decision support (CDS) tools offers the potential to successfully implement evidence-based clinical guidelines. However, CDS tools that provide an evidence-based differential diagnosis (DDx) of syncope at the point of care are currently lacking. Objective: With input from diverse health systems, we developed and demonstrated the viability of a mobile app, the Multilevel Implementation Strategy for Syncope optImal care thrOugh eNgagement (MISSION) Syncope, as a CDS tool for syncope diagnosis and prognosis. Methods: Development of the app had three main goals: (1) reliable generation of an accurate DDx, (2) incorporation of an evidence-based clinical risk tool for prognosis, and (3) user-based design and technical development. To generate a DDx that incorporated assessment recommendations, we reviewed guidelines and the literature to determine clinical assessment questions (variables) and likelihood ratios (LHRs) for each variable in predicting etiology. The creation and validation of the app diagnosis occurred through an iterative clinician review and application to actual clinical cases. The review of available risk score calculators focused on identifying an easily applied and valid evidence-based clinical risk stratification tool. The review and decision-making factors included characteristics of the original study, clinical variables, and validation studies. App design and development relied on user-centered design principles. We used observations of the emergency department workflow, storyboard demonstration, multiple mock review sessions, and beta-testing to optimize functionality and usability. Results: The MISSION Syncope app is consistent with guideline recommendations on evidence-based practice (EBP), and its user interface (UI) reflects steps in a real-world patient evaluation: assessment, DDx, risk stratification, and recommendations. The app provides flexible clinical decision making, while emphasizing a care continuum; it generates recommendations for diagnosis and prognosis based on user input. The DDx in the app is deemed a pragmatic model that more closely aligns with real-world clinical practice and was validated using actual clinical cases. The beta-testing of the app demonstrated well-accepted functionality and usability of this syncope CDS tool. Conclusions: The MISSION Syncope app development integrated the current literature and clinical expertise to provide an evidence-based DDx, a prognosis using a validated scoring system, and recommendations based on clinical guidelines. This app demonstrates the importance of using research literature in the development of a CDS tool and applying clinical experience to fill the gaps in available research. It is essential for a successful app to be deliberate in pursuing a practical clinical model instead of striving for a perfect mathematical model, given available published evidence. This hybrid methodology can be applied to similar CDS tool development. ", doi="10.2196/25192", url="https://www.jmir.org/2021/11/e25192", url="http://www.ncbi.nlm.nih.gov/pubmed/34783669" } @Article{info:doi/10.2196/25455, author="Yoo, Whi Dong and Ernala, Kiranmai Sindhu and Saket, Bahador and Weir, Domino and Arenare, Elizabeth and Ali, F. Asra and Van Meter, R. Anna and Birnbaum, L. Michael and Abowd, D. Gregory and De Choudhury, Munmun", title="Clinician Perspectives on Using Computational Mental Health Insights From Patients' Social Media Activities: Design and Qualitative Evaluation of a Prototype", journal="JMIR Ment Health", year="2021", month="Nov", day="16", volume="8", number="11", pages="e25455", keywords="mental health", keywords="social media", keywords="information technology", abstract="Background: Previous studies have suggested that social media data, along with machine learning algorithms, can be used to generate computational mental health insights. These computational insights have the potential to support clinician-patient communication during psychotherapy consultations. However, how clinicians perceive and envision using computational insights during consultations has been underexplored. Objective: The aim of this study is to understand clinician perspectives regarding computational mental health insights from patients' social media activities. We focus on the opportunities and challenges of using these insights during psychotherapy consultations. Methods: We developed a prototype that can analyze consented patients' Facebook data and visually represent these computational insights. We incorporated the insights into existing clinician-facing assessment tools, the Hamilton Depression Rating Scale and Global Functioning: Social Scale. The design intent is that a clinician will verbally interview a patient (eg, How was your mood in the past week?) while they reviewed relevant insights from the patient's social media activities (eg, number of depression-indicative posts). Using the prototype, we conducted interviews (n=15) and 3 focus groups (n=13) with mental health clinicians: psychiatrists, clinical psychologists, and licensed clinical social workers. The transcribed qualitative data were analyzed using thematic analysis. Results: Clinicians reported that the prototype can support clinician-patient collaboration in agenda-setting, communicating symptoms, and navigating patients' verbal reports. They suggested potential use scenarios, such as reviewing the prototype before consultations and using the prototype when patients missed their consultations. They also speculated potential negative consequences: patients may feel like they are being monitored, which may yield negative effects, and the use of the prototype may increase the workload of clinicians, which is already difficult to manage. Finally, our participants expressed concerns regarding the prototype: they were unsure whether patients' social media accounts represented their actual behaviors; they wanted to learn how and when the machine learning algorithm can fail to meet their expectations of trust; and they were worried about situations where they could not properly respond to the insights, especially emergency situations outside of clinical settings. Conclusions: Our findings support the touted potential of computational mental health insights from patients' social media account data, especially in the context of psychotherapy consultations. However, sociotechnical issues, such as transparent algorithmic information and institutional support, should be addressed in future endeavors to design implementable and sustainable technology. ", doi="10.2196/25455", url="https://mental.jmir.org/2021/11/e25455", url="http://www.ncbi.nlm.nih.gov/pubmed/34783667" } @Article{info:doi/10.2196/31337, author="Kasturi, N. Suranga and Park, Jeremy and Wild, David and Khan, Babar and Haggstrom, A. David and Grannis, Shaun", title="Predicting COVID-19--Related Health Care Resource Utilization Across a Statewide Patient Population: Model Development Study", journal="J Med Internet Res", year="2021", month="Nov", day="15", volume="23", number="11", pages="e31337", keywords="COVID-19", keywords="machine learning", keywords="population health", keywords="health care utilization", keywords="health disparities", keywords="health information", keywords="epidemiology", keywords="public health", keywords="digital health", keywords="health data", keywords="pandemic", keywords="decision models", keywords="health informatics", keywords="healthcare resources", abstract="Background: The COVID-19 pandemic has highlighted the inability of health systems to leverage existing system infrastructure in order to rapidly develop and apply broad analytical tools that could inform state- and national-level policymaking, as well as patient care delivery in hospital settings. The COVID-19 pandemic has also led to highlighted systemic disparities in health outcomes and access to care based on race or ethnicity, gender, income-level, and urban-rural divide. Although the United States seems to be recovering from the COVID-19 pandemic owing to widespread vaccination efforts and increased public awareness, there is an urgent need to address the aforementioned challenges. Objective: This study aims to inform the feasibility of leveraging broad, statewide datasets for population health--driven decision-making by developing robust analytical models that predict COVID-19--related health care resource utilization across patients served by Indiana's statewide Health Information Exchange. Methods: We leveraged comprehensive datasets obtained from the Indiana Network for Patient Care to train decision forest-based models that can predict patient-level need of health care resource utilization. To assess these models for potential biases, we tested model performance against subpopulations stratified by age, race or ethnicity, gender, and residence (urban vs rural). Results: For model development, we identified a cohort of 96,026 patients from across 957 zip codes in Indiana, United States. We trained the decision models that predicted health care resource utilization by using approximately 100 of the most impactful features from a total of 1172 features created. Each model and stratified subpopulation under test reported precision scores >70\%, accuracy and area under the receiver operating curve scores >80\%, and sensitivity scores approximately >90\%. We noted statistically significant variations in model performance across stratified subpopulations identified by age, race or ethnicity, gender, and residence (urban vs rural). Conclusions: This study presents the possibility of developing decision models capable of predicting patient-level health care resource utilization across a broad, statewide region with considerable predictive performance. However, our models present statistically significant variations in performance across stratified subpopulations of interest. Further efforts are necessary to identify root causes of these biases and to rectify them. ", doi="10.2196/31337", url="https://www.jmir.org/2021/11/e31337", url="http://www.ncbi.nlm.nih.gov/pubmed/34581671" } @Article{info:doi/10.2196/29504, author="Murtas, Rossella and Morici, Nuccia and Cogliati, Chiara and Puoti, Massimo and Omazzi, Barbara and Bergamaschi, Walter and Voza, Antonio and Rovere Querini, Patrizia and Stefanini, Giulio and Manfredi, Grazia Maria and Zocchi, Teresa Maria and Mangiagalli, Andrea and Brambilla, Vittoria Carla and Bosio, Marco and Corradin, Matteo and Cortellaro, Francesca and Trivelli, Marco and Savonitto, Stefano and Russo, Giampiero Antonio", title="Algorithm for Individual Prediction of COVID-19--Related Hospitalization Based on Symptoms: Development and Implementation Study", journal="JMIR Public Health Surveill", year="2021", month="Nov", day="15", volume="7", number="11", pages="e29504", keywords="COVID-19", keywords="severe outcome", keywords="prediction", keywords="monitoring system", keywords="symptoms", keywords="risk prediction", keywords="risk", keywords="algorithms", keywords="prediction models", keywords="pandemic", keywords="digital data", keywords="health records", abstract="Background: The COVID-19 pandemic has placed a huge strain on the health care system globally. The metropolitan area of Milan, Italy, was one of the regions most impacted by the COVID-19 pandemic worldwide. Risk prediction models developed by combining administrative databases and basic clinical data are needed to stratify individual patient risk for public health purposes. Objective: This study aims to develop a stratification tool aimed at improving COVID-19 patient management and health care organization. Methods: A predictive algorithm was developed and applied to 36,834 patients with COVID-19 in Italy between March 8 and the October 9, 2020, in order to foresee their risk of hospitalization. Exposures considered were age, sex, comorbidities, and symptoms associated with COVID-19 (eg, vomiting, cough, fever, diarrhea, myalgia, asthenia, headache, anosmia, ageusia, and dyspnea). The outcome was hospitalizations and emergency department admissions for COVID-19. Discrimination and calibration of the model were also assessed. Results: The predictive model showed a good fit for predicting COVID-19 hospitalization (C-index 0.79) and a good overall prediction accuracy (Brier score 0.14). The model was well calibrated (intercept --0.0028, slope 0.9970). Based on these results, 118,804 patients diagnosed with COVID-19 from October 25 to December 11, 2020, were stratified into low, medium, and high risk for COVID-19 severity. Among the overall study population, 67,030 (56.42\%) were classified as low-risk patients; 43,886 (36.94\%), as medium-risk patients; and 7888 (6.64\%), as high-risk patients. In all, 89.37\% (106,179/118,804) of the overall study population was being assisted at home, 9\% (10,695/118,804) was hospitalized, and 1.62\% (1930/118,804) died. Among those assisted at home, most people (63,983/106,179, 60.26\%) were classified as low risk, whereas only 3.63\% (3858/106,179) were classified at high risk. According to ordinal logistic regression, the odds ratio (OR) of being hospitalized or dead was 5.0 (95\% CI 4.6-5.4) among high-risk patients and 2.7 (95\% CI 2.6-2.9) among medium-risk patients, as compared to low-risk patients. Conclusions: A simple monitoring system, based on primary care data sets linked to COVID-19 testing results, hospital admissions data, and death records may assist in the proper planning and allocation of patients and resources during the ongoing COVID-19 pandemic. ", doi="10.2196/29504", url="https://publichealth.jmir.org/2021/11/e29504", url="http://www.ncbi.nlm.nih.gov/pubmed/34543227" } @Article{info:doi/10.2196/31186, author="Hammam, Nevin and Izadi, Zara and Li, Jing and Evans, Michael and Kay, Julia and Shiboski, Stephen and Schmajuk, Gabriela and Yazdany, Jinoos", title="The Relationship Between Electronic Health Record System and Performance on Quality Measures in the American College of Rheumatology's Rheumatology Informatics System for Effectiveness (RISE) Registry: Observational Study", journal="JMIR Med Inform", year="2021", month="Nov", day="12", volume="9", number="11", pages="e31186", keywords="rheumatoid arthritis", keywords="electronic health record", keywords="patient-reported outcomes", keywords="quality measures", keywords="disease activity", keywords="quality of care", keywords="performance reporting", keywords="medical informatics", keywords="clinical informatics", abstract="Background: Routine collection of disease activity (DA) and patient-reported outcomes (PROs) in rheumatoid arthritis (RA) are nationally endorsed quality measures and critical components of a treat-to-target approach. However, little is known about the role electronic health record (EHR) systems play in facilitating performance on these measures. Objective: Using the American College Rheumatology's (ACR's) RISE registry, we analyzed the relationship between EHR system and performance on DA and functional status (FS) quality measures. Methods: We analyzed data collected in 2018 from practices enrolled in RISE. We assessed practice-level performance on quality measures that require DA and FS documentation. Multivariable linear regression and zero-inflated negative binomial models were used to examine the independent effect of EHR system on practice-level quality measure performance, adjusting for practice characteristics and patient case-mix. Results: In total, 220 included practices cared for 314,793 patients with RA. NextGen was the most commonly used EHR system (34.1\%). We found wide variation in performance on DA and FS quality measures by EHR system (median 30.1, IQR 0-74.8, and median 9.0, IQR 0-74.2), respectively). Even after adjustment, NextGen practices performed significantly better than Allscripts on the DA measure (51.4\% vs 5.0\%; P<.05) and significantly better than eClinicalWorks and eMDs on the FS measure (49.3\% vs 29.0\% and 10.9\%; P<.05). Conclusions: Performance on national RA quality measures was associated with the EHR system, even after adjusting for practice and patient characteristics. These findings suggest that future efforts to improve quality of care in RA should focus not only on provider performance reporting but also on developing and implementing rheumatology-specific standards across EHRs. ", doi="10.2196/31186", url="https://medinform.jmir.org/2021/11/e31186", url="http://www.ncbi.nlm.nih.gov/pubmed/34766910" } @Article{info:doi/10.2196/29241, author="McKenzie, Jordan and Rajapakshe, Rasika and Shen, Hua and Rajapakshe, Shan and Lin, Angela", title="A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study", journal="JMIR Med Inform", year="2021", month="Nov", day="12", volume="9", number="11", pages="e29241", keywords="chart review", keywords="natural language processing", keywords="text extraction", keywords="radiation pneumonitis", keywords="lung cancer", keywords="radiation therapy", keywords="python", keywords="electronic medical record", keywords="accuracy", abstract="Background: Health research frequently requires manual chart reviews to identify patients in a study-specific cohort and examine their clinical outcomes. Manual chart review is a labor-intensive process that requires significant time investment for clinical researchers. Objective: This study aims to evaluate the feasibility and accuracy of an assisted chart review program, using an in-house rule-based text-extraction program written in Python, to identify patients who developed radiation pneumonitis (RP) after receiving curative radiotherapy. Methods: A retrospective manual chart review was completed for patients who received curative radiotherapy for stage 2-3 lung cancer from January 1, 2013 to December 31, 2015, at British Columbia Cancer, Kelowna Centre. In the manual chart review, RP diagnosis and grading were recorded using the Common Terminology Criteria for Adverse Events version 5.0. From the charts of 50 sample patients, a total of 1413 clinical documents were obtained for review from the electronic medical record system. The text-extraction program was built using the Natural Language Toolkit Python platform (and regular expressions, also known as RegEx). Python version 3.7.2 was used to run the text-extraction program. The output of the text-extraction program was a list of the full sentences containing the key terms, document IDs, and dates from which these sentences were extracted. The results from the manual review were used as the gold standard in this study, with which the results of the text-extraction program were compared. Results: Fifty percent (25/50) of the sample patients developed grade ?1 RP; the natural language processing program was able to ascertain 92\% (23/25) of these patients (sensitivity 0.92, 95\% CI 0.74-0.99; specificity 0.36, 95\% CI 0.18-0.57). Furthermore, the text-extraction program was able to correctly identify all 9 patients with grade ?2 RP, which are patients with clinically significant symptoms (sensitivity 1.0, 95\% CI 0.66-1.0; specificity 0.27, 95\% CI 0.14-0.43). The program was useful for distinguishing patients with RP from those without RP. The text-extraction program in this study avoided unnecessary manual review of 22\% (11/50) of the sample patients, as these patients were identified as grade 0 RP and would not require further manual review in subsequent studies. Conclusions: This feasibility study showed that the text-extraction program was able to assist with the identification of patients who developed RP after curative radiotherapy. The program streamlines the manual chart review further by identifying the key sentences of interest. This work has the potential to improve future clinical research, as the text-extraction program shows promise in performing chart review in a more time-efficient manner, compared with the traditional labor-intensive manual chart review. ", doi="10.2196/29241", url="https://medinform.jmir.org/2021/11/e29241", url="http://www.ncbi.nlm.nih.gov/pubmed/34766919" } @Article{info:doi/10.2196/30277, author="Yang, Yujie and Zheng, Jing and Du, Zhenzhen and Li, Ye and Cai, Yunpeng", title="Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Nov", day="10", volume="9", number="11", pages="e30277", keywords="stroke", keywords="medical big data", keywords="electronic health records", keywords="machine learning", keywords="risk prediction", keywords="hypertension", abstract="Background: Stroke risk assessment is an important means of primary prevention, but the applicability of existing stroke risk assessment scales in the Chinese population has always been controversial. A prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote disease risk factor discovery and prognosis, attracting broad research interest. Objective: We aimed to establish a high-precision stroke risk prediction model for hypertensive patients based on historical electronic medical record data and machine learning algorithms. Methods: Based on the Shenzhen Health Information Big Data Platform, a total of 57,671 patients were screened from 250,788 registered patients with hypertension, of whom 9421 had stroke onset during the 3-year follow-up. In addition to baseline characteristics and historical symptoms, we constructed some trend characteristics from multitemporal medical records. Stratified sampling according to gender ratio and age stratification was implemented to balance the positive and negative cases, and the final 19,953 samples were randomly divided into a training set and test set according to a ratio of 7:3. We used 4 machine learning algorithms for modeling, and the risk prediction performance was compared with the traditional risk scales. We also analyzed the nonlinear effect of continuous characteristics on stroke onset. Results: The tree-based integration algorithm extreme gradient boosting achieved the optimal performance with an area under the receiver operating characteristic curve of 0.9220, surpassing the other 3 traditional machine learning algorithms. Compared with 2 traditional risk scales, the Framingham stroke risk profiles and the Chinese Multiprovincial Cohort Study, our proposed model achieved better performance on the independent validation set, and the area under the receiver operating characteristic value increased by 0.17. Further nonlinear effect analysis revealed the importance of multitemporal trend characteristics in stroke risk prediction, which will benefit the standardized management of hypertensive patients. Conclusions: A high-precision 3-year stroke risk prediction model for hypertensive patients was established, and the model's performance was verified by comparing it with the traditional risk scales. Multitemporal trend characteristics played an important role in stroke onset, and thus the model could be deployed to electronic health record systems to assist in more pervasive, preemptive stroke risk screening, enabling higher efficiency of early disease prevention and intervention. ", doi="10.2196/30277", url="https://medinform.jmir.org/2021/11/e30277", url="http://www.ncbi.nlm.nih.gov/pubmed/34757322" } @Article{info:doi/10.2196/27748, author="Javier, J. Sarah and Wu, Justina and Smith, L. Donna and Kanwal, Fasiha and Martin, A. Lindsey and Clark, Jack and Midboe, M. Amanda", title="A Web-Based, Population-Based Cirrhosis Identification and Management System for Improving Cirrhosis Care: Qualitative Formative Evaluation", journal="JMIR Form Res", year="2021", month="Nov", day="9", volume="5", number="11", pages="e27748", keywords="cirrhosis", keywords="informatics", keywords="care coordination", keywords="implementation", keywords="Consolidated Framework for Implementation Research (CFIR)", keywords="quality improvement", abstract="Background: Cirrhosis, or scarring of the liver, is a debilitating condition that affects millions of US adults. Early identification, linkage to care, and retention of care are critical for preventing severe complications and death from cirrhosis. Objective: The purpose of this study is to conduct a preimplementation formative evaluation to identify factors that could impact implementation of the Population-Based Cirrhosis Identification and Management System (P-CIMS) in clinics serving patients with cirrhosis. P-CIMS is a web-based informatics tool designed to facilitate patient outreach and cirrhosis care management. Methods: Semistructured interviews were conducted between January and May 2016 with frontline providers in liver disease and primary care clinics at 3 Veterans Health Administration medical centers. A total of 10 providers were interviewed, including 8 physicians and midlevel providers from liver-related specialty clinics and 2 primary care providers who managed patients with cirrhosis. The Consolidated Framework for Implementation Research guided the development of the interview guides. Inductive consensus coding and content analysis were used to analyze transcribed interviews and abstracted coded passages, elucidated themes, and insights. Results: The following themes and subthemes emerged from the analyses: outer setting: needs and resources for patients with cirrhosis; inner setting: readiness for implementation (subthemes: lack of resources, lack of leadership support), and implementation climate (subtheme: competing priorities); characteristics of individuals: role within clinic; knowledge and beliefs about P-CIMS (subtheme: perceived and realized benefits; useful features; suggestions for improvement); and perceptions of current practices in managing cirrhosis cases (subthemes: preimplementation process for identifying and linking patients to cirrhosis care; structural and social barriers to follow-up). Overall, P-CIMS was viewed as a powerful tool for improving linkage and retention, but its integration in the clinical workflow required leadership support, time, and staffing. Providers also cited the need for more intuitive interface elements to enhance usability. Conclusions: P-CIMS shows promise as a powerful tool for identifying, linking, and retaining care in patients living with cirrhosis. The current evaluation identified several improvements and advantages of P-CIMS over current care processes and provides lessons for others implementing similar population-based identification and management tools in populations with chronic disease. ", doi="10.2196/27748", url="https://formative.jmir.org/2021/11/e27748", url="http://www.ncbi.nlm.nih.gov/pubmed/34751653" } @Article{info:doi/10.2196/23789, author="Steinkamp, Jackson and Sharma, Abhinav and Bala, Wasif and Kantrowitz, J. Jacob", title="A Fully Collaborative, Noteless Electronic Medical Record Designed to Minimize Information Chaos: Software Design and Feasibility Study", journal="JMIR Form Res", year="2021", month="Nov", day="9", volume="5", number="11", pages="e23789", keywords="electronic medical records", keywords="clinical notes", keywords="information chaos", keywords="information overload", keywords="clinician burnout", keywords="software design", keywords="problem-oriented medical record", keywords="medical records", keywords="electronic records", keywords="documentation", keywords="clinical", keywords="software", abstract="Background: Clinicians spend large amounts of their workday using electronic medical records (EMRs). Poorly designed documentation systems contribute to the proliferation of out-of-date information, increased time spent on medical records, clinician burnout, and medical errors. Beyond software interfaces, examining the underlying paradigms and organizational structures for clinical information may provide insights into ways to improve documentation systems. In particular, our attachment to the note as the major organizational unit for storing unstructured medical data may be a cause of many of the problems with modern clinical documentation. Notes, as currently understood, systematically incentivize information duplication and information scattering, both within a single clinician's notes over time and across multiple clinicians' notes. Therefore, it is worthwhile to explore alternative paradigms for unstructured data organization. Objective: The aim of this study is to demonstrate the feasibility of building an EMR that does not use notes as the core organizational unit for unstructured data and which is designed specifically to disincentivize information duplication and information scattering. Methods: We used specific design principles to minimize the incentive for users to duplicate and scatter information. By default, the majority of a patient's medical history remains the same over time, so users should not have to redocument that information. Clinicians on different teams or services mostly share the same medical information, so all data should be collaboratively shared across teams and services (while still allowing for disagreement and nuance). In all cases where a clinician must state that information has remained the same, they should be able to attest to the information without redocumenting it. We designed and built a web-based EMR based on these design principles. Results: We built a medical documentation system that does not use notes and instead treats the chart as a single, dynamically updating, and fully collaborative workspace. All information is organized by clinical topic or problem. Version history functionality is used to enable granular tracking of changes over time. Our system is highly customizable to individual workflows and enables each individual user to decide which data should be structured and which should be unstructured, enabling individuals to leverage the advantages of structured templating and clinical decision support as desired without requiring programming knowledge. The system is designed to facilitate real-time, fully collaborative documentation and communication among multiple clinicians. Conclusions: We demonstrated the feasibility of building a non--note-based, fully collaborative EMR system. Our attachment to the note as the only possible atomic unit of unstructured medical data should be reevaluated, and alternative models should be considered. ", doi="10.2196/23789", url="https://formative.jmir.org/2021/11/e23789", url="http://www.ncbi.nlm.nih.gov/pubmed/34751651" } @Article{info:doi/10.2196/30786, author="Baumgartner, L. Susan and Buffkin Jr, Eric D. and Rukavina, Elise and Jones, Jason and Weiler, Elizabeth and Carnes, C. Tony", title="A Novel Digital Pill System for Medication Adherence Measurement and Reporting: Usability Validation Study", journal="JMIR Hum Factors", year="2021", month="Nov", day="8", volume="8", number="4", pages="e30786", keywords="digital pills", keywords="digital medication", keywords="ingestible event marker", keywords="ingestible sensor", keywords="human factors", keywords="usability", keywords="validation study", keywords="medication adherence", keywords="medication nonadherence", keywords="remote patient monitoring", keywords="mobile phone", abstract="Background: Medication nonadherence is a costly problem that is common in clinical use and clinical trials alike, with significant adverse consequences. Digital pill systems have proved to be effective and safe solutions to the challenges of nonadherence, with documented success in improving adherence and health outcomes. Objective: The aim of this human factors validation study is to evaluate a novel digital pill system, the ID-Cap System from etectRx, for usability among patient users in a simulated real-world use environment. Methods: A total of 17 patients with diverse backgrounds who regularly take oral prescription medications were recruited. After training and a period of training decay, the participants were asked to complete 12 patient-use scenarios during which errors or difficulties were logged. The participants were also interviewed about their experiences with the ID-Cap System. Results: The participants ranged in age from 27 to 74 years (mean 51 years, SD 13.8 years), and they were heterogeneous in other demographic factors as well, such as education level, handedness, and sex. In this human factors validation study, the patient users completed 97.5\% (196/201) of the total use scenarios successfully; 75.1\% (151/201) were completed without any failures or errors. The participants found the ID-Cap System easy to use, and they were able to accurately and proficiently record ingestion events using the device. Conclusions: The participants demonstrated the ability to safely and effectively use the ID-Cap System for its intended use. The ID-Cap System has great potential as a useful tool for encouraging medication adherence and can be easily implemented by patient users. ", doi="10.2196/30786", url="https://humanfactors.jmir.org/2021/4/e30786", url="http://www.ncbi.nlm.nih.gov/pubmed/34747709" } @Article{info:doi/10.2196/27568, author="Keniston, Angela and McBeth, Lauren and Pell, Jonathan and Bowden, Kasey and Metzger, Anna and Nordhagen, Jamie and Anthony, Amanda and Rice, John and Burden, Marisha", title="The Effectiveness of a Multidisciplinary Electronic Discharge Readiness Tool: Prospective, Single-Center, Pre-Post Study", journal="JMIR Hum Factors", year="2021", month="Nov", day="8", volume="8", number="4", pages="e27568", keywords="discharge planning", keywords="health information technology", keywords="quasi-experimental design", keywords="multidisciplinary", keywords="teamwork", abstract="Background: In the face of hospital capacity strain, hospitals have developed multifaceted plans to try to improve patient flow. Many of these initiatives have focused on the timing of discharges and on lowering lengths of stay, and they have met with variable success. We deployed a novel tool in the electronic health record to enhance discharge communication. Objective: The aim of this study is to evaluate the effectiveness of a discharge communication tool. Methods: This was a prospective, single-center, pre-post study. Hospitalist physicians and advanced practice providers (APPs) used the Discharge Today Tool to update patient discharge readiness every morning and at any time the patient status changed throughout the day. Primary outcomes were tool use, time of day the clinician entered the discharge order, time of day the patient left the hospital, and hospital length of stay. We used linear mixed modeling and generalized linear mixed modeling, with team and discharging provider included in all the models to account for patients cared for by the same team and the same provider. Results: During the pilot implementation period from March 5, 2019, to July 31, 2019, a total of 4707 patients were discharged (compared with 4558 patients discharged during the preimplementation period). A total of 352 clinical staff had used the tool, and 84.85\% (3994/4707) of the patients during the pilot period had a discharge status assigned at least once. In a survey, most respondents reported that the tool was helpful (32/34, 94\% of clinical staff) and either saved time or did not add additional time to their workflow (21/24, 88\% of providers, and 34/34, 100\% of clinical staff). Although improvements were not observed in either unadjusted or adjusted analyses, after including starting morning census per team as an effect modifier, there was a reduction in the time of day the discharge order was entered into the electronic health record by the discharging physician and in the time of day the patient left the hospital (decrease of 2.9 minutes per additional patient, P=.07, and 3 minutes per additional patient, P=.07, respectively). As an effect modifier, for teams that included an APP, there was a significant reduction in the time of day the patient left the hospital beyond the reduction seen for teams without an APP (decrease of 19.1 minutes per patient, P=.04). Finally, in the adjusted analysis, hospital length of stay decreased by an average of 3.7\% (P=.06). Conclusions: The Discharge Today tool allows for real time documentation and sharing of discharge status. Our results suggest an overall positive response by care team members and that the tool may be useful for improving discharge time and length of stay if a team is staffed with an APP or in higher-census situations. ", doi="10.2196/27568", url="https://humanfactors.jmir.org/2021/4/e27568", url="http://www.ncbi.nlm.nih.gov/pubmed/34747702" } @Article{info:doi/10.2196/26426, author="Sung, MinDong and Hahn, Sangchul and Han, Hoon Chang and Lee, Mo Jung and Lee, Jayoung and Yoo, Jinkyu and Heo, Jay and Kim, Sam Young and Chung, Soo Kyung", title="Event Prediction Model Considering Time and Input Error Using Electronic Medical Records in the Intensive Care Unit: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Nov", day="4", volume="9", number="11", pages="e26426", keywords="machine learning", keywords="critical care", keywords="prediction model", keywords="intensive care unit", keywords="mortality", keywords="AKI", keywords="sepsis", abstract="Background: In the era of artificial intelligence, event prediction models are abundant. However, considering the limitation of the electronic medical record--based model, including the temporally skewed prediction and the record itself, these models could be delayed or could yield errors. Objective: In this study, we aim to develop multiple event prediction models in intensive care units to overcome their temporal skewness and evaluate their robustness against delayed and erroneous input. Methods: A total of 21,738 patients were included in the development cohort. Three events---death, sepsis, and acute kidney injury---were predicted. To overcome the temporal skewness, we developed three models for each event, which predicted the events in advance of three prespecified timepoints. Additionally, to evaluate the robustness against input error and delays, we added simulated errors and delayed input and calculated changes in the area under the receiver operating characteristic curve (AUROC) values. Results: Most of the AUROC and area under the precision-recall curve values of each model were higher than those of the conventional scores, as well as other machine learning models previously used. In the error input experiment, except for our proposed model, an increase in the noise added to the model lowered the resulting AUROC value. However, the delayed input did not show the performance decreased in this experiment. Conclusions: For a prediction model that was applicable in the real world, we considered not only performance but also temporal skewness, delayed input, and input error. ", doi="10.2196/26426", url="https://medinform.jmir.org/2021/11/e26426", url="http://www.ncbi.nlm.nih.gov/pubmed/34734837" } @Article{info:doi/10.2196/28999, author="Rodrigo, Hansapani and Beukes, W. Eldr{\'e} and Andersson, Gerhard and Manchaiah, Vinaya", title="Exploratory Data Mining Techniques (Decision Tree Models) for Examining the Impact of Internet-Based Cognitive Behavioral Therapy for Tinnitus: Machine Learning Approach", journal="J Med Internet Res", year="2021", month="Nov", day="2", volume="23", number="11", pages="e28999", keywords="tinnitus", keywords="internet interventions", keywords="digital therapeutics", keywords="cognitive behavioral therapy", keywords="artificial intelligence", keywords="machine learning", keywords="data mining", keywords="decision tree", keywords="random forest", abstract="Background: There is huge variability in the way that individuals with tinnitus respond to interventions. These experiential variations, together with a range of associated etiologies, contribute to tinnitus being a highly heterogeneous condition. Despite this heterogeneity, a ``one size fits all'' approach is taken when making management recommendations. Although there are various management approaches, not all are equally effective. Psychological approaches such as cognitive behavioral therapy have the most evidence base. Managing tinnitus is challenging due to the significant variations in tinnitus experiences and treatment successes. Tailored interventions based on individual tinnitus profiles may improve outcomes. Predictive models of treatment success are, however, lacking. Objective: This study aimed to use exploratory data mining techniques (ie, decision tree models) to identify the variables associated with the treatment success of internet-based cognitive behavioral therapy (ICBT) for tinnitus. Methods: Individuals (N=228) who underwent ICBT in 3 separate clinical trials were included in this analysis. The primary outcome variable was a reduction of 13 points in tinnitus severity, which was measured by using the Tinnitus Functional Index following the intervention. The predictor variables included demographic characteristics, tinnitus and hearing-related variables, and clinical factors (ie, anxiety, depression, insomnia, hyperacusis, hearing disability, cognitive function, and life satisfaction). Analyses were undertaken by using various exploratory machine learning algorithms to identify the most influencing variables. In total, 6 decision tree models were implemented, namely the classification and regression tree (CART), C5.0, GB, XGBoost, AdaBoost algorithm and random forest models. The Shapley additive explanations framework was applied to the two optimal decision tree models to determine relative predictor importance. Results: Among the six decision tree models, the CART (accuracy: mean 70.7\%, SD 2.4\%; sensitivity: mean 74\%, SD 5.5\%; specificity: mean 64\%, SD 3.7\%; area under the receiver operating characteristic curve [AUC]: mean 0.69, SD 0.001) and gradient boosting (accuracy: mean 71.8\%, SD 1.5\%; sensitivity: mean 78.3\%, SD 2.8\%; specificity: 58.7\%, SD 4.2\%; AUC: mean 0.68, SD 0.02) models were found to be the best predictive models. Although the other models had acceptable accuracy (range 56.3\%-66.7\%) and sensitivity (range 68.6\%-77.9\%), they all had relatively weak specificity (range 31.1\%-50\%) and AUCs (range 0.52-0.62). A higher education level was the most influencing factor for ICBT outcomes. The CART decision tree model identified 3 participant groups who had at least an 85\% success probability following the undertaking of ICBT. Conclusions: Decision tree models, especially the CART and gradient boosting models, appeared to be promising in predicting ICBT outcomes. Their predictive power may be improved by using larger sample sizes and including a wider range of predictive factors in future studies. ", doi="10.2196/28999", url="https://www.jmir.org/2021/11/e28999", url="http://www.ncbi.nlm.nih.gov/pubmed/34726612" } @Article{info:doi/10.2196/32726, author="Kim, Jeongmin and Lim, Hakyung and Ahn, Jae-Hyeon and Lee, Hwa Kyoung and Lee, Suk Kwang and Koo, Chul Kyo", title="Optimal Triage for COVID-19 Patients Under Limited Health Care Resources With a Parsimonious Machine Learning Prediction Model and Threshold Optimization Using Discrete-Event Simulation: Development Study", journal="JMIR Med Inform", year="2021", month="Nov", day="2", volume="9", number="11", pages="e32726", keywords="COVID-19", keywords="decision support techniques", keywords="machine learning", keywords="prediction", keywords="triage", abstract="Background: The COVID-19 pandemic has placed an unprecedented burden on health care systems. Objective: We aimed to effectively triage COVID-19 patients within situations of limited data availability and explore optimal thresholds to minimize mortality rates while maintaining health care system capacity. Methods: A nationwide sample of 5601 patients confirmed with COVID-19 until April 2020 was retrospectively reviewed. Extreme gradient boosting (XGBoost) and logistic regression analysis were used to develop prediction models for the maximum clinical severity during hospitalization, classified according to the World Health Organization Ordinal Scale for Clinical Improvement (OSCI). The recursive feature elimination technique was used to evaluate the maintenance of model performance when clinical and laboratory variables were eliminated. Using populations based on hypothetical patient influx scenarios, discrete-event simulation was performed to find an optimal threshold within limited resource environments that minimizes mortality rates. Results: The cross-validated area under the receiver operating characteristic curve (AUROC) of the baseline XGBoost model that utilized all 37 variables was 0.965 for OSCI ?6. Compared to the baseline model's performance, the AUROC of the feature-eliminated model that utilized 17 variables was maintained at 0.963 with statistical insignificance. Optimal thresholds were found to minimize mortality rates in a hypothetical patient influx scenario. The benefit of utilizing an optimal triage threshold was clear, reducing mortality up to 18.1\%, compared with the conventional Youden index. Conclusions: Our adaptive triage model and its threshold optimization capability revealed that COVID-19 management can be achieved via the cooperation of both the medical and health care management sectors for maximum treatment efficacy. The model is available online for clinical implementation. ", doi="10.2196/32726", url="https://medinform.jmir.org/2021/11/e32726", url="http://www.ncbi.nlm.nih.gov/pubmed/34609319" } @Article{info:doi/10.2196/29120, author="Zanotto, Stella Bruna and Beck da Silva Etges, Paula Ana and dal Bosco, Avner and Cortes, Gabriel Eduardo and Ruschel, Renata and De Souza, Claudia Ana and Andrade, V. Claudio M. and Viegas, Felipe and Canuto, Sergio and Luiz, Washington and Ouriques Martins, Sheila and Vieira, Renata and Polanczyk, Carisi and Andr{\'e} Gon{\c{c}}alves, Marcos", title="Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers", journal="JMIR Med Inform", year="2021", month="Nov", day="1", volume="9", number="11", pages="e29120", keywords="natural language processing", keywords="stroke", keywords="outcomes", keywords="electronic medical records", keywords="EHR", keywords="electronic health records", keywords="text processing", keywords="data mining", keywords="text classification", keywords="patient outcomes", abstract="Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results: The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71\% (17/24) of tasks, with an F1 score >80\% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations. ", doi="10.2196/29120", url="https://medinform.jmir.org/2021/11/e29120", url="http://www.ncbi.nlm.nih.gov/pubmed/34723829" } @Article{info:doi/10.2196/26524, author="Akbarian, Sina and Ghahjaverestan, Montazeri Nasim and Yadollahi, Azadeh and Taati, Babak", title="Noncontact Sleep Monitoring With Infrared Video Data to Estimate Sleep Apnea Severity and Distinguish Between Positional and Nonpositional Sleep Apnea: Model Development and Experimental Validation", journal="J Med Internet Res", year="2021", month="Nov", day="1", volume="23", number="11", pages="e26524", keywords="sleep apnea", keywords="deep learning", keywords="noncontact monitoring", keywords="computer vision", keywords="positional sleep apnea", keywords="3D convolutional neural network", keywords="3D-CNN", abstract="Background: Sleep apnea is a respiratory disorder characterized by frequent breathing cessation during sleep. Sleep apnea severity is determined by the apnea-hypopnea index (AHI), which is the hourly rate of respiratory events. In positional sleep apnea, the AHI is higher in the supine sleeping position than it is in other sleeping positions. Positional therapy is a behavioral strategy (eg, wearing an item to encourage sleeping toward the lateral position) to treat positional apnea. The gold standard of diagnosing sleep apnea and whether or not it is positional is polysomnography; however, this test is inconvenient, expensive, and has a long waiting list. Objective: The objective of this study was to develop and evaluate a noncontact method to estimate sleep apnea severity and to distinguish positional versus nonpositional sleep apnea. Methods: A noncontact deep-learning algorithm was developed to analyze infrared video of sleep for estimating AHI and to distinguish patients with positional vs nonpositional sleep apnea. Specifically, a 3D convolutional neural network (CNN) architecture was used to process movements extracted by optical flow to detect respiratory events. Positional sleep apnea patients were subsequently identified by combining the AHI information provided by the 3D-CNN model with the sleeping position (supine vs lateral) detected via a previously developed CNN model. Results: The algorithm was validated on data of 41 participants, including 26 men and 15 women with a mean age of 53 (SD 13) years, BMI of 30 (SD 7), AHI of 27 (SD 31) events/hour, and sleep duration of 5 (SD 1) hours; 20 participants had positional sleep apnea, 15 participants had nonpositional sleep apnea, and the positional status could not be discriminated for the remaining 6 participants. AHI values estimated by the 3D-CNN model correlated strongly and significantly with the gold standard (Spearman correlation coefficient 0.79, P<.001). Individuals with positional sleep apnea (based on an AHI threshold of 15) were identified with 83\% accuracy and an F1-score of 86\%. Conclusions: This study demonstrates the possibility of using a camera-based method for developing an accessible and easy-to-use device for screening sleep apnea at home, which can be provided in the form of a tablet or smartphone app. ", doi="10.2196/26524", url="https://www.jmir.org/2021/11/e26524", url="http://www.ncbi.nlm.nih.gov/pubmed/34723817" } @Article{info:doi/10.2196/28763, author="Teramoto, Kei and Takeda, Toshihiro and Mihara, Naoki and Shimai, Yoshie and Manabe, Shirou and Kuwata, Shigeki and Kondoh, Hiroshi and Matsumura, Yasushi", title="Detecting Adverse Drug Events Through the Chronological Relationship Between the Medication Period and the Presence of Adverse Reactions From Electronic Medical Record Systems: Observational Study", journal="JMIR Med Inform", year="2021", month="Nov", day="1", volume="9", number="11", pages="e28763", keywords="real world data", keywords="electronic medical record", keywords="adverse drug event", abstract="Background: Medicines may cause various adverse reactions. An enormous amount of money and effort is spent investigating adverse drug events (ADEs) in clinical trials and postmarketing surveillance. Real-world data from multiple electronic medical records (EMRs) can make it easy to understand the ADEs that occur in actual patients. Objective: In this study, we generated a patient medication history database from physician orders recorded in EMRs, which allowed the period of medication to be clearly identified. Methods: We developed a method for detecting ADEs based on the chronological relationship between the presence of an adverse event and the medication period. To verify our method, we detected ADEs with alanine aminotransferase elevation in patients receiving aspirin, clopidogrel, and ticlopidine. The accuracy of the detection was evaluated with a chart review and by comparison with the Roussel Uclaf Causality Assessment Method (RUCAM), which is a standard method for detecting drug-induced liver injury. Results: The calculated rates of ADE with ALT elevation in patients receiving aspirin, clopidogrel, and ticlopidine were 3.33\% (868/26,059 patients), 3.70\% (188/5076 patients), and 5.69\% (226/3974 patients), respectively, which were in line with the rates of previous reports. We reviewed the medical records of the patients in whom ADEs were detected. Our method accurately predicted ADEs in 90\% (27/30patients) treated with aspirin, 100\% (9/9 patients) treated with clopidogrel, and 100\% (4/4 patients) treated with ticlopidine. Only 3 ADEs that were detected by the RUCAM were not detected by our method. Conclusions: These findings demonstrate that the present method is effective for detecting ADEs based on EMR data. ", doi="10.2196/28763", url="https://medinform.jmir.org/2021/11/e28763", url="http://www.ncbi.nlm.nih.gov/pubmed/33993103" } @Article{info:doi/10.2196/25378, author="Tsuji, Shintaro and Wen, Andrew and Takahashi, Naoki and Zhang, Hongjian and Ogasawara, Katsuhiko and Jiang, Gouqian", title="Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study", journal="J Med Internet Res", year="2021", month="Oct", day="29", volume="23", number="10", pages="e25378", keywords="named entity recognition (NER)", keywords="natural language processing (NLP)", keywords="RadLex", keywords="ontology", keywords="stem term", abstract="Background: Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns. Objective: The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports. Methods: We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms--enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR). Results: The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9\% (precision 73.3\% and recall 19.6\%) and that of the combined CtED was 63.1\% (precision 82.8\% and recall 51\%). The OR indicated that the stem terms of effusion, node, tube, and disease were used frequently, but it still lacks capturing compound terms. The MR showed that 71.85\% (9411/13,098) of the stem terms matched with that of the ontologies, and RadLex improved approximately 22\% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using the ontologies. Conclusions: We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies. ", doi="10.2196/25378", url="https://www.jmir.org/2021/10/e25378", url="http://www.ncbi.nlm.nih.gov/pubmed/34714247" } @Article{info:doi/10.2196/25460, author="Li, Po-Hung Lieber and Han, Ji-Yan and Zheng, Wei-Zhong and Huang, Ren-Jie and Lai, Ying-Hui", title="Improved Environment-Aware--Based Noise Reduction System for Cochlear Implant Users Based on a Knowledge Transfer Approach: Development and Usability Study", journal="J Med Internet Res", year="2021", month="Oct", day="28", volume="23", number="10", pages="e25460", keywords="cochlear implants", keywords="noise reduction", keywords="deep learning", keywords="noise classification", keywords="hearing", keywords="deaf", keywords="sound", keywords="audio", keywords="cochlear", abstract="Background: Cochlear implant technology is a well-known approach to help deaf individuals hear speech again and can improve speech intelligibility in quiet conditions; however, it still has room for improvement in noisy conditions. More recently, it has been proven that deep learning--based noise reduction, such as noise classification and deep denoising autoencoder (NC+DDAE), can benefit the intelligibility performance of patients with cochlear implants compared to classical noise reduction algorithms. Objective: Following the successful implementation of the NC+DDAE model in our previous study, this study aimed to propose an advanced noise reduction system using knowledge transfer technology, called NC+DDAE\_T; examine the proposed NC+DDAE\_T noise reduction system using objective evaluations and subjective listening tests; and investigate which layer substitution of the knowledge transfer technology in the NC+DDAE\_T noise reduction system provides the best outcome. Methods: The knowledge transfer technology was adopted to reduce the number of parameters of the NC+DDAE\_T compared with the NC+DDAE. We investigated which layer should be substituted using short-time objective intelligibility and perceptual evaluation of speech quality scores as well as t-distributed stochastic neighbor embedding to visualize the features in each model layer. Moreover, we enrolled 10 cochlear implant users for listening tests to evaluate the benefits of the newly developed NC+DDAE\_T. Results: The experimental results showed that substituting the middle layer (ie, the second layer in this study) of the noise-independent DDAE (NI-DDAE) model achieved the best performance gain regarding short-time objective intelligibility and perceptual evaluation of speech quality scores. Therefore, the parameters of layer 3 in the NI-DDAE were chosen to be replaced, thereby establishing the NC+DDAE\_T. Both objective and listening test results showed that the proposed NC+DDAE\_T noise reduction system achieved similar performances compared with the previous NC+DDAE in several noisy test conditions. However, the proposed NC+DDAE\_T only required a quarter of the number of parameters compared to the NC+DDAE. Conclusions: This study demonstrated that knowledge transfer technology can help reduce the number of parameters in an NC+DDAE while keeping similar performance rates. This suggests that the proposed NC+DDAE\_T model may reduce the implementation costs of this noise reduction system and provide more benefits for cochlear implant users. ", doi="10.2196/25460", url="https://www.jmir.org/2021/10/e25460", url="http://www.ncbi.nlm.nih.gov/pubmed/34709193" } @Article{info:doi/10.2196/19812, author="Liang, Chia-Wei and Yang, Hsuan-Chia and Islam, Mohaimenul Md and Nguyen, Alex Phung Anh and Feng, Yi-Ting and Hou, Yu Ze and Huang, Chih-Wei and Poly, Nasrin Tahmina and Li, Jack Yu-Chuan", title="Predicting Hepatocellular Carcinoma With Minimal Features From Electronic Health Records: Development of a Deep Learning Model", journal="JMIR Cancer", year="2021", month="Oct", day="28", volume="7", number="4", pages="e19812", keywords="hepatocellular carcinoma", keywords="deep learning", keywords="risk prediction", keywords="convolution neural network", keywords="deep learning model", keywords="hepatoma", abstract="Background: Hepatocellular carcinoma (HCC), usually known as hepatoma, is the third leading cause of cancer mortality globally. Early detection of HCC helps in its treatment and increases survival rates. Objective: The aim of this study is to develop a deep learning model, using the trend and severity of each medical event from the electronic health record to accurately predict the patients who will be diagnosed with HCC in 1 year. Methods: Patients with HCC were screened out from the National Health Insurance Research Database of Taiwan between 1999 and 2013. To be included, the patients with HCC had to register as patients with cancer in the catastrophic illness file and had to be diagnosed as a patient with HCC in an inpatient admission. The control cases (non-HCC patients) were randomly sampled from the same database. We used age, gender, diagnosis code, drug code, and time information as the input variables of a convolution neural network model to predict those patients with HCC. We also inspected the highly weighted variables in the model and compared them to their odds ratio at HCC to understand how the predictive model works Results: We included 47,945 individuals, 9553 of whom were patients with HCC. The area under the receiver operating curve (AUROC) of the model for predicting HCC risk 1 year in advance was 0.94 (95\% CI 0.937-0.943), with a sensitivity of 0.869 and a specificity 0.865. The AUROC for predicting HCC patients 7 days, 6 months, 1 year, 2 years, and 3 years early were 0.96, 0.94, 0.94, 0.91, and 0.91, respectively. Conclusions: The findings of this study show that the convolutional neural network model has immense potential to predict the risk of HCC 1 year in advance with minimal features available in the electronic health records. ", doi="10.2196/19812", url="https://cancer.jmir.org/2021/4/e19812", url="http://www.ncbi.nlm.nih.gov/pubmed/34709180" } @Article{info:doi/10.2196/28752, author="Wang, Jie-Teng and Lin, Wen-Yang", title="Privacy-Preserving Anonymity for Periodical Releases of Spontaneous Adverse Drug Event Reporting Data: Algorithm Development and Validation", journal="JMIR Med Inform", year="2021", month="Oct", day="28", volume="9", number="10", pages="e28752", keywords="adverse drug reaction", keywords="data anonymization", keywords="incremental data publishing", keywords="privacy preserving data publishing", keywords="spontaneous reporting system", keywords="drug", keywords="data set", keywords="anonymous", keywords="privacy", keywords="security", keywords="algorithm", keywords="development", keywords="validation", keywords="data", abstract="Background: Spontaneous reporting systems (SRSs) have been increasingly established to collect adverse drug events for fostering adverse drug reaction (ADR) detection and analysis research. SRS data contain personal information, and so their publication requires data anonymization to prevent the disclosure of individuals' privacy. We have previously proposed a privacy model called MS(k, $\theta$*)-bounding and the associated MS-Anonymization algorithm to fulfill the anonymization of SRS data. In the real world, the SRS data usually are released periodically (eg, FDA Adverse Event Reporting System [FAERS]) to accommodate newly collected adverse drug events. Different anonymized releases of SRS data available to the attacker may thwart our single-release-focus method, that is, MS(k, $\theta$*)-bounding. Objective: We investigate the privacy threat caused by periodical releases of SRS data and propose anonymization methods to prevent the disclosure of personal privacy information while maintaining the utility of published data. Methods: We identify potential attacks on periodical releases of SRS data, namely, BFL-attacks, mainly caused by follow-up cases. We present a new privacy model called PPMS(k, $\theta$*)-bounding, and propose the associated PPMS-Anonymization algorithm and 2 improvements: PPMS+-Anonymization and PPMS++-Anonymization. Empirical evaluations were performed using 32 selected FAERS quarter data sets from 2004Q1 to 2011Q4. The performance of the proposed versions of PPMS-Anonymization was inspected against MS-Anonymization from some aspects, including data distortion, measured by normalized information loss; privacy risk of anonymized data, measured by dangerous identity ratio and dangerous sensitivity ratio; and data utility, measured by the bias of signal counting and strength (proportional reporting ratio). Results: The best version of PPMS-Anonymization, PPMS++-Anonymization, achieves nearly the same quality as MS-Anonymization in both privacy protection and data utility. Overall, PPMS++-Anonymization ensures zero privacy risk on record and attribute linkage, and exhibits 51\%-78\% and 59\%-82\% improvements on information loss over PPMS+-Anonymization and PPMS-Anonymization, respectively, and significantly reduces the bias of ADR signal. Conclusions: The proposed PPMS(k, $\theta$*)-bounding model and PPMS-Anonymization algorithm are effective in anonymizing SRS data sets in the periodical data publishing scenario, preventing the series of releases from disclosing personal sensitive information caused by BFL-attacks while maintaining the data utility for ADR signal detection. ", doi="10.2196/28752", url="https://medinform.jmir.org/2021/10/e28752", url="http://www.ncbi.nlm.nih.gov/pubmed/34709197" } @Article{info:doi/10.2196/30093, author="Veludhandi, Anirudh and Ross, Diana and Sinha, B. Cynthia and McCracken, Courtney and Bakshi, Nitya and Krishnamurti, Lakshmanan", title="A Decision Support Tool for Allogeneic Hematopoietic Stem Cell Transplantation for Children With Sickle Cell Disease: Acceptability and Usability Study", journal="JMIR Form Res", year="2021", month="Oct", day="28", volume="5", number="10", pages="e30093", keywords="decision support tool", keywords="sickle cell disease", keywords="mobile application", keywords="mHealth", keywords="pediatrics", keywords="transplant", keywords="mobile phone", abstract="Background: Individuals living with sickle cell disease (SCD) may benefit from a variety of disease-modifying therapies, including hydroxyurea, voxelotor, crizanlizumab, L-glutamine, and chronic blood transfusions. However, allogeneic hematopoietic stem cell transplantation (HCT) remains the only nonexperimental treatment with curative intent. As HCT outcomes can be influenced by the complex interaction of several risk factors, HCT can be a difficult decision for health care providers to make for their patients with SCD. Objective: The aim of this study is to determine the acceptability and usability of a prototype decision support tool for health care providers in decision-making about HCT for SCD, together with patients and their families. Methods: On the basis of published transplant registry data, we developed the Sickle Options Decision Support Tool for Children, which provides health care providers with personalized transplant survival and risk estimates for their patients to help them make informed decisions regarding their patients' management of SCD. To evaluate the tool for its acceptability and usability, we conducted beta tests of the tool and surveys with physicians using the Ottawa Decision Support Framework and mobile health app usability questionnaire, respectively. Results: According to the mobile health app usability questionnaire survey findings, the overall usability of the tool was high (mean 6.15, SD 0.79; range 4.2-7). According to the Ottawa Decision Support Framework survey findings, acceptability of the presentation of information on the decision support tool was also high (mean 2.94, SD 0.63; range 2-4), but the acceptability regarding the amount of information was mixed (mean 2.59, SD 0.5; range 2-3). Most participants expressed that they would use the tool in their own patient consults (13/15, 87\%) and suggested that the tool would ease the decision-making process regarding HCT (8/9, 89\%). The 4 major emergent themes from the qualitative analysis of participant beta tests include user interface, data content, usefulness during a patient consult, and potential for a patient-focused decision aid. Most participants supported the idea of a patient-focused decision aid but recommended that it should include more background on HCT and a simplification of medical terminology. Conclusions: We report the development, acceptability, and usability of a prototype decision support tool app to provide individualized risk and survival estimates to patients interested in HCT in a patient consultation setting. We propose to finalize the tool by validating predictive analytics using a large data set of patients with SCD who have undergone HCT. Such a tool may be useful in promoting physician-patient collaboration in making shared decisions regarding HCT for SCD. Further incorporation of patient-specific measures, including the HCT comorbidity index and the quality of life after transplant, may improve the applicability of the decision support tool in a health care setting. ", doi="10.2196/30093", url="https://formative.jmir.org/2021/10/e30093", url="http://www.ncbi.nlm.nih.gov/pubmed/34709190" } @Article{info:doi/10.2196/27671, author="van Poelgeest, Rube and Schrijvers, Augustinus and Boonstra, Albert and Roes, Kit", title="Medical Specialists' Perspectives on the Influence of Electronic Medical Record Use on the Quality of Hospital Care: Semistructured Interview Study", journal="JMIR Hum Factors", year="2021", month="Oct", day="27", volume="8", number="4", pages="e27671", keywords="electronic medical record (emr)", keywords="hospitals", keywords="quality", keywords="health care", keywords="medical specialist", abstract="Background: Numerous publications show that electronic medical records (EMRs) may make an important contribution to increasing the quality of care. There are indications that particularly the medical specialist plays an important role in the use of EMRs in hospitals. Objective: The aim of this study was to examine how, and by which aspects, the relationship between EMR use and the quality of care in hospitals is influenced according to medical specialists. Methods: To answer this question, a qualitative study was conducted in the period of August-October 2018. Semistructured interviews of around 90 min were conducted with 11 medical specialists from 11 different Dutch hospitals. For analysis of the answers, we used a previously published taxonomy of factors that can influence the use of EMRs. Results: The professional experience of the participating medical specialists varied between 5 and 27 years. Using the previously published taxonomy, these medical specialists considered technical barriers the most significant for EMR use. The suboptimal change processes surrounding implementation were also perceived as a major barrier. A final major problem is related to the categories ``social'' (their relationships with the patients and fellow care providers), ``psychological'' (based on their personal issues, knowledge, and perceptions), and ``time'' (the time required to select, implement, and learn how to use EMR systems and subsequently enter data into the system). However, the medical specialists also identified potential technical facilitators, particularly in the assured availability of information to all health care professionals involved in the care of a patient. They see promise in using EMRs for medical decision support to improve the quality of care but consider these possibilities currently lacking. Conclusions: The 11 medical specialists shared positive experiences with EMR use when comparing it to formerly used paper records. The fact that involved health care professionals can access patient data at any time they need is considered important. However, in practice, potential quality improvement lags as long as decision support cannot be applied because of the lack of a fully coded patient record. ", doi="10.2196/27671", url="https://humanfactors.jmir.org/2021/4/e27671", url="http://www.ncbi.nlm.nih.gov/pubmed/34704955" } @Article{info:doi/10.2196/28723, author="Belli, M. Hayley and Troxel, B. Andrea and Blecker, B. Saul and Anderman, Judd and Wong, Christina and Martinez, R. Tiffany and Mann, M. Devin", title="A Behavioral Economics--Electronic Health Record Module to Promote Appropriate Diabetes Management in Older Adults: Protocol for a Pragmatic Cluster Randomized Controlled Trial", journal="JMIR Res Protoc", year="2021", month="Oct", day="27", volume="10", number="10", pages="e28723", keywords="diabetes", keywords="behavioral economics", keywords="electronic health records", keywords="clinical decision support", keywords="randomized controlled trial", keywords="pragmatic", abstract="Background: The integration of behavioral economics (BE) principles and electronic health records (EHRs) using clinical decision support (CDS) tools is a novel approach to improving health outcomes. Meanwhile, the American Geriatrics Society has created the Choosing Wisely (CW) initiative to promote less aggressive glycemic targets and reduction in pharmacologic therapy in older adults with type 2 diabetes mellitus. To date, few studies have shown the effectiveness of combined BE and EHR approaches for managing chronic conditions, and none have addressed guideline-driven deprescribing specifically in type 2 diabetes. We previously conducted a pilot study aimed at promoting appropriate CW guideline adherence using BE nudges and EHRs embedded within CDS tools at 5 clinics within the New York University Langone Health (NYULH) system. The BE-EHR module intervention was tested for usability, adoption, and early effectiveness. Preliminary results suggested a modest improvement of 5.1\% in CW compliance. Objective: This paper presents the protocol for a study that will investigate the effectiveness of a BE-EHR module intervention that leverages BE nudges with EHR technology and CDS tools to reduce overtreatment of type 2 diabetes in adults aged 76 years and older, per the CW guideline. Methods: A pragmatic, investigator-blind, cluster randomized controlled trial was designed to evaluate the BE-EHR module. A total of 66 NYULH clinics will be randomized 1:1 to receive for 18 months either (1) a 6-component BE-EHR module intervention + standard care within the NYULH EHR, or (2) standard care only. The intervention will be administered to clinicians during any patient encounter (eg, in person, telemedicine, medication refill, etc). The primary outcome will be patient-level CW compliance. Secondary outcomes will measure the frequency of intervention component firings within the NYULH EHR, and provider utilization and interaction with the BE-EHR module components. Results: Study recruitment commenced on December 7, 2020, with the activation of all 6 BE-EHR components in the NYULH EHR. Conclusions: This study will test the effectiveness of a previously developed, iteratively refined, user-tested, and pilot-tested BE-EHR module aimed at providing appropriate diabetes care to elderly adults, compared to usual care via a cluster randomized controlled trial. This innovative research will be the first pragmatic randomized controlled trial to use BE principles embedded within the EHR and delivered using CDS tools to specifically promote CW guideline adherence in type 2 diabetes. The study will also collect valuable information on clinician workflow and interaction with the BE-EHR module, guiding future research in optimizing the timely delivery of BE nudges within CDS tools. This work will address the effectiveness of BE-inspired interventions in diabetes and chronic disease management. Trial Registration: ClinicalTrials.gov NCT04181307; https://clinicaltrials.gov/ct2/show/NCT04181307 International Registered Report Identifier (IRRID): DERR1-10.2196/28723 ", doi="10.2196/28723", url="https://www.researchprotocols.org/2021/10/e28723", url="http://www.ncbi.nlm.nih.gov/pubmed/34704959" } @Article{info:doi/10.2196/28618, author="Reese, J. Thomas and Del Fiol, Guilherme and Morgan, Keaton and Hurwitz, T. Jason and Kawamoto, Kensaku and Gomez-Lumbreras, Ainhoa and Brown, L. Mary and Thiess, Henrik and Vazquez, R. Sara and Nelson, D. Scott and Boyce, Richard and Malone, Daniel", title="A Shared Decision-making Tool for Drug Interactions Between Warfarin and Nonsteroidal Anti-inflammatory Drugs: Design and Usability Study", journal="JMIR Hum Factors", year="2021", month="Oct", day="26", volume="8", number="4", pages="e28618", keywords="shared decision-making", keywords="user-centered design", keywords="drug interaction", keywords="clinical decision support", abstract="Background: Exposure to life-threatening drug-drug interactions (DDIs) occurs despite the widespread use of clinical decision support. The DDI between warfarin and nonsteroidal anti-inflammatory drugs is common and potentially life-threatening. Patients can play a substantial role in preventing harm from DDIs; however, the current model for DDI decision-making is clinician centric. Objective: This study aims to design and study the usability of DDInteract, a tool to support shared decision-making (SDM) between a patient and provider for the DDI between warfarin and nonsteroidal anti-inflammatory drugs. Methods: We used an SDM framework and user-centered design methods to guide the design and usability of DDInteract---an SDM electronic health record app to prevent harm from clinically significant DDIs. The design involved iterative prototypes, qualitative feedback from stakeholders, and a heuristic evaluation. The usability evaluation included patients and clinicians. Patients participated in a simulated SDM discussion using clinical vignettes. Clinicians were asked to complete eight tasks using DDInteract and to assess the tool using a survey adapted from the System Usability Scale. Results: The designed DDInteract prototype includes the following features: a patient-specific risk profile, dynamic risk icon array, patient education section, and treatment decision tree. A total of 4 patients and 11 clinicians participated in the usability study. After an SDM session where patients and clinicians review the tool concurrently, patients generally favored pain treatments with less risk of gastrointestinal bleeding. Clinicians successfully completed the tasks with a mean of 144 (SD 74) seconds and rated the usability of DDInteract as 4.32 (SD 0.52) of 5. Conclusions: This study expands the use of SDM to DDIs. The next steps are to determine if DDInteract can improve shared decision-making quality and to implement it across health systems using interoperable technology. ", doi="10.2196/28618", url="https://humanfactors.jmir.org/2021/4/e28618", url="http://www.ncbi.nlm.nih.gov/pubmed/34698649" } @Article{info:doi/10.2196/31616, author="Yung, Alan and Kay, Judy and Beale, Philip and Gibson, A. Kathryn and Shaw, Tim", title="Computer-Based Decision Tools for Shared Therapeutic Decision-making in Oncology: Systematic Review", journal="JMIR Cancer", year="2021", month="Oct", day="26", volume="7", number="4", pages="e31616", keywords="oncology", keywords="cancer", keywords="computer-based", keywords="decision support", keywords="decision-making", keywords="system", keywords="tool", keywords="machine learning", keywords="artificial intelligence", keywords="uncertainty", keywords="shared decision-making", abstract="Background: Therapeutic decision-making in oncology is a complex process because physicians must consider many forms of medical data and protocols. Another challenge for physicians is to clearly communicate their decision-making process to patients to ensure informed consent. Computer-based decision tools have the potential to play a valuable role in supporting this process. Objective: This systematic review aims to investigate the extent to which computer-based decision tools have been successfully adopted in oncology consultations to improve patient-physician joint therapeutic decision-making. Methods: This review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 checklist and guidelines. A literature search was conducted on February 4, 2021, across the Cochrane Database of Systematic Reviews (from 2005 to January 28, 2021), the Cochrane Central Register of Controlled Trials (December 2020), MEDLINE (from 1946 to February 4, 2021), Embase (from 1947 to February 4, 2021), Web of Science (from 1900 to 2021), Scopus (from 1969 to 2021), and PubMed (from 1991 to 2021). We used a snowball approach to identify additional studies by searching the reference lists of the studies included for full-text review. Additional supplementary searches of relevant journals and gray literature websites were conducted. The reviewers screened the articles eligible for review for quality and inclusion before data extraction. Results: There are relatively few studies looking at the use of computer-based decision tools in oncology consultations. Of the 4431 unique articles obtained from the searches, only 10 (0.22\%) satisfied the selection criteria. From the 10 selected studies, 8 computer-based decision tools were identified. Of the 10 studies, 6 (60\%) were conducted in the United States. Communication and information-sharing were improved between physicians and patients. However, physicians did not change their habits to take advantage of computer-assisted decision-making tools or the information they provide. On average, the use of these computer-based decision tools added approximately 5 minutes to the total length of consultations. In addition, some physicians felt that the technology increased patients' anxiety. Conclusions: Of the 10 selected studies, 6 (60\%) demonstrated positive outcomes, 1 (10\%) showed negative results, and 3 (30\%) were neutral. Adoption of computer-based decision tools during oncology consultations continues to be low. This review shows that information-sharing and communication between physicians and patients can be improved with the assistance of technology. However, the lack of integration with electronic health records is a barrier. This review provides key requirements for enhancing the chance of success of future computer-based decision tools. However, it does not show the effects of health care policies, regulations, or business administration on physicians' propensity to adopt the technology. Nevertheless, it is important that future research address the influence of these higher-level factors as well. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42021226087; https://www.crd.york.ac.uk/prospero/display\_record.php?ID=CRD42021226087 ", doi="10.2196/31616", url="https://cancer.jmir.org/2021/4/e31616", url="http://www.ncbi.nlm.nih.gov/pubmed/34544680" } @Article{info:doi/10.2196/28235, author="Rizk, Elsie and Swan, T. Joshua", title="Development, Validation, and Assessment of Clinical Impact of Real-time Alerts to Detect Inpatient As-Needed Opioid Orders With Duplicate Indications: Prospective Study", journal="J Med Internet Res", year="2021", month="Oct", day="25", volume="23", number="10", pages="e28235", keywords="opioid stewardship", keywords="pain", keywords="as-needed opioids", keywords="duplicate orders", keywords="automated alerts", abstract="Background: As-needed (PRN) opioid orders with duplicate indications can lead to medication errors and opioid-related adverse drug events. Objective: The objective of our study was to build and validate real-time alerts that detect duplicate PRN opioid orders and assist clinicians in optimizing the safety of opioid orders. Methods: This single-center, prospective study used an iterative, 3-step process to refine alert performance by advancing from small sample evaluations of positive predictive values (PPVs) (step 1) through intensive evaluations of accuracy (step 2) to evaluations of clinical impact (step 3). Validation cohorts were randomly sampled from eligible patients for each step. Results: During step 1, the PPV was 100\% (one-sided, 97.5\% CI 70\%-100\%) for moderate and severe pain alerts. During step 2, duplication of 1 or more PRN opioid orders was identified for 17\% (34/201; 95\% CI, 12\%-23\%) of patients during chart review. This bundle of alerts showed 94\% sensitivity (95\% CI 80\%-99\%) and 96\% specificity (95\% CI 92\%-98\%) for identifying patients who had duplicate PRN opioid orders. During step 3, at least 1 intervention was made to the medication profile for 77\% (46/60; 95\% CI 64\%-87\%) of patients, and at least 1 inappropriate duplicate PRN opioid order was discontinued for 53\% (32/60; 95\% CI 40\%-66\%) of patients. Conclusions: The bundle of alerts developed in this study was validated against chart review by a pharmacist and identified patients who benefited from medication safety interventions to optimize PRN opioid orders. ", doi="10.2196/28235", url="https://www.jmir.org/2021/10/e28235", url="http://www.ncbi.nlm.nih.gov/pubmed/34694235" } @Article{info:doi/10.2196/31862, author="Popescu, Christina and Golden, Grace and Benrimoh, David and Tanguay-Sela, Myriam and Slowey, Dominique and Lundrigan, Eryn and Williams, J{\'e}r{\^o}me and Desormeau, Bennet and Kardani, Divyesh and Perez, Tamara and Rollins, Colleen and Israel, Sonia and Perlman, Kelly and Armstrong, Caitrin and Baxter, Jacob and Whitmore, Kate and Fradette, Marie-Jeanne and Felcarek-Hope, Kaelan and Soufi, Ghassen and Fratila, Robert and Mehltretter, Joseph and Looper, Karl and Steiner, Warren and Rej, Soham and Karp, F. Jordan and Heller, Katherine and Parikh, V. Sagar and McGuire-Snieckus, Rebecca and Ferrari, Manuela and Margolese, Howard and Turecki, Gustavo", title="Evaluating the Clinical Feasibility of an Artificial Intelligence--Powered, Web-Based Clinical Decision Support System for the Treatment of Depression in Adults: Longitudinal Feasibility Study", journal="JMIR Form Res", year="2021", month="Oct", day="25", volume="5", number="10", pages="e31862", keywords="clinical decision support system", keywords="major depressive disorder", keywords="artificial intelligence", keywords="feasibility", keywords="usability", keywords="mobile phone", abstract="Background: Approximately two-thirds of patients with major depressive disorder do not achieve remission during their first treatment. There has been increasing interest in the use of digital, artificial intelligence--powered clinical decision support systems (CDSSs) to assist physicians in their treatment selection and management, improving the personalization and use of best practices such as measurement-based care. Previous literature shows that for digital mental health tools to be successful, the tool must be easy for patients and physicians to use and feasible within existing clinical workflows. Objective: This study aims to examine the feasibility of an artificial intelligence--powered CDSS, which combines the operationalized 2016 Canadian Network for Mood and Anxiety Treatments guidelines with a neural network--based individualized treatment remission prediction. Methods: Owing to the COVID-19 pandemic, the study was adapted to be completed entirely remotely. A total of 7 physicians recruited outpatients diagnosed with major depressive disorder according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition criteria. Patients completed a minimum of one visit without the CDSS (baseline) and 2 subsequent visits where the CDSS was used by the physician (visits 1 and 2). The primary outcome of interest was change in appointment length after the introduction of the CDSS as a proxy for feasibility. Feasibility and acceptability data were collected through self-report questionnaires and semistructured interviews. Results: Data were collected between January and November 2020. A total of 17 patients were enrolled in the study; of the 17 patients, 14 (82\%) completed the study. There was no significant difference in appointment length between visits (introduction of the tool did not increase appointment length; F2,24=0.805; mean squared error 58.08; P=.46). In total, 92\% (12/13) of patients and 71\% (5/7) of physicians felt that the tool was easy to use; 62\% (8/13) of patients and 71\% (5/7) of physicians rated that they trusted the CDSS. Of the 13 patients, 6 (46\%) felt that the patient-clinician relationship significantly or somewhat improved, whereas 7 (54\%) felt that it did not change. Conclusions: Our findings confirm that the integration of the tool does not significantly increase appointment length and suggest that the CDSS is easy to use and may have positive effects on the patient-physician relationship for some patients. The CDSS is feasible and ready for effectiveness studies. Trial Registration: ClinicalTrials.gov NCT04061642; http://clinicaltrials.gov/ct2/show/NCT04061642 ", doi="10.2196/31862", url="https://formative.jmir.org/2021/10/e31862", url="http://www.ncbi.nlm.nih.gov/pubmed/34694234" } @Article{info:doi/10.2196/28039, author="Yan, Jianjun and Cai, Xianglei and Chen, Songye and Guo, Rui and Yan, Haixia and Wang, Yiqin", title="Ensemble Learning-Based Pulse Signal Recognition: Classification Model Development Study", journal="JMIR Med Inform", year="2021", month="Oct", day="21", volume="9", number="10", pages="e28039", keywords="wrist pulse", keywords="ensemble learning", keywords="support vector machine", keywords="deep convolutional neural network", keywords="pulse signal", keywords="machine learning", keywords="traditional Chinese medicine", keywords="pulse classification", keywords="pulse analysis", keywords="fully connected neural network", keywords="synthetic minority oversampling technique", keywords="feature extraction", abstract="Background: In pulse signal analysis and identification, time domain and time frequency domain analysis methods can obtain interpretable structured data and build classification models using traditional machine learning methods. Unstructured data, such as pulse signals, contain rich information about the state of the cardiovascular system, and local features of unstructured data can be extracted and classified using deep learning. Objective: The objective of this paper was to comprehensively use machine learning and deep learning classification methods to fully exploit the information about pulse signals. Methods: Structured data were obtained by using time domain and time frequency domain analysis methods. A classification model was built using a support vector machine (SVM), a deep convolutional neural network (DCNN) kernel was used to extract local features of the unstructured data, and the stacking method was used to fuse the above classification results for decision making. Results: The highest average accuracy of 0.7914 was obtained using only a single classifier, while the average accuracy obtained using the ensemble learning approach was 0.8330. Conclusions: Ensemble learning can effectively use information from structured and unstructured data to improve classification accuracy through decision-level fusion. This study provides a new idea and method for pulse signal classification, which is of practical value for pulse diagnosis objectification. ", doi="10.2196/28039", url="https://medinform.jmir.org/2021/10/e28039", url="http://www.ncbi.nlm.nih.gov/pubmed/34673537" } @Article{info:doi/10.2196/33192, author="Li, Mengyang and Cai, Hailing and Nan, Shan and Li, Jialin and Lu, Xudong and Duan, Huilong", title="A Patient-Screening Tool for Clinical Research Based on Electronic Health Records Using OpenEHR: Development Study", journal="JMIR Med Inform", year="2021", month="Oct", day="21", volume="9", number="10", pages="e33192", keywords="openEHR", keywords="patient screening", keywords="electronic health record", keywords="clinical research", abstract="Background: The widespread adoption of electronic health records (EHRs) has facilitated the secondary use of EHR data for clinical research. However, screening eligible patients from EHRs is a challenging task. The concepts in eligibility criteria are not completely matched with EHRs, especially derived concepts. The lack of high-level expression of Structured Query Language (SQL) makes it difficult and time consuming to express them. The openEHR Expression Language (EL) as a domain-specific language based on clinical information models shows promise to represent complex eligibility criteria. Objective: The study aims to develop a patient-screening tool based on EHRs for clinical research using openEHR to solve concept mismatch and improve query performance. Methods: A patient-screening tool based on EHRs using openEHR was proposed. It uses the advantages of information models and EL in openEHR to provide high-level expressions and improve query performance. First, openEHR archetypes and templates were chosen to define concepts called simple concepts directly from EHRs. Second, openEHR EL was used to generate derived concepts by combining simple concepts and constraints. Third, a hierarchical index corresponding to archetypes in Elasticsearch (ES) was generated to improve query performance for subqueries and join queries related to the derived concepts. Finally, we realized a patient-screening tool for clinical research. Results: In total, 500 sentences randomly selected from 4691 eligibility criteria in 389 clinical trials on stroke from the Chinese Clinical Trial Registry (ChiCTR) were evaluated. An openEHR-based clinical data repository (CDR) in a grade A tertiary hospital in China was considered as an experimental environment. Based on these, 589 medical concepts were found in the 500 sentences. Of them, 513 (87.1\%) concepts could be represented, while the others could not be, because of a lack of information models and coarse-grained requirements. In addition, our case study on 6 queries demonstrated that our tool shows better query performance among 4 cases (66.67\%). Conclusions: We developed a patient-screening tool using openEHR. It not only helps solve concept mismatch but also improves query performance to reduce the burden on researchers. In addition, we demonstrated a promising solution for secondary use of EHR data using openEHR, which can be referenced by other researchers. ", doi="10.2196/33192", url="https://medinform.jmir.org/2021/10/e33192", url="http://www.ncbi.nlm.nih.gov/pubmed/34673526" } @Article{info:doi/10.2196/26486, author="Goh, Huat Kim and Wang, Le and Yeow, Kwang Adrian Yong and Ding, Yoong Yew and Au, Yi Lydia Shu and Poh, Niang Hermione Mei and Li, Ke and Yeow, Lin Joannas Jie and Tan, Heng Gamaliel Yu", title="Prediction of Readmission in Geriatric Patients From Clinical Notes: Retrospective Text Mining Study", journal="J Med Internet Res", year="2021", month="Oct", day="19", volume="23", number="10", pages="e26486", keywords="geriatrics", keywords="readmission risk", keywords="artificial intelligence", keywords="text mining", keywords="psychosocial factors", abstract="Background: Prior literature suggests that psychosocial factors adversely impact health and health care utilization outcomes. However, psychosocial factors are typically not captured by the structured data in electronic medical records (EMRs) but are rather recorded as free text in different types of clinical notes. Objective: We here propose a text-mining approach to analyze EMRs to identify older adults with key psychosocial factors that predict adverse health care utilization outcomes, measured by 30-day readmission. The psychological factors were appended to the LACE (Length of stay, Acuity of the admission, Comorbidity of the patient, and Emergency department use) Index for Readmission to improve the prediction of readmission risk. Methods: We performed a retrospective analysis using EMR notes of 43,216 hospitalization encounters in a hospital from January 1, 2017 to February 28, 2019. The mean age of the cohort was 67.51 years (SD 15.87), the mean length of stay was 5.57 days (SD 10.41), and the mean intensive care unit stay was 5\% (SD 22\%). We employed text-mining techniques to extract psychosocial topics that are representative of these patients and tested the utility of these topics in predicting 30-day hospital readmission beyond the predictive value of the LACE Index for Readmission. Results: The added text-mined factors improved the area under the receiver operating characteristic curve of the readmission prediction by 8.46\% for geriatric patients, 6.99\% for the general hospital population, and 6.64\% for frequent admitters. Medical social workers and case managers captured more of the psychosocial text topics than physicians. Conclusions: The results of this study demonstrate the feasibility of extracting psychosocial factors from EMR clinical notes and the value of these notes in improving readmission risk prediction. Psychosocial profiles of patients can be curated and quantified from text mining clinical notes and these profiles can be successfully applied to artificial intelligence models to improve readmission risk prediction. ", doi="10.2196/26486", url="https://www.jmir.org/2021/10/e26486", url="http://www.ncbi.nlm.nih.gov/pubmed/34665149" } @Article{info:doi/10.2196/32303, author="Luu, S. Hung and Filkins, M. Laura and Park, Y. Jason and Rakheja, Dinesh and Tweed, Jefferson and Menzies, Christopher and Wang, J. Vincent and Mittal, Vineeta and Lehmann, U. Christoph and Sebert, E. Michael", title="Harnessing the Electronic Health Record and Computerized Provider Order Entry Data for Resource Management During the COVID-19 Pandemic: Development of a Decision Tree", journal="JMIR Med Inform", year="2021", month="Oct", day="18", volume="9", number="10", pages="e32303", keywords="COVID-19", keywords="computerized provider order entry", keywords="electronic health record", keywords="resource utilization", keywords="personal protective equipment", keywords="SARS-CoV-2 testing", keywords="clinical decision support", abstract="Background: The COVID-19 pandemic has resulted in shortages of diagnostic tests, personal protective equipment, hospital beds, and other critical resources. Objective: We sought to improve the management of scarce resources by leveraging electronic health record (EHR) functionality, computerized provider order entry, clinical decision support (CDS), and data analytics. Methods: Due to the complex eligibility criteria for COVID-19 tests and the EHR implementation--related challenges of ordering these tests, care providers have faced obstacles in selecting the appropriate test modality. As test choice is dependent upon specific patient criteria, we built a decision tree within the EHR to automate the test selection process by using a branching series of questions that linked clinical criteria to the appropriate SARS-CoV-2 test and triggered an EHR flag for patients who met our institutional persons under investigation criteria. Results: The percentage of tests that had to be canceled and reordered due to errors in selecting the correct testing modality was 3.8\% (23/608) before CDS implementation and 1\% (262/26,643) after CDS implementation (P<.001). Patients for whom multiple tests were ordered during a 24-hour period accounted for 0.8\% (5/608) and 0.3\% (76/26,643) of pre- and post-CDS implementation orders, respectively (P=.03). Nasopharyngeal molecular assay results were positive in 3.4\% (826/24,170) of patients who were classified as asymptomatic and 10.9\% (1421/13,074) of symptomatic patients (P<.001). Positive tests were more frequent among asymptomatic patients with a history of exposure to COVID-19 (36/283, 12.7\%) than among asymptomatic patients without such a history (790/23,887, 3.3\%; P<.001). Conclusions: The leveraging of EHRs and our CDS algorithm resulted in a decreased incidence of order entry errors and the appropriate flagging of persons under investigation. These interventions optimized reagent and personal protective equipment usage. Data regarding symptoms and COVID-19 exposure status that were collected by using the decision tree correlated with the likelihood of positive test results, suggesting that clinicians appropriately used the questions in the decision tree algorithm. ", doi="10.2196/32303", url="https://medinform.jmir.org/2021/10/e32303", url="http://www.ncbi.nlm.nih.gov/pubmed/34546942" } @Article{info:doi/10.2196/17472, author="Follmann, Andreas and Ruhl, Alexander and G{\"o}sch, Michael and Felzen, Marc and Rossaint, Rolf and Czaplik, Michael", title="Augmented Reality for Guideline Presentation in Medicine: Randomized Crossover Simulation Trial for Technically Assisted Decision-making", journal="JMIR Mhealth Uhealth", year="2021", month="Oct", day="18", volume="9", number="10", pages="e17472", keywords="augmented reality", keywords="smart glasses", keywords="wearables", keywords="guideline presentation", keywords="decision support", keywords="triage", abstract="Background: Guidelines provide instructions for diagnostics and therapy in modern medicine. Various mobile devices are used to represent the potential complex decision trees. An example of time-critical decisions is triage in case of a mass casualty incident. Objective: In this randomized controlled crossover study, the potential of augmented reality for guideline presentation was evaluated and compared with the guideline presentation provided in a tablet PC as a conventional device. Methods: A specific Android app was designed for use with smart glasses and a tablet PC for the presentation of a triage algorithm as an example for a complex guideline. Forty volunteers simulated a triage based on 30 fictional patient descriptions, each with technical support from smart glasses and a tablet PC in a crossover trial design. The time to come to a decision and the accuracy were recorded and compared between both devices. Results: A total of 2400 assessments were performed by the 40 volunteers. A significantly faster time to triage was achieved in total with the tablet PC (median 12.8 seconds, IQR 9.4-17.7; 95\% CI 14.1-14.9) compared to that to triage with smart glasses (median 17.5 seconds, IQR 13.2-22.8, 95\% CI 18.4-19.2; P=.001). Considering the difference in the triage time between both devices, the additional time needed with the smart glasses could be reduced significantly in the course of assessments (21.5 seconds, IQR 16.5-27.3, 95\% CI 21.6-23.2) in the first run, 17.4 seconds (IQR 13-22.4, 95\% CI 17.6-18.9) in the second run, and 14.9 seconds (IQR 11.7-18.6, 95\% CI 15.2-16.3) in the third run (P=.001). With regard to the accuracy of the guideline decisions, there was no significant difference between both the devices. Conclusions: The presentation of a guideline on a tablet PC as well as through augmented reality achieved good results. The implementation with smart glasses took more time owing to their more complex operating concept but could be accelerated in the course of the study after adaptation. Especially in a non--time-critical working area where hands-free interfaces are useful, a guideline presentation with augmented reality can be of great use during clinical management. ", doi="10.2196/17472", url="https://mhealth.jmir.org/2021/10/e17472", url="http://www.ncbi.nlm.nih.gov/pubmed/34661548" } @Article{info:doi/10.2196/29392, author="Doyle, Riccardo", title="Machine Learning--Based Prediction of COVID-19 Mortality With Limited Attributes to Expedite Patient Prognosis and Triage: Retrospective Observational Study", journal="JMIRx Med", year="2021", month="Oct", day="15", volume="2", number="4", pages="e29392", keywords="COVID-19", keywords="coronavirus", keywords="medical informatics", keywords="machine learning", keywords="artificial intelligence", keywords="dimensionality reduction", keywords="automation", keywords="model development", keywords="prediction", keywords="hospital", keywords="resource management", keywords="mortality", keywords="prognosis", keywords="triage", keywords="comorbidities", keywords="public data", keywords="epidemiology", keywords="pre-existing conditions", abstract="Background: The onset and development of the COVID-19 pandemic have placed pressure on hospital resources and staff worldwide. The integration of more streamlined predictive modeling in prognosis and triage--related decision-making can partly ease this pressure. Objective: The objective of this study is to assess the performance impact of dimensionality reduction on COVID-19 mortality prediction models, demonstrating the high impact of a limited number of features to limit the need for complex variable gathering before reaching meaningful risk labelling in clinical settings. Methods: Standard machine learning classifiers were employed to predict an outcome of either death or recovery using 25 patient-level variables, spanning symptoms, comorbidities, and demographic information, from a geographically diverse sample representing 17 countries. The effects of feature reduction on the data were tested by running classifiers on a high-quality data set of 212 patients with populated entries for all 25 available features. The full data set was compared to two reduced variations with 7 features and 1 feature, respectively, extracted using univariate mutual information and chi-square testing. Classifier performance on each data set was then assessed on the basis of accuracy, sensitivity, specificity, and received operating characteristic--derived area under the curve metrics to quantify benefit or loss from reduction. Results: The performance of the classifiers on the 212-patient sample resulted in strong mortality detection, with the highest performing model achieving specificity of 90.7\% (95\% CI 89.1\%-92.3\%) and sensitivity of 92.0\% (95\% CI 91.0\%-92.9\%). Dimensionality reduction provided strong benefits for performance. The baseline accuracy of a random forest classifier increased from 89.2\% (95\% CI 88.0\%-90.4\%) to 92.5\% (95\% CI 91.9\%-93.0\%) when training on 7 chi-square--extracted features and to 90.8\% (95\% CI 89.8\%-91.7\%) when training on 7 mutual information--extracted features. Reduction impact on a separate logistic classifier was mixed; however, when present, losses were marginal compared to the extent of feature reduction, altogether showing that reduction either improves performance or can reduce the variable-sourcing burden at hospital admission with little performance loss. Extreme feature reduction to a single most salient feature, often age, demonstrated large standalone explanatory power, with the best-performing model achieving an accuracy of 81.6\% (95\% CI 81.1\%-82.1\%); this demonstrates the relatively marginal improvement that additional variables bring to the tested models. Conclusions: Predictive statistical models have promising performance in early prediction of death among patients with COVID-19. Strong dimensionality reduction was shown to further improve baseline performance on selected classifiers and only marginally reduce it in others, highlighting the importance of feature reduction in future model construction and the feasibility of deprioritizing large, hard-to-source, and nonessential feature sets in real world settings. ", doi="10.2196/29392", url="https://med.jmirx.org/2021/4/e29392", url="http://www.ncbi.nlm.nih.gov/pubmed/34843609" } @Article{info:doi/10.2196/29301, author="Pumplun, Luisa and Fecho, Mariska and Wahl, Nihal and Peters, Felix and Buxmann, Peter", title="Adoption of Machine Learning Systems for Medical Diagnostics in Clinics: Qualitative Interview Study", journal="J Med Internet Res", year="2021", month="Oct", day="15", volume="23", number="10", pages="e29301", keywords="machine learning", keywords="clinics", keywords="diagnostics", keywords="adoption", keywords="maturity model", abstract="Background: Recently, machine learning (ML) has been transforming our daily lives by enabling intelligent voice assistants, personalized support for purchase decisions, and efficient credit card fraud detection. In addition to its everyday applications, ML holds the potential to improve medicine as well, especially with regard to diagnostics in clinics. In a world characterized by population growth, demographic change, and the global COVID-19 pandemic, ML systems offer the opportunity to make diagnostics more effective and efficient, leading to a high interest of clinics in such systems. However, despite the high potential of ML, only a few ML systems have been deployed in clinics yet, as their adoption process differs significantly from the integration of prior health information technologies given the specific characteristics of ML. Objective: This study aims to explore the factors that influence the adoption process of ML systems for medical diagnostics in clinics to foster the adoption of these systems in clinics. Furthermore, this study provides insight into how these factors can be used to determine the ML maturity score of clinics, which can be applied by practitioners to measure the clinic status quo in the adoption process of ML systems. Methods: To gain more insight into the adoption process of ML systems for medical diagnostics in clinics, we conducted a qualitative study by interviewing 22 selected medical experts from clinics and their suppliers with profound knowledge in the field of ML. We used a semistructured interview guideline, asked open-ended questions, and transcribed the interviews verbatim. To analyze the transcripts, we first used a content analysis approach based on the health care--specific framework of nonadoption, abandonment, scale-up, spread, and sustainability. Then, we drew on the results of the content analysis to create a maturity model for ML adoption in clinics according to an established development process. Results: With the help of the interviews, we were able to identify 13 ML-specific factors that influence the adoption process of ML systems in clinics. We categorized these factors according to 7 domains that form a holistic ML adoption framework for clinics. In addition, we created an applicable maturity model that could help practitioners assess their current state in the ML adoption process. Conclusions: Many clinics still face major problems in adopting ML systems for medical diagnostics; thus, they do not benefit from the potential of these systems. Therefore, both the ML adoption framework and the maturity model for ML systems in clinics can not only guide future research that seeks to explore the promises and challenges associated with ML systems in a medical setting but also be a practical reference point for clinicians. ", doi="10.2196/29301", url="https://www.jmir.org/2021/10/e29301", url="http://www.ncbi.nlm.nih.gov/pubmed/34652275" } @Article{info:doi/10.2196/32771, author="Shin, Jeong Seo and Park, Jungchan and Lee, Seung-Hwa and Yang, Kwangmo and Park, Woong Rae", title="Predictability of Mortality in Patients With Myocardial Injury After Noncardiac Surgery Based on Perioperative Factors via Machine Learning: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Oct", day="14", volume="9", number="10", pages="e32771", keywords="myocardial injury after noncardiac surgery", keywords="high-sensitivity cardiac troponin", keywords="machine learning", keywords="extreme gradient boosting", abstract="Background: Myocardial injury after noncardiac surgery (MINS) is associated with increased postoperative mortality, but the relevant perioperative factors that contribute to the mortality of patients with MINS have not been fully evaluated. Objective: To establish a comprehensive body of knowledge relating to patients with MINS, we researched the best performing predictive model based on machine learning algorithms. Methods: Using clinical data from 7629 patients with MINS from the clinical data warehouse, we evaluated 8 machine learning algorithms for accuracy, precision, recall, F1 score, area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve to investigate the best model for predicting mortality. Feature importance and Shapley Additive Explanations values were analyzed to explain the role of each clinical factor in patients with MINS. Results: Extreme gradient boosting outperformed the other models. The model showed an AUROC of 0.923 (95\% CI 0.916-0.930). The AUROC of the model did not decrease in the test data set (0.894, 95\% CI 0.86-0.922; P=.06). Antiplatelet drugs prescription, elevated C-reactive protein level, and beta blocker prescription were associated with reduced 30-day mortality. Conclusions: Predicting the mortality of patients with MINS was shown to be feasible using machine learning. By analyzing the impact of predictors, markers that should be cautiously monitored by clinicians may be identified. ", doi="10.2196/32771", url="https://medinform.jmir.org/2021/10/e32771", url="http://www.ncbi.nlm.nih.gov/pubmed/34647900" } @Article{info:doi/10.2196/30824, author="Gwon, Hansle and Ahn, Imjin and Kim, Yunha and Kang, Jun Hee and Seo, Hyeram and Cho, Na Ha and Choi, Heejung and Jun, Joon Tae and Kim, Young-Hak", title="Self--Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study", journal="JMIR Public Health Surveill", year="2021", month="Oct", day="13", volume="7", number="10", pages="e30824", keywords="self-training", keywords="artificial intelligence", keywords="electronic medical records", keywords="imputation", abstract="Background: When using machine learning in the real world, the missing value problem is the first problem encountered. Methods to impute this missing value include statistical methods such as mean, expectation-maximization, and multiple imputations by chained equations (MICE) as well as machine learning methods such as multilayer perceptron, k-nearest neighbor, and decision tree. Objective: The objective of this study was to impute numeric medical data such as physical data and laboratory data. We aimed to effectively impute data using a progressive method called self-training in the medical field where training data are scarce. Methods: In this paper, we propose a self-training method that gradually increases the available data. Models trained with complete data predict the missing values in incomplete data. Among the incomplete data, the data in which the missing value is validly predicted are incorporated into the complete data. Using the predicted value as the actual value is called pseudolabeling. This process is repeated until the condition is satisfied. The most important part of this process is how to evaluate the accuracy of pseudolabels. They can be evaluated by observing the effect of the pseudolabeled data on the performance of the model. Results: In self-training using random forest (RF), mean squared error was up to 12\% lower than pure RF, and the Pearson correlation coefficient was 0.1\% higher. This difference was confirmed statistically. In the Friedman test performed on MICE and RF, self-training showed a P value between .003 and .02. A Wilcoxon signed-rank test performed on the mean imputation showed the lowest possible P value, 3.05e-5, in all situations. Conclusions: Self-training showed significant results in comparing the predicted values and actual values, but it needs to be verified in an actual machine learning system. And self-training has the potential to improve performance according to the pseudolabel evaluation method, which will be the main subject of our future research. ", doi="10.2196/30824", url="https://publichealth.jmir.org/2021/10/e30824", url="http://www.ncbi.nlm.nih.gov/pubmed/34643539" } @Article{info:doi/10.2196/29174, author="Gaudet-Blavignac, Christophe and Rudaz, Andrea and Lovis, Christian", title="Building a Shared, Scalable, and Sustainable Source for the Problem-Oriented Medical Record: Developmental Study", journal="JMIR Med Inform", year="2021", month="Oct", day="13", volume="9", number="10", pages="e29174", keywords="medical records", keywords="problem-oriented", keywords="electronic health records", keywords="semantics", abstract="Background: Since the creation of the problem-oriented medical record, the building of problem lists has been the focus of many studies. To date, this issue is not well resolved, and building an appropriate contextualized problem list is still a challenge. Objective: This paper aims to present the process of building a shared multipurpose common problem list at the Geneva University Hospitals. This list aims to bridge the gap between clinicians' language expressed in free text and secondary uses requiring structured information. Methods: We focused on the needs of clinicians by building a list of uniquely identified expressions to support their daily activities. In the second stage, these expressions were connected to additional information to build a complex graph of information. A list of 45,946 expressions manually extracted from clinical documents was manually curated and encoded in multiple semantic dimensions, such as International Classification of Diseases, 10th revision; International Classification of Primary Care 2nd edition; Systematized Nomenclature of Medicine Clinical Terms; or dimensions dictated by specific usages, such as identifying expressions specific to a domain, a gender, or an intervention. The list was progressively deployed for clinicians with an iterative process of quality control, maintenance, and improvements, including the addition of new expressions or dimensions for specific needs. The problem management of the electronic health record allowed the measurement and correction of encoding based on real-world use. Results: The list was deployed in production in January 2017 and was regularly updated and deployed in new divisions of the hospital. Over 4 years, 684,102 problems were created using the list. The proportion of free-text entries decreased progressively from 37.47\% (8321/22,206) in December 2017 to 18.38\% (4547/24,738) in December 2020. In the last version of the list, over 14 dimensions were mapped to expressions, among which 5 were international classifications and 8 were other classifications for specific uses. The list became a central axis in the electronic health record, being used for many different purposes linked to care, such as surgical planning or emergency wards, or in research, for various predictions using machine learning techniques. Conclusions: This study breaks with common approaches primarily by focusing on real clinicians' language when expressing patients' problems and secondarily by mapping whatever is required, including controlled vocabularies to answer specific needs. This approach improves the quality of the expression of patients' problems while allowing the building of as many structured dimensions as needed to convey semantics according to specific contexts. The method is shown to be scalable, sustainable, and efficient at hiding the complexity of semantics or the burden of constraint-structured problem list entry for clinicians. Ongoing work is analyzing the impact of this approach on how clinicians express patients' problems. ", doi="10.2196/29174", url="https://medinform.jmir.org/2021/10/e29174", url="http://www.ncbi.nlm.nih.gov/pubmed/34643542" } @Article{info:doi/10.2196/29017, author="Meng, Weilin and Mosesso, M. Kelly and Lane, A. Kathleen and Roberts, R. Anna and Griffith, Ashley and Ou, Wanmei and Dexter, R. Paul", title="An Automated Line-of-Therapy Algorithm for Adults With Metastatic Non--Small Cell Lung Cancer: Validation Study Using Blinded Manual Chart Review", journal="JMIR Med Inform", year="2021", month="Oct", day="12", volume="9", number="10", pages="e29017", keywords="automated algorithm", keywords="line of therapy", keywords="longitudinal changes", keywords="manual chart review", keywords="non--small cell lung cancer", keywords="systemic anticancer therapy", abstract="Background: Extraction of line-of-therapy (LOT) information from electronic health record and claims data is essential for determining longitudinal changes in systemic anticancer therapy in real-world clinical settings. Objective: The aim of this retrospective cohort analysis is to validate and refine our previously described open-source LOT algorithm by comparing the output of the algorithm with results obtained through blinded manual chart review. Methods: We used structured electronic health record data and clinical documents to identify 500 adult patients treated for metastatic non--small cell lung cancer with systemic anticancer therapy from 2011 to mid-2018; we assigned patients to training (n=350) and test (n=150) cohorts, randomly divided proportional to the overall ratio of simple:complex cases (n=254:246). Simple cases were patients who received one LOT and no maintenance therapy; complex cases were patients who received more than one LOT and/or maintenance therapy. Algorithmic changes were performed using the training cohort data, after which the refined algorithm was evaluated against the test cohort. Results: For simple cases, 16 instances of discordance between the LOT algorithm and chart review prerefinement were reduced to 8 instances postrefinement; in the test cohort, there was no discordance between algorithm and chart review. For complex cases, algorithm refinement reduced the discordance from 68 to 62 instances, with 37 instances in the test cohort. The percentage agreement between LOT algorithm output and chart review for patients who received one LOT was 89\% prerefinement, 93\% postrefinement, and 93\% for the test cohort, whereas the likelihood of precise matching between algorithm output and chart review decreased with an increasing number of unique regimens. Several areas of discordance that arose from differing definitions of LOTs and maintenance therapy could not be objectively resolved because of a lack of precise definitions in the medical literature. Conclusions: Our findings identify common sources of discordance between the LOT algorithm and clinician documentation, providing the possibility of targeted algorithm refinement. ", doi="10.2196/29017", url="https://medinform.jmir.org/2021/10/e29017", url="http://www.ncbi.nlm.nih.gov/pubmed/34636730" } @Article{info:doi/10.2196/27396, author="Berensp{\"o}hler, Sarah and Minnerup, Jens and Dugas, Martin and Varghese, Julian", title="Common Data Elements for Meaningful Stroke Documentation in Routine Care and Clinical Research: Retrospective Data Analysis", journal="JMIR Med Inform", year="2021", month="Oct", day="12", volume="9", number="10", pages="e27396", keywords="common data elements", keywords="stroke", keywords="documentation", abstract="Background: Medical information management for stroke patients is currently a very time-consuming endeavor. There are clear guidelines and procedures to treat patients having acute stroke, but it is not known how well these established practices are reflected in patient documentation. Objective: This study compares a variety of documentation processes regarding stroke. The main objective of this work is to provide an overview of the most commonly occurring medical concepts in stroke documentation and identify overlaps between different documentation contexts to allow for the definition of a core data set that could be used in potential data interfaces. Methods: Medical source documentation forms from different documentation contexts, including hospitals, clinical trials, registries, and international standards, regarding stroke treatment followed by rehabilitation were digitized in the operational data model. Each source data element was semantically annotated using the Unified Medical Language System. The concept codes were analyzed for semantic overlaps. A concept was considered common if it appeared in at least two documentation contexts. The resulting common concepts were extended with implementation details, including data types and permissible values based on frequent patterns of source data elements, using an established expert-based and semiautomatic approach. Results: In total, 3287 data elements were identified, and 1051 of these emerged as unique medical concepts. The 100 most frequent medical concepts cover 9.51\% (100/1051) of all concept occurrences in stroke documentation, and the 50 most frequent concepts cover 4.75\% (50/1051). A list of common data elements was implemented in different standardized machine-readable formats on a public metadata repository for interoperable reuse. Conclusions: Standardization of medical documentation is a prerequisite for data exchange as well as the transferability and reuse of data. In the long run, standardization would save time and money and extend the capabilities for which such data could be used. In the context of this work, a lack of standardization was observed regarding current information management. Free-form text fields and intricate questions complicate automated data access and transfer between institutions. This work also revealed the potential of a unified documentation process as a core data set of the 50 most frequent common data elements, accounting for 34\% of the documentation in medical information management. Such a data set offers a starting point for standardized and interoperable data collection in routine care, quality management, and clinical research. ", doi="10.2196/27396", url="https://medinform.jmir.org/2021/10/e27396", url="http://www.ncbi.nlm.nih.gov/pubmed/34636733" } @Article{info:doi/10.2196/26732, author="Kovoor, G. Joshua and McIntyre, Daniel and Chik, B. William W. and Chow, K. Clara and Thiagalingam, Aravinda", title="Clinician-Created Educational Video Resources for Shared Decision-making in the Outpatient Management of Chronic Disease: Development and Evaluation Study", journal="J Med Internet Res", year="2021", month="Oct", day="11", volume="23", number="10", pages="e26732", keywords="Shared decision-making", keywords="chronic disease", keywords="outpatients", keywords="audiovisual aids", keywords="atrial fibrillation", keywords="educational technology", keywords="teaching materials", keywords="referral and consultation", keywords="physician-patient relations", keywords="physicians", abstract="Background: The provision of reliable patient education is essential for shared decision-making. However, many clinicians are reluctant to use commonly available resources, as they are generic and may contain information of insufficient quality. Clinician-created educational materials, accessed during the waiting time prior to consultation, can potentially benefit clinical practice if developed in a time- and resource-efficient manner. Objective: The aim of this study is to evaluate the utility of educational videos in improving patient decision-making, as well as consultation satisfaction and anxiety, within the outpatient management of chronic disease (represented by atrial fibrillation). The approach involves clinicians creating audiovisual patient education in a time- and resource-efficient manner for opportunistic delivery, using mobile smart devices with internet access, during waiting time before consultation. Methods: We implemented this educational approach in outpatient clinics and collected patient responses through an electronic survey. The educational module was a web-based combination of 4 short videos viewed sequentially, followed by a patient experience survey using 5-point Likert scales and 0-100 visual analogue scales. The clinician developed the audiovisual module over a 2-day span while performing usual clinical tasks, using existing hardware and software resources (laptop and tablet). Patients presenting for the outpatient management of atrial fibrillation accessed the module during waiting time before their consultation using either a URL or Quick Response (QR) code on a provided tablet or their own mobile smart devices. The primary outcome of the study was the module's utility in improving patient decision-making ability, as measured on a 0-100 visual analogue scale. Secondary outcomes were the level of patient satisfaction with the videos, measured with 5-point Likert scales, in addition to the patient's value for clinician narration and the module's utility in improving anxiety and long-term treatment adherence, as represented on 0-100 visual analogue scales. Results: This study enrolled 116 patients presenting for the outpatient management of atrial fibrillation. The proportion of responses that were ``very satisfied'' with the educational video content across the 4 videos ranged from 93\% (86/92) to 96.3\% (104/108) and this was between 98\% (90/92) and 99.1\% (107/108) for ``satisfied'' or ``very satisfied.'' There were no reports of dissatisfaction for the first 3 videos, and only 1\% (1/92) of responders reported dissatisfaction for the fourth video. The median reported scores (on 0-100 visual analogue scales) were 90 (IQR 82.5-97) for improving patient decision-making, 89 (IQR 81-95) for reducing consultation anxiety, 90 (IQR 81-97) for improving treatment adherence, and 82 (IQR 70-90) for the clinician's narration adding benefit to the patient experience. Conclusions: Clinician-created educational videos for chronic disease management resulted in improvements in patient-reported informed decision-making ability and expected long-term treatment adherence, as well as anxiety reduction. This form of patient education was also time efficient as it used the sunk time cost of waiting time to provide education without requiring additional clinician input. ", doi="10.2196/26732", url="https://www.jmir.org/2021/10/e26732", url="http://www.ncbi.nlm.nih.gov/pubmed/34633292" } @Article{info:doi/10.2196/31733, author="Gold, Rachel and Sheppler, Christina and Hessler, Danielle and Bunce, Arwen and Cottrell, Erika and Yosuf, Nadia and Pisciotta, Maura and Gunn, Rose and Leo, Michael and Gottlieb, Laura", title="Using Electronic Health Record--Based Clinical Decision Support to Provide Social Risk--Informed Care in Community Health Centers: Protocol for the Design and Assessment of a Clinical Decision Support Tool", journal="JMIR Res Protoc", year="2021", month="Oct", day="8", volume="10", number="10", pages="e31733", keywords="social determinants of health", keywords="decision support systems, clinical", keywords="electronic health records", keywords="community health centers", keywords="health status disparities", abstract="Background: Consistent and compelling evidence demonstrates that social and economic adversity has an impact on health outcomes. In response, many health care professional organizations recommend screening patients for experiences of social and economic adversity or social risks---for example, food, housing, and transportation insecurity---in the context of care. Guidance on how health care providers can act on documented social risk data to improve health outcomes is nascent. A strategy recommended by the National Academy of Medicine involves using social risk data to adapt care plans in ways that accommodate patients' social risks. Objective: This study's aims are to develop electronic health record (EHR)--based clinical decision support (CDS) tools that suggest social risk--informed care plan adaptations for patients with diabetes or hypertension, assess tool adoption and its impact on selected clinical quality measures in community health centers, and examine perceptions of tool usability and impact on care quality. Methods: A systematic scoping review and several stakeholder activities will be conducted to inform development of the CDS tools. The tools will be pilot-tested to obtain user input, and their content and form will be revised based on this input. A randomized quasi-experimental design will then be used to assess the impact of the revised tools. Eligible clinics will be randomized to a control group or potential intervention group; clinics will be recruited from the potential intervention group in random order until 6 are enrolled in the study. Intervention clinics will have access to the CDS tools in their EHR, will receive minimal implementation support, and will be followed for 18 months to evaluate tool adoption and the impact of tool use on patient blood pressure and glucose control. Results: This study was funded in January 2020 by the National Institute on Minority Health and Health Disparities of the National Institutes of Health. Formative activities will take place from April 2020 to July 2021, the CDS tools will be developed between May 2021 and November 2022, the pilot study will be conducted from August 2021 to July 2022, and the main trial will occur from December 2022 to May 2024. Study data will be analyzed, and the results will be disseminated in 2024. Conclusions: Patients' social risk information must be presented to care teams in a way that facilitates social risk--informed care. To our knowledge, this study is the first to develop and test EHR-embedded CDS tools designed to support the provision of social risk--informed care. The study results will add a needed understanding of how to use social risk data to improve health outcomes and reduce disparities. International Registered Report Identifier (IRRID): PRR1-10.2196/31733 ", doi="10.2196/31733", url="https://www.researchprotocols.org/2021/10/e31733", url="http://www.ncbi.nlm.nih.gov/pubmed/34623308" } @Article{info:doi/10.2196/26314, author="Tong, Yao and Liao, C. Zachary and Tarczy-Hornoch, Peter and Luo, Gang", title="Using a Constraint-Based Method to Identify Chronic Disease Patients Who Are Apt to Obtain Care Mostly Within a Given Health Care System: Retrospective Cohort Study", journal="JMIR Form Res", year="2021", month="Oct", day="7", volume="5", number="10", pages="e26314", keywords="asthma", keywords="chronic kidney disease", keywords="chronic obstructive pulmonary disease", keywords="data analysis", keywords="diabetes mellitus", keywords="emergency department", keywords="health care system", keywords="inpatients", keywords="patient care management", abstract="Background: For several major chronic diseases including asthma, chronic obstructive pulmonary disease, chronic kidney disease, and diabetes, a state-of-the-art method to avert poor outcomes is to use predictive models to identify future high-cost patients for preemptive care management interventions. Frequently, an American patient obtains care from multiple health care systems, each managed by a distinct institution. As the patient's medical data are spread across these health care systems, none has complete medical data for the patient. The task of building models to predict an individual patient's cost is currently thought to be impractical with incomplete data, which limits the use of care management to improve outcomes. Recently, we developed a constraint-based method to identify patients who are apt to obtain care mostly within a given health care system. Our method was shown to work well for the cohort of all adult patients at the University of Washington Medicine for a 6-month follow-up period. It is unknown how well our method works for patients with various chronic diseases and over follow-up periods of different lengths, and subsequently, whether it is reasonable to perform this predictive modeling task on the subset of patients pinpointed by our method. Objective: To understand our method's potential to enable this predictive modeling task on incomplete medical data, this study assesses our method's performance at the University of Washington Medicine on 5 subgroups of adult patients with major chronic diseases and over follow-up periods of 2 different lengths. Methods: We used University of Washington Medicine data for all adult patients who obtained care at the University of Washington Medicine in 2018 and PreManage data containing usage information from all hospitals in Washington state in 2019. We evaluated our method's performance over the follow-up periods of 6 months and 12 months on 5 patient subgroups separately---asthma, chronic kidney disease, type 1 diabetes, type 2 diabetes, and chronic obstructive pulmonary disease. Results: Our method identified 21.81\% (3194/14,644) of University of Washington Medicine adult patients with asthma. Around 66.75\% (797/1194) and 67.13\% (1997/2975) of their emergency department visits and inpatient stays took place within the University of Washington Medicine system in the subsequent 6 months and in the subsequent 12 months, respectively, approximately double the corresponding percentage for all University of Washington Medicine adult patients with asthma. The performance for adult patients with chronic kidney disease, adult patients with chronic obstructive pulmonary disease, adult patients with type 1 diabetes, and adult patients with type 2 diabetes was reasonably similar to that for adult patients with asthma. Conclusions: For each of the 5 chronic diseases most relevant to care management, our method can pinpoint a reasonably large subset of patients who are apt to obtain care mostly within the University of Washington Medicine system. This opens the door to building models to predict an individual patient's cost on incomplete data, which was formerly deemed impractical. International Registered Report Identifier (IRRID): RR2-10.2196/13783 ", doi="10.2196/26314", url="https://formative.jmir.org/2021/10/e26314", url="http://www.ncbi.nlm.nih.gov/pubmed/34617906" } @Article{info:doi/10.2196/29558, author="Ziemssen, Tjalf and Giovannoni, Gavin and Alvarez, Enrique and Bhan, Virender and Hersh, Carrie and Hoffmann, Olaf and Oreja-Guevara, Celia and Robles-Cede{\~n}o, R. Rene and Trojano, Maria and Vermersch, Patrick and Dobay, Pamela and Khwaja, Mudeer and Stadler, Bianca and Rauser, Benedict and Hach, Thomas and Piani-Meier, Daniela and Burton, Jason", title="Multiple Sclerosis Progression Discussion Tool Usability and Usefulness in Clinical Practice: Cross-sectional, Web-Based Survey", journal="J Med Internet Res", year="2021", month="Oct", day="6", volume="23", number="10", pages="e29558", keywords="multiple sclerosis", keywords="relapsing remitting multiple sclerosis", keywords="secondary progressive multiple sclerosis", keywords="transition", keywords="progression", keywords="digital", keywords="usability", abstract="Background: A digital tool, Multiple Sclerosis Progression Discussion Tool (MSProDiscuss), was developed to facilitate discussions between health care professionals (HCPs) and patients in evaluating early, subtle signs of multiple sclerosis (MS) disease progression. Objective: The aim of this study is to report the findings on the usability and usefulness of MSProDiscuss in a real-world clinical setting. Methods: In this cross-sectional, web-based survey, HCPs across 34 countries completed an initial individual questionnaire (comprising 7 questions on comprehensibility, usability, and usefulness after using MSProDiscuss during each patient consultation) and a final questionnaire (comprising 13 questions on comprehensibility, usability, usefulness, and integration and adoption into clinical practice to capture the HCPs' overall experience of using the tool). The responses were provided on a 5-point Likert scale. All analyses were descriptive, and no statistical comparisons were made. Results: In total, 301 HCPs tested the tool in 6974 people with MS, of whom 77\% (5370/6974) had relapsing-remitting MS, including those suspected to be transitioning to secondary progressive MS. The time taken to complete MSProDiscuss was reported to be in the range of 1 to 4 minutes in 97.3\% (6786/6974; initial) to 98.2\% (269/274; final) of the cases. In 93.54\% (6524/6974; initial) to 97.1\% (266/274; final) of the cases, the HCPs agreed (4 or 5 on the Likert scale) that patients were able to comprehend the questions in the tool. The HCPs were willing to use the tool again in the same patient, 90.47\% (6310/6974; initial) of the cases. The HCPs reported MSProDiscuss to be useful in discussing MS symptoms and their impact on daily activities (6121/6974, 87.76\% initial and 252/274, 92\% final) and cognitive function (5482/6974, 78.61\% initial and 271/274, 79.2\% final), as well as in discussing progression in general (6102/6974, 87.49\% initial and 246/274, 89.8\% final). While completing the final questionnaire, 94.9\% (260/274) of the HCPs agreed that the questions were similar to those asked in regular consultation, and the tool helped to better understand the impact of MS symptoms on daily activities (249/274, 90.9\%) and cognitive function (220/274, 80.3\%). Overall, 92\% (252/274) of the HCPs reported that they would recommend MSProDiscuss to a colleague, and 85.8\% (235/274) were willing to integrate it into their clinical practice. Conclusions: MSProDiscuss is a usable and useful tool to facilitate a physician-patient discussion on MS disease progression in daily clinical practice. Most of the HCPs agreed that the tool is easy to use and were willing to integrate MSProDiscuss into their daily clinical practice. ", doi="10.2196/29558", url="https://www.jmir.org/2021/10/e29558", url="http://www.ncbi.nlm.nih.gov/pubmed/34612826" } @Article{info:doi/10.2196/27499, author="Maruster, Laura and van der Zee, Durk-Jouke and Buskens, Erik", title="Identifying Frequent Health Care Users and Care Consumption Patterns: Process Mining of Emergency Medical Services Data", journal="J Med Internet Res", year="2021", month="Oct", day="6", volume="23", number="10", pages="e27499", keywords="process mining", keywords="frequent users", keywords="hospital care", keywords="emergency medical services", keywords="regional care networks", keywords="elderly", keywords="Netherlands", abstract="Background: Tracing frequent users of health care services is highly relevant to policymakers and clinicians, enabling them to avoid wasting scarce resources. Data collection on frequent users from all possible health care providers may be cumbersome due to patient privacy, competition, incompatible information systems, and the efforts involved. Objective: This study explored the use of a single key source, emergency medical services (EMS) records, to trace and reveal frequent users' health care consumption patterns. Methods: A retrospective study was performed analyzing EMS calls from the province of Drenthe in the Netherlands between 2012 and 2017. Process mining was applied to identify the structure of patient routings (ie, their consecutive visits to hospitals, nursing homes, and EMS). Routings are used to identify and quantify frequent users, recognizing frail elderly users as a focal group. The structure of these routes was analyzed at the patient and group levels, aiming to gain insight into regional coordination issues and workload distributions among health care providers. Results: Frail elderly users aged 70 years or more represented over 50\% of frequent users, making 4 or more calls per year. Over the period of observation, their annual number and the number of calls increased from 395 to 628 and 2607 to 3615, respectively. Structural analysis based on process mining revealed two categories of frail elderly users: low-complexity patients who need dialysis, radiation therapy, or hyperbaric medicine, involving a few health care providers, and high-complexity patients for whom routings appear chaotic. Conclusions: This efficient approach exploits the role of EMS as the unique regional ``ferryman,'' while the combined use of EMS data and process mining allows for the effective and efficient tracing of frequent users' utilization of health care services. The approach informs regional policymakers and clinicians by quantifying and detailing frequent user consumption patterns to support subsequent policy adaptations. ", doi="10.2196/27499", url="https://www.jmir.org/2021/10/e27499", url="http://www.ncbi.nlm.nih.gov/pubmed/34612834" } @Article{info:doi/10.2196/27174, author="Lenaerts, Gerlinde and Bekkering, E. Geertruida and Goossens, Martine and De Coninck, Leen and Delvaux, Nicolas and Cordyn, Sam and Adriaenssens, Jef and Aertgeerts, Bert and Vankrunkelsven, Patrik", title="A Tool to Assess the Trustworthiness of Evidence-Based Point-of-Care Information for Health Care Professionals (CAPOCI): Design and Validation Study", journal="J Med Internet Res", year="2021", month="Oct", day="5", volume="23", number="10", pages="e27174", keywords="evidence-based medicine", keywords="evidence-based practice", keywords="point-of-care systems", keywords="health care quality", keywords="information science", keywords="practice guidelines as a topic", abstract="Background: User-friendly information at the point of care for health care professionals should be well structured, rapidly accessible, comprehensive, and trustworthy. The reliability of information and the associated methodological process must be clear. There is no standard tool to evaluate the trustworthiness of such point-of-care (POC) information. Objective: We aim to develop and validate a new tool for assessment of trustworthiness of evidence-based POC resources to enhance the quality of POC resources and facilitate evidence-based practice. Methods: We designed the Critical Appraisal of Point-of-Care Information (CAPOCI) tool based on the criteria important for assessment of trustworthiness of POC information, reported in a previously published review. A group of health care professionals and methodologists (the authors of this paper) defined criteria for the CAPOCI tool in an iterative process of discussion and pilot testing until consensus was reached. In the next step, all criteria were subject to content validation with a Delphi study. We invited an international panel of 10 experts to rate their agreement with the relevance and wording of the criteria and to give feedback. Consensus was reached when 70\% of the experts agreed. When no consensus was reached, we reformulated the criteria based on the experts' comments for a next round of the Delphi study. This process was repeated until consensus was reached for each criterion. In a last step, the interrater reliability of the CAPOCI tool was calculated with a 2-tailed Kendall tau correlation coefficient to quantify the agreement between 2 users who piloted the CAPOCI tool on 5 POC resources. Two scoring systems were tested: a 3-point ordinal scale and a 7-point Likert scale. Results: After validation, the CAPOCI tool was designed with 11 criteria that focused on methodological quality and author-related information. The criteria assess authorship, literature search, use of preappraised evidence, critical appraisal of evidence, expert opinions, peer review, timeliness and updating, conflict of interest, and commercial support. Interrater agreement showed substantial agreement between 2 users for scoring with the 3-point ordinal scale ($\tau$=.621, P<.01) and scoring with the 7-point Likert scale ($\tau$=.677, P<.01). Conclusions: The CAPOCI tool may support validation teams in the assessment of trustworthiness of POC resources. It may also provide guidance for producers of POC resources. ", doi="10.2196/27174", url="https://www.jmir.org/2021/10/e27174", url="http://www.ncbi.nlm.nih.gov/pubmed/34609314" } @Article{info:doi/10.2196/29200, author="Conway, Aaron and Jungquist, R. Carla and Chang, Kristina and Kamboj, Navpreet and Sutherland, Joanna and Mafeld, Sebastian and Parotto, Matteo", title="Predicting Prolonged Apnea During Nurse-Administered Procedural Sedation: Machine Learning Study", journal="JMIR Perioper Med", year="2021", month="Oct", day="5", volume="4", number="2", pages="e29200", keywords="procedural sedation and analgesia", keywords="conscious sedation", keywords="nursing", keywords="informatics", keywords="patient safety", keywords="machine learning", keywords="capnography", keywords="anesthesia", keywords="anaesthesia", keywords="medical informatics", keywords="sleep apnea", keywords="apnea", keywords="apnoea", keywords="sedation", abstract="Background: Capnography is commonly used for nurse-administered procedural sedation. Distinguishing between capnography waveform abnormalities that signal the need for clinical intervention for an event and those that do not indicate the need for intervention is essential for the successful implementation of this technology into practice. It is possible that capnography alarm management may be improved by using machine learning to create a ``smart alarm'' that can alert clinicians to apneic events that are predicted to be prolonged. Objective: To determine the accuracy of machine learning models for predicting at the 15-second time point if apnea will be prolonged (ie, apnea that persists for >30 seconds). Methods: A secondary analysis of an observational study was conducted. We selected several candidate models to evaluate, including a random forest model, generalized linear model (logistic regression), least absolute shrinkage and selection operator regression, ridge regression, and the XGBoost model. Out-of-sample accuracy of the models was calculated using 10-fold cross-validation. The net benefit decision analytic measure was used to assist with deciding whether using the models in practice would lead to better outcomes on average than using the current default capnography alarm management strategies. The default strategies are the aggressive approach, in which an alarm is triggered after brief periods of apnea (typically 15 seconds) and the conservative approach, in which an alarm is triggered for only prolonged periods of apnea (typically >30 seconds). Results: A total of 384 apneic events longer than 15 seconds were observed in 61 of the 102 patients (59.8\%) who participated in the observational study. Nearly half of the apneic events (180/384, 46.9\%) were prolonged. The random forest model performed the best in terms of discrimination (area under the receiver operating characteristic curve 0.66) and calibration. The net benefit associated with the random forest model exceeded that associated with the aggressive strategy but was lower than that associated with the conservative strategy. Conclusions: Decision curve analysis indicated that using a random forest model would lead to a better outcome for capnography alarm management than using an aggressive strategy in which alarms are triggered after 15 seconds of apnea. The model would not be superior to the conservative strategy in which alarms are only triggered after 30 seconds. ", doi="10.2196/29200", url="https://periop.jmir.org/2021/2/e29200", url="http://www.ncbi.nlm.nih.gov/pubmed/34609322" } @Article{info:doi/10.2196/30083, author="Fiorentino, Francesca and Prociuk, Denys and Espinosa Gonzalez, Belen Ana and Neves, Luisa Ana and Husain, Laiba and Ramtale, Christian Sonny and Mi, Emma and Mi, Ella and Macartney, Jack and Anand, N. Sneha and Sherlock, Julian and Saravanakumar, Kavitha and Mayer, Erik and de Lusignan, Simon and Greenhalgh, Trisha and Delaney, C. Brendan", title="An Early Warning Risk Prediction Tool (RECAP-V1) for Patients Diagnosed With COVID-19: Protocol for a Statistical Analysis Plan", journal="JMIR Res Protoc", year="2021", month="Oct", day="5", volume="10", number="10", pages="e30083", keywords="COVID-19", keywords="modeling", keywords="remote assessment", keywords="risk score", keywords="early warning", abstract="Background: Since the start of the COVID-19 pandemic, efforts have been made to develop early warning risk scores to help clinicians decide which patient is likely to deteriorate and require hospitalization. The RECAP (Remote COVID-19 Assessment in Primary Care) study investigates the predictive risk of hospitalization, deterioration, and death of patients with confirmed COVID-19, based on a set of parameters chosen through a Delphi process performed by clinicians. We aim to use rich data collected remotely through the use of electronic data templates integrated in the electronic health systems of several general practices across the United Kingdom to construct accurate predictive models. The models will be based on preexisting conditions and monitoring data of a patient's clinical parameters (eg, blood oxygen saturation) to make reliable predictions as to the patient's risk of hospital admission, deterioration, and death. Objective: This statistical analysis plan outlines the statistical methods to build the prediction model to be used in the prioritization of patients in the primary care setting. The statistical analysis plan for the RECAP study includes the development and validation of the RECAP-V1 prediction model as a primary outcome. This prediction model will be adapted as a three-category risk score split into red (high risk), amber (medium risk), and green (low risk) for any patient with suspected COVID-19. The model will predict the risk of deterioration and hospitalization. Methods: After the data have been collected, we will assess the degree of missingness and use a combination of traditional data imputation using multiple imputation by chained equations, as well as more novel machine-learning approaches to impute the missing data for the final analysis. For predictive model development, we will use multiple logistic regression analyses to construct the model. We aim to recruit a minimum of 1317 patients for model development and validation. We will then externally validate the model on an independent dataset of 1400 patients. The model will also be applied for multiple different datasets to assess both its performance in different patient groups and its applicability for different methods of data collection. Results: As of May 10, 2021, we have recruited 3732 patients. A further 2088 patients have been recruited through the National Health Service Clinical Assessment Service, and approximately 5000 patients have been recruited through the DoctalyHealth platform. Conclusions: The methodology for the development of the RECAP-V1 prediction model as well as the risk score will provide clinicians with a statistically robust tool to help prioritize COVID-19 patients. Trial Registration: ClinicalTrials.gov NCT04435041; https://clinicaltrials.gov/ct2/show/NCT04435041 International Registered Report Identifier (IRRID): DERR1-10.2196/30083 ", doi="10.2196/30083", url="https://www.researchprotocols.org/2021/10/e30083", url="http://www.ncbi.nlm.nih.gov/pubmed/34468322" } @Article{info:doi/10.2196/28000, author="Persson, Inger and {\"O}stling, Andreas and Arlbrandt, Martin and S{\"o}derberg, Joakim and Becedas, David", title="A Machine Learning Sepsis Prediction Algorithm for Intended Intensive Care Unit Use (NAVOY Sepsis): Proof-of-Concept Study", journal="JMIR Form Res", year="2021", month="Sep", day="30", volume="5", number="9", pages="e28000", keywords="sepsis", keywords="prediction", keywords="early detection", keywords="machine learning", keywords="electronic health record", keywords="EHR", keywords="software as a medical device", keywords="algorithm", keywords="detection", keywords="intensive care unit", keywords="ICU", keywords="proof of concept", abstract="Background: Despite decades of research, sepsis remains a leading cause of mortality and morbidity in intensive care units worldwide. The key to effective management and patient outcome is early detection, for which no prospectively validated machine learning prediction algorithm is currently available for clinical use in Europe. Objective: We aimed to develop a high-performance machine learning sepsis prediction algorithm based on routinely collected intensive care unit data, designed to be implemented in European intensive care units. Methods: The machine learning algorithm was developed using convolutional neural networks, based on Massachusetts Institute of Technology Lab for Computational Physiology MIMIC-III clinical data from intensive care unit patients aged 18 years or older. The model uses 20 variables to produce hourly predictions of onset of sepsis, defined by international Sepsis-3 criteria. Predictive performance was externally validated using hold-out test data. Results: The algorithm---NAVOY Sepsis---uses 4 hours of input and can identify patients with high risk of developing sepsis, with high performance (area under the receiver operating characteristics curve 0.90; area under the precision-recall curve 0.62) for predictions up to 3 hours before sepsis onset. Conclusions: The prediction performance of NAVOY Sepsis was superior to that of existing sepsis early warning scoring systems and comparable with those of other prediction algorithms designed to predict sepsis onset. The algorithm has excellent predictive properties and uses variables that are routinely collected in intensive care units. ", doi="10.2196/28000", url="https://formative.jmir.org/2021/9/e28000", url="http://www.ncbi.nlm.nih.gov/pubmed/34591016" } @Article{info:doi/10.2196/28209, author="Mann, D. Kay and Good, M. Norm and Fatehi, Farhad and Khanna, Sankalp and Campbell, Victoria and Conway, Roger and Sullivan, Clair and Staib, Andrew and Joyce, Christopher and Cook, David", title="Predicting Patient Deterioration: A Review of Tools in the Digital Hospital Setting", journal="J Med Internet Res", year="2021", month="Sep", day="30", volume="23", number="9", pages="e28209", keywords="patient deterioration", keywords="early warning scores", keywords="digital tools", keywords="vital signs", keywords="electronic medical record", abstract="Background: Early warning tools identify patients at risk of deterioration in hospitals. Electronic medical records in hospitals offer real-time data and the opportunity to automate early warning tools and provide real-time, dynamic risk estimates. Objective: This review describes published studies on the development, validation, and implementation of tools for predicting patient deterioration in general wards in hospitals. Methods: An electronic database search of peer reviewed journal papers from 2008-2020 identified studies reporting the use of tools and algorithms for predicting patient deterioration, defined by unplanned transfer to the intensive care unit, cardiac arrest, or death. Studies conducted solely in intensive care units, emergency departments, or single diagnosis patient groups were excluded. Results: A total of 46 publications were eligible for inclusion. These publications were heterogeneous in design, setting, and outcome measures. Most studies were retrospective studies using cohort data to develop, validate, or statistically evaluate prediction tools. The tools consisted of early warning, screening, or scoring systems based on physiologic data, as well as more complex algorithms developed to better represent real-time data, deal with complexities of longitudinal data, and warn of deterioration risk earlier. Only a few studies detailed the results of the implementation of deterioration warning tools. Conclusions: Despite relative progress in the development of algorithms to predict patient deterioration, the literature has not shown that the deployment or implementation of such algorithms is reproducibly associated with improvements in patient outcomes. Further work is needed to realize the potential of automated predictions and update dynamic risk estimates as part of an operational early warning system for inpatient deterioration. ", doi="10.2196/28209", url="https://www.jmir.org/2021/9/e28209", url="http://www.ncbi.nlm.nih.gov/pubmed/34591017" } @Article{info:doi/10.2196/27122, author="Zhai, Huiwen and Yang, Xin and Xue, Jiaolong and Lavender, Christopher and Ye, Tiantian and Li, Ji-Bin and Xu, Lanyang and Lin, Li and Cao, Weiwei and Sun, Ying", title="Radiation Oncologists' Perceptions of Adopting an Artificial Intelligence--Assisted Contouring Technology: Model Development and Questionnaire Study", journal="J Med Internet Res", year="2021", month="Sep", day="30", volume="23", number="9", pages="e27122", keywords="artificial intelligence", keywords="technology acceptance model", keywords="intension", keywords="resistance", abstract="Background: An artificial intelligence (AI)--assisted contouring system benefits radiation oncologists by saving time and improving treatment accuracy. Yet, there is much hope and fear surrounding such technologies, and this fear can manifest as resistance from health care professionals, which can lead to the failure of AI projects. Objective: The objective of this study was to develop and test a model for investigating the factors that drive radiation oncologists' acceptance of AI contouring technology in a Chinese context. Methods: A model of AI-assisted contouring technology acceptance was developed based on the Unified Theory of Acceptance and Use of Technology (UTAUT) model by adding the variables of perceived risk and resistance that were proposed in this study. The model included 8 constructs with 29 questionnaire items. A total of 307 respondents completed the questionnaires. Structural equation modeling was conducted to evaluate the model's path effects, significance, and fitness. Results: The overall fitness indices for the model were evaluated and showed that the model was a good fit to the data. Behavioral intention was significantly affected by performance expectancy ($\beta$=.155; P=.01), social influence ($\beta$=.365; P<.001), and facilitating conditions ($\beta$=.459; P<.001). Effort expectancy ($\beta$=.055; P=.45), perceived risk ($\beta$=?.048; P=.35), and resistance bias ($\beta$=?.020; P=.63) did not significantly affect behavioral intention. Conclusions: The physicians' overall perceptions of an AI-assisted technology for radiation contouring were high. Technology resistance among Chinese radiation oncologists was low and not related to behavioral intention. Not all of the factors in the Venkatesh UTAUT model applied to AI technology adoption among physicians in a Chinese context. ", doi="10.2196/27122", url="https://www.jmir.org/2021/9/e27122", url="http://www.ncbi.nlm.nih.gov/pubmed/34591029" } @Article{info:doi/10.2196/31311, author="Choe, Sooho and Park, Eunjeong and Shin, Wooseok and Koo, Bonah and Shin, Dongjin and Jung, Chulwoo and Lee, Hyungchul and Kim, Jeongmin", title="Short-Term Event Prediction in the Operating Room (STEP-OP) of Five-Minute Intraoperative Hypotension Using Hybrid Deep Learning: Retrospective Observational Study and Model Development", journal="JMIR Med Inform", year="2021", month="Sep", day="30", volume="9", number="9", pages="e31311", keywords="arterial pressure", keywords="artificial intelligence", keywords="biosignals", keywords="deep learning", keywords="hypotension", keywords="machine learning", abstract="Background: Intraoperative hypotension has an adverse impact on postoperative outcomes. However, it is difficult to predict and treat intraoperative hypotension in advance according to individual clinical parameters. Objective: The aim of this study was to develop a prediction model to forecast 5-minute intraoperative hypotension based on the weighted average ensemble of individual neural networks, utilizing the biosignals recorded during noncardiac surgery. Methods: In this retrospective observational study, arterial waveforms were recorded during noncardiac operations performed between August 2016 and December 2019, at Seoul National University Hospital, Seoul, South Korea. We analyzed the arterial waveforms from the big data in the VitalDB repository of electronic health records. We defined 2s hypotension as the moving average of arterial pressure under 65 mmHg for 2 seconds, and intraoperative hypotensive events were defined when the 2s hypotension lasted for at least 60 seconds. We developed an artificial intelligence--enabled process, named short-term event prediction in the operating room (STEP-OP), for predicting short-term intraoperative hypotension. Results: The study was performed on 18,813 subjects undergoing noncardiac surgeries. Deep-learning algorithms (convolutional neural network [CNN] and recurrent neural network [RNN]) using raw waveforms as input showed greater area under the precision-recall curve (AUPRC) scores (0.698, 95\% CI 0.690-0.705 and 0.706, 95\% CI 0.698-0.715, respectively) than that of the logistic regression algorithm (0.673, 95\% CI 0.665-0.682). STEP-OP performed better and had greater AUPRC values than those of the RNN and CNN algorithms (0.716, 95\% CI 0.708-0.723). Conclusions: We developed STEP-OP as a weighted average of deep-learning models. STEP-OP predicts intraoperative hypotension more accurately than the CNN, RNN, and logistic regression models. Trial Registration: ClinicalTrials.gov NCT02914444; https://clinicaltrials.gov/ct2/show/NCT02914444. ", doi="10.2196/31311", url="https://medinform.jmir.org/2021/9/e31311", url="http://www.ncbi.nlm.nih.gov/pubmed/34591024" } @Article{info:doi/10.2196/21990, author="Shah, Kanan and Sharma, Akarsh and Moulton, Chris and Swift, Simon and Mann, Clifford and Jones, Simon", title="Forecasting the Requirement for Nonelective Hospital Beds in the National Health Service of the United Kingdom: Model Development Study", journal="JMIR Med Inform", year="2021", month="Sep", day="30", volume="9", number="9", pages="e21990", keywords="bed occupancy", keywords="clinical decision-making", keywords="forecasting", keywords="health care delivery", keywords="models", keywords="time-series analysis", abstract="Background: Over the last decade, increasing numbers of emergency department attendances and an even greater increase in emergency admissions have placed severe strain on the bed capacity of the National Health Service (NHS) of the United Kingdom. The result has been overcrowded emergency departments with patients experiencing long wait times for admission to an appropriate hospital bed. Nevertheless, scheduling issues can still result in significant underutilization of bed capacity. Bed occupancy rates may not correlate well with bed availability. More accurate and reliable long-term prediction of bed requirements will help anticipate the future needs of a hospital's catchment population, thus resulting in greater efficiencies and better patient care. Objective: This study aimed to evaluate widely used automated time-series forecasting techniques to predict short-term daily nonelective bed occupancy at all trusts in the NHS. These techniques were used to develop a simple yet accurate national health system--level forecasting framework that can be utilized at a low cost and by health care administrators who do not have statistical modeling expertise. Methods: Bed occupancy models that accounted for patterns in occupancy were created for each trust in the NHS. Daily nonelective midnight trust occupancy data from April 2011 to March 2017 for 121 NHS trusts were utilized to generate these models. Forecasts were generated using the three most widely used automated forecasting techniques: exponential smoothing; Seasonal Autoregressive Integrated Moving Average; and Trigonometric, Box-Cox transform, autoregressive moving average errors, and Trend and Seasonal components. The NHS Modernisation Agency's recommended forecasting method prior to 2020 was also replicated. Results: The accuracy of the models varied on the basis of the season during which occupancy was forecasted. For the summer season, percent root-mean-square error values for each model remained relatively stable across the 6 forecasted weeks. However, only the trend and seasonal components model (median error=2.45\% for 6 weeks) outperformed the NHS Modernisation Agency's recommended method (median error=2.63\% for 6 weeks). In contrast, during the winter season, the percent root-mean-square error values increased as we forecasted further into the future. Exponential smoothing generated the most accurate forecasts (median error=4.91\% over 4 weeks), but all models outperformed the NHS Modernisation Agency's recommended method prior to 2020 (median error=8.5\% over 4 weeks). Conclusions: It is possible to create automated models, similar to those recently published by the NHS, which can be used at a hospital level for a large national health care system to predict nonelective bed admissions and thus schedule elective procedures. ", doi="10.2196/21990", url="https://medinform.jmir.org/2021/9/e21990", url="http://www.ncbi.nlm.nih.gov/pubmed/34591020" } @Article{info:doi/10.2196/30157, author="Sankaranarayanan, Saranya and Balan, Jagadheshwar and Walsh, R. Jesse and Wu, Yanhong and Minnich, Sara and Piazza, Amy and Osborne, Collin and Oliver, R. Gavin and Lesko, Jessica and Bates, L. Kathy and Khezeli, Kia and Block, R. Darci and DiGuardo, Margaret and Kreuter, Justin and O'Horo, C. John and Kalantari, John and Klee, W. Eric and Salama, E. Mohamed and Kipp, Benjamin and Morice, G. William and Jenkinson, Garrett", title="COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation", journal="J Med Internet Res", year="2021", month="Sep", day="28", volume="23", number="9", pages="e30157", keywords="COVID-19", keywords="mortality", keywords="prediction", keywords="recurrent neural networks", keywords="missing data", keywords="time series", keywords="deep learning", keywords="machine learning", keywords="neural network", keywords="electronic health record", keywords="EHR", keywords="algorithm", keywords="development", keywords="validation", abstract="Background: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. Objective: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. Methods: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient's first positive COVID-19 nucleic acid test result. Results: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95\% CI 0.043-0.106). Conclusions: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19--positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result. ", doi="10.2196/30157", url="https://www.jmir.org/2021/9/e30157", url="http://www.ncbi.nlm.nih.gov/pubmed/34449401" } @Article{info:doi/10.2196/29678, author="Park, Jung Chae and Cho, Sang Young and Chung, Jin Myung and Kim, Yi-Kyung and Kim, Hyung-Jin and Kim, Kyunga and Ko, Jae-Wook and Chung, Won-Ho and Cho, Hwan Baek", title="A Fully Automated Analytic System for Measuring Endolymphatic Hydrops Ratios in Patients With M{\'e}ni{\`e}re Disease via Magnetic Resonance Imaging: Deep Learning Model Development Study", journal="J Med Internet Res", year="2021", month="Sep", day="21", volume="23", number="9", pages="e29678", keywords="deep learning", keywords="magnetic resonance imaging", keywords="medical image segmentation", keywords="M{\'e}ni{\`e}re disease", keywords="inner ear", keywords="endolymphatic hydrops", keywords="artificial intelligence", keywords="machine learning", keywords="multi-class segmentation", keywords="convolutional neural network", keywords="end-to-end system", keywords="clinician support", keywords="clinical decision support system", keywords="image selection", keywords="clinical usability", keywords="automation", abstract="Background: Recently, the analysis of endolymphatic hydropses (EHs) via inner ear magnetic resonance imaging (MRI) for patients with M{\'e}ni{\`e}re disease has been attempted in various studies. In addition, artificial intelligence has rapidly been incorporated into the medical field. In our previous studies, an automated algorithm for EH analysis was developed by using a convolutional neural network. However, several limitations existed, and further studies were conducted to compensate for these limitations. Objective: The aim of this study is to develop a fully automated analytic system for measuring EH ratios that enhances EH analysis accuracy and clinical usability when studying M{\'e}ni{\`e}re disease via MRI. Methods: We proposed the 3into3Inception and 3intoUNet networks. Their network architectures were based on those of the Inception-v3 and U-Net networks, respectively. The developed networks were trained for inner ear segmentation by using the magnetic resonance images of 124 people and were embedded in a new, automated EH analysis system---inner-ear hydrops estimation via artificial intelligence (INHEARIT)-version 2 (INHEARIT-v2). After fivefold cross-validation, an additional test was performed by using 60 new, unseen magnetic resonance images to evaluate the performance of our system. The INHEARIT-v2 system has a new function that automatically selects representative images from a full MRI stack. Results: The average segmentation performance of the fivefold cross-validation was measured via the intersection of union method, resulting in performance values of 0.743 (SD 0.030) for the 3into3Inception network and 0.811 (SD 0.032) for the 3intoUNet network. The representative magnetic resonance slices (ie, from a data set of unseen magnetic resonance images) that were automatically selected by the INHEARIT-v2 system only differed from a maximum of 2 expert-selected slices. After comparing the ratios calculated by experienced physicians and those calculated by the INHEARIT-v2 system, we found that the average intraclass correlation coefficient for all cases was 0.941; the average intraclass correlation coefficient of the vestibules was 0.968, and that of the cochleae was 0.914. The time required for the fully automated system to accurately analyze EH ratios based on a patient's MRI stack was approximately 3.5 seconds. Conclusions: In this study, a fully automated full-stack magnetic resonance analysis system for measuring EH ratios was developed (named INHEARIT-v2), and the results showed that there was a high correlation between the expert-calculated EH ratio values and those calculated by the INHEARIT-v2 system. The system is an upgraded version of the INHEARIT system; it has higher segmentation performance and automatically selects representative images from an MRI stack. The new model can help clinicians by providing objective analysis results and reducing the workload for interpreting magnetic resonance images. ", doi="10.2196/29678", url="https://www.jmir.org/2021/9/e29678", url="http://www.ncbi.nlm.nih.gov/pubmed/34546181" } @Article{info:doi/10.2196/26025, author="Huang, Kai and Jiang, Zixi and Li, Yixin and Wu, Zhe and Wu, Xian and Zhu, Wu and Chen, Mingliang and Zhang, Yu and Zuo, Ke and Li, Yi and Yu, Nianzhou and Liu, Siliang and Huang, Xing and Su, Juan and Yin, Mingzhu and Qian, Buyue and Wang, Xianggui and Chen, Xiang and Zhao, Shuang", title="The Classification of Six Common Skin Diseases Based on Xiangya-Derm: Development of a Chinese Database for Artificial Intelligence", journal="J Med Internet Res", year="2021", month="Sep", day="21", volume="23", number="9", pages="e26025", keywords="artificial intelligence", keywords="skin disease", keywords="convolutional neural network", keywords="medical image processing", keywords="automatic auxiliary diagnoses", keywords="dermatology", keywords="skin", keywords="classification", keywords="China", abstract="Background: Skin and subcutaneous disease is the fourth-leading cause of the nonfatal disease burden worldwide and constitutes one of the most common burdens in primary care. However, there is a severe lack of dermatologists, particularly in rural Chinese areas. Furthermore, although artificial intelligence (AI) tools can assist in diagnosing skin disorders from images, the database for the Chinese population is limited. Objective: This study aims to establish a database for AI based on the Chinese population and presents an initial study on six common skin diseases. Methods: Each image was captured with either a digital camera or a smartphone, verified by at least three experienced dermatologists and corresponding pathology information, and finally added to the Xiangya-Derm database. Based on this database, we conducted AI-assisted classification research on six common skin diseases and then proposed a network called Xy-SkinNet. Xy-SkinNet applies a two-step strategy to identify skin diseases. First, given an input image, we segmented the regions of the skin lesion. Second, we introduced an information fusion block to combine the output of all segmented regions. We compared the performance with 31 dermatologists of varied experiences. Results: Xiangya-Derm, as a new database that consists of over 150,000 clinical images of 571 different skin diseases in the Chinese population, is the largest and most diverse dermatological data set of the Chinese population. The AI-based six-category classification achieved a top 3 accuracy of 84.77\%, which exceeded the average accuracy of dermatologists (78.15\%). Conclusions: Xiangya-Derm, the largest database for the Chinese population, was created. The classification of six common skin conditions was conducted based on Xiangya-Derm to lay a foundation for product research. ", doi="10.2196/26025", url="https://www.jmir.org/2021/9/e26025", url="http://www.ncbi.nlm.nih.gov/pubmed/34546174" } @Article{info:doi/10.2196/30223, author="Shin, Dongyup and Kam, Jin Hye and Jeon, Min-Seok and Kim, Young Ha", title="Automatic Classification of Thyroid Findings Using Static and Contextualized Ensemble Natural Language Processing Systems: Development Study", journal="JMIR Med Inform", year="2021", month="Sep", day="21", volume="9", number="9", pages="e30223", keywords="deep learning", keywords="natural language processing", keywords="word embedding", keywords="convolution neural network", keywords="long short-term memory", keywords="transformer", keywords="ensemble", keywords="thyroid", keywords="electronic medical records", abstract="Background: In the case of Korean institutions and enterprises that collect nonstandardized and nonunified formats of electronic medical examination results from multiple medical institutions, a group of experienced nurses who can understand the results and related contexts initially classified the reports manually. The classification guidelines were established by years of workers' clinical experiences and there were attempts to automate the classification work. However, there have been problems in which rule-based algorithms or human labor--intensive efforts can be time-consuming or limited owing to high potential errors. We investigated natural language processing (NLP) architectures and proposed ensemble models to create automated classifiers. Objective: This study aimed to develop practical deep learning models with electronic medical records from 284 health care institutions and open-source corpus data sets for automatically classifying 3 thyroid conditions: healthy, caution required, and critical. The primary goal is to increase the overall accuracy of the classification, yet there are practical and industrial needs to correctly predict healthy (negative) thyroid condition data, which are mostly medical examination results, and minimize false-negative rates under the prediction of healthy thyroid conditions. Methods: The data sets included thyroid and comprehensive medical examination reports. The textual data are not only documented in fully complete sentences but also written in lists of words or phrases. Therefore, we propose static and contextualized ensemble NLP network (SCENT) systems to successfully reflect static and contextual information and handle incomplete sentences. We prepared each convolution neural network (CNN)-, long short-term memory (LSTM)-, and efficiently learning an encoder that classifies token replacements accurately (ELECTRA)-based ensemble model by training or fine-tuning them multiple times. Through comprehensive experiments, we propose 2 versions of ensemble models, SCENT-v1 and SCENT-v2, with the single-architecture--based CNN, LSTM, and ELECTRA ensemble models for the best classification performance and practical use, respectively. SCENT-v1 is an ensemble of CNN and ELECTRA ensemble models, and SCENT-v2 is a hierarchical ensemble of CNN, LSTM, and ELECTRA ensemble models. SCENT-v2 first classifies the 3 labels using an ELECTRA ensemble model and then reclassifies them using an ensemble model of CNN and LSTM if the ELECTRA ensemble model predicted them as ``healthy'' labels. Results: SCENT-v1 outperformed all the suggested models, with the highest F1 score (92.56\%). SCENT-v2 had the second-highest recall value (94.44\%) and the fewest misclassifications for caution-required thyroid condition while maintaining 0 classification error for the critical thyroid condition under the prediction of the healthy thyroid condition. Conclusions: The proposed SCENT demonstrates good classification performance despite the unique characteristics of the Korean language and problems of data lack and imbalance, especially for the extremely low amount of critical condition data. The result of SCENT-v1 indicates that different perspectives of static and contextual input token representations can enhance classification performance. SCENT-v2 has a strong impact on the prediction of healthy thyroid conditions. ", doi="10.2196/30223", url="https://medinform.jmir.org/2021/9/e30223", url="http://www.ncbi.nlm.nih.gov/pubmed/34546183" } @Article{info:doi/10.2196/30770, author="Yun, Hyoungju and Choi, Jinwook and Park, Ho Jeong", title="Prediction of Critical Care Outcome for Adult Patients Presenting to Emergency Department Using Initial Triage Information: An XGBoost Algorithm Analysis", journal="JMIR Med Inform", year="2021", month="Sep", day="20", volume="9", number="9", pages="e30770", keywords="triage", keywords="critical care", keywords="prediction", keywords="XGBoost", keywords="explainable machine learning", keywords="interpretable artificial intelligence", keywords="machine learning", keywords="algorithm", keywords="outcome", keywords="emergency", keywords="classify", keywords="prioritize", keywords="risk", keywords="model", abstract="Background: The emergency department (ED) triage system to classify and prioritize patients from high risk to less urgent continues to be a challenge. Objective: This study, comprising 80,433 patients, aims to develop a machine learning algorithm prediction model of critical care outcomes for adult patients using information collected during ED triage and compare the performance with that of the baseline model using the Korean Triage and Acuity Scale (KTAS). Methods: To predict the need for critical care, we used 13 predictors from triage information: age, gender, mode of ED arrival, the time interval between onset and ED arrival, reason of ED visit, chief complaints, systolic blood pressure, diastolic blood pressure, pulse rate, respiratory rate, body temperature, oxygen saturation, and level of consciousness. The baseline model with KTAS was developed using logistic regression, and the machine learning model with 13 variables was generated using extreme gradient boosting (XGB) and deep neural network (DNN) algorithms. The discrimination was measured by the area under the receiver operating characteristic (AUROC) curve. The ability of calibration with Hosmer--Lemeshow test and reclassification with net reclassification index were evaluated. The calibration plot and partial dependence plot were used in the analysis. Results: The AUROC of the model with the full set of variables (0.833-0.861) was better than that of the baseline model (0.796). The XGB model of AUROC 0.861 (95\% CI 0.848-0.874) showed a higher discriminative performance than the DNN model of 0.833 (95\% CI 0.819-0.848). The XGB and DNN models proved better reclassification than the baseline model with a positive net reclassification index. The XGB models were well-calibrated (Hosmer-Lemeshow test; P>.05); however, the DNN showed poor calibration power (Hosmer-Lemeshow test; P<.001). We further interpreted the nonlinear association between variables and critical care prediction. Conclusions: Our study demonstrated that the performance of the XGB model using initial information at ED triage for predicting patients in need of critical care outperformed the conventional model with KTAS. ", doi="10.2196/30770", url="https://medinform.jmir.org/2021/9/e30770", url="http://www.ncbi.nlm.nih.gov/pubmed/34346889" } @Article{info:doi/10.2196/27799, author="Wang, Xin Jonathan and Somani, Sulaiman and Chen, H. Jonathan and Murray, Sara and Sarkar, Urmimala", title="Health Equity in Artificial Intelligence and Primary Care Research: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2021", month="Sep", day="17", volume="10", number="9", pages="e27799", keywords="artificial intelligence", keywords="health information technology", keywords="health informatics", keywords="electronic health records", keywords="big data", keywords="data mining", keywords="primary care", keywords="family medicine", keywords="decision support", keywords="diagnosis", keywords="treatment", keywords="scoping review", keywords="health equity", keywords="health disparity", abstract="Background: Though artificial intelligence (AI) has the potential to augment the patient-physician relationship in primary care, bias in intelligent health care systems has the potential to differentially impact vulnerable patient populations. Objective: The purpose of this scoping review is to summarize the extent to which AI systems in primary care examine the inherent bias toward or against vulnerable populations and appraise how these systems have mitigated the impact of such biases during their development. Methods: We will conduct a search update from an existing scoping review to identify studies on AI and primary care in the following databases: Medline-OVID, Embase, CINAHL, Cochrane Library, Web of Science, Scopus, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI, and arXiv. Two screeners will independently review all abstracts, titles, and full-text articles. The team will extract data using a structured data extraction form and synthesize the results in accordance with PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Results: This review will provide an assessment of the current state of health care equity within AI for primary care. Specifically, we will identify the degree to which vulnerable patients have been included, assess how bias is interpreted and documented, and understand the extent to which harmful biases are addressed. As of October 2020, the scoping review is in the title- and abstract-screening stage. The results are expected to be submitted for publication in fall 2021. Conclusions: AI applications in primary care are becoming an increasingly common tool in health care delivery and in preventative care efforts for underserved populations. This scoping review would potentially show the extent to which studies on AI in primary care employ a health equity lens and take steps to mitigate bias. International Registered Report Identifier (IRRID): PRR1-10.2196/27799 ", doi="10.2196/27799", url="https://www.researchprotocols.org/2021/9/e27799", url="http://www.ncbi.nlm.nih.gov/pubmed/34533458" } @Article{info:doi/10.2196/21810, author="Alaqra, Sarah Ala and Kane, Bridget and Fischer-H{\"u}bner, Simone", title="Machine Learning--Based Analysis of Encrypted Medical Data in the Cloud: Qualitative Study of Expert Stakeholders' Perspectives", journal="JMIR Hum Factors", year="2021", month="Sep", day="16", volume="8", number="3", pages="e21810", keywords="medical data analysis", keywords="encryption", keywords="privacy-enhancing technologies", keywords="machine learning", keywords="stakeholders", keywords="tradeoffs", keywords="perspectives", keywords="eHealth", keywords="interviews", abstract="Background: Third-party cloud-based data analysis applications are proliferating in electronic health (eHealth) because of the expertise offered and their monetary advantage. However, privacy and security are critical concerns when handling sensitive medical data in the cloud. Technical advances based on ``crypto magic'' in privacy-preserving machine learning (ML) enable data analysis in encrypted form for maintaining confidentiality. Such privacy-enhancing technologies (PETs) could be counterintuitive to relevant stakeholders in eHealth, which could in turn hinder adoption; thus, more attention is needed on human factors for establishing trust and transparency. Objective: The aim of this study was to analyze eHealth expert stakeholders' perspectives and the perceived tradeoffs in regard to data analysis on encrypted medical data in the cloud, and to derive user requirements for development of a privacy-preserving data analysis tool. Methods: We used semistructured interviews and report on 14 interviews with individuals having medical, technical, or research expertise in eHealth. We used thematic analysis for analyzing interview data. In addition, we conducted a workshop for eliciting requirements. Results: Our results show differences in the understanding of and in trusting the technology; caution is advised by technical experts, whereas patient safety assurances are required by medical experts. Themes were identified with general perspectives on data privacy and practices (eg, acceptance of using external services), as well as themes highlighting specific perspectives (eg, data protection drawbacks and concerns of the data analysis on encrypted data). The latter themes result in requiring assurances and conformance testing for trusting tools such as the proposed ML-based tool. Communicating privacy, and utility benefits and tradeoffs with stakeholders is essential for trust. Furthermore, stakeholders and their organizations share accountability of patient data. Finally, stakeholders stressed the importance of informing patients about the privacy of their data. Conclusions: Understanding the benefits and risks of using eHealth PETs is crucial, and collaboration among diverse stakeholders is essential. Assurances of the tool's privacy, accuracy, and patient safety should be in place for establishing trust of ML-based PETs, especially if used in the cloud. ", doi="10.2196/21810", url="https://humanfactors.jmir.org/2021/3/e21810", url="http://www.ncbi.nlm.nih.gov/pubmed/34528892" } @Article{info:doi/10.2196/30022, author="Monahan, Corneille Ann and Feldman, S. Sue", title="Models Predicting Hospital Admission of Adult Patients Utilizing Prehospital Data: Systematic Review Using PROBAST and CHARMS", journal="JMIR Med Inform", year="2021", month="Sep", day="16", volume="9", number="9", pages="e30022", keywords="emergency services", keywords="hospital", keywords="decision support techniques", keywords="patient-specific modeling", keywords="crowding", keywords="boarding", keywords="exit block", keywords="systematic review", keywords="PROBAST", keywords="CHARMS", keywords="predictive model", keywords="medical informatics", keywords="health services research", keywords="prehospital assessment", keywords="process improvement", keywords="management information system", keywords="predict admission", keywords="emergency department", abstract="Background: Emergency department boarding and hospital exit block are primary causes of emergency department crowding and have been conclusively associated with poor patient outcomes and major threats to patient safety. Boarding occurs when a patient is delayed or blocked from transitioning out of the emergency department because of dysfunctional transition or bed assignment processes. Predictive models for estimating the probability of an occurrence of this type could be useful in reducing or preventing emergency department boarding and hospital exit block, to reduce emergency department crowding. Objective: The aim of this study was to identify and appraise the predictive performance, predictor utility, model application, and model utility of hospital admission prediction models that utilized prehospital, adult patient data and aimed to address emergency department crowding. Methods: We searched multiple databases for studies, from inception to September 30, 2019, that evaluated models predicting adult patients' imminent hospital admission, with prehospital patient data and regression analysis. We used PROBAST (Prediction Model Risk of Bias Assessment Tool) and CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies) to critically assess studies. Results: Potential biases were found in most studies, which suggested that each model's predictive performance required further investigation. We found that select prehospital patient data contribute to the identification of patients requiring hospital admission. Biomarker predictors may add superior value and advantages to models. It is, however, important to note that no models had been integrated with an information system or workflow, operated independently as electronic devices, or operated in real time within the care environment. Several models could be used at the site-of-care in real time without digital devices, which would make them suitable for low-technology or no-electricity environments. Conclusions: There is incredible potential for prehospital admission prediction models to improve patient care and hospital operations. Patient data can be utilized to act as predictors and as data-driven, actionable tools to identify patients likely to require imminent hospital admission and reduce patient boarding and crowding in emergency departments. Prediction models can be used to justify earlier patient admission and care, to lower morbidity and mortality, and models that utilize biomarker predictors offer additional advantages. ", doi="10.2196/30022", url="https://medinform.jmir.org/2021/9/e30022", url="http://www.ncbi.nlm.nih.gov/pubmed/34528893" } @Article{info:doi/10.2196/27798, author="Chi, Chien-Yu and Ao, Shuang and Winkler, Adrian and Fu, Kuan-Chun and Xu, Jie and Ho, Yi-Lwun and Huang, Chien-Hua and Soltani, Rohollah", title="Predicting the Mortality and Readmission of In-Hospital Cardiac Arrest Patients With Electronic Health Records: A Machine Learning Approach", journal="J Med Internet Res", year="2021", month="Sep", day="13", volume="23", number="9", pages="e27798", keywords="in-hospital cardiac arrest", keywords="30-day mortality", keywords="30-day readmission", keywords="machine learning", keywords="imbalanced dataset", abstract="Background: In-hospital cardiac arrest (IHCA) is associated with high mortality and health care costs in the recovery phase. Predicting adverse outcome events, including readmission, improves the chance for appropriate interventions and reduces health care costs. However, studies related to the early prediction of adverse events of IHCA survivors are rare. Therefore, we used a deep learning model for prediction in this study. Objective: This study aimed to demonstrate that with the proper data set and learning strategies, we can predict the 30-day mortality and readmission of IHCA survivors based on their historical claims. Methods: National Health Insurance Research Database claims data, including 168,693 patients who had experienced IHCA at least once and 1,569,478 clinical records, were obtained to generate a data set for outcome prediction. We predicted the 30-day mortality/readmission after each current record (ALL-mortality/ALL-readmission) and 30-day mortality/readmission after IHCA (cardiac arrest [CA]-mortality/CA-readmission). We developed a hierarchical vectorizer (HVec) deep learning model to extract patients' information and predict mortality and readmission. To embed the textual medical concepts of the clinical records into our deep learning model, we used Text2Node to compute the distributed representations of all medical concept codes as a 128-dimensional vector. Along with the patient's demographic information, our novel HVec model generated embedding vectors to hierarchically describe the health status at the record-level and patient-level. Multitask learning involving two main tasks and auxiliary tasks was proposed. As CA-mortality and CA-readmission were rare, person upsampling of patients with CA and weighting of CA records were used to improve prediction performance. Results: With the multitask learning setting in the model learning process, we achieved an area under the receiver operating characteristic of 0.752 for CA-mortality, 0.711 for ALL-mortality, 0.852 for CA-readmission, and 0.889 for ALL-readmission. The area under the receiver operating characteristic was improved to 0.808 for CA-mortality and 0.862 for CA-readmission after solving the extremely imbalanced issue for CA-mortality/CA-readmission by upsampling and weighting. Conclusions: This study demonstrated the potential of predicting future outcomes for IHCA survivors by machine learning. The results showed that our proposed approach could effectively alleviate data imbalance problems and train a better model for outcome prediction. ", doi="10.2196/27798", url="https://www.jmir.org/2021/9/e27798", url="http://www.ncbi.nlm.nih.gov/pubmed/34515639" } @Article{info:doi/10.2196/24295, author="Geva, A. Gil and Ketko, Itay and Nitecki, Maya and Simon, Shoham and Inbar, Barr and Toledo, Itay and Shapiro, Michael and Vaturi, Barak and Votta, Yoni and Filler, Daniel and Yosef, Roey and Shpitzer, A. Sagi and Hir, Nabil and Peri Markovich, Michal and Shapira, Shachar and Fink, Noam and Glasberg, Elon and Furer, Ariel", title="Data Empowerment of Decision-Makers in an Era of a Pandemic: Intersection of ``Classic'' and Artificial Intelligence in the Service of Medicine", journal="J Med Internet Res", year="2021", month="Sep", day="10", volume="23", number="9", pages="e24295", keywords="COVID-19", keywords="medical informatics", keywords="decision-making", keywords="pandemic", keywords="data", keywords="policy", keywords="validation", keywords="accuracy", keywords="data analysis", abstract="Background: The COVID-19 outbreak required prompt action by health authorities around the world in response to a novel threat. With enormous amounts of information originating in sources with uncertain degree of validation and accuracy, it is essential to provide executive-level decision-makers with the most actionable, pertinent, and updated data analysis to enable them to adapt their strategy swiftly and competently. Objective: We report here the origination of a COVID-19 dedicated response in the Israel Defense Forces with the assembly of an operational Data Center for the Campaign against Coronavirus. Methods: Spearheaded by directors with clinical, operational, and data analytics orientation, a multidisciplinary team utilized existing and newly developed platforms to collect and analyze large amounts of information on an individual level in the context of SARS-CoV-2 contraction and infection. Results: Nearly 300,000 responses to daily questionnaires were recorded and were merged with other data sets to form a unified data lake. By using basic as well as advanced analytic tools ranging from simple aggregation and display of trends to data science application, we provided commanders and clinicians with access to trusted, accurate, and personalized information and tools that were designed to foster operational changes and mitigate the propagation of the pandemic. The developed tools aided in the in the identification of high-risk individuals for severe disease and resulted in a 30\% decline in their attendance to their units. Moreover, the queue for laboratory examination for COVID-19 was optimized using a predictive model and resulted in a high true-positive rate of 20\%, which is more than twice as high as the baseline rate (2.28\%, 95\% CI 1.63\%-3.19\%). Conclusions: In times of ambiguity and uncertainty, along with an unprecedented flux of information, health organizations may find multidisciplinary teams working to provide intelligence from diverse and rich data a key factor in providing executives relevant and actionable support for decision-making. ", doi="10.2196/24295", url="https://www.jmir.org/2021/9/e24295", url="http://www.ncbi.nlm.nih.gov/pubmed/34313589" } @Article{info:doi/10.2196/26220, author="Blasi, Livio and Bordonaro, Roberto and Serretta, Vincenzo and Piazza, Dario and Firenze, Alberto and Gebbia, Vittorio", title="Virtual Clinical and Precision Medicine Tumor Boards---Cloud-Based Platform--Mediated Implementation of Multidisciplinary Reviews Among Oncology Centers in the COVID-19 Era: Protocol for an Observational Study", journal="JMIR Res Protoc", year="2021", month="Sep", day="10", volume="10", number="9", pages="e26220", keywords="virtual tumor board", keywords="multidisciplinary collaboration", keywords="oncology", keywords="multidisciplinary communication", keywords="health services", keywords="multidisciplinary oncology consultations", keywords="virtual health", keywords="digital health", keywords="precision medicine", keywords="tumor", keywords="cancer", keywords="cloud-based", keywords="platform", keywords="implementation", keywords="COVID-19", abstract="Background: Multidisciplinary tumor boards play a pivotal role in the patient-centered clinical management and in the decision-making process to provide best evidence-based, diagnostic, and therapeutic care to patients with cancer. Among the barriers to achieve an efficient multidisciplinary tumor board, lack of time and geographical distance play a major role. Therefore, the elaboration of an efficient virtual multidisciplinary tumor board (VMTB) is a key point to successfully obtain an oncology team and implement a network among health professionals and institutions. This need is stronger than ever during the COVID-19 pandemic. Objective: This paper presents a research protocol for an observational study focused on exploring the structuring process and the implementation of a multi-institutional VMTB in Sicily, Italy. Other endpoints include analysis of cooperation between participants, adherence to guidelines, patients' outcomes, and patient satisfaction. Methods: This protocol encompasses a pragmatic, observational, multicenter, noninterventional, prospective trial. The study's programmed duration is 5 years, with a half-yearly analysis of the primary and secondary objectives' measurements. Oncology care health professionals from various oncology subspecialties at oncology departments in multiple hospitals (academic and general hospitals as well as tertiary centers and community hospitals) are involved in a nonhierarchic manner. VMTB employs an innovative, virtual, cloud-based platform to share anonymized medical data that are discussed via a videoconferencing system both satisfying security criteria and compliance with the Health Insurance Portability and Accountability Act. Results: The protocol is part of a larger research project on communication and multidisciplinary collaboration in oncology units and departments spread in the Sicily region. The results of this study will particularly focus on the organization of VMTBs, involving oncology units present in different hospitals spread in the area, and creating a network to allow best patient care pathways and a hub-and-spoke relationship. The present results will also include data concerning organization skills and pitfalls, barriers, efficiency, number, and types with respect to clinical cases and customer satisfaction. Conclusions: VMTB represents a unique opportunity to optimize patient management through a patient-centered approach. An efficient virtualization and data-banking system is potentially time-saving, a source for outcome data, and a detector of possible holes in the hull of clinical pathways. The observations and results from this VMTB study may hopefully be useful to design nonclinical and organizational interventions that enhance multidisciplinary decision-making in oncology. International Registered Report Identifier (IRRID): DERR1-10.2196/26220 ", doi="10.2196/26220", url="https://www.researchprotocols.org/2021/9/e26220", url="http://www.ncbi.nlm.nih.gov/pubmed/34387553" } @Article{info:doi/10.2196/30451, author="Tomaszewski, Tre and Morales, Alex and Lourentzou, Ismini and Caskey, Rachel and Liu, Bing and Schwartz, Alan and Chin, Jessie", title="Identifying False Human Papillomavirus (HPV) Vaccine Information and Corresponding Risk Perceptions From Twitter: Advanced Predictive Models", journal="J Med Internet Res", year="2021", month="Sep", day="9", volume="23", number="9", pages="e30451", keywords="misinformation", keywords="disinformation", keywords="social media", keywords="HPV", keywords="human papillomavirus vaccination", keywords="vaccination", keywords="causality mining", keywords="cause", keywords="effect", keywords="risk perceptions", keywords="vaccine", keywords="perception", keywords="risk", keywords="Twitter", keywords="machine learning", keywords="natural language processing", keywords="cervical cancer", abstract="Background: The vaccination uptake rates of the human papillomavirus (HPV) vaccine remain low despite the fact that the effectiveness of HPV vaccines has been established for more than a decade. Vaccine hesitancy is in part due to false information about HPV vaccines on social media. Combating false HPV vaccine information is a reasonable step to addressing vaccine hesitancy. Objective: Given the substantial harm of false HPV vaccine information, there is an urgent need to identify false social media messages before it goes viral. The goal of the study is to develop a systematic and generalizable approach to identifying false HPV vaccine information on social media. Methods: This study used machine learning and natural language processing to develop a series of classification models and causality mining methods to identify and examine true and false HPV vaccine--related information on Twitter. Results: We found that the convolutional neural network model outperformed all other models in identifying tweets containing false HPV vaccine--related information (F score=91.95). We also developed completely unsupervised causality mining models to identify HPV vaccine candidate effects for capturing risk perceptions of HPV vaccines. Furthermore, we found that false information contained mostly loss-framed messages focusing on the potential risk of vaccines covering a variety of topics using more diverse vocabulary, while true information contained both gain- and loss-framed messages focusing on the effectiveness of vaccines covering fewer topics using relatively limited vocabulary. Conclusions: Our research demonstrated the feasibility and effectiveness of using predictive models to identify false HPV vaccine information and its risk perceptions on social media. ", doi="10.2196/30451", url="https://www.jmir.org/2021/9/e30451", url="http://www.ncbi.nlm.nih.gov/pubmed/34499043" } @Article{info:doi/10.2196/27098, author="Liu, Yi-Shiuan and Yang, Chih-Yu and Chiu, Ping-Fang and Lin, Hui-Chu and Lo, Chung-Chuan and Lai, Szu-Han Alan and Chang, Chia-Chu and Lee, Kuang-Sheng Oscar", title="Machine Learning Analysis of Time-Dependent Features for Predicting Adverse Events During Hemodialysis Therapy: Model Development and Validation Study", journal="J Med Internet Res", year="2021", month="Sep", day="7", volume="23", number="9", pages="e27098", keywords="hemodialysis", keywords="intradialytic adverse events", keywords="prediction algorithm", keywords="machine learning", abstract="Background: Hemodialysis (HD) therapy is an indispensable tool used in critical care management. Patients undergoing HD are at risk for intradialytic adverse events, ranging from muscle cramps to cardiac arrest. So far, there is no effective HD device--integrated algorithm to assist medical staff in response to these adverse events a step earlier during HD. Objective: We aimed to develop machine learning algorithms to predict intradialytic adverse events in an unbiased manner. Methods: Three-month dialysis and physiological time-series data were collected from all patients who underwent maintenance HD therapy at a tertiary care referral center. Dialysis data were collected automatically by HD devices, and physiological data were recorded by medical staff. Intradialytic adverse events were documented by medical staff according to patient complaints. Features extracted from the time series data sets by linear and differential analyses were used for machine learning to predict adverse events during HD. Results: Time series dialysis data were collected during the 4-hour HD session in 108 patients who underwent maintenance HD therapy. There were a total of 4221 HD sessions, 406 of which involved at least one intradialytic adverse event. Models were built by classification algorithms and evaluated by four-fold cross-validation. The developed algorithm predicted overall intradialytic adverse events, with an area under the curve (AUC) of 0.83, sensitivity of 0.53, and specificity of 0.96. The algorithm also predicted muscle cramps, with an AUC of 0.85, and blood pressure elevation, with an AUC of 0.93. In addition, the model built based on ultrafiltration-unrelated features predicted all types of adverse events, with an AUC of 0.81, indicating that ultrafiltration-unrelated factors also contribute to the onset of adverse events. Conclusions: Our results demonstrated that algorithms combining linear and differential analyses with two-class classification machine learning can predict intradialytic adverse events in quasi-real time with high AUCs. Such a methodology implemented with local cloud computation and real-time optimization by personalized HD data could warn clinicians to take timely actions in advance. ", doi="10.2196/27098", url="https://www.jmir.org/2021/9/e27098", url="http://www.ncbi.nlm.nih.gov/pubmed/34491204" } @Article{info:doi/10.2196/26503, author="Dhaliwal, Bandna and Neil-Sztramko, E. Sarah and Boston-Fisher, Nikita and Buckeridge, L. David and Dobbins, Maureen", title="Assessing the Electronic Evidence System Needs of Canadian Public Health Professionals: Cross-sectional Study", journal="JMIR Public Health Surveill", year="2021", month="Sep", day="7", volume="7", number="9", pages="e26503", keywords="population surveillance", keywords="evidence-informed decision-making", keywords="needs assessment", keywords="public health", keywords="precision public health", abstract="Background: True evidence-informed decision-making in public health relies on incorporating evidence from a number of sources in addition to traditional scientific evidence. Lack of access to these types of data as well as ease of use and interpretability of scientific evidence contribute to limited uptake of evidence-informed decision-making in practice. An electronic evidence system that includes multiple sources of evidence and potentially novel computational processing approaches or artificial intelligence holds promise as a solution to overcoming barriers to evidence-informed decision-making in public health. Objective: This study aims to understand the needs and preferences for an electronic evidence system among public health professionals in Canada. Methods: An invitation to participate in an anonymous web-based survey was distributed via listservs of 2 Canadian public health organizations in February 2019. Eligible participants were English- or French-speaking individuals currently working in public health. The survey contained both multiple-choice and open-ended questions about the needs and preferences relevant to an electronic evidence system. Quantitative responses were analyzed to explore differences by public health role. Inductive and deductive analysis methods were used to code and interpret the qualitative data. Ethics review was not required by the host institution. Results: Respondents (N=371) were heterogeneous, spanning organizations, positions, and areas of practice within public health. Nearly all (364/371, 98.1\%) respondents indicated that an electronic evidence system would support their work. Respondents had high preferences for local contextual data, research and intervention evidence, and information about human and financial resources. Qualitative analyses identified several concerns, needs, and suggestions for the development of such a system. Concerns ranged from the personal use of such a system to the ability of their organization to use such a system. Recognized needs spanned the different sources of evidence, including local context, research and intervention evidence, and resources and tools. Additional suggestions were identified to improve system usability. Conclusions: Canadian public health professionals have positive perceptions toward an electronic evidence system that would bring together evidence from the local context, scientific research, and resources. Elements were also identified to increase the usability of an electronic evidence system. ", doi="10.2196/26503", url="https://publichealth.jmir.org/2021/9/e26503", url="http://www.ncbi.nlm.nih.gov/pubmed/34491205" } @Article{info:doi/10.2196/29642, author="Shin, In-Soo and Rim, Hong Chai", title="Stepwise-Hierarchical Pooled Analysis for Synergistic Interpretation of Meta-analyses Involving Randomized and Observational Studies: Methodology Development", journal="J Med Internet Res", year="2021", month="Sep", day="2", volume="23", number="9", pages="e29642", keywords="meta-analysis", keywords="observational study", keywords="randomized study", keywords="interpretation", keywords="combination", keywords="statistics", keywords="synergy", keywords="methodology", keywords="hypothesis", keywords="validity", abstract="Background: The necessity of including observational studies in meta-analyses has been discussed in the literature, but a synergistic analysis method for combining randomized and observational studies has not been reported. Observational studies differ in validity depending on the degree of the confounders' influence. Combining interpretations may be challenging, especially if the statistical directions are similar but the magnitude of the pooled results are different between randomized and observational studies (the ''gray zone''). Objective: To overcome these hindrances, in this study, we aim to introduce a logical method for clinical interpretation of randomized and observational studies. Methods: We designed a stepwise-hierarchical pooled analysis method to analyze both distribution trends and individual pooled results by dividing the included studies into at least three stages (eg, all studies, balanced studies, and randomized studies). Results: According to the model, the validity of a hypothesis is mostly based on the pooled results of randomized studies (the highest stage). Ascending patterns in which effect size and statistical significance increase gradually with stage strengthen the validity of the hypothesis; in this case, the effect size of the observational studies is lower than that of the true effect (eg, because of the uncontrolled effect of negative confounders). Descending patterns in which decreasing effect size and statistical significance gradually weaken the validity of the hypothesis suggest that the effect size and statistical significance of the observational studies is larger than the true effect (eg, because of researchers' bias). Conclusions: We recommend using the stepwise-hierarchical pooled analysis approach for meta-analyses involving randomized and observational studies. ", doi="10.2196/29642", url="https://www.jmir.org/2021/9/e29642", url="http://www.ncbi.nlm.nih.gov/pubmed/34315697" } @Article{info:doi/10.2196/24377, author="Haroz, E. Emily and Grubin, Fiona and Goklish, Novalene and Pioche, Shardai and Cwik, Mary and Barlow, Allison and Waugh, Emma and Usher, Jason and Lenert, C. Matthew and Walsh, G. Colin", title="Designing a Clinical Decision Support Tool That Leverages Machine Learning for Suicide Risk Prediction: Development Study in Partnership With Native American Care Providers", journal="JMIR Public Health Surveill", year="2021", month="Sep", day="2", volume="7", number="9", pages="e24377", keywords="suicide prevention", keywords="machine learning", keywords="Native American health", keywords="implementation", abstract="Background: Machine learning algorithms for suicide risk prediction have been developed with notable improvements in accuracy. Implementing these algorithms to enhance clinical care and reduce suicide has not been well studied. Objective: This study aims to design a clinical decision support tool and appropriate care pathways for community-based suicide surveillance and case management systems operating on Native American reservations. Methods: Participants included Native American case managers and supervisors (N=9) who worked on suicide surveillance and case management programs on 2 Native American reservations. We used in-depth interviews to understand how case managers think about and respond to suicide risk. The results from interviews informed a draft clinical decision support tool, which was then reviewed with supervisors and combined with appropriate care pathways. Results: Case managers reported acceptance of risk flags based on a predictive algorithm in their surveillance system tools, particularly if the information was available in a timely manner and used in conjunction with their clinical judgment. Implementation of risk flags needed to be programmed on a dichotomous basis, so the algorithm could produce output indicating high versus low risk. To dichotomize the continuous predicted probabilities, we developed a cutoff point that favored specificity, with the understanding that case managers' clinical judgment would help increase sensitivity. Conclusions: Suicide risk prediction algorithms show promise, but implementation to guide clinical care remains relatively elusive. Our study demonstrates the utility of working with partners to develop and guide the operationalization of risk prediction algorithms to enhance clinical care in a community setting. ", doi="10.2196/24377", url="https://publichealth.jmir.org/2021/9/e24377", url="http://www.ncbi.nlm.nih.gov/pubmed/34473065" } @Article{info:doi/10.2196/24079, author="Ghanad Poor, Niema and West, C. Nicholas and Sreepada, Syamala Rama and Murthy, Srinivas and G{\"o}rges, Matthias", title="An Artificial Neural Network--Based Pediatric Mortality Risk Score: Development and Performance Evaluation Using Data From a Large North American Registry", journal="JMIR Med Inform", year="2021", month="Aug", day="31", volume="9", number="8", pages="e24079", keywords="artificial intelligence", keywords="risk assessment", keywords="decision support techniques", keywords="intensive care unit", keywords="pediatric", keywords="decision making", keywords="computer-assisted", abstract="Background: In the pediatric intensive care unit (PICU), quantifying illness severity can be guided by risk models to enable timely identification and appropriate intervention. Logistic regression models, including the pediatric index of mortality 2 (PIM-2) and pediatric risk of mortality III (PRISM-III), produce a mortality risk score using data that are routinely available at PICU admission. Artificial neural networks (ANNs) outperform regression models in some medical fields. Objective: In light of this potential, we aim to examine ANN performance, compared to that of logistic regression, for mortality risk estimation in the PICU. Methods: The analyzed data set included patients from North American PICUs whose discharge diagnostic codes indicated evidence of infection and included the data used for the PIM-2 and PRISM-III calculations and their corresponding scores. We stratified the data set into training and test sets, with approximately equal mortality rates, in an effort to replicate real-world data. Data preprocessing included imputing missing data through simple substitution and normalizing data into binary variables using PRISM-III thresholds. A 2-layer ANN model was built to predict pediatric mortality, along with a simple logistic regression model for comparison. Both models used the same features required by PIM-2 and PRISM-III. Alternative ANN models using single-layer or unnormalized data were also evaluated. Model performance was compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC) and their empirical 95\% CIs. Results: Data from 102,945 patients (including 4068 deaths) were included in the analysis. The highest performing ANN (AUROC 0.871, 95\% CI 0.862-0.880; AUPRC 0.372, 95\% CI 0.345-0.396) that used normalized data performed better than PIM-2 (AUROC 0.805, 95\% CI 0.801-0.816; AUPRC 0.234, 95\% CI 0.213-0.255) and PRISM-III (AUROC 0.844, 95\% CI 0.841-0.855; AUPRC 0.348, 95\% CI 0.322-0.367). The performance of this ANN was also significantly better than that of the logistic regression model (AUROC 0.862, 95\% CI 0.852-0.872; AUPRC 0.329, 95\% CI 0.304-0.351). The performance of the ANN that used unnormalized data (AUROC 0.865, 95\% CI 0.856-0.874) was slightly inferior to our highest performing ANN; the single-layer ANN architecture performed poorly and was not investigated further. Conclusions: A simple ANN model performed slightly better than the benchmark PIM-2 and PRISM-III scores and a traditional logistic regression model trained on the same data set. The small performance gains achieved by this two-layer ANN model may not offer clinically significant improvement; however, further research with other or more sophisticated model designs and better imputation of missing data may be warranted. ", doi="10.2196/24079", url="https://medinform.jmir.org/2021/8/e24079", url="http://www.ncbi.nlm.nih.gov/pubmed/34463636" } @Article{info:doi/10.2196/23230, author="Chen, Pei-Fu and Wang, Ssu-Ming and Liao, Wei-Chih and Kuo, Lu-Cheng and Chen, Kuan-Chih and Lin, Yu-Cheng and Yang, Chi-Yu and Chiu, Chi-Hao and Chang, Shu-Chih and Lai, Feipei", title="Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning", journal="JMIR Med Inform", year="2021", month="Aug", day="31", volume="9", number="8", pages="e23230", keywords="natural language processing", keywords="deep learning", keywords="International Classification of Diseases", keywords="Recurrent Neural Network", keywords="text classification", abstract="Background: The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning-- and natural language processing--related approaches have been studied to assist disease coders. Objective: This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods: We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results: In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions: The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders. ", doi="10.2196/23230", url="https://medinform.jmir.org/2021/8/e23230", url="http://www.ncbi.nlm.nih.gov/pubmed/34463639" } @Article{info:doi/10.2196/29807, author="Lee, Eunsaem and Jung, Young Se and Hwang, Ju Hyung and Jung, Jaewoo", title="Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation", journal="JMIR Med Inform", year="2021", month="Aug", day="30", volume="9", number="8", pages="e29807", keywords="prediction", keywords="model", keywords="claim data", keywords="cancer", keywords="machine learning", keywords="development", keywords="cohort", keywords="validation", keywords="database", keywords="algorithm", abstract="Background: Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. Objective: We aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. Methods: As source data, we used the Korean National Insurance System Database. Every Korean in ?40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning--based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. Results: The one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. Conclusions: Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments. ", doi="10.2196/29807", url="https://medinform.jmir.org/2021/8/e29807", url="http://www.ncbi.nlm.nih.gov/pubmed/34459743" } @Article{info:doi/10.2196/27247, author="Shaballout, Nour and Aloumar, Anas and Manuel, Jorge and May, Marcus and Beissner, Florian", title="Lateralization and Bodily Patterns of Segmental Signs and Spontaneous Pain in Acute Visceral Disease: Observational Study", journal="J Med Internet Res", year="2021", month="Aug", day="27", volume="23", number="8", pages="e27247", keywords="digital pain drawings", keywords="visceral referred pain", keywords="referred pain", keywords="head zones", keywords="mydriasis", keywords="chest pain", keywords="clinical examination", keywords="differential diagnosis", keywords="digital health", keywords="digital drawings", keywords="pain", keywords="health technology", keywords="image analysis", abstract="Background: The differential diagnosis of acute visceral diseases is a challenging clinical problem. Older literature suggests that patients with acute visceral problems show segmental signs such as hyperalgesia, skin resistance, or muscular defense as manifestations of referred visceral pain in somatic or visceral tissues with overlapping segmental innervation. According to these sources, the lateralization and segmental distribution of such signs may be used for differential diagnosis. Segmental signs and symptoms may be accompanied by spontaneous (visceral) pain, which, however, shows a nonsegmental distribution. Objective: This study aimed to investigate the lateralization (ie, localization on one side of the body, in preference to the other) and segmental distribution (ie, surface ratio of the affected segments) of spontaneous pain and (referred) segmental signs in acute visceral diseases using digital pain drawing technology. Methods: We recruited 208 emergency room patients that were presenting for acute medical problems considered by triage as related to internal organ disease. All patients underwent a structured 10-minute bodily examination to test for various segmental signs and spontaneous visceral pain. They were further asked their segmental symptoms such as nausea, meteorism, and urinary retention. We collected spontaneous pain and segmental signs as digital drawings and segmental symptoms as binary values on a tablet PC. After the final diagnosis, patients were divided into groups according to the organ affected. Using statistical image analysis, we calculated mean distributions of pain and segmental signs for the heart, lungs, stomach, liver/gallbladder, and kidneys/ureters, analyzing the segmental distribution of these signs and the lateralization. Results: Of the 208 recruited patients, 110 (52.9\%) were later diagnosed with a single-organ problem. These recruited patients had a mean age of 57.3 (SD 17.2) years, and 40.9\% (85/208) were female. Of these 110 patients, 85 (77.3\%) reported spontaneous visceral pain. Of the 110, 81 (73.6\%) had at least 1 segmental sign, and the most frequent signs were hyperalgesia (46/81, 57\%), and muscle resistance (39/81, 48\%). While pain was distributed along the body midline, segmental signs for the heart, stomach, and liver/gallbladder appeared mostly ipsilateral to the affected organ. An unexpectedly high number of patients (37/110, 33.6\%) further showed ipsilateral mydriasis. Conclusions: This study underlines the usefulness of including digitally recorded segmental signs in bodily examinations of patients with acute medical problems. ", doi="10.2196/27247", url="https://www.jmir.org/2021/8/e27247", url="http://www.ncbi.nlm.nih.gov/pubmed/34448718" } @Article{info:doi/10.2196/27235, author="Chang, Panchun and Dang, Jun and Dai, Jianrong and Sun, Wenzheng", title="Real-Time Respiratory Tumor Motion Prediction Based on a Temporal Convolutional Neural Network: Prediction Model Development Study", journal="J Med Internet Res", year="2021", month="Aug", day="27", volume="23", number="8", pages="e27235", keywords="radiation therapy", keywords="temporal convolutional neural network", keywords="respiratory signal prediction", keywords="neural network", keywords="deep learning model", keywords="dynamic tracking", abstract="Background: The dynamic tracking of tumors with radiation beams in radiation therapy requires the prediction of real-time target locations prior to beam delivery, as treatment involving radiation beams and gating tracking results in time latency. Objective: In this study, a deep learning model that was based on a temporal convolutional neural network was developed to predict internal target locations by using multiple external markers. Methods: Respiratory signals from 69 treatment fractions of 21 patients with cancer who were treated with the CyberKnife Synchrony device (Accuray Incorporated) were used to train and test the model. The reported model's performance was evaluated by comparing the model to a long short-term memory model in terms of the root mean square errors (RMSEs) of real and predicted respiratory signals. The effect of the number of external markers was also investigated. Results: The average RMSEs of predicted (ahead time=400 ms) respiratory motion in the superior-inferior, anterior-posterior, and left-right directions and in 3D space were 0.49 mm, 0.28 mm, 0.25 mm, and 0.67 mm, respectively. Conclusions: The experiment results demonstrated that the temporal convolutional neural network--based respiratory prediction model could predict respiratory signals with submillimeter accuracy. ", doi="10.2196/27235", url="https://www.jmir.org/2021/8/e27235", url="http://www.ncbi.nlm.nih.gov/pubmed/34236336" } @Article{info:doi/10.2196/26843, author="Naqvi, Ali Syed Asil and Tennankore, Karthik and Vinson, Amanda and Roy, C. Patrice and Abidi, Raza Syed Sibte", title="Predicting Kidney Graft Survival Using Machine Learning Methods: Prediction Model Development and Feature Significance Analysis Study", journal="J Med Internet Res", year="2021", month="Aug", day="27", volume="23", number="8", pages="e26843", keywords="kidney transplantation", keywords="machine learning", keywords="predictive modeling", keywords="survival prediction", keywords="dimensionality reduction", keywords="feature sensitivity analysis", abstract="Background: Kidney transplantation is the optimal treatment for patients with end-stage renal disease. Short- and long-term kidney graft survival is influenced by a number of donor and recipient factors. Predicting the success of kidney transplantation is important for optimizing kidney allocation. Objective: The aim of this study was to predict the risk of kidney graft failure across three temporal cohorts (within 1 year, within 5 years, and after 5 years following a transplant) based on donor and recipient characteristics. We analyzed a large data set comprising over 50,000 kidney transplants covering an approximate 20-year period. Methods: We applied machine learning--based classification algorithms to develop prediction models for the risk of graft failure for three different temporal cohorts. Deep learning--based autoencoders were applied for data dimensionality reduction, which improved the prediction performance. The influence of features on graft survival for each cohort was studied by investigating a new nonoverlapping patient stratification approach. Results: Our models predicted graft survival with area under the curve scores of 82\% within 1 year, 69\% within 5 years, and 81\% within 17 years. The feature importance analysis elucidated the varying influence of clinical features on graft survival across the three different temporal cohorts. Conclusions: In this study, we applied machine learning to develop risk prediction models for graft failure that demonstrated a high level of prediction performance. Acknowledging that these models performed better than those reported in the literature for existing risk prediction tools, future studies will focus on how best to incorporate these prediction models into clinical care algorithms to optimize the long-term health of kidney recipients. ", doi="10.2196/26843", url="https://www.jmir.org/2021/8/e26843", url="http://www.ncbi.nlm.nih.gov/pubmed/34448704" } @Article{info:doi/10.2196/27709, author="Yu, Jessica and Chiu, Carter and Wang, Yajuan and Dzubur, Eldin and Lu, Wei and Hoffman, Julia", title="A Machine Learning Approach to Passively Informed Prediction of Mental Health Risk in People with Diabetes: Retrospective Case-Control Analysis", journal="J Med Internet Res", year="2021", month="Aug", day="27", volume="23", number="8", pages="e27709", keywords="diabetes mellitus", keywords="mental health", keywords="risk detection", keywords="passive sensing", keywords="ecological momentary assessment", keywords="machine learning", abstract="Background: Proactive detection of mental health needs among people with diabetes mellitus could facilitate early intervention, improve overall health and quality of life, and reduce individual and societal health and economic burdens. Passive sensing and ecological momentary assessment are relatively newer methods that may be leveraged for such proactive detection. Objective: The primary aim of this study was to conceptualize, develop, and evaluate a novel machine learning approach for predicting mental health risk in people with diabetes mellitus. Methods: A retrospective study was designed to develop and evaluate a machine learning model, utilizing data collected from 142,432 individuals with diabetes enrolled in the Livongo for Diabetes program. First, participants' mental health statuses were verified using prescription and medical and pharmacy claims data. Next, four categories of passive sensing signals were extracted from the participants' behavior in the program, including demographics and glucometer, coaching, and event data. Data sets were then assembled to create participant-period instances, and descriptive analyses were conducted to understand the correlation between mental health status and passive sensing signals. Passive sensing signals were then entered into the model to train and test its performance. The model was evaluated based on seven measures: sensitivity, specificity, precision, area under the curve, F1 score, accuracy, and confusion matrix. SHapley Additive exPlanations (SHAP) values were computed to determine the importance of individual signals. Results: In the training (and validation) and three subsequent test sets, the model achieved a confidence score greater than 0.5 for sensitivity, specificity, area under the curve, and accuracy. Signals identified as important by SHAP values included demographics such as race and gender, participant's emotional state during blood glucose checks, time of day of blood glucose checks, blood glucose values, and interaction with the Livongo mobile app and web platform. Conclusions: Results of this study demonstrate the utility of a passively informed mental health risk algorithm and invite further exploration to identify additional signals and determine when and where such algorithms should be deployed. ", doi="10.2196/27709", url="https://www.jmir.org/2021/8/e27709", url="http://www.ncbi.nlm.nih.gov/pubmed/34448707" } @Article{info:doi/10.2196/16293, author="Gonzales, Aldren and Smith, R. Scott and Dullabh, Prashila and Hovey, Lauren and Heaney-Huls, Krysta and Robichaud, Meagan and Boodoo, Roger", title="Potential Uses of Blockchain Technology for Outcomes Research on Opioids", journal="JMIR Med Inform", year="2021", month="Aug", day="27", volume="9", number="8", pages="e16293", keywords="blockchain", keywords="distributed ledger", keywords="opioid crisis", keywords="outcomes research", keywords="patient-centered outcomes research", keywords="mobile phone", doi="10.2196/16293", url="https://medinform.jmir.org/2021/8/e16293", url="http://www.ncbi.nlm.nih.gov/pubmed/34448721" } @Article{info:doi/10.2196/27669, author="Teni, Sebsibe Fitsum and Rolfson, Ola and Devlin, Nancy and Parkin, David and Naucl{\'e}r, Emma and Burstr{\"o}m, Kristina and ", title="Variations in Patients' Overall Assessment of Their Health Across and Within Disease Groups Using the EQ-5D Questionnaire: Protocol for a Longitudinal Study in the Swedish National Quality Registers", journal="JMIR Res Protoc", year="2021", month="Aug", day="27", volume="10", number="8", pages="e27669", keywords="EQ-5D", keywords="EQ VAS", keywords="experience-based values", keywords="health-related quality of life (HRQoL)", keywords="hypothetical values", keywords="patient values", keywords="Swedish National Quality Registers", keywords="health state valuation", abstract="Background: EQ-5D is one of the most commonly used questionnaires to measure health-related quality of life. It is included in many of the Swedish National Quality Registers (NQRs). EQ-5D health states are usually summarized using ``values'' obtained from members of the general public, a majority of whom are healthy. However, an alternative, which remains to be studied in detail, is the potential to use patients' self-reported overall health on the visual analog scale (VAS) as a means of capturing experience-based perspective. Objective: The aim of this study is to assess EQ VAS as a valuation method with an experience-based perspective through comparison of its performance across and within patient groups, and with that of the general population in Sweden. Methods: Data on nearly 700,000 patients from 12 NQRs covering a variety of diseases/conditions and nearly 50,000 individuals from the general population will be analyzed. The EQ-5D-3L data from the 12 registers and EQ-5D-5L data from 2 registers will be used in the analyses. Longitudinal studies of patient-reported outcomes among different patient groups will be conducted in the period from baseline to 1-year follow-up. Descriptive statistics and analyses comparing EQ-5D dimensions and observed self-assessed EQ VAS values across and within patient groups will be performed. Comparisons of the change in health state and observed EQ VAS values at 1-year follow-up will also be undertaken. Regression models will be used to assess whether EQ-5D dimensions predict observed EQ VAS values to investigate patient value sets in each patient group. These will be compared across the patient groups and with the existing Swedish experience-based VAS and time trade-off value sets obtained from the general population. Results: Data retrieval started in May 2019 and data of patients in the 12 NQRs and from the survey conducted among the general population have been retrieved. Data analysis is ongoing on the retrieved data. Conclusions: This research project will provide information on the differences across and within patient groups in terms of self-reported health status through EQ VAS and comparison with the general population. The findings of the study will contribute to the literature by exploring the potential of self-assessed EQ VAS values to develop value sets using an experience-based perspective. Trial Registration: ClinicalTrials.gov NCT04359628; https://clinicaltrials.gov/ct2/show/NCT04359628. International Registered Report Identifier (IRRID): DERR1-10.2196/27669 ", doi="10.2196/27669", url="https://www.researchprotocols.org/2021/8/e27669", url="http://www.ncbi.nlm.nih.gov/pubmed/34448726" } @Article{info:doi/10.2196/24890, author="Lyles, Rees Courtney and Adler-Milstein, Julia and Thao, Crishyashi and Lisker, Sarah and Nouri, Sarah and Sarkar, Urmimala", title="Alignment of Key Stakeholders' Priorities for Patient-Facing Tools in Digital Health: Mixed Methods Study", journal="J Med Internet Res", year="2021", month="Aug", day="26", volume="23", number="8", pages="e24890", keywords="medical informatics", keywords="medical informatics apps", keywords="information technology", keywords="implementation science", keywords="mixed methods", abstract="Background: There is widespread agreement on the promise of patient-facing digital health tools to transform health care. Yet, few tools are in widespread use or have documented clinical effectiveness. Objective: The aim of this study was to gain insight into the gap between the potential of patient-facing digital health tools and real-world uptake. Methods: We interviewed and surveyed experts (in total, n=24) across key digital health stakeholder groups---venture capitalists, digital health companies, payers, and health care system providers or leaders---guided by the Consolidated Framework for Implementation Research. Results: Our findings revealed that external policy, regulatory demands, internal organizational workflow, and integration needs often take priority over patient needs and patient preferences for digital health tools, which lowers patient acceptance rates. We discovered alignment, across all 4 stakeholder groups, in the desire to engage both patients and frontline health care providers in broader dissemination and evaluation of digital health tools. However, major areas of misalignment between stakeholder groups have stymied the progress of digital health tool uptake---venture capitalists and companies focused on external policy and regulatory demands, while payers and providers focused on internal organizational workflow and integration needs. Conclusions: Misalignment of the priorities of digital health companies and their funders with those of providers and payers requires direct attention to improve uptake of patient-facing digital health tools and platforms. ", doi="10.2196/24890", url="https://www.jmir.org/2021/8/e24890", url="http://www.ncbi.nlm.nih.gov/pubmed/34435966" } @Article{info:doi/10.2196/27571, author="Vassolo, Santiago Roberto and Mac Cawley, Francisco Alejandro and Tortorella, Luz Guilherme and Fogliatto, Sanson Flavio and Tlapa, Diego and Narayanamurthy, Gopalakrishnan", title="Hospital Investment Decisions in Healthcare 4.0 Technologies: Scoping Review and Framework for Exploring Challenges, Trends, and Research Directions", journal="J Med Internet Res", year="2021", month="Aug", day="26", volume="23", number="8", pages="e27571", keywords="healthcare 4.0", keywords="scoping review", keywords="investments", keywords="real options", keywords="health technology assessment", keywords="technological bundles", keywords="decision-makers", keywords="hospital", keywords="public health", keywords="technology", keywords="health technology", keywords="smart technology", keywords="hospital management", keywords="health care investment", keywords="decision making", keywords="new technologies", abstract="Background: Alternative approaches to analyzing and evaluating health care investments in state-of-the-art technologies are being increasingly discussed in the literature, especially with the advent of Healthcare 4.0 (H4.0) technologies or eHealth. Such investments generally involve computer hardware and software that deal with the storage, retrieval, sharing, and use of health care information, data, and knowledge for communication and decision-making. Besides, the use of these technologies significantly increases when addressed in bundles. However, a structured and holistic approach to analyzing investments in H4.0 technologies is not available in the literature. Objective: This study aims to analyze previous research related to the evaluation of H4.0 technologies in hospitals and characterize the most common investment approaches used. We propose a framework that organizes the research associated with hospitals' H4.0 technology investment decisions and suggest five main research directions on the topic. Methods: To achieve our goal, we followed the standard procedure for scoping reviews. We performed a search in the Crossref, PubMed, Scopus, and Web of Science databases with the keywords investment, health, industry 4.0, investment, health technology assessment, healthcare 4.0, and smart in the title, abstract, and keywords of research papers. We retrieved 5701 publications from all the databases. After removing papers published before 2011 as well as duplicates and performing further screening, we were left with 244 articles, from which 33 were selected after in-depth analysis to compose the final publication portfolio. Results: Our findings show the multidisciplinary nature of the research related to evaluating hospital investments in H4.0 technologies. We found that the most common investment approaches focused on cost analysis, single technology, and single decision-maker involvement, which dominate bundle analysis, H4.0 technology value considerations, and multiple decision-maker involvement. Conclusions: Some of our findings were unexpected, given the interrelated nature of H4.0 technologies and their multidimensional impact. Owing to the absence of a more holistic approach to H4.0 technology investment decisions, we identified five promising research directions for the topic: development of economic valuation methodologies tailored for H4.0 technologies; accounting for technology interrelations in the form of bundles; accounting for uncertainties in the process of evaluating such technologies; integration of administrative, medical, and patient perspectives into the evaluation process; and balancing and handling complexity in the decision-making process. ", doi="10.2196/27571", url="https://www.jmir.org/2021/8/e27571", url="http://www.ncbi.nlm.nih.gov/pubmed/34435967" } @Article{info:doi/10.2196/25090, author="Wang, Xueping and Zhong, Jie and Lei, Ting and Chen, Deng and Wang, Haijiao and Zhu, Lina and Chu, Shanshan and Liu, Ling", title="An Artificial Neural Network Prediction Model for Posttraumatic Epilepsy: Retrospective Cohort Study", journal="J Med Internet Res", year="2021", month="Aug", day="19", volume="23", number="8", pages="e25090", keywords="artificial neural network", keywords="posttraumatic epilepsy", keywords="traumatic brain injury", abstract="Background: Posttraumatic epilepsy (PTE) is a common sequela after traumatic brain injury (TBI), and identifying high-risk patients with PTE is necessary for their better treatment. Although artificial neural network (ANN) prediction models have been reported and are superior to traditional models, the ANN prediction model for PTE is lacking. Objective: We aim to train and validate an ANN model to anticipate the risks of PTE. Methods: The training cohort was TBI patients registered at West China Hospital. We used a 5-fold cross-validation approach to train and test the ANN model to avoid overfitting; 21 independent variables were used as input neurons in the ANN models, using a back-propagation algorithm to minimize the loss function. Finally, we obtained sensitivity, specificity, and accuracy of each ANN model from the 5 rounds of cross-validation and compared the accuracy with a nomogram prediction model built in our previous work based on the same population. In addition, we evaluated the performance of the model using patients registered at Chengdu Shang Jin Nan Fu Hospital (testing cohort 1) and Sichuan Provincial People's Hospital (testing cohort 2) between January 1, 2013, and March 1, 2015. Results: For the training cohort, we enrolled 1301 TBI patients from January 1, 2011, to December 31, 2017. The prevalence of PTE was 12.8\% (166/1301, 95\% CI 10.9\%-14.6\%). Of the TBI patients registered in testing cohort 1, PTE prevalence was 10.5\% (44/421, 95\% CI 7.5\%-13.4\%). Of the TBI patients registered in testing cohort 2, PTE prevalence was 6.1\% (25/413, 95\% CI 3.7\%-8.4\%). The results of the ANN model show that, the area under the receiver operating characteristic curve in the training cohort was 0.907 (95\% CI 0.889-0.924), testing cohort 1 was 0.867 (95\% CI 0.842-0.893), and testing cohort 2 was 0.859 (95\% CI 0.826-0.890). Second, the average accuracy of the training cohort was 0.557 (95\% CI 0.510-0.620), with 0.470 (95\% CI 0.414-0.526) in testing cohort 1 and 0.344 (95\% CI 0.287-0.401) in testing cohort 2. In addition, sensitivity, specificity, positive predictive values and negative predictors in the training cohort (testing cohort 1 and testing cohort 2) were 0.80 (0.83 and 0.80), 0.86 (0.80 and 0.84), 91\% (85\% and 78\%), and 86\% (80\% and 83\%), respectively. When calibrating this ANN model, Brier scored 0.121 in testing cohort 1 and 0.127 in testing cohort 2. Compared with the nomogram model, the ANN prediction model had a higher accuracy (P=.01). Conclusions: This study shows that the ANN model can predict the risk of PTE and is superior to the risk estimated based on traditional statistical methods. However, the calibration of the model is a bit poor, and we need to calibrate it on a large sample size set and further improve the model. ", doi="10.2196/25090", url="https://www.jmir.org/2021/8/e25090", url="http://www.ncbi.nlm.nih.gov/pubmed/34420931" } @Article{info:doi/10.2196/25612, author="Cao, Yang and N{\"a}slund, Ingmar and N{\"a}slund, Erik and Ottosson, Johan and Montgomery, Scott and Stenberg, Erik", title="Using a Convolutional Neural Network to Predict Remission of Diabetes After Gastric Bypass Surgery: Machine Learning Study From the Scandinavian Obesity Surgery Register", journal="JMIR Med Inform", year="2021", month="Aug", day="19", volume="9", number="8", pages="e25612", keywords="forecasting", keywords="clinical decision rules", keywords="remission induction", keywords="type 2 diabetes mellitus", keywords="gastric bypass", keywords="morbid obesity", abstract="Background: Prediction of diabetes remission is an important topic in the evaluation of patients with type 2 diabetes (T2D) before bariatric surgery. Several high-quality predictive indices are available, but artificial intelligence algorithms offer the potential for higher predictive capability. Objective: This study aimed to construct and validate an artificial intelligence prediction model for diabetes remission after Roux-en-Y gastric bypass surgery. Methods: Patients who underwent surgery from 2007 to 2017 were included in the study, with collection of individual data from the Scandinavian Obesity Surgery Registry (SOReg), the Swedish National Patients Register, the Swedish Prescribed Drugs Register, and Statistics Sweden. A 7-layer convolution neural network (CNN) model was developed using 80\% (6446/8057) of patients randomly selected from SOReg and 20\% (1611/8057) of patients for external testing. The predictive capability of the CNN model and currently used scores (DiaRem, Ad-DiaRem, DiaBetter, and individualized metabolic surgery) were compared. Results: In total, 8057 patients with T2D were included in the study. At 2 years after surgery, 77.09\% achieved pharmacological remission (n=6211), while 63.07\% (4004/6348) achieved complete remission. The CNN model showed high accuracy for cessation of antidiabetic drugs and complete remission of T2D after gastric bypass surgery. The area under the receiver operating characteristic curve (AUC) for the CNN model for pharmacological remission was 0.85 (95\% CI 0.83-0.86) during validation and 0.83 for the final test, which was 9\%-12\% better than the traditional predictive indices. The AUC for complete remission was 0.83 (95\% CI 0.81-0.85) during validation and 0.82 for the final test, which was 9\%-11\% better than the traditional predictive indices. Conclusions: The CNN method had better predictive capability compared to traditional indices for diabetes remission. However, further validation is needed in other countries to evaluate its external generalizability. ", doi="10.2196/25612", url="https://medinform.jmir.org/2021/8/e25612", url="http://www.ncbi.nlm.nih.gov/pubmed/34420921" } @Article{info:doi/10.2196/26398, author="Wu, Cheng-Tse and Chu, Ta-Wei and Jang, Roger Jyh-Shing", title="Current-Visit and Next-Visit Prediction for Fatty Liver Disease With a Large-Scale Dataset: Model Development and Performance Comparison", journal="JMIR Med Inform", year="2021", month="Aug", day="12", volume="9", number="8", pages="e26398", keywords="machine learning", keywords="sequence forward selection", keywords="one-pass ranking", keywords="fatty liver diseases", keywords="alcohol fatty liver disease", keywords="nonalcoholic fatty liver disease", keywords="long short-term memory", keywords="current-visit prediction", keywords="next-visit prediction", abstract="Background: Fatty liver disease (FLD) arises from the accumulation of fat in the liver and may cause liver inflammation, which, if not well controlled, may develop into liver fibrosis, cirrhosis, or even hepatocellular carcinoma. Objective: We describe the construction of machine-learning models for current-visit prediction (CVP), which can help physicians obtain more information for accurate diagnosis, and next-visit prediction (NVP), which can help physicians provide potential high-risk patients with advice to effectively prevent FLD. Methods: The large-scale and high-dimensional dataset used in this study comes from Taipei MJ Health Research Foundation in Taiwan. We used one-pass ranking and sequential forward selection (SFS) for feature selection in FLD prediction. For CVP, we explored multiple models, including k-nearest-neighbor classifier (KNNC), Adaboost, support vector machine (SVM), logistic regression (LR), random forest (RF), Gaussian na{\"i}ve Bayes (GNB), decision trees C4.5 (C4.5), and classification and regression trees (CART). For NVP, we used long short-term memory (LSTM) and several of its variants as sequence classifiers that use various input sets for prediction. Model performance was evaluated based on two criteria: the accuracy of the test set and the intersection over union/coverage between the features selected by one-pass ranking/SFS and by domain experts. The accuracy, precision, recall, F-measure, and area under the receiver operating characteristic curve were calculated for both CVP and NVP for males and females, respectively. Results: After data cleaning, the dataset included 34,856 and 31,394 unique visits respectively for males and females for the period 2009-2016. The test accuracy of CVP using KNNC, Adaboost, SVM, LR, RF, GNB, C4.5, and CART was respectively 84.28\%, 83.84\%, 82.22\%, 82.21\%, 76.03\%, 75.78\%, and 75.53\%. The test accuracy of NVP using LSTM, bidirectional LSTM (biLSTM), Stack-LSTM, Stack-biLSTM, and Attention-LSTM was respectively 76.54\%, 76.66\%, 77.23\%, 76.84\%, and 77.31\% for fixed-interval features, and was 79.29\%, 79.12\%, 79.32\%, 79.29\%, and 78.36\%, respectively, for variable-interval features. Conclusions: This study explored a large-scale FLD dataset with high dimensionality. We developed FLD prediction models for CVP and NVP. We also implemented efficient feature selection schemes for current- and next-visit prediction to compare the automatically selected features with expert-selected features. In particular, NVP emerged as more valuable from the viewpoint of preventive medicine. For NVP, we propose use of feature set 2 (with variable intervals), which is more compact and flexible. We have also tested several variants of LSTM in combination with two feature sets to identify the best match for male and female FLD prediction. More specifically, the best model for males was Stack-LSTM using feature set 2 (with 79.32\% accuracy), whereas the best model for females was LSTM using feature set 1 (with 81.90\% accuracy). ", doi="10.2196/26398", url="https://medinform.jmir.org/2021/8/e26398", url="http://www.ncbi.nlm.nih.gov/pubmed/34387552" } @Article{info:doi/10.2196/28287, author="Zhang, Xiaoyi and Luo, Gang", title="Ranking Rule-Based Automatic Explanations for Machine Learning Predictions on Asthma Hospital Encounters in Patients With Asthma: Retrospective Cohort Study", journal="JMIR Med Inform", year="2021", month="Aug", day="11", volume="9", number="8", pages="e28287", keywords="asthma", keywords="clinical decision support", keywords="machine learning", keywords="patient care management", keywords="forecasting", abstract="Background: Asthma hospital encounters impose a heavy burden on the health care system. To improve preventive care and outcomes for patients with asthma, we recently developed a black-box machine learning model to predict whether a patient with asthma will have one or more asthma hospital encounters in the succeeding 12 months. Our model is more accurate than previous models. However, black-box machine learning models do not explain their predictions, which forms a barrier to widespread clinical adoption. To solve this issue, we previously developed a method to automatically provide rule-based explanations for the model's predictions and to suggest tailored interventions without sacrificing model performance. For an average patient correctly predicted by our model to have future asthma hospital encounters, our explanation method generated over 5000 rule-based explanations, if any. However, the user of the automated explanation function, often a busy clinician, will want to quickly obtain the most useful information for a patient by viewing only the top few explanations. Therefore, a methodology is required to appropriately rank the explanations generated for a patient. However, this is currently an open problem. Objective: The aim of this study is to develop a method to appropriately rank the rule-based explanations that our automated explanation method generates for a patient. Methods: We developed a ranking method that struck a balance among multiple factors. Through a secondary analysis of 82,888 data instances of adults with asthma from the University of Washington Medicine between 2011 and 2018, we demonstrated our ranking method on the test case of predicting asthma hospital encounters in patients with asthma. Results: For each patient predicted to have asthma hospital encounters in the succeeding 12 months, the top few explanations returned by our ranking method typically have high quality and low redundancy. Many top-ranked explanations provide useful insights on the various aspects of the patient's situation, which cannot be easily obtained by viewing the patient's data in the current electronic health record system. Conclusions: The explanation ranking module is an essential component of the automated explanation function, and it addresses the interpretability issue that deters the widespread adoption of machine learning predictive models in clinical practice. In the next few years, we plan to test our explanation ranking method on predictive modeling problems addressing other diseases as well as on data from other health care systems. International Registered Report Identifier (IRRID): RR2-10.2196/5039 ", doi="10.2196/28287", url="https://medinform.jmir.org/2021/8/e28287", url="http://www.ncbi.nlm.nih.gov/pubmed/34383673" } @Article{info:doi/10.2196/23508, author="Hur, Sujeong and Min, Young Ji and Yoo, Junsang and Kim, Kyunga and Chung, Ryang Chi and Dykes, C. Patricia and Cha, Chul Won", title="Development and Validation of Unplanned Extubation Prediction Models Using Intensive Care Unit Data: Retrospective, Comparative, Machine Learning Study", journal="J Med Internet Res", year="2021", month="Aug", day="11", volume="23", number="8", pages="e23508", keywords="intensive care unit", keywords="machine learning", keywords="mechanical ventilator", keywords="patient safety", keywords="unplanned extubation", abstract="Background: Patient safety in the intensive care unit (ICU) is one of the most critical issues, and unplanned extubation (UE) is considered the most adverse event for patient safety. Prevention and early detection of such an event is an essential but difficult component of quality care. Objective: This study aimed to develop and validate prediction models for UE in ICU patients using machine learning. Methods: This study was conducted in an academic tertiary hospital in Seoul, Republic of Korea. The hospital had approximately 2000 inpatient beds and 120 ICU beds. As of January 2019, the hospital had approximately 9000 outpatients on a daily basis. The number of annual ICU admissions was approximately 10,000. We conducted a retrospective study between January 1, 2010, and December 31, 2018. A total of 6914 extubation cases were included. We developed a UE prediction model using machine learning algorithms, which included random forest (RF), logistic regression (LR), artificial neural network (ANN), and support vector machine (SVM). For evaluating the model's performance, we used the area under the receiver operating characteristic curve (AUROC). The sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were also determined for each model. For performance evaluation, we also used a calibration curve, the Brier score, and the integrated calibration index (ICI) to compare different models. The potential clinical usefulness of the best model at the best threshold was assessed through a net benefit approach using a decision curve. Results: Among the 6914 extubation cases, 248 underwent UE. In the UE group, there were more males than females, higher use of physical restraints, and fewer surgeries. The incidence of UE was higher during the night shift as compared to the planned extubation group. The rate of reintubation within 24 hours and hospital mortality were higher in the UE group. The UE prediction algorithm was developed, and the AUROC for RF was 0.787, for LR was 0.762, for ANN was 0.763, and for SVM was 0.740. Conclusions: We successfully developed and validated machine learning--based prediction models to predict UE in ICU patients using electronic health record data. The best AUROC was 0.787 and the sensitivity was 0.949, which was obtained using the RF algorithm. The RF model was well-calibrated, and the Brier score and ICI were 0.129 and 0.048, respectively. The proposed prediction model uses widely available variables to limit the additional workload on the clinician. Further, this evaluation suggests that the model holds potential for clinical usefulness. ", doi="10.2196/23508", url="https://www.jmir.org/2021/8/e23508", url="http://www.ncbi.nlm.nih.gov/pubmed/34382940" } @Article{info:doi/10.2196/27017, author="Bright, A. Roselie and Rankin, K. Summer and Dowdy, Katherine and Blok, V. Sergey and Bright, J. Susan and Palmer, M. Lee Anne", title="Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method", journal="JMIRx Med", year="2021", month="Aug", day="11", volume="2", number="3", pages="e27017", keywords="epidemiology", keywords="electronic health record", keywords="electronic health care record", keywords="big data", keywords="patient harm", keywords="patient safety", keywords="public health", keywords="product surveillance, postmarketing", keywords="natural language processing", keywords="proof-of-concept study", keywords="critical care", abstract="Background: Big data tools provide opportunities to monitor adverse events (patient harm associated with medical care) (AEs) in the unstructured text of electronic health care records (EHRs). Writers may explicitly state an apparent association between treatment and adverse outcome (``attributed'') or state the simple treatment and outcome without an association (``unattributed''). Many methods for finding AEs in text rely on predefining possible AEs before searching for prespecified words and phrases or manual labeling (standardization) by investigators. We developed a method to identify possible AEs, even if unknown or unattributed, without any prespecifications or standardization of notes. Our method was inspired by word-frequency analysis methods used to uncover the true authorship of disputed works credited to William Shakespeare. We chose two use cases, ``transfusion'' and ``time-based.'' Transfusion was chosen because new transfusion AE types were becoming recognized during the study data period; therefore, we anticipated an opportunity to find unattributed potential AEs (PAEs) in the notes. With the time-based case, we wanted to simulate near real-time surveillance. We chose time periods in the hope of detecting PAEs due to contaminated heparin from mid-2007 to mid-2008 that were announced in early 2008. We hypothesized that the prevalence of contaminated heparin may have been widespread enough to manifest in EHRs through symptoms related to heparin AEs, independent of clinicians' documentation of attributed AEs. Objective: We aimed to develop a new method to identify attributed and unattributed PAEs using the unstructured text of EHRs. Methods: We used EHRs for adult critical care admissions at a major teaching hospital (2001-2012). For each case, we formed a group of interest and a comparison group. We concatenated the text notes for each admission into one document sorted by date, and deleted replicate sentences and lists. We identified statistically significant words in the group of interest versus the comparison group. Documents in the group of interest were filtered to those words, followed by topic modeling on the filtered documents to produce topics. For each topic, the three documents with the maximum topic scores were manually reviewed to identify PAEs. Results: Topics centered around medical conditions that were unique to or more common in the group of interest, including PAEs. In each use case, most PAEs were unattributed in the notes. Among the transfusion PAEs was unattributed evidence of transfusion-associated cardiac overload and transfusion-related acute lung injury. Some of the PAEs from mid-2007 to mid-2008 were increased unattributed events consistent with AEs related to heparin contamination. Conclusions: The Shakespeare method could be a useful supplement to AE reporting and surveillance of structured EHR data. Future improvements should include automation of the manual review process. ", doi="10.2196/27017", url="https://med.jmirx.org/2021/3/e27017", url="http://www.ncbi.nlm.nih.gov/pubmed/37725533" } @Article{info:doi/10.2196/20815, author="Hutchinson, Claire and Brereton, Michelle and Adams, Julie and De La Salle, Barbara and Sims, Jon and Hyde, Keith and Chasty, Richard and Brown, Rachel and Rees-Unwin, Karen and Burthem, John", title="The Use and Effectiveness of an Online Diagnostic Support System for Blood Film Interpretation: Comparative Observational Study", journal="J Med Internet Res", year="2021", month="Aug", day="9", volume="23", number="8", pages="e20815", keywords="blood cell morphology", keywords="decision support", keywords="external quality assessment in hematology", keywords="diagnosis", keywords="digital morphology", keywords="morphology education", abstract="Background: The recognition and interpretation of abnormal blood cell morphology is often the first step in diagnosing underlying serious systemic illness or leukemia. Supporting the staff who interpret blood film morphology is therefore essential for a safe laboratory service. This paper describes an open-access, web-based decision support tool, developed by the authors to support morphological diagnosis, arising from earlier studies identifying mechanisms of error in blood film reporting. The effectiveness of this intervention was assessed using the unique resource offered by the online digital morphology Continuing Professional Development scheme (DM scheme) offered by the UK National External Quality Assessment Service for Haematology, with more than 3000 registered users. This allowed the effectiveness of decision support to be tested within a defined user group, each of whom viewed and interpreted the morphology of identical digital blood films. Objective: The primary objective of the study was to test the effectiveness of the decision support system in supporting users to identify and interpret abnormal morphological features. The secondary objective was to determine the pattern and frequency of use of the system for different case types, and to determine how users perceived the support in terms of their confidence in decision-making. Methods: This was a comparative study of identical blood films evaluated either with or without decision support. Selected earlier cases from the DM scheme were rereleased as new cases but with decision support made available; this allowed a comparison of data sets for identical cases with or without decision support. To address the primary objectives, the study used quantitative evaluation and statistical comparisons of the identification and interpretation of morphological features between the two different case releases. To address the secondary objective, the use of decision support was assessed using web analytical tools, while a questionnaire was used to assess user perceptions of the system. Results: Cases evaluated with the aid of decision support had significantly improved accuracy of identification for relevant morphological features (mean improvement 9.8\%) and the interpretation of those features (mean improvement 11\%). The improvement was particularly significant for cases with higher complexity or for rarer diagnoses. Analysis of website usage demonstrated a high frequency of access for web pages relevant to each case (mean 9298 for each case, range 2661-24,276). Users reported that the decision support website increased their confidence for feature identification (4.8/5) and interpretation (4.3/5), both within the context of training (4.6/5) and also in their wider laboratory practice (4.4/5). Conclusions: The findings of this study demonstrate that directed online decision support for blood morphology evaluation improves accuracy and confidence in the context of educational evaluation of digital films, with effectiveness potentially extending to wider laboratory use. ", doi="10.2196/20815", url="https://www.jmir.org/2021/8/e20815", url="http://www.ncbi.nlm.nih.gov/pubmed/34383663" } @Article{info:doi/10.2196/17971, author="Oxholm, Christina and Christensen, Soendergaard Anne-Marie and Christiansen, Regina and Wiil, Kock Uffe and Nielsen, S{\o}gaard Anette", title="Attitudes of Patients and Health Professionals Regarding Screening Algorithms: Qualitative Study", journal="JMIR Form Res", year="2021", month="Aug", day="9", volume="5", number="8", pages="e17971", keywords="screening", keywords="algorithms", keywords="alcohol", keywords="qualitative study", keywords="attitudes", keywords="opinions", keywords="patients", keywords="health professionals", abstract="Background: As a preamble to an attempt to develop a tool that can aid health professionals at hospitals in identifying whether the patient may have an alcohol abuse problem, this study investigates opinions and attitudes among both health professionals and patients about using patient data from electronic health records (EHRs) in an algorithm screening for alcohol problems. Objective: The aim of this study was to investigate the attitudes and opinions of patients and health professionals at hospitals regarding the use of previously collected data in developing and implementing an algorithmic helping tool in EHR for screening inexpedient alcohol habits; in addition, the study aims to analyze how patients would feel about asking and being asked about alcohol by staff, based on a notification in the EHR from such a tool. Methods: Using semistructured interviews, we interviewed 9 health professionals and 5 patients to explore their opinions and attitudes about an algorithm-based helping tool and about asking and being asked about alcohol usage when being given a reminder from this type of tool. The data were analyzed using an ad hoc method consistent with a close reading and meaning condensing. Results: The health professionals were both positive and negative about a helping tool grounded in algorithms. They were optimistic about the potential of such a tool to save some time by providing a quick overview if it was easy to use but, on the negative side, noted that this type of helping tool might take away the professionals' instinct. The patients were overall positive about the helping tool, stating that they would find this tool beneficial for preventive care. Some of the patients expressed concerns that the information provided by the tool could be misused. Conclusions: When developing and implementing an algorithmic helping tool, the following aspects should be considered: (1) making the helping tool as transparent in its recommendations as possible, avoiding black boxing, and ensuring room for professional discretion in clinical decision making; and (2) including and taking into account the attitudes and opinions of patients and health professionals in the design and development process of such an algorithmic helping tool. ", doi="10.2196/17971", url="https://formative.jmir.org/2021/8/e17971", url="http://www.ncbi.nlm.nih.gov/pubmed/34383666" } @Article{info:doi/10.2196/25046, author="Richardson, Safiya and Dauber-Decker, L. Katherine and McGinn, Thomas and Barnaby, P. Douglas and Cattamanchi, Adithya and Pekmezaris, Renee", title="Barriers to the Use of Clinical Decision Support for the Evaluation of Pulmonary Embolism: Qualitative Interview Study", journal="JMIR Hum Factors", year="2021", month="Aug", day="4", volume="8", number="3", pages="e25046", keywords="medical informatics", keywords="pulmonary embolism", keywords="electronic health records", keywords="quality improvement", keywords="clinical decision support systems", abstract="Background: Clinicians often disregard potentially beneficial clinical decision support (CDS). Objective: In this study, we sought to explore the psychological and behavioral barriers to the use of a CDS tool. Methods: We conducted a qualitative study involving emergency medicine physicians and physician assistants. A semistructured interview guide was created based on the Capability, Opportunity, and Motivation-Behavior model. Interviews focused on the barriers to the use of a CDS tool built based on Wells' criteria for pulmonary embolism to assist clinicians in establishing pretest probability of pulmonary embolism before imaging. Results: Interviews were conducted with 12 clinicians. Six barriers were identified, including (1) Bayesian reasoning, (2) fear of missing a pulmonary embolism, (3) time pressure or cognitive load, (4) gestalt includes Wells' criteria, (5) missed risk factors, and (6) social pressure. Conclusions: Clinicians highlighted several important psychological and behavioral barriers to CDS use. Addressing these barriers will be paramount in developing CDS that can meet its potential to transform clinical care. ", doi="10.2196/25046", url="https://humanfactors.jmir.org/2021/3/e25046", url="http://www.ncbi.nlm.nih.gov/pubmed/34346901" } @Article{info:doi/10.2196/24405, author="Li, Rui and Niu, Yue and Scott, Robbins Sarah and Zhou, Chu and Lan, Lan and Liang, Zhigang and Li, Jia", title="Using Electronic Medical Record Data for Research in a Healthcare Information and Management Systems Society (HIMSS) Analytics Electronic Medical Record Adoption Model (EMRAM) Stage 7 Hospital in Beijing: Cross-sectional Study", journal="JMIR Med Inform", year="2021", month="Aug", day="3", volume="9", number="8", pages="e24405", keywords="electronic medical records", keywords="data utilization", keywords="medical research", keywords="China", abstract="Background: With the proliferation of electronic medical record (EMR) systems, there is an increasing interest in utilizing EMR data for medical research; yet, there is no quantitative research on EMR data utilization for medical research purposes in China. Objective: This study aimed to understand how and to what extent EMR data are utilized for medical research purposes in a Healthcare Information and Management Systems Society (HIMSS) Analytics Electronic Medical Record Adoption Model (EMRAM) Stage 7 hospital in Beijing, China. Obstacles and issues in the utilization of EMR data were also explored to provide a foundation for the improved utilization of such data. Methods: For this descriptive cross-sectional study, cluster sampling from Xuanwu Hospital, one of two Stage 7 hospitals in Beijing, was conducted from 2016 to 2019. The utilization of EMR data was described as the number of requests, the proportion of requesters, and the frequency of requests per capita. Comparisons by year, professional title, and age were conducted by double-sided chi-square tests. Results: From 2016 to 2019, EMR data utilization was poor, as the proportion of requesters was 5.8\% and the frequency was 0.1 times per person per year. The frequency per capita gradually slowed and older senior-level staff more frequently used EMR data compared with younger staff. Conclusions: The value of using EMR data for research purposes is not well studied in China. More research is needed to quantify to what extent EMR data are utilized across all hospitals in Beijing and how these systems can enhance future studies. The results of this study also suggest that young doctors may be less exposed or have less reason to access such research methods. ", doi="10.2196/24405", url="https://medinform.jmir.org/2021/8/e24405", url="http://www.ncbi.nlm.nih.gov/pubmed/34342589" } @Article{info:doi/10.2196/29433, author="Du, Yi and Wang, Hanxue and Cui, Wenjuan and Zhu, Hengshu and Guo, Yunchang and Dharejo, Ali Fayaz and Zhou, Yuanchun", title="Foodborne Disease Risk Prediction Using Multigraph Structural Long Short-term Memory Networks: Algorithm Design and Validation Study", journal="JMIR Med Inform", year="2021", month="Aug", day="2", volume="9", number="8", pages="e29433", keywords="foodborne disease", keywords="risk", keywords="prediction", keywords="spatial--temporal data", abstract="Background: Foodborne disease is a common threat to human health worldwide, leading to millions of deaths every year. Thus, the accurate prediction foodborne disease risk is very urgent and of great importance for public health management. Objective: We aimed to design a spatial--temporal risk prediction model suitable for predicting foodborne disease risks in various regions, to provide guidance for the prevention and control of foodborne diseases. Methods: We designed a novel end-to-end framework to predict foodborne disease risk by using a multigraph structural long short-term memory neural network, which can utilize an encoder--decoder to achieve multistep prediction. In particular, to capture multiple spatial correlations, we divided regions by administrative area and constructed adjacent graphs with metrics that included region proximity, historical data similarity, regional function similarity, and exposure food similarity. We also integrated an attention mechanism in both spatial and temporal dimensions, as well as external factors, to refine prediction accuracy. We validated our model with a long-term real-world foodborne disease data set, comprising data from 2015 to 2019 from multiple provinces in China. Results: Our model can achieve F1 scores of 0.822, 0.679, 0.709, and 0.720 for single-month forecasts for the provinces of Beijing, Zhejiang, Shanxi and Hebei, respectively, and the highest F1 score was 20\% higher than the best results of the other models. The experimental results clearly demonstrated that our approach can outperform other state-of-the-art models, with a margin. Conclusions: The spatial--temporal risk prediction model can take into account the spatial--temporal characteristics of foodborne disease data and accurately determine future disease spatial--temporal risks, thereby providing support for the prevention and risk assessment of foodborne disease. ", doi="10.2196/29433", url="https://medinform.jmir.org/2021/8/e29433", url="http://www.ncbi.nlm.nih.gov/pubmed/34338648" } @Article{info:doi/10.2196/28266, author="Kummer, Benjamin and Shakir, Lubaina and Kwon, Rachel and Habboushe, Joseph and Jett{\'e}, Nathalie", title="Usage Patterns of Web-Based Stroke Calculators in Clinical Decision Support: Retrospective Analysis", journal="JMIR Med Inform", year="2021", month="Aug", day="2", volume="9", number="8", pages="e28266", keywords="medical informatics", keywords="clinical informatics", keywords="mhealth", keywords="digital health", keywords="cerebrovascular disease", keywords="medical calculators", keywords="health information", keywords="health information technology", keywords="information technology", keywords="economic health", keywords="clinical health", keywords="electronic health records", abstract="Background: Clinical scores are frequently used in the diagnosis and management of stroke. While medical calculators are increasingly important support tools for clinical decisions, the uptake and use of common medical calculators for stroke remain poorly characterized. Objective: We aimed to describe use patterns in frequently used stroke-related medical calculators for clinical decisions from a web-based support system. Methods: We conducted a retrospective study of calculators from MDCalc, a web-based and mobile app--based medical calculator platform based in the United States. We analyzed metadata tags from MDCalc's calculator use data to identify all calculators related to stroke. Using relative page views as a measure of calculator use, we determined the 5 most frequently used stroke-related calculators between January 2016 and December 2018. For all 5 calculators, we determined cumulative and quarterly use, mode of access (eg, app or web browser), and both US and international distributions of use. We compared cumulative use in the 2016-2018 period with use from January 2011 to December 2015. Results: Over the study period, we identified 454 MDCalc calculators, of which 48 (10.6\%) were related to stroke. Of these, the 5 most frequently used calculators were the CHA2DS2-VASc score for atrial fibrillation stroke risk calculator (5.5\% of total and 32\% of stroke-related page views), the Mean Arterial Pressure calculator (2.4\% of total and 14.0\% of stroke-related page views), the HAS-BLED score for major bleeding risk (1.9\% of total and 11.4\% of stroke-related page views), the National Institutes of Health Stroke Scale (NIHSS) score calculator (1.7\% of total and 10.1\% of stroke-related page views), and the CHADS2 score for atrial fibrillation stroke risk calculator (1.4\% of total and 8.1\% of stroke-related page views). Web browser was the most common mode of access, accounting for 82.7\%-91.2\% of individual stroke calculator page views. Access originated most frequently from the most populated regions within the United States. Internationally, use originated mostly from English-language countries. The NIHSS score calculator demonstrated the greatest increase in page views (238.1\% increase) between the first and last quarters of the study period. Conclusions: The most frequently used stroke calculators were the CHA2DS2-VASc, Mean Arterial Pressure, HAS-BLED, NIHSS, and CHADS2. These were mainly accessed by web browser, from English-speaking countries, and from highly populated areas. Further studies should investigate barriers to stroke calculator adoption and the effect of calculator use on the application of best practices in cerebrovascular disease. ", doi="10.2196/28266", url="https://medinform.jmir.org/2021/8/e28266", url="http://www.ncbi.nlm.nih.gov/pubmed/34338647" } @Article{info:doi/10.2196/26770, author="De, Amrita and Huang, Ming and Feng, Tinghao and Yue, Xiaomeng and Yao, Lixia", title="Analyzing Patient Secure Messages Using a Fast Health Care Interoperability Resources (FIHR)--Based Data Model: Development and Topic Modeling Study", journal="J Med Internet Res", year="2021", month="Jul", day="30", volume="23", number="7", pages="e26770", keywords="patient secure messages", keywords="patient portal", keywords="data model", keywords="FHIR", keywords="annotated corpus", keywords="topic modeling", abstract="Background: Patient portals tethered to electronic health records systems have become attractive web platforms since the enacting of the Medicare Access and Children's Health Insurance Program Reauthorization Act and the introduction of the Meaningful Use program in the United States. Patients can conveniently access their health records and seek consultation from providers through secure web portals. With increasing adoption and patient engagement, the volume of patient secure messages has risen substantially, which opens up new research and development opportunities for patient-centered care. Objective: This study aims to develop a data model for patient secure messages based on the Fast Healthcare Interoperability Resources (FHIR) standard to identify and extract significant information. Methods: We initiated the first draft of the data model by analyzing FHIR and manually reviewing 100 sentences randomly sampled from more than 2 million patient-generated secure messages obtained from the online patient portal at the Mayo Clinic Rochester between February 18, 2010, and December 31, 2017. We then annotated additional sets of 100 randomly selected sentences using the Multi-purpose Annotation Environment tool and updated the data model and annotation guideline iteratively until the interannotator agreement was satisfactory. We then created a larger corpus by annotating 1200 randomly selected sentences and calculated the frequency of the identified medical concepts in these sentences. Finally, we performed topic modeling analysis to learn the hidden topics of patient secure messages related to 3 highly mentioned microconcepts, namely, fatigue, prednisone, and patient visit, and to evaluate the proposed data model independently. Results: The proposed data model has a 3-level hierarchical structure of health system concepts, including 3 macroconcepts, 28 mesoconcepts, and 85 microconcepts. Foundation and base macroconcepts comprise 33.99\% (841/2474), clinical macroconcepts comprise 64.38\% (1593/2474), and financial macroconcepts comprise 1.61\% (40/2474) of the annotated corpus. The top 3 mesoconcepts among the 28 mesoconcepts are condition (505/2474, 20.41\%), medication (424/2474, 17.13\%), and practitioner (243/2474, 9.82\%). Topic modeling identified hidden topics of patient secure messages related to fatigue, prednisone, and patient visit. A total of 89.2\% (107/120) of the top-ranked topic keywords are actually the health concepts of the data model. Conclusions: Our data model and annotated corpus enable us to identify and understand important medical concepts in patient secure messages and prepare us for further natural language processing analysis of such free texts. The data model could be potentially used to automatically identify other types of patient narratives, such as those in various social media and patient forums. In the future, we plan to develop a machine learning and natural language processing solution to enable automatic triaging solutions to reduce the workload of clinicians and perform more granular content analysis to understand patients' needs and improve patient-centered care. ", doi="10.2196/26770", url="https://www.jmir.org/2021/7/e26770", url="http://www.ncbi.nlm.nih.gov/pubmed/34328444" } @Article{info:doi/10.2196/26823, author="Barata, Carolina and Rodrigues, Maria Ana and Canh{\~a}o, Helena and Vinga, Susana and Carvalho, M. Alexandra", title="Predicting Biologic Therapy Outcome of Patients With Spondyloarthritis: Joint Models for Longitudinal and Survival Analysis", journal="JMIR Med Inform", year="2021", month="Jul", day="30", volume="9", number="7", pages="e26823", keywords="data mining", keywords="survival analysis", keywords="joint models", keywords="spondyloarthritis", keywords="drug survival", keywords="rheumatic disease", keywords="electronic medical records", keywords="medical records", abstract="Background: Rheumatic diseases are one of the most common chronic diseases worldwide. Among them, spondyloarthritis (SpA) is a group of highly debilitating diseases, with an early onset age, which significantly impacts patients' quality of life, health care systems, and society in general. Recent treatment options consist of using biologic therapies, and establishing the most beneficial option according to the patients' characteristics is a challenge that needs to be overcome. Meanwhile, the emerging availability of electronic medical records has made necessary the development of methods that can extract insightful information while handling all the challenges of dealing with complex, real-world data. Objective: The aim of this study was to achieve a better understanding of SpA patients' therapy responses and identify the predictors that affect them, thereby enabling the prognosis of therapy success or failure. Methods: A data mining approach based on joint models for the survival analysis of the biologic therapy failure is proposed, which considers the information of both baseline and time-varying variables extracted from the electronic medical records of SpA patients from the database, Reuma.pt. Results: Our results show that being a male, starting biologic therapy at an older age, having a larger time interval between disease start and initiation of the first biologic drug, and being human leukocyte antigen (HLA)--B27 positive are indicators of a good prognosis for the biological drug survival; meanwhile, having disease onset or biologic therapy initiation occur in more recent years, a larger number of education years, and higher values of C-reactive protein or Bath Ankylosing Spondylitis Functional Index (BASFI) at baseline are all predictors of a greater risk of failure of the first biologic therapy. Conclusions: Among this Portuguese subpopulation of SpA patients, those who were male, HLA-B27 positive, and with a later biologic therapy starting date or a larger time interval between disease start and initiation of the first biologic therapy showed longer therapy adherence. Joint models proved to be a valuable tool for the analysis of electronic medical records in the field of rheumatic diseases and may allow for the identification of potential predictors of biologic therapy failure. ", doi="10.2196/26823", url="https://medinform.jmir.org/2021/7/e26823", url="http://www.ncbi.nlm.nih.gov/pubmed/34328435" } @Article{info:doi/10.2196/27992, author="Abdulaal, Ahmed and Patel, Aatish and Al-Hindawi, Ahmed and Charani, Esmita and Alqahtani, A. Saleh and Davies, W. Gary and Mughal, Nabeela and Moore, Prockter Luke Stephen", title="Clinical Utility and Functionality of an Artificial Intelligence--Based App to Predict Mortality in COVID-19: Mixed Methods Analysis", journal="JMIR Form Res", year="2021", month="Jul", day="28", volume="5", number="7", pages="e27992", keywords="app", keywords="artificial intelligence", keywords="coronavirus", keywords="COVID-19", keywords="development", keywords="function", keywords="graphical user interface", keywords="machine learning", keywords="model", keywords="mortality", keywords="neural network", keywords="prediction", keywords="usability", keywords="utility", abstract="Background: The artificial neural network (ANN) is an increasingly important tool in the context of solving complex medical classification problems. However, one of the principal challenges in leveraging artificial intelligence technology in the health care setting has been the relative inability to translate models into clinician workflow. Objective: Here we demonstrate the development of a COVID-19 outcome prediction app that utilizes an ANN and assesses its usability in the clinical setting. Methods: Usability assessment was conducted using the app, followed by a semistructured end-user interview. Usability was specified by effectiveness, efficiency, and satisfaction measures. These data were reported with descriptive statistics. The end-user interview data were analyzed using the thematic framework method, which allowed for the development of themes from the interview narratives. In total, 31 National Health Service physicians at a West London teaching hospital, including foundation physicians, senior house officers, registrars, and consultants, were included in this study. Results: All participants were able to complete the assessment, with a mean time to complete separate patient vignettes of 59.35 (SD 10.35) seconds. The mean system usability scale score was 91.94 (SD 8.54), which corresponds to a qualitative rating of ``excellent.'' The clinicians found the app intuitive and easy to use, with the majority describing its predictions as a useful adjunct to their clinical practice. The main concern was related to the use of the app in isolation rather than in conjunction with other clinical parameters. However, most clinicians speculated that the app could positively reinforce or validate their clinical decision-making. Conclusions: Translating artificial intelligence technologies into the clinical setting remains an important but challenging task. We demonstrate the effectiveness, efficiency, and system usability of a web-based app designed to predict the outcomes of patients with COVID-19 from an ANN. ", doi="10.2196/27992", url="https://formative.jmir.org/2021/7/e27992", url="http://www.ncbi.nlm.nih.gov/pubmed/34115603" } @Article{info:doi/10.2196/22491, author="Chen, Rai-Fu and Cheng, Kuei-Chen and Lin, Yu-Yin and Chang, I-Chiu and Tsai, Cheng-Han", title="Predicting Unscheduled Emergency Department Return Visits Among Older Adults: Population-Based Retrospective Study", journal="JMIR Med Inform", year="2021", month="Jul", day="28", volume="9", number="7", pages="e22491", keywords="classification model", keywords="decision tree", keywords="emergency department", keywords="older adult patients", keywords="unscheduled return visits", abstract="Background: Unscheduled emergency department return visits (EDRVs) are key indicators for monitoring the quality of emergency medical care. A high return rate implies that the medical services provided by the emergency department (ED) failed to achieve the expected results of accurate diagnosis and effective treatment. Older adults are more susceptible to diseases and comorbidities than younger adults, and they exhibit unique and complex clinical characteristics that increase the difficulty of clinical diagnosis and treatment. Older adults also use more emergency medical resources than people in other age groups. Many studies have reviewed the causes of EDRVs among general ED patients; however, few have focused on older adults, although this is the age group with the highest rate of EDRVs. Objective: This aim of this study is to establish a model for predicting unscheduled EDRVs within a 72-hour period among patients aged 65 years and older. In addition, we aim to investigate the effects of the influencing factors on their unscheduled EDRVs. Methods: We used stratified and randomized data from Taiwan's National Health Insurance Research Database and applied data mining techniques to construct a prediction model consisting of patient, disease, hospital, and physician characteristics. Records of ED visits by patients aged 65 years and older from 1996 to 2010 in the National Health Insurance Research Database were selected, and the final sample size was 49,252 records. Results: The decision tree of the prediction model achieved an acceptable overall accuracy of 76.80\%. Economic status, chronic illness, and length of stay in the ED were the top three variables influencing unscheduled EDRVs. Those who stayed in the ED overnight or longer on their first visit were less likely to return. This study confirms the results of prior studies, which found that economically underprivileged older adults with chronic illness and comorbidities were more likely to return to the ED. Conclusions: Medical institutions can use our prediction model as a reference to improve medical management and clinical services by understanding the reasons for 72-hour unscheduled EDRVs in older adult patients. A possible solution is to create mechanisms that incorporate our prediction model and develop a support system with customized medical education for older patients and their family members before discharge. Meanwhile, a reasonably longer length of stay in the ED may help evaluate treatments and guide prognosis for older adult patients, and it may further reduce the rate of their unscheduled EDRVs. ", doi="10.2196/22491", url="https://medinform.jmir.org/2021/7/e22491", url="http://www.ncbi.nlm.nih.gov/pubmed/34319244" } @Article{info:doi/10.2196/27110, author="Zhao, Ziran and Cheng, Xi and Sun, Xiao and Ma, Shanrui and Feng, Hao and Zhao, Liang", title="Prediction Model of Anastomotic Leakage Among Esophageal Cancer Patients After Receiving an Esophagectomy: Machine Learning Approach", journal="JMIR Med Inform", year="2021", month="Jul", day="27", volume="9", number="7", pages="e27110", keywords="anastomotic leakage", keywords="esophageal cancer", keywords="esophagectomy", keywords="machine learning", keywords="risk factors", abstract="Background: Anastomotic leakage (AL) is one of the severe postoperative adverse events (5\%-30\%), and it is related to increased medical costs in cancer patients who undergo esophagectomies. Machine learning (ML) methods show good performance at predicting risk for AL. However, AL risk prediction based on ML models among the Chinese population is unavailable. Objective: This study uses ML techniques to develop and validate a risk prediction model to screen patients with emerging AL risk factors. Methods: Analyses were performed using medical records from 710 patients who underwent esophagectomies at the National Clinical Research Center for Cancer between January 2010 and May 2015. We randomly split (9:1) the data set into a training data set of 639 patients and a testing data set of 71 patients using a computer algorithm. We assessed multiple classification tools to create a multivariate risk prediction model. Our ML algorithms contained decision tree, random forest, naive Bayes, and logistic regression with least absolute shrinkage and selection operator. The optimal AL prediction model was selected based on model evaluation metrics. Results: The final risk panel included 36 independent risk features. Of those, 10 features were significantly identified by the logistic model, including aortic calcification (OR 2.77, 95\% CI 1.32-5.81), celiac trunk calcification (OR 2.79, 95\% CI 1.20-6.48), forced expiratory volume 1\% (OR 0.51, 95\% CI 0.30-0.89); TLco (OR 0.56, 95\% CI 0.27-1.18), peripheral vascular disease (OR 4.97, 95\% CI 1.44-17.07), laparoscope (OR 3.92, 95\% CI 1.23-12.51), postoperative length of hospital stay (OR 1.17, 95\% CI 1.13-1.21), vascular permeability activity (OR 0.46, 95\% CI 0.14-1.48), and fat liquefaction of incisions (OR 4.36, 95\% CI 1.86-10.21). Logistic regression with least absolute shrinkage and selection operator offered the highest prediction quality with an area under the receiver operator characteristic of 72\% in the training data set. The testing model also achieved similar high performance. Conclusions: Our model offered a prediction of AL with high accuracy, assisting in AL prevention and treatment. A personalized ML prediction model with a purely data-driven selection of features is feasible and effective in predicting AL in patients who underwent esophagectomy. ", doi="10.2196/27110", url="https://medinform.jmir.org/2021/7/e27110", url="http://www.ncbi.nlm.nih.gov/pubmed/34313597" } @Article{info:doi/10.2196/24651, author="Yoo, Junsang and Kim, Si-Ho and Hur, Sujeong and Ha, Juhyung and Huh, Kyungmin and Cha, Chul Won", title="Candidemia Risk Prediction (CanDETEC) Model for Patients With Malignancy: Model Development and Validation in a Single-Center Retrospective Study", journal="JMIR Med Inform", year="2021", month="Jul", day="26", volume="9", number="7", pages="e24651", keywords="candidemia", keywords="precision medicine", keywords="supervised machine learning", keywords="decision support systems, clinical", keywords="infection control", keywords="decision support", keywords="machine learning", keywords="development", keywords="validation", keywords="prediction", keywords="risk", keywords="model", abstract="Background: Appropriate empirical treatment for candidemia is associated with reduced mortality; however, the timely diagnosis of candidemia in patients with sepsis remains poor. Objective: We aimed to use machine learning algorithms to develop and validate a candidemia prediction model for patients with cancer. Methods: We conducted a single-center retrospective study using the cancer registry of a tertiary academic hospital. Adult patients diagnosed with malignancies between January 2010 and December 2018 were included. Our study outcome was the prediction of candidemia events. A stratified undersampling method was used to extract control data for algorithm learning. Multiple models were developed---a combination of 4 variable groups and 5 algorithms (auto-machine learning, deep neural network, gradient boosting, logistic regression, and random forest). The model with the largest area under the receiver operating characteristic curve (AUROC) was selected as the Candida detection (CanDETEC) model after comparing its performance indexes with those of the Candida Score Model. Results: From a total of 273,380 blood cultures from 186,404 registered patients with cancer, we extracted 501 records of candidemia events and 2000 records as control data. Performance among the different models varied (AUROC 0.771- 0.889), with all models demonstrating superior performance to that of the Candida Score (AUROC 0.677). The random forest model performed the best (AUROC 0.889, 95\% CI 0.888-0.889); therefore, it was selected as the CanDETEC model. Conclusions: The CanDETEC model predicted candidemia in patients with cancer with high discriminative power. This algorithm could be used for the timely diagnosis and appropriate empirical treatment of candidemia. ", doi="10.2196/24651", url="https://medinform.jmir.org/2021/7/e24651", url="http://www.ncbi.nlm.nih.gov/pubmed/34309570" } @Article{info:doi/10.2196/23401, author="Hur, Sujeong and Ko, Ryoung-Eun and Yoo, Junsang and Ha, Juhyung and Cha, Chul Won and Chung, Ryang Chi", title="A Machine Learning--Based Algorithm for the Prediction of Intensive Care Unit Delirium (PRIDE): Retrospective Study", journal="JMIR Med Inform", year="2021", month="Jul", day="26", volume="9", number="7", pages="e23401", keywords="clinical prediction", keywords="delirium", keywords="electronic health record", keywords="intensive care unit", keywords="machine learning", abstract="Background: Delirium frequently occurs among patients admitted to the intensive care unit (ICU). There is limited evidence to support interventions to treat or resolve delirium in patients who have already developed delirium. Therefore, the early recognition and prevention of delirium are important in the management of critically ill patients. Objective: This study aims to develop and validate a delirium prediction model within 24 hours of admission to the ICU using electronic health record data. The algorithm was named the Prediction of ICU Delirium (PRIDE). Methods: This is a retrospective cohort study performed at a tertiary referral hospital with 120 ICU beds. We only included patients who were 18 years or older at the time of admission and who stayed in the medical or surgical ICU. Patients were excluded if they lacked a Confusion Assessment Method for the ICU record from the day of ICU admission or if they had a positive Confusion Assessment Method for the ICU record at the time of ICU admission. The algorithm to predict delirium was developed using patient data from the first 2 years of the study period and validated using patient data from the last 6 months. Random forest (RF), Extreme Gradient Boosting (XGBoost), deep neural network (DNN), and logistic regression (LR) were used. The algorithms were externally validated using MIMIC-III data, and the algorithm with the largest area under the receiver operating characteristics (AUROC) curve in the external data set was named the PRIDE algorithm. Results: A total of 37,543 cases were collected. After patient exclusion, 12,409 remained as our study population, of which 3816 (30.8\%) patients experienced delirium incidents during the study period. Based on the exclusion criteria, out of the 96,016 ICU admission cases in the MIMIC-III data set, 2061 cases were included, and 272 (13.2\%) delirium incidents occurred. The average AUROCs and 95\% CIs for internal validation were 0.916 (95\% CI 0.916-0.916) for RF, 0.919 (95\% CI 0.919-0.919) for XGBoost, 0.881 (95\% CI 0.878-0.884) for DNN, and 0.875 (95\% CI 0.875-0.875) for LR. Regarding the external validation, the best AUROC were 0.721 (95\% CI 0.72-0.721) for RF, 0.697 (95\% CI 0.695-0.699) for XGBoost, 0.655 (95\% CI 0.654-0.657) for DNN, and 0.631 (95\% CI 0.631-0.631) for LR. The Brier score of the RF model is 0.168, indicating that it is well-calibrated. Conclusions: A machine learning approach based on electronic health record data can be used to predict delirium within 24 hours of ICU admission. RF, XGBoost, DNN, and LR models were used, and they effectively predicted delirium. However, with the potential to advise ICU physicians and prevent ICU delirium, prospective studies are required to verify the algorithm's performance. ", doi="10.2196/23401", url="https://medinform.jmir.org/2021/7/e23401", url="http://www.ncbi.nlm.nih.gov/pubmed/34309567" } @Article{info:doi/10.2196/27858, author="Sun, Xingzhi and Bee, Mong Yong and Lam, Wei Shao and Liu, Zhuo and Zhao, Wei and Chia, Yi Sing and Abdul Kadir, Hanis and Wu, Tian Jun and Ang, Yew Boon and Liu, Nan and Lei, Zuo and Xu, Zhuoyang and Zhao, Tingting and Hu, Gang and Xie, Guotong", title="Effective Treatment Recommendations for Type 2 Diabetes Management Using Reinforcement Learning: Treatment Recommendation Model Development and Validation", journal="J Med Internet Res", year="2021", month="Jul", day="22", volume="23", number="7", pages="e27858", keywords="type 2 diabetes", keywords="reinforcement learning", keywords="model concordance", keywords="short-term outcome", keywords="long-term outcome", abstract="Background: Type 2 diabetes mellitus (T2DM) and its related complications represent a growing economic burden for many countries and health systems. Diabetes complications can be prevented through better disease control, but there is a large gap between the recommended treatment and the treatment that patients actually receive. The treatment of T2DM can be challenging because of different comprehensive therapeutic targets and individual variability of the patients, leading to the need for precise, personalized treatment. Objective: The aim of this study was to develop treatment recommendation models for T2DM based on deep reinforcement learning. A retrospective analysis was then performed to evaluate the reliability and effectiveness of the models. Methods: The data used in our study were collected from the Singapore Health Services Diabetes Registry, encompassing 189,520 patients with T2DM, including 6,407,958 outpatient visits from 2013 to 2018. The treatment recommendation model was built based on 80\% of the dataset and its effectiveness was evaluated with the remaining 20\% of data. Three treatment recommendation models were developed for antiglycemic, antihypertensive, and lipid-lowering treatments by combining a knowledge-driven model and a data-driven model. The knowledge-driven model, based on clinical guidelines and expert experiences, was first applied to select the candidate medications. The data-driven model, based on deep reinforcement learning, was used to rank the candidates according to the expected clinical outcomes. To evaluate the models, short-term outcomes were compared between the model-concordant treatments and the model-nonconcordant treatments with confounder adjustment by stratification, propensity score weighting, and multivariate regression. For long-term outcomes, model-concordant rates were included as independent variables to evaluate if the combined antiglycemic, antihypertensive, and lipid-lowering treatments had a positive impact on reduction of long-term complication occurrence or death at the patient level via multivariate logistic regression. Results: The test data consisted of 36,993 patients for evaluating the effectiveness of the three treatment recommendation models. In 43.3\% of patient visits, the antiglycemic medications recommended by the model were concordant with the actual prescriptions of the physicians. The concordant rates for antihypertensive medications and lipid-lowering medications were 51.3\% and 58.9\%, respectively. The evaluation results also showed that model-concordant treatments were associated with better glycemic control (odds ratio [OR] 1.73, 95\% CI 1.69-1.76), blood pressure control (OR 1.26, 95\% CI, 1.23-1.29), and blood lipids control (OR 1.28, 95\% CI 1.22-1.35). We also found that patients with more model-concordant treatments were associated with a lower risk of diabetes complications (including 3 macrovascular and 2 microvascular complications) and death, suggesting that the models have the potential of achieving better outcomes in the long term. Conclusions: Comprehensive management by combining knowledge-driven and data-driven models has good potential to help physicians improve the clinical outcomes of patients with T2DM; achieving good control on blood glucose, blood pressure, and blood lipids; and reducing the risk of diabetes complications in the long term. ", doi="10.2196/27858", url="https://www.jmir.org/2021/7/e27858", url="http://www.ncbi.nlm.nih.gov/pubmed/34292166" } @Article{info:doi/10.2196/29226, author="Zhong, Tao and Zhuang, Zian and Dong, Xiaoli and Wong, Hing Ka and Wong, Tak Wing and Wang, Jian and He, Daihai and Liu, Shengyuan", title="Predicting Antituberculosis Drug--Induced Liver Injury Using an Interpretable Machine Learning Method: Model Development and Validation Study", journal="JMIR Med Inform", year="2021", month="Jul", day="20", volume="9", number="7", pages="e29226", keywords="accuracy", keywords="drug", keywords="drug-induced liver injury", keywords="high accuracy", keywords="injury", keywords="interpretability", keywords="interpretation", keywords="liver", keywords="machine learning", keywords="model", keywords="prediction", keywords="treatment", keywords="tuberculosis", keywords="XGBoost algorithm", abstract="Background: Tuberculosis (TB) is a pandemic, being one of the top 10 causes of death and the main cause of death from a single source of infection. Drug-induced liver injury (DILI) is the most common and serious side effect during the treatment of TB. Objective: We aim to predict the status of liver injury in patients with TB at the clinical treatment stage. Methods: We designed an interpretable prediction model based on the XGBoost algorithm and identified the most robust and meaningful predictors of the risk of TB-DILI on the basis of clinical data extracted from the Hospital Information System of Shenzhen Nanshan Center for Chronic Disease Control from 2014 to 2019. Results: In total, 757 patients were included, and 287 (38\%) had developed TB-DILI. Based on values of relative importance and area under the receiver operating characteristic curve, machine learning tools selected patients' most recent alanine transaminase levels, average rate of change of patients' last 2 measures of alanine transaminase levels, cumulative dose of pyrazinamide, and cumulative dose of ethambutol as the best predictors for assessing the risk of TB-DILI. In the validation data set, the model had a precision of 90\%, recall of 74\%, classification accuracy of 76\%, and balanced error rate of 77\% in predicting cases of TB-DILI. The area under the receiver operating characteristic curve score upon 10-fold cross-validation was 0.912 (95\% CI 0.890-0.935). In addition, the model provided warnings of high risk for patients in advance of DILI onset for a median of 15 (IQR 7.3-27.5) days. Conclusions: Our model shows high accuracy and interpretability in predicting cases of TB-DILI, which can provide useful information to clinicians to adjust the medication regimen and avoid more serious liver injury in patients. ", doi="10.2196/29226", url="https://medinform.jmir.org/2021/7/e29226", url="http://www.ncbi.nlm.nih.gov/pubmed/34283036" } @Article{info:doi/10.2196/27484, author="Shojaie, Danielle and Hoffman, S. Aubri and Amaku, Ruth and Cabanillas, E. Maria and Sosa, Ann Julie and Waguespack, G. Steven and Zafereo, E. Mark and Hu, I. Mimi and Grubbs, E. Elizabeth", title="Decision Making When Cancer Becomes Chronic: Needs Assessment for a Web-Based Medullary Thyroid Carcinoma Patient Decision Aid", journal="JMIR Form Res", year="2021", month="Jul", day="16", volume="5", number="7", pages="e27484", keywords="patient decision aids", keywords="decision support techniques", keywords="oncology", keywords="medullary thyroid cancer", keywords="targeted therapy", keywords="clinical trial", keywords="mobile phone", abstract="Background: In cancers with a chronic phase, patients and family caregivers face difficult decisions such as whether to start a novel therapy, whether to enroll in a clinical trial, and when to stop treatment. These decisions are complex, require an understanding of uncertainty, and necessitate the consideration of patients' informed preferences. For some cancers, such as medullary thyroid carcinoma, these decisions may also involve significant out-of-pocket costs and effects on family members. Providers have expressed a need for web-based interventions that can be delivered between consultations to provide education and prepare patients and families to discuss these decisions. To ensure that these tools are effective, usable, and understandable, studies are needed to identify patients', families', and providers' decision-making needs and optimal design strategies for a web-based patient decision aid. Objective: Following the international guidelines for the development of a web-based patient decision aid, the objectives of this study are to engage potential users to guide development; review the existing literature and available tools; assess users' decision-making experiences, needs, and design recommendations; and identify shared decision-making approaches to address each need. Methods: This study used the decisional needs assessment approach, which included creating a stakeholder advisory panel, mapping decision pathways, conducting an environmental scan of existing materials, and administering a decisional needs assessment questionnaire. Thematic analyses identified current decision-making pathways, unmet decision-making needs, and decision support strategies for meeting each need. Results: The stakeholders reported wide heterogeneity in decision timing and pathways. Relevant existing materials included 2 systematic reviews, 9 additional papers, and multiple educational websites, but none of these met the criteria for a patient decision aid. Patients and family members (n=54) emphasized the need for plain language (46/54, 85\%), shared decision making (45/54, 83\%), and help with family discussions (39/54, 72\%). Additional needs included information about uncertainty, lived experience, and costs. Providers (n=10) reported needing interventions that address misinformation (9/10, 90\%), foster realistic expectations (9/10, 90\%), and address mistrust in clinical trials (5/10, 50\%). Additional needs included provider tools that support shared decision making. Both groups recommended designing a web-based patient decision aid that can be tailored to (64/64, 100\%) and delivered on a hospital website (53/64, 83\%), focuses on quality of life (45/64, 70\%), and provides step-by-step guidance (43/64, 67\%). The study team identified best practices to meet each need, which are presented in the proposed decision support design guide. Conclusions: Patients, families, and providers report multifaceted decision support needs during the chronic phase of cancer. Web-based patient decision aids that provide tailored support over time and explicitly address uncertainty, quality of life, realistic expectations, and effects on families are needed. ", doi="10.2196/27484", url="https://formative.jmir.org/2021/7/e27484", url="http://www.ncbi.nlm.nih.gov/pubmed/34269691" } @Article{info:doi/10.2196/22021, author="Froud, Robert and Hansen, Hakestad Solveig and Ruud, Kristian Hans and Foss, Jonathan and Ferguson, Leila and Fredriksen, Morten Per", title="Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study", journal="J Med Internet Res", year="2021", month="Jul", day="16", volume="23", number="7", pages="e22021", keywords="modelling", keywords="linear regression", keywords="machine learning", keywords="artificial intelligence", keywords="quality of life", keywords="academic performance", keywords="continuous/quasi-continuous health outcomes", abstract="Background: Machine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be associated with aerobic fitness and strength. It is unclear whether diet or academic performance is associated with quality of life. Objective: The purpose of this study was to compare the predictive performance of machine learning techniques with that of linear regression in examining the extent to which continuous outcomes (physical activity, aerobic fitness, muscular strength, diet, and parental education) are predictive of academic performance and quality of life and whether academic performance and quality of life are associated. Methods: We modeled data from children attending 9 schools in a quasi-experimental study. We split data randomly into training and validation sets. Curvilinear, nonlinear, and heteroscedastic variables were simulated to examine the performance of machine learning techniques compared to that of linear models, with and without imputation. Results: We included data for 1711 children. Regression models explained 24\% of academic performance variance in the real complete-case validation set, and up to 15\% in quality of life. While machine learning techniques explained high proportions of variance in training sets, in validation, machine learning techniques explained approximately 0\% of academic performance and 3\% to 8\% of quality of life. With imputation, machine learning techniques improved to 15\% for academic performance. Machine learning outperformed regression for simulated nonlinear and heteroscedastic variables. The best predictors of academic performance in adjusted models were the child's mother having a master-level education (P<.001; $\beta$=1.98, 95\% CI 0.25 to 3.71), increased television and computer use (P=.03; $\beta$=1.19, 95\% CI 0.25 to 3.71), and dichotomized self-reported exercise (P=.001; $\beta$=2.47, 95\% CI 1.08 to 3.87). For quality of life, self-reported exercise (P<.001; $\beta$=1.09, 95\% CI 0.53 to 1.66) and increased television and computer use (P=.002; $\beta$=?0.95, 95\% CI ?1.55 to ?0.36) were the best predictors. Adjusted academic performance was associated with quality of life (P=.02; $\beta$=0.12, 95\% CI 0.02 to 0.22). Conclusions: Linear regression was less prone to overfitting and outperformed commonly used machine learning techniques. Imputation improved the performance of machine learning, but not sufficiently to outperform regression. Machine learning techniques outperformed linear regression for modeling nonlinear and heteroscedastic relationships and may be of use in such cases. Regression with splines performed almost as well in nonlinear modeling. Lifestyle variables, including physical exercise, television and computer use, and parental education are predictive of academic performance or quality of life. Academic performance is associated with quality of life after adjusting for lifestyle variables and may offer another promising intervention target to improve quality of life in children. ", doi="10.2196/22021", url="https://www.jmir.org/2021/7/e22021", url="http://www.ncbi.nlm.nih.gov/pubmed/34009128" } @Article{info:doi/10.2196/22110, author="Beaubien, Louis and Conrad, Colin and Music, Janet and Toze, Sandra", title="Evaluating Simplified Web Interfaces of Risk Models for Clinical Use: Pilot Survey Study", journal="JMIR Form Res", year="2021", month="Jul", day="16", volume="5", number="7", pages="e22110", keywords="risk model", keywords="electronic records", keywords="user interface", keywords="technology acceptance", abstract="Background: In this pilot study, we investigated sociotechnical factors that affect intention to use a simplified web model to support clinical decision making. Objective: We investigated factors that are known to affect technology adoption using the unified theory of acceptance and use of technology (UTAUT2) model. The goal was to pilot and test a tool to better support complex clinical assessments. Methods: Based on the results of a previously published work, we developed a web-based mobile user interface, WebModel, to allow users to work with regression equations and their predictions to evaluate the impact of various characteristics or treatments on key outcomes (eg, survival time) for chronic obstructive pulmonary disease. The WebModel provides a way to combat information overload and more easily compare treatment options. It limits the number of web forms presented to a user to between 1 and 20, rather than the dozens of detailed calculations typically required. The WebModel uses responsive design and can be used on multiple devices. To test the WebModel, we designed a questionnaire to probe the efficacy of the WebModel and assess the usability and usefulness of the system. The study was live for one month, and participants had access to it over that time. The questionnaire was administered online, and data from 674 clinical users who had access to the WebModel were captured. SPSS and R were used for statistical analysis. Results: The regression model developed from UTAUT2 constructs was a fit. Specifically, five of the seven factors were significant positive coefficients in the regression: performance expectancy ($\beta$=.2730; t=7.994; P<.001), effort expectancy ($\beta$=.1473; t=3.870; P=.001), facilitating conditions ($\beta$=.1644; t=3.849; P<.001), hedonic motivation ($\beta$=.2321; t=3.991; P<.001), and habit ($\beta$=.2943; t=12.732). Social influence was not a significant factor, while price value had a significant negative influence on intention to use the WebModel. Conclusions: Our results indicate that multiple influences impact positive response to the system, many of which relate to the efficiency of the interface to provide clear information. Although we found that the price value was a negative factor, it is possible this was due to the removal of health workers from purchasing decisions. Given that this was a pilot test, and that the system was not used in a clinical setting, we could not examine factors related to actual workflow, patient safety, or social influence. This study shows that the concept of a simplified WebModel could be effective and efficient in reducing information overload in complex clinical decision making. We recommend further study to test this in a clinical setting and gather qualitative data from users regarding the value of the tool in practice. ", doi="10.2196/22110", url="https://formative.jmir.org/2021/7/e22110", url="http://www.ncbi.nlm.nih.gov/pubmed/34269692" } @Article{info:doi/10.2196/24645, author="Abbey, Joseph Enoch and Mammen, R. Jennifer S. and Soghoian, E. Samara and Cadorette, F. Maureen A. and Ariyo, Promise", title="In-hospital Mortality and the Predictive Ability of the Modified Early Warning Score in Ghana: Single-Center, Retrospective Study", journal="JMIRx Med", year="2021", month="Jul", day="12", volume="2", number="3", pages="e24645", keywords="modified early warning score", keywords="MEWS", keywords="AVPU scale", keywords="Korle-Bu Teaching Hospital", keywords="KBTH", keywords="Ghana", keywords="critical care", keywords="vital signs", keywords="global health", abstract="Background: The modified early warning score (MEWS) is an objective measure of illness severity that promotes early recognition of clinical deterioration in critically ill patients. Its primary use is to facilitate faster intervention or increase the level of care. Despite its adoption in some African countries, MEWS is not standard of care in Ghana. In order to facilitate the use of such a tool, we assessed whether MEWS, or a combination of the more limited data that are routinely collected in current clinical practice, can be used predict to mortality among critically ill inpatients at the Korle-Bu Teaching Hospital in Accra, Ghana. Objective: The aim of this study was to identify the predictive ability of MEWS for medical inpatients at risk of mortality and its comparability to a measure combining routinely measured physiologic parameters (limited MEWS [LMEWS]). Methods: We conducted a retrospective study of medical inpatients, aged ?13 years and admitted to the Korle-Bu Teaching Hospital from January 2017 to March 2019. Routine vital signs at 48 hours post admission were coded to obtain LMEWS values. The level of consciousness was imputed from medical records and combined with LMEWS to obtain the full MEWS value. A predictive model comparing mortality among patients with a significant MEWS value or LMEWS ?4 versus a nonsignificant MEWS value or LMEWS <4 was designed using multiple logistic regression and internally validated for predictive accuracy, using the receiver operating characteristic (ROC) curve. Results: A total of 112 patients were included in the study. The adjusted odds of death comparing patients with a significant MEWS to patients with a nonsignificant MEWS was 6.33 (95\% CI 1.96-20.48). Similarly, the adjusted odds of death comparing patients with a significant versus nonsignificant LMEWS value was 8.22 (95\% CI 2.45-27.56). The ROC curve for each analysis had a C-statistic of 0.83 and 0.84, respectively. Conclusions: LMEWS is a good predictor of mortality and comparable to MEWS. Adoption of LMEWS can be implemented now using currently available data to identify medical inpatients at risk of death in order to improve care. ", doi="10.2196/24645", url="https://xmed.jmir.org/2021/3/e24645", url="http://www.ncbi.nlm.nih.gov/pubmed/37725551" } @Article{info:doi/10.2196/26151, author="Nikolov, Stanislav and Blackwell, Sam and Zverovitch, Alexei and Mendes, Ruheena and Livne, Michelle and De Fauw, Jeffrey and Patel, Yojan and Meyer, Clemens and Askham, Harry and Romera-Paredes, Bernadino and Kelly, Christopher and Karthikesalingam, Alan and Chu, Carlton and Carnell, Dawn and Boon, Cheng and D'Souza, Derek and Moinuddin, Ali Syed and Garie, Bethany and McQuinlan, Yasmin and Ireland, Sarah and Hampton, Kiarna and Fuller, Krystle and Montgomery, Hugh and Rees, Geraint and Suleyman, Mustafa and Back, Trevor and Hughes, Owen C{\'i}an and Ledsam, R. Joseph and Ronneberger, Olaf", title="Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study", journal="J Med Internet Res", year="2021", month="Jul", day="12", volume="23", number="7", pages="e26151", keywords="radiotherapy", keywords="segmentation", keywords="contouring", keywords="machine learning", keywords="artificial intelligence", keywords="UNet", keywords="convolutional neural networks", keywords="surface DSC", abstract="Background: Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain. Objective: Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice. Methods: The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions. Results: We demonstrated the model's clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model's generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training. Conclusions: Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways. ", doi="10.2196/26151", url="https://www.jmir.org/2021/7/e26151", url="http://www.ncbi.nlm.nih.gov/pubmed/34255661" } @Article{info:doi/10.2196/29514, author="Wang, M. Joshua and Liu, Wenke and Chen, Xiaoshan and McRae, P. Michael and McDevitt, T. John and Feny{\"o}, David", title="Predictive Modeling of Morbidity and Mortality in Patients Hospitalized With COVID-19 and its Clinical Implications: Algorithm Development and Interpretation", journal="J Med Internet Res", year="2021", month="Jul", day="9", volume="23", number="7", pages="e29514", keywords="COVID-19", keywords="coronavirus", keywords="SARS-CoV-2", keywords="predictive modeling", keywords="New York City", keywords="prediction", keywords="model", keywords="machine learning", keywords="morbidity", keywords="mortality", keywords="hospital", keywords="marker", keywords="severity", keywords="symptom", keywords="decision making", keywords="outcome", abstract="Background: The COVID-19 pandemic began in early 2021 and placed significant strains on health care systems worldwide. There remains a compelling need to analyze factors that are predictive for patients at elevated risk of morbidity and mortality. Objective: The goal of this retrospective study of patients who tested positive with COVID-19 and were treated at NYU (New York University) Langone Health was to identify clinical markers predictive of disease severity in order to assist in clinical decision triage and to provide additional biological insights into disease progression. Methods: The clinical activity of 3740 patients at NYU Langone Hospital was obtained between January and August 2020; patient data were deidentified. Models were trained on clinical data during different parts of their hospital stay to predict three clinical outcomes: deceased, ventilated, or admitted to the intensive care unit (ICU). Results: The XGBoost (eXtreme Gradient Boosting) model that was trained on clinical data from the final 24 hours excelled at predicting mortality (area under the curve [AUC]=0.92; specificity=86\%; and sensitivity=85\%). Respiration rate was the most important feature, followed by SpO2 (peripheral oxygen saturation) and being aged 75 years and over. Performance of this model to predict the deceased outcome extended 5 days prior, with AUC=0.81, specificity=70\%, and sensitivity=75\%. When only using clinical data from the first 24 hours, AUCs of 0.79, 0.80, and 0.77 were obtained for deceased, ventilated, or ICU-admitted outcomes, respectively. Although respiration rate and SpO2 levels offered the highest feature importance, other canonical markers, including diabetic history, age, and temperature, offered minimal gain. When lab values were incorporated, prediction of mortality benefited the most from blood urea nitrogen and lactate dehydrogenase (LDH). Features that were predictive of morbidity included LDH, calcium, glucose, and C-reactive protein. Conclusions: Together, this work summarizes efforts to systematically examine the importance of a wide range of features across different endpoint outcomes and at different hospitalization time points. ", doi="10.2196/29514", url="https://www.jmir.org/2021/7/e29514", url="http://www.ncbi.nlm.nih.gov/pubmed/34081611" } @Article{info:doi/10.2196/29986, author="Lee, Y. Jewel and Molani, Sevda and Fang, Chen and Jade, Kathleen and O'Mahony, Shane D. and Kornilov, A. Sergey and Mico, T. Lindsay and Hadlock, J. Jennifer", title="Ambulatory Risk Models for the Long-Term Prevention of Sepsis: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Jul", day="8", volume="9", number="7", pages="e29986", keywords="sepsis", keywords="machine learning", keywords="electronic health records", keywords="risk prediction", keywords="clinical decision making", keywords="prevention", keywords="risk factors", abstract="Background: Sepsis is a life-threatening condition that can rapidly lead to organ damage and death. Existing risk scores predict outcomes for patients who have already become acutely ill. Objective: We aimed to develop a model for identifying patients at risk of getting sepsis within 2 years in order to support the reduction of sepsis morbidity and mortality. Methods: Machine learning was applied to 2,683,049 electronic health records (EHRs) with over 64 million encounters across five states to develop models for predicting a patient's risk of getting sepsis within 2 years. Features were selected to be easily obtainable from a patient's chart in real time during ambulatory encounters. Results: The models showed consistent prediction scores, with the highest area under the receiver operating characteristic curve of 0.82 and a positive likelihood ratio of 2.9 achieved with gradient boosting on all features combined. Predictive features included age, sex, ethnicity, average ambulatory heart rate, standard deviation of BMI, and the number of prior medical conditions and procedures. The findings identified both known and potential new risk factors for long-term sepsis. Model variations also illustrated trade-offs between incrementally higher accuracy, implementability, and interpretability. Conclusions: Accurate implementable models were developed to predict the 2-year risk of sepsis, using EHR data that is easy to obtain from ambulatory encounters. These results help advance the understanding of sepsis and provide a foundation for future trials of risk-informed preventive care. ", doi="10.2196/29986", url="https://medinform.jmir.org/2021/7/e29986", url="http://www.ncbi.nlm.nih.gov/pubmed/34086596" } @Article{info:doi/10.2196/27532, author="Holdsworth, M. Laura and Kling, R. Samantha M. and Smith, Margaret and Safaeinili, Nadia and Shieh, Lisa and Vilendrer, Stacie and Garvert, W. Donn and Winget, Marcy and Asch, M. Steven and Li, C. Ron", title="Predicting and Responding to Clinical Deterioration in Hospitalized Patients by Using Artificial Intelligence: Protocol for a Mixed Methods, Stepped Wedge Study", journal="JMIR Res Protoc", year="2021", month="Jul", day="7", volume="10", number="7", pages="e27532", keywords="artificial intelligence", keywords="clinical deterioration", keywords="rapid response team", keywords="mixed methods", keywords="workflow", keywords="predictive models, SEIPS 2.0", abstract="Background: The early identification of clinical deterioration in patients in hospital units can decrease mortality rates and improve other patient outcomes; yet, this remains a challenge in busy hospital settings. Artificial intelligence (AI), in the form of predictive models, is increasingly being explored for its potential to assist clinicians in predicting clinical deterioration. Objective: Using the Systems Engineering Initiative for Patient Safety (SEIPS) 2.0 model, this study aims to assess whether an AI-enabled work system improves clinical outcomes, describe how the clinical deterioration index (CDI) predictive model and associated work processes are implemented, and define the emergent properties of the AI-enabled work system that mediate the observed clinical outcomes. Methods: This study will use a mixed methods approach that is informed by the SEIPS 2.0 model to assess both processes and outcomes and focus on how physician-nurse clinical teams are affected by the presence of AI. The intervention will be implemented in hospital medicine units based on a modified stepped wedge design featuring three stages over 11 months---stage 0 represents a baseline period 10 months before the implementation of the intervention; stage 1 introduces the CDI predictions to physicians only and triggers a physician-driven workflow; and stage 2 introduces the CDI predictions to the multidisciplinary team, which includes physicians and nurses, and triggers a nurse-driven workflow. Quantitative data will be collected from the electronic health record for the clinical processes and outcomes. Interviews will be conducted with members of the multidisciplinary team to understand how the intervention changes the existing work system and processes. The SEIPS 2.0 model will provide an analytic framework for a mixed methods analysis. Results: A pilot period for the study began in December 2020, and the results are expected in mid-2022. Conclusions: This protocol paper proposes an approach to evaluation that recognizes the importance of assessing both processes and outcomes to understand how a multifaceted AI-enabled intervention affects the complex team-based work of identifying and managing clinical deterioration. International Registered Report Identifier (IRRID): PRR1-10.2196/27532 ", doi="10.2196/27532", url="https://www.researchprotocols.org/2021/7/e27532", url="http://www.ncbi.nlm.nih.gov/pubmed/34255728" } @Article{info:doi/10.2196/26393, author="Du, Jian and Chen, Ting and Zhang, Luxia", title="Measuring the Interactions Between Health Demand, Informatics Supply, and Technological Applications in Digital Medical Innovation for China: Content Mapping and Analysis", journal="JMIR Med Inform", year="2021", month="Jul", day="6", volume="9", number="7", pages="e26393", keywords="medical informatics", keywords="Medical Subject Headings (MeSH)", keywords="health demand", keywords="informatics supply", keywords="technological applications", abstract="Background: There were 2 major incentives introduced by the Chinese government to promote medical informatics in 2009 and 2016. As new drugs are the major source of medical innovation, informatics-related concepts and techniques are a major source of digital medical innovation. However, it is unclear whether the research efforts of medical informatics in China have met the health needs, such as disease management and population health. Objective: We proposed an approach to mapping the interplay between different knowledge entities by using the tree structure of Medical Subject Headings (MeSH) to gain insights into the interactions between informatics supply, health demand, and technological applications in digital medical innovation in China. Methods: All terms under the MeSH tree parent node ``Diseases [C]'' or node ``Health [N01.400]'' or ``Public Health [N06.850]'' were labelled as H. All terms under the node ``Information Science [L]'' were labelled as I, and all terms under node ``Analytical, Diagnostic and Therapeutic Techniques, and Equipment [E]'' were labelled as T. The H-I-T interactions can be measured by using their co-occurrences in a given publication. Results: The H-I-T interactions in China are showing significant growth and a more concentrated interplay were observed. Computing methodologies, informatics, and communications media (such as social media and the internet) constitute the majority of I-related concepts and techniques used for resolving the health promotion and diseases management problems in China. Generally there is a positive correlation between the burden and informatics research efforts for diseases in China. We think it is not contradictory that informatics research should be focused on the greatest burden of diseases or where it can have the most impact. Artificial intelligence is a competing field of medical informatics research in China, with a notable focus on diagnostic deep learning algorithms for medical imaging. Conclusions: It is suggested that technological transfers, namely the functionality to be realized by medical/health informatics (eg, diagnosis, therapeutics, surgical procedures, laboratory testing techniques, and equipment and supplies) should be strengthened. Research on natural language processing and electronic health records should also be strengthened to improve the real-world applications of health information technologies and big data in the future. ", doi="10.2196/26393", url="https://medinform.jmir.org/2021/7/e26393", url="http://www.ncbi.nlm.nih.gov/pubmed/34255693" } @Article{info:doi/10.2196/24796, author="Muro, Shigeo and Ishida, Masato and Horie, Yoshiharu and Takeuchi, Wataru and Nakagawa, Shunki and Ban, Hideyuki and Nakagawa, Tohru and Kitamura, Tetsuhisa", title="Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study", journal="JMIR Med Inform", year="2021", month="Jul", day="6", volume="9", number="7", pages="e24796", keywords="chronic obstructive pulmonary disease", keywords="airflow limitation", keywords="medical check-up", keywords="Gradient Boosting Decision Tree", keywords="logistic regression", abstract="Background: Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist. Objective: This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database. Methods: In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV1) to prebronchodilator forced vital capacity (FVC) was <0.7 during two consecutive examinations. Results: Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV1/FVC, smoking status, allergic symptoms, cough, pack years, hemoglobin A1c, serum albumin, mean corpuscular volume, percent predicted vital capacity, and percent predicted value of FEV1. The areas under the receiver operating characteristic curves of the XGBoost model and the logistic regression model were 0.956 and 0.943, respectively. Conclusions: Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan. ", doi="10.2196/24796", url="https://medinform.jmir.org/2021/7/e24796", url="http://www.ncbi.nlm.nih.gov/pubmed/34255684" } @Article{info:doi/10.2196/28361, author="Kim, Woong Ji and Ha, Juhyung and Kim, Taerim and Yoon, Hee and Hwang, Yeon Sung and Jo, Joon Ik and Shin, Gun Tae and Sim, Seob Min and Kim, Kyunga and Cha, Chul Won", title="Developing a Time-Adaptive Prediction Model for Out-of-Hospital Cardiac Arrest: Nationwide Cohort Study in Korea", journal="J Med Internet Res", year="2021", month="Jul", day="5", volume="23", number="7", pages="e28361", keywords="out-of-hospital cardiac arrest", keywords="Republic of Korea", keywords="machine learning", keywords="artificial intelligence", keywords="prognosis", keywords="cardiology", keywords="prediction model", abstract="Background: Out-of-hospital cardiac arrest (OHCA) is a serious public health issue, and predicting the prognosis of OHCA patients can assist clinicians in making decisions about the treatment of patients, use of hospital resources, or termination of resuscitation. Objective: This study aimed to develop a time-adaptive conditional prediction model (TACOM) to predict clinical outcomes every minute. Methods: We performed a retrospective observational study using data from the Korea OHCA Registry in South Korea. In this study, we excluded patients with trauma, those who experienced return of spontaneous circulation before arriving in the emergency department (ED), and those who did not receive cardiopulmonary resuscitation (CPR) in the ED. We selected patients who received CPR in the ED. To develop the time-adaptive prediction model, we organized the training data set as ongoing CPR patients by the minute. A total of 49,669 patients were divided into 39,602 subjects for training and 10,067 subjects for validation. We compared random forest, LightGBM, and artificial neural networks as the prediction model methods. Model performance was quantified using the prediction probability of the model, area under the receiver operating characteristic curve (AUROC), and area under the precision recall curve. Results: Among the three algorithms, LightGBM showed the best performance. From 0 to 30 min, the AUROC of the TACOM for predicting good neurological outcomes ranged from 0.910 (95\% CI 0.910-0.911) to 0.869 (95\% CI 0.865-0.871), whereas that for survival to hospital discharge ranged from 0.800 (95\% CI 0.797-0.800) to 0.734 (95\% CI 0.736-0.740). The prediction probability of the TACOM showed similar flow with cohort data based on a comparison with the conventional model's prediction probability. Conclusions: The TACOM predicted the clinical outcome of OHCA patients per minute. This model for predicting patient outcomes by the minute can assist clinicians in making rational decisions for OHCA patients. ", doi="10.2196/28361", url="https://www.jmir.org/2021/7/e28361/", url="http://www.ncbi.nlm.nih.gov/pubmed/36260382" } @Article{info:doi/10.2196/23863, author="Wu, Jo-Hsuan and Liu, Alvin T. Y. and Hsu, Wan-Ting and Ho, Hui-Chun Jennifer and Lee, Chien-Chang", title="Performance and Limitation of Machine Learning Algorithms for Diabetic Retinopathy Screening: Meta-analysis", journal="J Med Internet Res", year="2021", month="Jul", day="5", volume="23", number="7", pages="e23863", keywords="machine learning", keywords="diabetic retinopathy", keywords="diabetes", keywords="deep learning", keywords="neural network", keywords="diagnostic accuracy", abstract="Background: Diabetic retinopathy (DR), whose standard diagnosis is performed by human experts, has high prevalence and requires a more efficient screening method. Although machine learning (ML)--based automated DR diagnosis has gained attention due to recent approval of IDx-DR, performance of this tool has not been examined systematically, and the best ML technique for use in a real-world setting has not been discussed. Objective: The aim of this study was to systematically examine the overall diagnostic accuracy of ML in diagnosing DR of different categories based on color fundus photographs and to determine the state-of-the-art ML approach. Methods: Published studies in PubMed and EMBASE were searched from inception to June 2020. Studies were screened for relevant outcomes, publication types, and data sufficiency, and a total of 60 out of 2128 (2.82\%) studies were retrieved after study selection. Extraction of data was performed by 2 authors according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), and the quality assessment was performed according to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2). Meta-analysis of diagnostic accuracy was pooled using a bivariate random effects model. The main outcomes included diagnostic accuracy, sensitivity, and specificity of ML in diagnosing DR based on color fundus photographs, as well as the performances of different major types of ML algorithms. Results: The primary meta-analysis included 60 color fundus photograph studies (445,175 interpretations). Overall, ML demonstrated high accuracy in diagnosing DR of various categories, with a pooled area under the receiver operating characteristic (AUROC) ranging from 0.97 (95\% CI 0.96-0.99) to 0.99 (95\% CI 0.98-1.00). The performance of ML in detecting more-than-mild DR was robust (sensitivity 0.95; AUROC 0.97), and by subgroup analyses, we observed that robust performance of ML was not limited to benchmark data sets (sensitivity 0.92; AUROC 0.96) but could be generalized to images collected in clinical practice (sensitivity 0.97; AUROC 0.97). Neural network was the most widely used method, and the subgroup analysis revealed a pooled AUROC of 0.98 (95\% CI 0.96-0.99) for studies that used neural networks to diagnose more-than-mild DR. Conclusions: This meta-analysis demonstrated high diagnostic accuracy of ML algorithms in detecting DR on color fundus photographs, suggesting that state-of-the-art, ML-based DR screening algorithms are likely ready for clinical applications. However, a significant portion of the earlier published studies had methodology flaws, such as the lack of external validation and presence of spectrum bias. The results of these studies should be interpreted with caution. ", doi="10.2196/23863", url="https://www.jmir.org/2021/7/e23863", url="http://www.ncbi.nlm.nih.gov/pubmed/34407500" } @Article{info:doi/10.2196/29631, author="Keim-Malpass, Jessica and Ratcliffe, J. Sarah and Moorman, P. Liza and Clark, T. Matthew and Krahn, N. Katy and Monfredi, J. Oliver and Hamil, Susan and Yousefvand, Gholamreza and Moorman, Randall J. and Bourque, M. Jamieson", title="Predictive Monitoring--Impact in Acute Care Cardiology Trial (PM-IMPACCT): Protocol for a Randomized Controlled Trial", journal="JMIR Res Protoc", year="2021", month="Jul", day="2", volume="10", number="7", pages="e29631", keywords="predictive analytics monitoring", keywords="AI", keywords="randomized controlled trial", keywords="risk estimation", keywords="clinical deterioration", keywords="visual analytics", keywords="artificial intelligence", keywords="monitoring", keywords="risk", keywords="prediction", keywords="impact", keywords="cardiology", keywords="acute care", abstract="Background: Patients in acute care wards who deteriorate and are emergently transferred to intensive care units (ICUs) have poor outcomes. Early identification of patients who are decompensating might allow for earlier clinical intervention and reduced morbidity and mortality. Advances in bedside continuous predictive analytics monitoring (ie, artificial intelligence [AI]--based risk prediction) have made complex data easily available to health care providers and have provided early warning of potentially catastrophic clinical events. We present a dynamic, visual, predictive analytics monitoring tool that integrates real-time bedside telemetric physiologic data into robust clinical models to estimate and communicate risk of imminent events. This tool, Continuous Monitoring of Event Trajectories (CoMET), has been shown in retrospective observational studies to predict clinical decompensation on the acute care ward. There is a need to more definitively study this advanced predictive analytics or AI monitoring system in a prospective, randomized controlled, clinical trial. Objective: The goal of this trial is to determine the impact of an AI-based visual risk analytic, CoMET, on improving patient outcomes related to clinical deterioration, response time to proactive clinical action, and costs to the health care system. Methods: We propose a cluster randomized controlled trial to test the impact of using the CoMET display in an acute care cardiology and cardiothoracic surgery hospital floor. The number of admissions to a room undergoing cluster randomization was estimated to be 10,424 over the 20-month study period. Cluster randomization based on bed number will occur every 2 months. The intervention cluster will have the CoMET score displayed (along with standard of care), while the usual care group will receive standard of care only. Results: The primary outcome will be hours free from events of clinical deterioration. Hours of acute clinical events are defined as time when one or more of the following occur: emergent ICU transfer, emergent surgery prior to ICU transfer, cardiac arrest prior to ICU transfer, emergent intubation, or death. The clinical trial began randomization in January 2021. Conclusions: Very few AI-based health analytics have been translated from algorithm to real-world use. This study will use robust, prospective, randomized controlled, clinical trial methodology to assess the effectiveness of an advanced AI predictive analytics monitoring system in incorporating real-time telemetric data for identifying clinical deterioration on acute care wards. This analysis will strengthen the ability of health care organizations to evolve as learning health systems, in which bioinformatics data are applied to improve patient outcomes by incorporating AI into knowledge tools that are successfully integrated into clinical practice by health care providers. Trial Registration: ClinicalTrials.gov NCT04359641; https://clinicaltrials.gov/ct2/show/NCT04359641 International Registered Report Identifier (IRRID): DERR1-10.2196/29631 ", doi="10.2196/29631", url="https://www.researchprotocols.org/2021/7/e29631", url="http://www.ncbi.nlm.nih.gov/pubmed/34043525" } @Article{info:doi/10.2196/26946, author="Lee, Hung-Yi Andy and Aaronson, Emily and Hibbert, A. Kathryn and Flynn, H. Micah and Rutkey, Hayley and Mort, Elizabeth and Sonis, D. Jonathan and Safavi, C. Kyan", title="Design and Implementation of a Real-time Monitoring Platform for Optimal Sepsis Care in an Emergency Department: Observational Cohort Study", journal="J Med Internet Res", year="2021", month="Jun", day="24", volume="23", number="6", pages="e26946", keywords="electronic monitoring platform", keywords="sepsis", keywords="quality improvement", abstract="Background: Sepsis is the leading cause of death in US hospitals. Compliance with bundled care, specifically serial lactates, blood cultures, and antibiotics, improves outcomes but is often delayed or missed altogether in a busy practice environment. Objective: This study aims to design, implement, and validate a novel monitoring and alerting platform that provides real-time feedback to frontline emergency department (ED) providers regarding adherence to bundled care. Methods: This single-center, prospective, observational study was conducted in three phases: the design and technical development phase to build an initial version of the platform; the pilot phase to test and refine the platform in the clinical setting; and the postpilot rollout phase to fully implement the study intervention. Results: During the design and technical development, study team members and stakeholders identified the criteria for patient inclusion, selected bundle measures from the Center for Medicare and Medicaid Sepsis Core Measure for alerting, and defined alert thresholds, message content, delivery mechanisms, and recipients. Additional refinements were made based on 70 provider survey results during the pilot phase, including removing alerts for vasopressor initiation and modifying text in the pages to facilitate patient identification. During the 48 days of the postpilot rollout phase, 15,770 ED encounters were tracked and 711 patient encounters were included in the active monitoring cohort. In total, 634 pages were sent at a rate of 0.98 per attending physician shift. Overall, 38.3\% (272/711) patients had at least one page. The missing bundle elements that triggered alerts included: antibiotics 41.6\% (136/327), repeat lactate 32.4\% (106/327), blood cultures 20.8\% (68/327), and initial lactate 5.2\% (17/327). Of the missing Sepsis Core Measures elements for which a page was sent, 38.2\% (125/327) were successfully completed on time. Conclusions: A real-time sepsis care monitoring and alerting platform was created for the ED environment. The high proportion of patients with at least one alert suggested the significant potential for such a platform to improve care, whereas the overall number of alerts per clinician suggested a low risk of alarm fatigue. The study intervention warrants a more rigorous evaluation to ensure that the added alerts lead to better outcomes for patients with sepsis. ", doi="10.2196/26946", url="https://www.jmir.org/2021/6/e26946/", url="http://www.ncbi.nlm.nih.gov/pubmed/34185009" } @Article{info:doi/10.2196/25800, author="Plasek, Joseph and Weissert, John and Downs, Tracy and Richards, Kyle and Ravvaz, Kourosh", title="Clinicopathological Criteria Predictive of Recurrence Following Bacillus Calmette-Gu{\'e}rin Therapy Initiation in Non--Muscle-Invasive Bladder Cancer: Retrospective Cohort Study", journal="JMIR Cancer", year="2021", month="Jun", day="22", volume="7", number="2", pages="e25800", keywords="urinary bladder neoplasms", keywords="risk factor", keywords="bacillus Calmette-Gu{\'e}rin", keywords="recurrence", abstract="Background: Bacillus Calmette-Gu{\'e}rin (BCG) is currently the most clinically effective intravesical treatment for non--muscle-invasive bladder cancer (NMIBC), particularly for patients with high-risk NMIBC such as those with carcinoma in situ. BCG treatments could be optimized to improve patient safety and conserve supply by predicting BCG efficacy based on tumor characteristics or clinicopathological criteria. Objective: The aim of this study is to assess the ability of specific clinicopathological criteria to predict tumor recurrence in patients with NMIBC who received BCG therapy along various treatment timelines. Methods: A total of 1331 patients (stage Ta, T1, or carcinoma in situ) who underwent transurethral resection of a bladder tumor between 2006 and 2017 were included. Univariate analysis, including laboratory tests (eg, complete blood panels, creatinine levels, and hemoglobin A1c levels) within 180 days of BCG therapy initiation, medications, and clinical and demographic variables to assess their ability to predict NMIBC recurrence, was completed. This was followed by multivariate regression that included the elements of the Club Urol{\'o}gico Espa{\~n}ol de Tratamiento Oncol{\'o}gico (CUETO) scoring model and variables that were significant predictors of recurrence in univariate analysis. Results: BCG was administered to 183 patients classified as intermediate or high risk, and 76 (41.5\%) experienced disease recurrence. An abnormal neutrophil-to-lymphocyte ratio measured within 180 days of induction BCG therapy was a significant predictor (P=.047) of future cancer recurrence and was a stronger predictor than the CUETO score or the individual variables included in the CUETO scoring model through multivariate analysis. Conclusions: An abnormal neutrophil-to-lymphocyte ratio within 180 days of BCG therapy initiation is predictive of recurrence and could be suggestive of additional or alternative interventions. ", doi="10.2196/25800", url="https://cancer.jmir.org/2021/2/e25800", url="http://www.ncbi.nlm.nih.gov/pubmed/34156341" } @Article{info:doi/10.2196/25124, author="Ferreira-Santos, Daniela and Rodrigues, Pereira Pedro", title="Enhancing Obstructive Sleep Apnea Diagnosis With Screening Through Disease Phenotypes: Algorithm Development and Validation", journal="JMIR Med Inform", year="2021", month="Jun", day="22", volume="9", number="6", pages="e25124", keywords="obstructive sleep apnea", keywords="screening", keywords="risk factors", keywords="phenotypes", keywords="Bayesian network classifiers", abstract="Background: The American Academy of Sleep Medicine guidelines suggest that clinical prediction algorithms can be used in patients with obstructive sleep apnea (OSA) without replacing polysomnography, which is the gold standard. Objective: This study aims to develop a clinical decision support system for OSA diagnosis according to its standard definition (apnea-hypopnea index plus symptoms), identifying individuals with high pretest probability based on risk and diagnostic factors. Methods: A total of 47 predictive variables were extracted from a cohort of patients who underwent polysomnography. A total of 14 variables that were univariately significant were then used to compute the distance between patients with OSA, defining a hierarchical clustering structure from which patient phenotypes were derived and described. Affinity from individuals at risk of OSA phenotypes was later computed, and cluster membership was used as an additional predictor in a Bayesian network classifier (model B). Results: A total of 318 patients at risk were included, of whom 207 (65.1\%) individuals were diagnosed with OSA (111, 53.6\% with mild; 50, 24.2\% with moderate; and 46, 22.2\% with severe). On the basis of predictive variables, 3 phenotypes were defined (74/207, 35.7\% low; 104/207, 50.2\% medium; and 29/207, 14.1\% high), with an increasing prevalence of symptoms and comorbidities, the latter describing older and obese patients, and a substantial increase in some comorbidities, suggesting their beneficial use as combined predictors (median apnea-hypopnea indices of 10, 14, and 31, respectively). Cross-validation results demonstrated that the inclusion of OSA phenotypes as an adjusting predictor in a Bayesian classifier improved screening specificity (26\%, 95\% CI 24-29, to 38\%, 95\% CI 35-40) while maintaining a high sensitivity (93\%, 95\% CI 91-95), with model B doubling the diagnostic model effectiveness (diagnostic odds ratio of 8.14). Conclusions: Defined OSA phenotypes are a sensitive tool that enhances our understanding of the disease and allows the derivation of a predictive algorithm that can clearly outperform symptom-based guideline recommendations as a rule-out approach for screening. ", doi="10.2196/25124", url="https://medinform.jmir.org/2021/6/e25124", url="http://www.ncbi.nlm.nih.gov/pubmed/34156340" } @Article{info:doi/10.2196/26391, author="Nichol, A. Ariadne and Batten, N. Jason and Halley, C. Meghan and Axelrod, K. Julia and Sankar, L. Pamela and Cho, K. Mildred", title="A Typology of Existing Machine Learning--Based Predictive Analytic Tools Focused on Reducing Costs and Improving Quality in Health Care: Systematic Search and Content Analysis", journal="J Med Internet Res", year="2021", month="Jun", day="22", volume="23", number="6", pages="e26391", keywords="machine learning", keywords="artificial intelligence", keywords="ethics", keywords="regulation", keywords="health care quality", keywords="costs", abstract="Background: Considerable effort has been devoted to the development of artificial intelligence, including machine learning--based predictive analytics (MLPA) for use in health care settings. The growth of MLPA could be fueled by payment reforms that hold health care organizations responsible for providing high-quality, cost-effective care. Policy analysts, ethicists, and computer scientists have identified unique ethical and regulatory challenges from the use of MLPA in health care. However, little is known about the types of MLPA health care products available on the market today or their stated goals. Objective: This study aims to better characterize available MLPA health care products, identifying and characterizing claims about products recently or currently in use in US health care settings that are marketed as tools to improve health care efficiency by improving quality of care while reducing costs. Methods: We conducted systematic database searches of relevant business news and academic research to identify MLPA products for health care efficiency meeting our inclusion and exclusion criteria. We used content analysis to generate MLPA product categories and characterize the organizations marketing the products. Results: We identified 106 products and characterized them based on publicly available information in terms of the types of predictions made and the size, type, and clinical training of the leadership of the companies marketing them. We identified 5 categories of predictions made by MLPA products based on publicly available product marketing materials: disease onset and progression, treatment, cost and utilization, admissions and readmissions, and decompensation and adverse events. Conclusions: Our findings provide a foundational reference to inform the analysis of specific ethical and regulatory challenges arising from the use of MLPA to improve health care efficiency. ", doi="10.2196/26391", url="https://www.jmir.org/2021/6/e26391", url="http://www.ncbi.nlm.nih.gov/pubmed/34156338" } @Article{info:doi/10.2196/26139, author="Woodman, John Richard and Bryant, Kimberley and Sorich, J. Michael and Pilotto, Alberto and Mangoni, Aleksander Arduino", title="Use of Multiprognostic Index Domain Scores, Clinical Data, and Machine Learning to Improve 12-Month Mortality Risk Prediction in Older Hospitalized Patients: Prospective Cohort Study", journal="J Med Internet Res", year="2021", month="Jun", day="21", volume="23", number="6", pages="e26139", keywords="machine learning", keywords="Multidimensional Prognostic Index", keywords="mortality", keywords="diagnostic accuracy", keywords="XGBoost", abstract="Background: The Multidimensional Prognostic Index (MPI) is an aggregate, comprehensive, geriatric assessment scoring system derived from eight domains that predict adverse outcomes, including 12-month mortality. However, the prediction accuracy of using the three MPI categories (mild, moderate, and severe risk) was relatively poor in a study of older hospitalized Australian patients. Prediction modeling using the component domains of the MPI together with additional clinical features and machine learning (ML) algorithms might improve prediction accuracy. Objective: This study aims to assess whether the accuracy of prediction for 12-month mortality using logistic regression with maximum likelihood estimation (LR-MLE) with the 3-category MPI together with age and gender (feature set 1) can be improved with the addition of 10 clinical features (sodium, hemoglobin, albumin, creatinine, urea, urea-to-creatinine ratio, estimated glomerular filtration rate, C-reactive protein, BMI, and anticholinergic risk score; feature set 2) and the replacement of the 3-category MPI in feature sets 1 and 2 with the eight separate MPI domains (feature sets 3 and 4, respectively), and to assess the prediction accuracy of the ML algorithms using the same feature sets. Methods: MPI and clinical features were collected from patients aged 65 years and above who were admitted to either the general medical or acute care of the elderly wards of a South Australian hospital between September 2015 and February 2017. The diagnostic accuracy of LR-MLE was assessed together with nine ML algorithms: decision trees, random forests, extreme gradient boosting (XGBoost), support-vector machines, na{\"i}ve Bayes, K-nearest neighbors, ridge regression, logistic regression without regularization, and neural networks. A 70:30 training set:test set split of the data and a grid search of hyper-parameters with 10-fold cross-validation---was used during model training. The area under the curve was used as the primary measure of accuracy. Results: A total of 737 patients (female: 370/737, 50.2\%; male: 367/737, 49.8\%) with a median age of 80 (IQR 72-86) years had complete MPI data recorded on admission and had completed the 12-month follow-up. The area under the receiver operating curve for LR-MLE was 0.632, 0.688, 0.738, and 0.757 for feature sets 1 to 4, respectively. The best overall accuracy for the nine ML algorithms was obtained using the XGBoost algorithm (0.635, 0.706, 0.756, and 0.757 for feature sets 1 to 4, respectively). Conclusions: The use of MPI domains with LR-MLE considerably improved the prediction accuracy compared with that obtained using the traditional 3-category MPI. The XGBoost ML algorithm slightly improved accuracy compared with LR-MLE, and adding clinical data improved accuracy. These results build on previous work on the MPI and suggest that implementing risk scores based on MPI domains and clinical data by using ML prediction models can support clinical decision-making with respect to risk stratification for the follow-up care of older hospitalized patients. ", doi="10.2196/26139", url="https://www.jmir.org/2021/6/e26139", url="http://www.ncbi.nlm.nih.gov/pubmed/34152274" } @Article{info:doi/10.2196/26601, author="Enriquez, S. Jos{\'e} and Chu, Yan and Pudakalakatti, Shivanand and Hsieh, Lin Kang and Salmon, Duncan and Dutta, Prasanta and Millward, Zacharias Niki and Lurie, Eugene and Millward, Steven and McAllister, Florencia and Maitra, Anirban and Sen, Subrata and Killary, Ann and Zhang, Jian and Jiang, Xiaoqian and Bhattacharya, K. Pratip and Shams, Shayan", title="Hyperpolarized Magnetic Resonance and Artificial Intelligence: Frontiers of Imaging in Pancreatic Cancer", journal="JMIR Med Inform", year="2021", month="Jun", day="17", volume="9", number="6", pages="e26601", keywords="artificial intelligence", keywords="deep learning", keywords="hyperpolarization", keywords="metabolic imaging", keywords="MRI", keywords="13C", keywords="HP-MR", keywords="pancreatic ductal adenocarcinoma", keywords="pancreatic cancer", keywords="early detection", keywords="assessment of treatment response", keywords="probes", keywords="cancer", keywords="marker", keywords="imaging", keywords="treatment", keywords="review", keywords="detection", keywords="efficacy", abstract="Background: There is an unmet need for noninvasive imaging markers that can help identify the aggressive subtype(s) of pancreatic ductal adenocarcinoma (PDAC) at diagnosis and at an earlier time point, and evaluate the efficacy of therapy prior to tumor reduction. In the past few years, there have been two major developments with potential for a significant impact in establishing imaging biomarkers for PDAC and pancreatic cancer premalignancy: (1) hyperpolarized metabolic (HP)-magnetic resonance (MR), which increases the sensitivity of conventional MR by over 10,000-fold, enabling real-time metabolic measurements; and (2) applications of artificial intelligence (AI). Objective: Our objective of this review was to discuss these two exciting but independent developments (HP-MR and AI) in the realm of PDAC imaging and detection from the available literature to date. Methods: A systematic review following the PRISMA extension for Scoping Reviews (PRISMA-ScR) guidelines was performed. Studies addressing the utilization of HP-MR and/or AI for early detection, assessment of aggressiveness, and interrogating the early efficacy of therapy in patients with PDAC cited in recent clinical guidelines were extracted from the PubMed and Google Scholar databases. The studies were reviewed following predefined exclusion and inclusion criteria, and grouped based on the utilization of HP-MR and/or AI in PDAC diagnosis. Results: Part of the goal of this review was to highlight the knowledge gap of early detection in pancreatic cancer by any imaging modality, and to emphasize how AI and HP-MR can address this critical gap. We reviewed every paper published on HP-MR applications in PDAC, including six preclinical studies and one clinical trial. We also reviewed several HP-MR--related articles describing new probes with many functional applications in PDAC. On the AI side, we reviewed all existing papers that met our inclusion criteria on AI applications for evaluating computed tomography (CT) and MR images in PDAC. With the emergence of AI and its unique capability to learn across multimodal data, along with sensitive metabolic imaging using HP-MR, this knowledge gap in PDAC can be adequately addressed. CT is an accessible and widespread imaging modality worldwide as it is affordable; because of this reason alone, most of the data discussed are based on CT imaging datasets. Although there were relatively few MR-related papers included in this review, we believe that with rapid adoption of MR imaging and HP-MR, more clinical data on pancreatic cancer imaging will be available in the near future. Conclusions: Integration of AI, HP-MR, and multimodal imaging information in pancreatic cancer may lead to the development of real-time biomarkers of early detection, assessing aggressiveness, and interrogating early efficacy of therapy in PDAC. ", doi="10.2196/26601", url="https://medinform.jmir.org/2021/6/e26601", url="http://www.ncbi.nlm.nih.gov/pubmed/34137725" } @Article{info:doi/10.2196/29336, author="{\'C}irkovi{\'c}, Aleksandar", title="Author's Reply to: Periodic Manual Algorithm Updates and Generalizability: A Developer's Response. Comment on ``Evaluation of Four Artificial Intelligence--Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study''", journal="J Med Internet Res", year="2021", month="Jun", day="16", volume="23", number="6", pages="e29336", keywords="artificial intelligence", keywords="machine learning", keywords="mobile apps", keywords="medical diagnosis", keywords="mHealth", keywords="symptom assessment", doi="10.2196/29336", url="https://www.jmir.org/2021/6/e29336", url="http://www.ncbi.nlm.nih.gov/pubmed/34132643" } @Article{info:doi/10.2196/26514, author="Gilbert, Stephen and Fenech, Matthew and Idris, Anisa and T{\"u}rk, Ewelina", title="Periodic Manual Algorithm Updates and Generalizability: A Developer's Response. Comment on ``Evaluation of Four Artificial Intelligence--Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study''", journal="J Med Internet Res", year="2021", month="Jun", day="16", volume="23", number="6", pages="e26514", keywords="artificial intelligence", keywords="machine learning", keywords="mobile apps", keywords="medical diagnosis", keywords="mHealth", keywords="symptom assessment", doi="10.2196/26514", url="https://www.jmir.org/2021/6/e26514", url="http://www.ncbi.nlm.nih.gov/pubmed/34132641" } @Article{info:doi/10.2196/26892, author="Deng, Lizong and Chen, Luming and Yang, Tao and Liu, Mi and Li, Shicheng and Jiang, Taijiao", title="Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study", journal="J Med Internet Res", year="2021", month="Jun", day="15", volume="23", number="6", pages="e26892", keywords="knowledge graph", keywords="knowledge granularity", keywords="machine learning", keywords="high-fidelity phenotyping", keywords="phenotyping", keywords="phenotype", keywords="semantic", abstract="Background: Phenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, the construction of phenotype knowledge graphs for diseases is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs because they only consider the core concepts of phenotypes while neglecting the details (attributes) associated with these phenotypes. Objective: To characterize the details of disease phenotypes for clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (semantic structured unit of phenotypes). Methods: PhenoSSU is an ``entity-attribute-value'' model by its very nature, and it aims to capture the full semantic information underlying phenotype descriptions with a series of attributes and values. A total of 193 clinical guidelines for infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on the co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether PhenoSSU instances could capture the full semantics underlying the descriptions of the corresponding phenotypes. To automatically construct fine-grained phenotype knowledge graphs, a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with machine learning classifiers was developed. Results: Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. A total of 4020 PhenoSSU instances were annotated in these knowledge graphs, and 3757 of them (89.5\%) were found to be able to capture the full semantics underlying the descriptions of the corresponding phenotypes listed in clinical guidelines. By comparison, other information models, such as the clinical element model and the HL7 fast health care interoperability resource model, could only capture the full semantics underlying 48.4\% (2034/4020) and 21.8\% (914/4020) of the descriptions of phenotypes listed in clinical guidelines, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction. Conclusions: PhenoSSU is an effective information model for the precise representation of phenotype knowledge for clinical guidelines, and machine learning can be used to improve the efficiency of constructing PhenoSSU-based knowledge graphs. Our work will potentially shift the focus of medical knowledge engineering from a coarse-grained level to a more fine-grained level. ", doi="10.2196/26892", url="https://www.jmir.org/2021/6/e26892", url="http://www.ncbi.nlm.nih.gov/pubmed/34128811" } @Article{info:doi/10.2196/24642, author="Enayati, Moein and Sir, Mustafa and Zhang, Xingyu and Parker, J. Sarah and Duffy, Elizabeth and Singh, Hardeep and Mahajan, Prashant and Pasupathy, S. Kalyan", title="Monitoring Diagnostic Safety Risks in Emergency Departments: Protocol for a Machine Learning Study", journal="JMIR Res Protoc", year="2021", month="Jun", day="14", volume="10", number="6", pages="e24642", keywords="diagnostic error", keywords="emergency department", keywords="machine learning", keywords="electronic health records", keywords="electronic triggers", abstract="Background: Diagnostic decision making, especially in emergency departments, is a highly complex cognitive process that involves uncertainty and susceptibility to errors. A combination of factors, including patient factors (eg, history, behaviors, complexity, and comorbidity), provider-care team factors (eg, cognitive load and information gathering and synthesis), and system factors (eg, health information technology, crowding, shift-based work, and interruptions) may contribute to diagnostic errors. Using electronic triggers to identify records of patients with certain patterns of care, such as escalation of care, has been useful to screen for diagnostic errors. Once errors are identified, sophisticated data analytics and machine learning techniques can be applied to existing electronic health record (EHR) data sets to shed light on potential risk factors influencing diagnostic decision making. Objective: This study aims to identify variables associated with diagnostic errors in emergency departments using large-scale EHR data and machine learning techniques. Methods: This study plans to use trigger algorithms within EHR data repositories to generate a large data set of records that are labeled trigger-positive or trigger-negative, depending on whether they meet certain criteria. Samples from both data sets will be validated using medical record reviews, upon which we expect to find a higher number of diagnostic safety events in the trigger-positive subset. Machine learning will be used to evaluate relationships between certain patient factors, provider-care team factors, and system-level risk factors and diagnostic safety signals in the statistically matched groups of trigger-positive and trigger-negative charts. Results: This federally funded study was approved by the institutional review board of 2 academic medical centers with affiliated community hospitals. Trigger queries are being developed at both organizations, and sample cohorts will be labeled using the triggers. Machine learning techniques such as association rule mining, chi-square automated interaction detection, and classification and regression trees will be used to discover important variables that could be incorporated within future clinical decision support systems to help identify and reduce risks that contribute to diagnostic errors. Conclusions: The use of large EHR data sets and machine learning to investigate risk factors (related to the patient, provider-care team, and system-level) in the diagnostic process may help create future mechanisms for monitoring diagnostic safety. International Registered Report Identifier (IRRID): DERR1-10.2196/24642 ", doi="10.2196/24642", url="https://www.researchprotocols.org/2021/6/e24642", url="http://www.ncbi.nlm.nih.gov/pubmed/34125077" } @Article{info:doi/10.2196/28219, author="Jia, Qi and Zhang, Dezheng and Xu, Haifeng and Xie, Yonghong", title="Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision", journal="JMIR Med Inform", year="2021", month="Jun", day="14", volume="9", number="6", pages="e28219", keywords="traditional Chinese medicine", keywords="named entity recognition", keywords="span level", keywords="distantly supervised", abstract="Background: Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. Therefore, a method to automatically extract medical entities from TCM clinical records is indispensable. Objective: Training a medical entity extracting model needs a large number of annotated corpus. The cost of annotated corpus is very high and there is a lack of gold-standard data sets for supervised learning methods. Therefore, we utilized distantly supervised named entity recognition (NER) to respond to the challenge. Methods: We propose a span-level distantly supervised NER approach to extract TCM medical entity. It utilizes the pretrained language model and a simple multilayer neural network as classifier to detect and classify entity. We also designed a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filters the possible false-negative samples periodically. It reduces the bad influence from the false-negative samples. Results: We compare our methods with other baseline methods to illustrate the effectiveness of our method on a gold-standard data set. The F1 score of our method is 77.34 and it remarkably outperforms the other baselines. Conclusions: We developed a distantly supervised NER approach to extract medical entity from TCM clinical records. We estimated our approach on a TCM clinical record data set. Our experimental results indicate that the proposed approach achieves a better performance than other baselines. ", doi="10.2196/28219", url="https://medinform.jmir.org/2021/6/e28219", url="http://www.ncbi.nlm.nih.gov/pubmed/34125076" } @Article{info:doi/10.2196/25741, author="Madar, Ronni and Ugon, Adrien and Ivankovi{\'c}, Damir and Tsopra, Rosy", title="A Web Interface for Antibiotic Prescription Recommendations in Primary Care: User-Centered Design Approach", journal="J Med Internet Res", year="2021", month="Jun", day="11", volume="23", number="6", pages="e25741", keywords="clinical decision support system", keywords="visualization", keywords="usability", keywords="clinical practice guidelines", keywords="antibiotic", keywords="primary care", abstract="Background: Antibiotic misuse is a serious public health problem worldwide. National health authorities release clinical practice guidelines (CPGs) to guide general practitioners (GPs) in their choice of antibiotics. However, despite the large-scale dissemination of CPGs, GPs continue to prescribe antibiotics that are not recommended as first-line treatments. This nonadherence to recommendations may be due to GPs misunderstanding the CPGs. A web interface displaying antibiotic prescription recommendations and their justifications could help to improve the comprehensibility and readability of CPGs, thereby increasing the adoption of recommendations regarding antibiotic treatment. Objective: This study aims to design and evaluate a web interface for antibiotic prescription displaying both the recommended antibiotics and their justifications in the form of antibiotic properties. Methods: A web interface was designed according to the same principles as e-commerce interfaces and was assessed by 117 GPs. These GPs were asked to answer 17 questions relating to the usefulness, user-friendliness, and comprehensibility and readability of the interface, and their satisfaction with it. Responses were recorded on a 4-point Likert scale (ranging from ``absolutely disagree'' to ``absolutely agree''). At the end of the evaluation, the GPs were allowed to provide optional, additional free comments. Results: The antibiotic prescription web interface consists of three main sections: a clinical summary section, a filter section, and a recommended antibiotics section. The majority of GPs appreciated the clinical summary (90/117, 76.9\%) and filter (98/117, 83.8\%) sections, whereas 48.7\% (57/117) of them reported difficulty reading some of the icons in the recommended antibiotics section. Overall, 82.9\% (97/117) of GPs found the display of drug properties useful, and 65.8\% (77/117) reported that the web interface improved their understanding of CPG recommendations. Conclusions: The web interface displaying antibiotic recommendations and their properties can help doctors understand the rationale underlying CPG recommendations regarding antibiotic treatment, but further improvements are required before its implementation into a clinical decision support system. ", doi="10.2196/25741", url="https://www.jmir.org/2021/6/e25741", url="http://www.ncbi.nlm.nih.gov/pubmed/34114958" } @Article{info:doi/10.2196/25560, author="Surodina, Svitlana and Lam, Ching and Grbich, Svetislav and Milne-Ives, Madison and van Velthoven, Michelle and Meinert, Edward", title="Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study", journal="JMIRx Med", year="2021", month="Jun", day="11", volume="2", number="2", pages="e25560", keywords="data collection", keywords="herpes simplex virus", keywords="registries", keywords="machine learning", keywords="risk assessment", keywords="artificial intelligence", keywords="medical information system", keywords="user-centered design", keywords="predictor", keywords="risk", abstract="Background: Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. Objective: This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk. Methods: The US National Health and Nutrition Examination Survey (NHANES, 2015-2016) database includes the confirmed HSV type 1 and type 2 (HSV-1 and HSV-2, respectively) status of American participants (14-49 years) and a wealth of demographic and health-related data. The questionnaires and data sets from this survey were used to form two data sets: one for HSV-1 and one for HSV-2. These data sets were used to train and test a model that used a random forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV. Results: The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for the HSV-1 and HSV-2 data sets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model, therefore, provided high predictability of risk of infection with minimal required input. Conclusions: This machine learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals' levels of risk of HSV infection. A limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymization options, explicit permissions, and a standardized data schema that meet the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and third-party interface connectivity requirements. ", doi="10.2196/25560", url="https://xmed.jmir.org/2021/2/e25560", url="http://www.ncbi.nlm.nih.gov/pubmed/37725536" } @Article{info:doi/10.2196/25083, author="Hoffman, Aubri and Crocker, Laura and Mathur, Aakrati and Holman, Deborah and Weston, June and Campbell, Sukhkamal and Housten, Ashley and Bradford, Andrea and Agrawala, Shilpi and Woodard, L. Terri", title="Patients' and Providers' Needs and Preferences When Considering Fertility Preservation Before Cancer Treatment: Decision-Making Needs Assessment", journal="JMIR Form Res", year="2021", month="Jun", day="7", volume="5", number="6", pages="e25083", keywords="cancer", keywords="decision support techniques", keywords="fertility preservation", keywords="oncofertility", keywords="oncology", keywords="needs assessment", keywords="patient decision aids", keywords="patient needs", keywords="shared decision making", abstract="Background: As cancer treatments continue to improve, it is increasingly important that women of reproductive age have an opportunity to decide whether they want to undergo fertility preservation treatments to try to protect their ability to have a child after cancer. Clinical practice guidelines recommend that providers offer fertility counseling to all young women with cancer; however, as few as 12\% of women recall discussing fertility preservation. The long-term goal of this program is to develop an interactive web-based patient decision aid to improve awareness, access, knowledge, and decision making for all young women with cancer. The International Patient Decision Aid Standards collaboration recommends a formal decision-making needs assessment to inform and guide the design of understandable, meaningful, and usable patient decision aid interventions. Objective: This study aims to assess providers' and survivors' fertility preservation decision-making experiences, unmet needs, and initial design preferences to inform the development of a web-based patient decision aid. Methods: Semistructured interviews and an ad hoc focus group assessed current decision-making experiences, unmet needs, and recommendations for a patient decision aid. Two researchers coded and analyzed the transcripts using NVivo (QSR International). A stakeholder advisory panel guided the study and interpretation of results. Results: A total of 51 participants participated in 46 interviews (18 providers and 28 survivors) and 1 ad hoc focus group (7 survivors). The primary themes included the importance of fertility decisions for survivorship, the existence of significant but potentially modifiable barriers to optimal decision making, and a strong support for developing a carefully designed patient decision aid website. Providers reported needing an intervention that could quickly raise awareness and facilitate timely referrals. Survivors reported needing understandable information and help with managing uncertainty, costs, and pressures. Design recommendations included providing tailored information (eg, by age and cancer type), optional interactive features, and multimedia delivery at multiple time points, preferably outside the consultation. Conclusions: Decision making about fertility preservation is an important step in providing high-quality comprehensive cancer care and a priority for many survivors' optimal quality of life. Decision support interventions are needed to address gaps in care and help women quickly navigate toward an informed, values-congruent decision. Survivors and providers support developing a patient decision aid website to make information directly available to women outside of the consultation and to provide self-tailored content according to women's clinical characteristics and their information-seeking and deliberative styles. ", doi="10.2196/25083", url="https://formative.jmir.org/2021/6/e25083", url="http://www.ncbi.nlm.nih.gov/pubmed/34096871" } @Article{info:doi/10.2196/20407, author="Walter Costa, Beatriz Maria and Wernsdorfer, Mark and Kehrer, Alexander and Voigt, Markus and Cundius, Carina and Federbusch, Martin and Eckelt, Felix and Remmler, Johannes and Schmidt, Maria and Pehnke, Sarah and G{\"a}rtner, Christiane and Wehner, Markus and Isermann, Berend and Richter, Heike and Telle, J{\"o}rg and Kaiser, Thorsten", title="The Clinical Decision Support System AMPEL for Laboratory Diagnostics: Implementation and Technical Evaluation", journal="JMIR Med Inform", year="2021", month="Jun", day="3", volume="9", number="6", pages="e20407", keywords="clinical decision support system (CDSS)", keywords="laboratory medicine", keywords="digital health", keywords="reactive software agent", keywords="computational architecture", abstract="Background: Laboratory results are of central importance for clinical decision making. The time span between availability and review of results by clinicians is crucial to patient care. Clinical decision support systems (CDSS) are computational tools that can identify critical values automatically and help decrease treatment delay. Objective: With this work, we aimed to implement and evaluate a CDSS that supports health care professionals and improves patient safety. In addition to our experiences, we also describe its main components in a general manner to make it applicable to a wide range of medical institutions and to empower colleagues to implement a similar system in their facilities. Methods: Technical requirements must be taken into account before implementing a CDSS that performs laboratory diagnostics (labCDSS). These can be planned within the functional components of a reactive software agent, a computational framework for such a CDSS. Results: We present AMPEL (Analysis and Reporting System for the Improvement of Patient Safety through Real-Time Integration of Laboratory Findings), a labCDSS that notifies health care professionals if a life-threatening medical condition is detected. We developed and implemented AMPEL at a university hospital and regional hospitals in Germany (University of Leipzig Medical Center and the Muldental Clinics in Grimma and Wurzen). It currently runs 5 different algorithms in parallel: hypokalemia, hypercalcemia, hyponatremia, hyperlactatemia, and acute kidney injury. Conclusions: AMPEL enables continuous surveillance of patients. The system is constantly being evaluated and extended and has the capacity for many more algorithms. We hope to encourage colleagues from other institutions to design and implement similar CDSS using the theory, specifications, and experiences described in this work. ", doi="10.2196/20407", url="https://medinform.jmir.org/2021/6/e20407", url="http://www.ncbi.nlm.nih.gov/pubmed/34081013" } @Article{info:doi/10.2196/25929, author="Ji, Mengting and Genchev, Z. Georgi and Huang, Hengye and Xu, Ting and Lu, Hui and Yu, Guangjun", title="Evaluation Framework for Successful Artificial Intelligence--Enabled Clinical Decision Support Systems: Mixed Methods Study", journal="J Med Internet Res", year="2021", month="Jun", day="2", volume="23", number="6", pages="e25929", keywords="artificial intelligence", keywords="AI", keywords="clinical decision support systems", keywords="evaluation framework", abstract="Background: Clinical decision support systems are designed to utilize medical data, knowledge, and analysis engines and to generate patient-specific assessments or recommendations to health professionals in order to assist decision making. Artificial intelligence--enabled clinical decision support systems aid the decision-making process through an intelligent component. Well-defined evaluation methods are essential to ensure the seamless integration and contribution of these systems to clinical practice. Objective: The purpose of this study was to develop and validate a measurement instrument and test the interrelationships of evaluation variables for an artificial intelligence--enabled clinical decision support system evaluation framework. Methods: An artificial intelligence--enabled clinical decision support system evaluation framework consisting of 6 variables was developed. A Delphi process was conducted to develop the measurement instrument items. Cognitive interviews and pretesting were performed to refine the questions. Web-based survey response data were analyzed to remove irrelevant questions from the measurement instrument, to test dimensional structure, and to assess reliability and validity. The interrelationships of relevant variables were tested and verified using path analysis, and a 28-item measurement instrument was developed. Measurement instrument survey responses were collected from 156 respondents. Results: The Cronbach $\alpha$ of the measurement instrument was 0.963, and its content validity was 0.943. Values of average variance extracted ranged from 0.582 to 0.756, and values of the heterotrait-monotrait ratio ranged from 0.376 to 0.896. The final model had a good fit ($\chi$262=36.984; P=.08; comparative fit index 0.991; goodness-of-fit index 0.957; root mean square error of approximation 0.052; standardized root mean square residual 0.028). Variables in the final model accounted for 89\% of the variance in the user acceptance dimension. Conclusions: User acceptance is the central dimension of artificial intelligence--enabled clinical decision support system success. Acceptance was directly influenced by perceived ease of use, information quality, service quality, and perceived benefit. Acceptance was also indirectly influenced by system quality and information quality through perceived ease of use. User acceptance and perceived benefit were interrelated. ", doi="10.2196/25929", url="https://www.jmir.org/2021/6/e25929", url="http://www.ncbi.nlm.nih.gov/pubmed/34076581" } @Article{info:doi/10.2196/28868, author="Kang, Yu-Chuan Eugene and Yeung, Ling and Lee, Yi-Lun and Wu, Cheng-Hsiu and Peng, Shu-Yen and Chen, Yueh-Peng and Gao, Quan-Ze and Lin, Chihung and Kuo, Chang-Fu and Lai, Chi-Chun", title="A Multimodal Imaging--Based Deep Learning Model for Detecting Treatment-Requiring Retinal Vascular Diseases: Model Development and Validation Study", journal="JMIR Med Inform", year="2021", month="May", day="31", volume="9", number="5", pages="e28868", keywords="deep learning", keywords="retinal vascular diseases", keywords="multimodal imaging", keywords="treatment requirement", keywords="machine learning", keywords="eye", keywords="retinal", keywords="imaging", keywords="treatment", keywords="model", keywords="detection", keywords="vascular", abstract="Background: Retinal vascular diseases, including diabetic macular edema (DME), neovascular age-related macular degeneration (nAMD), myopic choroidal neovascularization (mCNV), and branch and central retinal vein occlusion (BRVO/CRVO), are considered vision-threatening eye diseases. However, accurate diagnosis depends on multimodal imaging and the expertise of retinal ophthalmologists. Objective: The aim of this study was to develop a deep learning model to detect treatment-requiring retinal vascular diseases using multimodal imaging. Methods: This retrospective study enrolled participants with multimodal ophthalmic imaging data from 3 hospitals in Taiwan from 2013 to 2019. Eye-related images were used, including those obtained through retinal fundus photography, optical coherence tomography (OCT), and fluorescein angiography with or without indocyanine green angiography (FA/ICGA). A deep learning model was constructed for detecting DME, nAMD, mCNV, BRVO, and CRVO and identifying treatment-requiring diseases. Model performance was evaluated and is presented as the area under the curve (AUC) for each receiver operating characteristic curve. Results: A total of 2992 eyes of 2185 patients were studied, with 239, 1209, 1008, 211, 189, and 136 eyes in the control, DME, nAMD, mCNV, BRVO, and CRVO groups, respectively. Among them, 1898 eyes required treatment. The eyes were divided into training, validation, and testing groups in a 5:1:1 ratio. In total, 5117 retinal fundus photos, 9316 OCT images, and 20,922 FA/ICGA images were used. The AUCs for detecting mCNV, DME, nAMD, BRVO, and CRVO were 0.996, 0.995, 0.990, 0.959, and 0.988, respectively. The AUC for detecting treatment-requiring diseases was 0.969. From the heat maps, we observed that the model could identify retinal vascular diseases. Conclusions: Our study developed a deep learning model to detect retinal diseases using multimodal ophthalmic imaging. Furthermore, the model demonstrated good performance in detecting treatment-requiring retinal diseases. ", doi="10.2196/28868", url="https://medinform.jmir.org/2021/5/e28868", url="http://www.ncbi.nlm.nih.gov/pubmed/34057419" } @Article{info:doi/10.2196/29058, author="Halasz, Geza and Sperti, Michela and Villani, Matteo and Michelucci, Umberto and Agostoni, Piergiuseppe and Biagi, Andrea and Rossi, Luca and Botti, Andrea and Mari, Chiara and Maccarini, Marco and Pura, Filippo and Roveda, Loris and Nardecchia, Alessia and Mottola, Emanuele and Nolli, Massimo and Salvioni, Elisabetta and Mapelli, Massimo and Deriu, Agostino Marco and Piga, Dario and Piepoli, Massimo", title="A Machine Learning Approach for Mortality Prediction in COVID-19 Pneumonia: Development and Evaluation of the Piacenza Score", journal="J Med Internet Res", year="2021", month="May", day="31", volume="23", number="5", pages="e29058", keywords="artificial intelligence", keywords="prognostic score", keywords="COVID-19", keywords="pneumonia", keywords="mortality", keywords="prediction", keywords="machine learning", keywords="modeling", abstract="Background: Several models have been developed to predict mortality in patients with COVID-19 pneumonia, but only a few have demonstrated enough discriminatory capacity. Machine learning algorithms represent a novel approach for the data-driven prediction of clinical outcomes with advantages over statistical modeling. Objective: We aimed to develop a machine learning--based score---the Piacenza score---for 30-day mortality prediction in patients with COVID-19 pneumonia. Methods: The study comprised 852 patients with COVID-19 pneumonia, admitted to the Guglielmo da Saliceto Hospital in Italy from February to November 2020. Patients' medical history, demographics, and clinical data were collected using an electronic health record. The overall patient data set was randomly split into derivation and test cohorts. The score was obtained through the na{\"i}ve Bayes classifier and externally validated on 86 patients admitted to Centro Cardiologico Monzino (Italy) in February 2020. Using a forward-search algorithm, 6 features were identified: age, mean corpuscular hemoglobin concentration, PaO2/FiO2 ratio, temperature, previous stroke, and gender. The Brier index was used to evaluate the ability of the machine learning model to stratify and predict the observed outcomes. A user-friendly website was designed and developed to enable fast and easy use of the tool by physicians. Regarding the customization properties of the Piacenza score, we added a tailored version of the algorithm to the website, which enables an optimized computation of the mortality risk score for a patient when some of the variables used by the Piacenza score are not available. In this case, the na{\"i}ve Bayes classifier is retrained over the same derivation cohort but using a different set of patient characteristics. We also compared the Piacenza score with the 4C score and with a na{\"i}ve Bayes algorithm with 14 features chosen a priori. Results: The Piacenza score exhibited an area under the receiver operating characteristic curve (AUC) of 0.78 (95\% CI 0.74-0.84, Brier score=0.19) in the internal validation cohort and 0.79 (95\% CI 0.68-0.89, Brier score=0.16) in the external validation cohort, showing a comparable accuracy with respect to the 4C score and to the na{\"i}ve Bayes model with a priori chosen features; this achieved an AUC of 0.78 (95\% CI 0.73-0.83, Brier score=0.26) and 0.80 (95\% CI 0.75-0.86, Brier score=0.17), respectively. Conclusions: Our findings demonstrated that a customizable machine learning--based score with a purely data-driven selection of features is feasible and effective for the prediction of mortality among patients with COVID-19 pneumonia. ", doi="10.2196/29058", url="https://www.jmir.org/2021/5/e29058", url="http://www.ncbi.nlm.nih.gov/pubmed/33999838" } @Article{info:doi/10.2196/25988, author="Benito-Le{\'o}n, Juli{\'a}n and del Castillo, Dolores M. {\textordfeminine} and Estirado, Alberto and Ghosh, Ritwik and Dubey, Souvik and Serrano, Ignacio J.", title="Using Unsupervised Machine Learning to Identify Age- and Sex-Independent Severity Subgroups Among Patients with COVID-19: Observational Longitudinal Study", journal="J Med Internet Res", year="2021", month="May", day="27", volume="23", number="5", pages="e25988", keywords="COVID-19", keywords="machine learning", keywords="outcome", keywords="severity", keywords="subgroup", keywords="emergency", keywords="detection", keywords="intervention", keywords="testing", keywords="data set", keywords="characterization", abstract="Background: Early detection and intervention are the key factors for improving outcomes in patients with COVID-19. Objective: The objective of this observational longitudinal study was to identify nonoverlapping severity subgroups (ie, clusters) among patients with COVID-19, based exclusively on clinical data and standard laboratory tests obtained during patient assessment in the emergency department. Methods: We applied unsupervised machine learning to a data set of 853 patients with COVID-19 from the HM group of hospitals (HM Hospitales) in Madrid, Spain. Age and sex were not considered while building the clusters, as these variables could introduce biases in machine learning algorithms and raise ethical implications or enable discrimination in triage protocols. Results: From 850 clinical and laboratory variables, four tests---the serum levels of aspartate transaminase (AST), lactate dehydrogenase (LDH), C-reactive protein (CRP), and the number of neutrophils---were enough to segregate the entire patient pool into three separate clusters. Further, the percentage of monocytes and lymphocytes and the levels of alanine transaminase (ALT) distinguished cluster 3 patients from the other two clusters. The highest proportion of deceased patients; the highest levels of AST, ALT, LDH, and CRP; the highest number of neutrophils; and the lowest percentages of monocytes and lymphocytes characterized cluster 1. Cluster 2 included a lower proportion of deceased patients and intermediate levels of the previous laboratory tests. The lowest proportion of deceased patients; the lowest levels of AST, ALT, LDH, and CRP; the lowest number of neutrophils; and the highest percentages of monocytes and lymphocytes characterized cluster 3. Conclusions: A few standard laboratory tests, deemed available in all emergency departments, have shown good discriminative power for the characterization of severity subgroups among patients with COVID-19. ", doi="10.2196/25988", url="https://www.jmir.org/2021/5/e25988", url="http://www.ncbi.nlm.nih.gov/pubmed/33872186" } @Article{info:doi/10.2196/27778, author="Luo, Gang", title="A Roadmap for Automating Lineage Tracing to Aid Automatically Explaining Machine Learning Predictions for Clinical Decision Support", journal="JMIR Med Inform", year="2021", month="May", day="27", volume="9", number="5", pages="e27778", keywords="clinical decision support", keywords="database management systems", keywords="forecasting", keywords="machine learning", keywords="electronic medical records", doi="10.2196/27778", url="https://medinform.jmir.org/2021/5/e27778", url="http://www.ncbi.nlm.nih.gov/pubmed/34042600" } @Article{info:doi/10.2196/23495, author="Nizami, Shermeen and McGregor AM, Carolyn and Green, Robert James", title="Integrating Physiological Data Artifacts Detection With Clinical Decision Support Systems: Observational Study", journal="JMIR Biomed Eng", year="2021", month="May", day="27", volume="6", number="2", pages="e23495", keywords="patient monitoring", keywords="clinical decision support", keywords="systems architecture", keywords="biomedical data analytics", keywords="alarm fatigue", keywords="physiological data artifacts", abstract="Background: Clinical decision support systems (CDSS) have the potential to lower the patient mortality and morbidity rates. However, signal artifacts present in physiological data affect the reliability and accuracy of the CDSS. Moreover, patient monitors and other medical devices generate false alarms while processing physiological data, further leading to alarm fatigue because of increased noise levels, staff disruption, and staff desensitization in busy critical care environments. This adversely affects the quality of care at the patient bedside. Hence, artifact detection (AD) algorithms play a crucial role in assessing the quality of physiological data and mitigating the impact of these artifacts. Objective: The aim of this study is to evaluate a novel AD framework for integrating AD algorithms with CDSS. We designed the framework with features that support real-time implementation within critical care. In this study, we evaluated the framework and its features in a false alarm reduction study. We developed static framework component models, followed by dynamic framework compositions to formulate four CDSS. We evaluated these formulations using neonatal patient data and validated the six framework features: flexibility, reusability, signal quality indicator standardization, scalability, customizability, and real-time implementation support. Methods: We developed four exemplar static AD components with standardized requirements and provisions interfaces that facilitate the interoperability of framework components. These AD components were mixed and matched into four different AD compositions to mitigate the artifacts' effects. We developed a novel static clinical event detection component that is integrated with each AD composition to formulate and evaluate a dynamic CDSS for peripheral oxygen saturation (SpO2) alarm generation. This study collected data from 11 patients with diverse pathologies in the neonatal intensive care unit. Collected data streams and corresponding alarms include pulse rate and SpO2 measured from a pulse oximeter (Masimo SET SmartPod) integrated with an Infinity Delta monitor and the heart rate derived from electrocardiography leads attached to a second Infinity Delta monitor. Results: A total of 119 SpO2 alarms were evaluated. The lowest achievable SpO2 false alarm rate was 39\%, with a sensitivity of 80\%. This demonstrates the framework's utility in identifying the best possible dynamic composition to serve the clinical need for false SpO2 alarm reduction and subsequent alarm fatigue, given the limitations of a small sample size. Conclusions: The framework features, including reusability, signal quality indicator standardization, scalability, and customizability, allow the evaluation and comparison of novel CDSS formulations. The optimal solution for a CDSS can then be hard-coded and integrated within clinical workflows for real-time implementation. The flexibility to serve different clinical needs and standardized component interoperability of the framework supports the potential for a real-time clinical implementation of AD. ", doi="10.2196/23495", url="https://biomedeng.jmir.org/2021/2/e23495" } @Article{info:doi/10.2196/29405, author="Izquierdo, Luis Jose and Soriano, B. Joan", title="Authors' Reply to: Minimizing Selection and Classification Biases Comment on ``Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing''", journal="J Med Internet Res", year="2021", month="May", day="26", volume="23", number="5", pages="e29405", keywords="artificial intelligence", keywords="big data", keywords="COVID-19", keywords="electronic health records", keywords="tachypnea", keywords="SARS-CoV-2", keywords="predictive model", keywords="prognosis", keywords="classification bias", keywords="critical care", doi="10.2196/29405", url="https://www.jmir.org/2021/5/e29405", url="http://www.ncbi.nlm.nih.gov/pubmed/33989164" } @Article{info:doi/10.2196/27142, author="Martos P{\'e}rez, Francisco and Gomez Huelgas, Ricardo and Mart{\'i}n Escalante, Dolores Mar{\'i}a and Casas Rojo, Manuel Jos{\'e}", title="Minimizing Selection and Classification Biases. Comment on ``Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing''", journal="J Med Internet Res", year="2021", month="May", day="26", volume="23", number="5", pages="e27142", keywords="artificial intelligence", keywords="big data", keywords="COVID-19", keywords="electronic health records", keywords="tachypnea", keywords="SARS-CoV-2", keywords="predictive model", keywords="prognosis", keywords="classification bias", keywords="critical care", doi="10.2196/27142", url="https://www.jmir.org/2021/5/e27142", url="http://www.ncbi.nlm.nih.gov/pubmed/33989163" } @Article{info:doi/10.2196/14851, author="Rose, Christian and Nichols, Taylor and Hackner, Daniel and Chang, Julia and Straube, Steven and Jooste, Willem and Sawe, Hendry and Tenner, Andrea", title="Utilizing Lean Software Methods To Improve Acceptance of Global eHealth Initiatives: Results From the Implementation of the Basic Emergency Care App", journal="JMIR Form Res", year="2021", month="May", day="26", volume="5", number="5", pages="e14851", keywords="lean", keywords="eHealth", keywords="emergency", keywords="global health", keywords="app development", keywords="decision support", keywords="primary survey", keywords="mHealth", keywords="Africa", keywords="Tanzania", keywords="low- and middle income countries", keywords="LMIC", abstract="Background: Health systems in low- and middle-income countries face considerable challenges in providing high-quality accessible care. eHealth has had mounting interest as a possible solution given the unprecedented growth in mobile phone and internet technologies in these locations; however, few apps or software programs have, as of yet, gone beyond the testing phase, most downloads are never opened, and consistent use is extremely rare. This is believed to be due to a failure to engage and meet local stakeholder needs and the high costs of software development. Objective: World Health Organization Basic Emergency Care course participants requested a mobile point-of-care adjunct to the primary course material. Our team undertook the task of developing this solution through a community-based participatory model in an effort to meet trainees' reported needs and avoid some of the abovementioned failings. We aimed to use the well-described Lean software development strategy---given our familiarity with its elements and its ubiquitous use in medicine, global health, and software development---to complete this task efficiently and with maximal stakeholder involvement. Methods: From September 2016 through January 2017, the Basic Emergency Care app was designed and developed at the University of California San Francisco. When a prototype was complete, it was piloted in Cape Town, South Africa and Dar es Salaam, Tanzania---World Health Organization Basic Emergency Care partner sites. Feedback from this pilot shaped continuous amendments to the app before subsequent user testing and study of the effect of use of the app on trainee retention of Basic Emergency Care course material. Results: Our user-centered mobile app was developed with an iterative participatory approach with its first version available within 6 months and with high acceptance---95\% of Basic Emergency Care Course participants felt that it was useful. Our solution had minimal direct costs and resulted in a robust infrastructure for subsequent assessment and maintenance and allows for efficient feedback and expansion. Conclusions: We believe that utilizing Lean software development strategies may help global health advocates and researchers build eHealth solutions with a process that is familiar and with buy-in across stakeholders that is responsive, rapid to deploy, and sustainable. ", doi="10.2196/14851", url="https://formative.jmir.org/2021/5/e14851", url="http://www.ncbi.nlm.nih.gov/pubmed/33882013" } @Article{info:doi/10.2196/23586, author="Zong, Nansu and Ngo, Victoria and Stone, J. Daniel and Wen, Andrew and Zhao, Yiqing and Yu, Yue and Liu, Sijia and Huang, Ming and Wang, Chen and Jiang, Guoqian", title="Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study", journal="JMIR Med Inform", year="2021", month="May", day="25", volume="9", number="5", pages="e23586", keywords="genetic reports", keywords="electronic health records", keywords="predicting primary cancers", keywords="Fast Healthcare Interoperability Resources", keywords="FHIR", keywords="Resource Description Framework", keywords="RDF", abstract="Background: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. Objective: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. Methods: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic's electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. Results: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56\% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77\% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. Conclusions: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer. ", doi="10.2196/23586", url="https://medinform.jmir.org/2021/5/e23586", url="http://www.ncbi.nlm.nih.gov/pubmed/34032581" } @Article{info:doi/10.2196/29072, author="Espinosa-Gonzalez, Belen Ana and Neves, Luisa Ana and Fiorentino, Francesca and Prociuk, Denys and Husain, Laiba and Ramtale, Christian Sonny and Mi, Emma and Mi, Ella and Macartney, Jack and Anand, N. Sneha and Sherlock, Julian and Saravanakumar, Kavitha and Mayer, Erik and de Lusignan, Simon and Greenhalgh, Trisha and Delaney, C. Brendan", title="Predicting Risk of Hospital Admission in Patients With Suspected COVID-19 in a Community Setting: Protocol for Development and Validation of a Multivariate Risk Prediction Tool", journal="JMIR Res Protoc", year="2021", month="May", day="25", volume="10", number="5", pages="e29072", keywords="COVID-19 severity", keywords="risk prediction tool", keywords="early warning score", keywords="hospital admission", keywords="primary care", keywords="electronic health records", abstract="Background: During the pandemic, remote consultations have become the norm for assessing patients with signs and symptoms of COVID-19 to decrease the risk of transmission. This has intensified the clinical uncertainty already experienced by primary care clinicians when assessing patients with suspected COVID-19 and has prompted the use of risk prediction scores, such as the National Early Warning Score (NEWS2), to assess severity and guide treatment. However, the risk prediction tools available have not been validated in a community setting and are not designed to capture the idiosyncrasies of COVID-19 infection. Objective: The objective of this study is to produce a multivariate risk prediction tool, RECAP-V1 (Remote COVID-19 Assessment in Primary Care), to support primary care clinicians in the identification of those patients with COVID-19 that are at higher risk of deterioration and facilitate the early escalation of their treatment with the aim of improving patient outcomes. Methods: The study follows a prospective cohort observational design, whereby patients presenting in primary care with signs and symptoms suggestive of COVID-19 will be followed and their data linked to hospital outcomes (hospital admission and death). Data collection will be carried out by primary care clinicians in four arms: North West London Clinical Commissioning Groups (NWL CCGs), Oxford-Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC), Covid Clinical Assessment Service (CCAS), and South East London CCGs (Doctaly platform). The study involves the use of an electronic template that incorporates a list of items (known as RECAP-V0) thought to be associated with disease outcome according to previous qualitative work. Data collected will be linked to patient outcomes in highly secure environments. We will then use multivariate logistic regression analyses for model development and validation. Results: Recruitment of participants started in October 2020. Initially, only the NWL CCGs and RCGP RSC arms were active. As of March 24, 2021, we have recruited a combined sample of 3827 participants in these two arms. CCAS and Doctaly joined the study in February 2021, with CCAS starting the recruitment process on March 15, 2021. The first part of the analysis (RECAP-V1 model development) is planned to start in April 2021 using the first half of the NWL CCGs and RCGP RSC combined data set. Posteriorly, the model will be validated with the rest of the NWL CCGs and RCGP RSC data as well as the CCAS and Doctaly data sets. The study was approved by the Research Ethics Committee on May 27, 2020 (Integrated Research Application System number: 283024, Research Ethics Committee reference number: 20/NW/0266) and badged as National Institute of Health Research Urgent Public Health Study on October 14, 2020. Conclusions: We believe the validated RECAP-V1 early warning score will be a valuable tool for the assessment of severity in patients with suspected COVID-19 in the community, either in face-to-face or remote consultations, and will facilitate the timely escalation of treatment with the potential to improve patient outcomes. Trial Registration: ISRCTN registry ISRCTN13953727; https://www.isrctn.com/ISRCTN13953727 International Registered Report Identifier (IRRID): DERR1-10.2196/29072 ", doi="10.2196/29072", url="https://www.researchprotocols.org/2021/5/e29072", url="http://www.ncbi.nlm.nih.gov/pubmed/33939619" } @Article{info:doi/10.2196/25237, author="Alhassan, Zakhriya and Watson, Matthew and Budgen, David and Alshammari, Riyad and Alessa, Ali and Al Moubayed, Noura", title="Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records", journal="JMIR Med Inform", year="2021", month="May", day="24", volume="9", number="5", pages="e25237", keywords="glycated hemoglobin HbA1c", keywords="prediction", keywords="machine learning", keywords="deep learning", keywords="neural network", keywords="multilayer perceptron", keywords="electronic health records", keywords="time series data", keywords="longitudinal data", keywords="diabetes", abstract="Background: Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems, such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records data for identifying such patients can ultimately help provide better health outcomes. Objective: Our study investigated the performance of predictive models to forecast HbA1c elevation levels by employing several machine learning models. We also examined the use of patient electronic health record longitudinal data in the performance of the predictive models. Explainable methods were employed to interpret the decisions made by the black box models. Methods: This study employed multiple logistic regression, random forest, support vector machine, and logistic regression models, as well as a deep learning model (multilayer perceptron) to classify patients with normal (<5.7\%) and elevated (?5.7\%) levels of HbA1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large data set from Saudi Arabia with 18,844 unique patient records. Results: The machine learning models achieved promising results for predicting current HbA1c elevation risk. When coupled with longitudinal data, the machine learning models outperformed the multiple logistic regression model used in the comparative study. The multilayer perceptron model achieved an accuracy of 83.22\% for the area under receiver operating characteristic curve when used with historical data. All models showed a close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data. Conclusions: This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels (?5.7\% or less). Using patients' longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies. ", doi="10.2196/25237", url="https://medinform.jmir.org/2021/5/e25237", url="http://www.ncbi.nlm.nih.gov/pubmed/34028357" } @Article{info:doi/10.2196/25656, author="Hawley, Steve and Yu, Joanna and Bogetic, Nikola and Potapova, Natalia and Wakefield, Chris and Thompson, Mike and Kloiber, Stefan and Hill, Sean and Jankowicz, Damian and Rotenberg, David", title="Digitization of Measurement-Based Care Pathways in Mental Health Through REDCap and Electronic Health Record Integration: Development and Usability Study", journal="J Med Internet Res", year="2021", month="May", day="20", volume="23", number="5", pages="e25656", keywords="REDCap", keywords="electronic health record", keywords="systems integration", keywords="measurement-based care", keywords="hospital information systems", abstract="Background: The delivery of standardized self-report assessments is essential for measurement-based care in mental health. Paper-based methods of measurement-based care data collection may result in transcription errors, missing data, and other data quality issues when entered into patient electronic health records (EHRs). Objective: This study aims to help address these issues by using a dedicated instance of REDCap (Research Electronic Data Capture; Vanderbilt University)---a free, widely used electronic data capture platform---that was established to enable the deployment of digitized self-assessments in clinical care pathways to inform clinical decision making. Methods: REDCap was integrated with the primary clinical information system to facilitate the real-time transfer of discrete data and PDF reports from REDCap into the EHR. Both technical and administrative components were required for complete implementation. A technology acceptance survey was also administered to capture physicians' and clinicians' attitudes toward the new system. Results: The integration of REDCap with the EHR transitioned clinical workflows from paper-based methods of data collection to electronic data collection. This resulted in significant time savings, improved data quality, and valuable real-time information delivery. The digitization of self-report assessments at each appointment contributed to the clinic-wide implementation of the major depressive disorder integrated care pathway. This digital transformation facilitated a 4-fold increase in the physician adoption of this integrated care pathway workflow and a 3-fold increase in patient enrollment, resulting in an overall significant increase in major depressive disorder integrated care pathway capacity. Physicians' and clinicians' attitudes were overall positive, with almost all respondents agreeing that the system was useful to their work. Conclusions: REDCap provided an intuitive patient interface for collecting self-report measures and accessing results in real time to inform clinical decisions and an extensible backend for system integration. The approach scaled effectively and expanded to high-impact clinics throughout the hospital, allowing for the broad deployment of complex workflows and standardized assessments, which led to the accumulation of harmonized data across clinics and care pathways. REDCap is a flexible tool that can be effectively leveraged to facilitate the automatic transfer of self-report data to the EHR; however, thoughtful governance is required to complement the technical implementation to ensure that data standardization, data quality, patient safety, and privacy are maintained. ", doi="10.2196/25656", url="https://www.jmir.org/2021/5/e25656", url="http://www.ncbi.nlm.nih.gov/pubmed/34014169" } @Article{info:doi/10.2196/27118, author="Li, Dongkai and Gao, Jianwei and Hong, Na and Wang, Hao and Su, Longxiang and Liu, Chun and He, Jie and Jiang, Huizhen and Wang, Qiang and Long, Yun and Zhu, Weiguo", title="A Clinical Prediction Model to Predict Heparin Treatment Outcomes and Provide Dosage Recommendations: Development and Validation Study", journal="J Med Internet Res", year="2021", month="May", day="20", volume="23", number="5", pages="e27118", keywords="outcome prediction", keywords="clinical decision support", keywords="dosage recommendation", keywords="machine learning", keywords="intensive care unit", abstract="Background: Unfractionated heparin is widely used in the intensive care unit as an anticoagulant. However, weight-based heparin dosing has been shown to be suboptimal and may place patients at unnecessary risk during their intensive care unit stay. Objective: In this study, we intended to develop and validate a machine learning--based model to predict heparin treatment outcomes and to provide dosage recommendations to clinicians. Methods: A shallow neural network model was adopted in a retrospective cohort of patients from the Multiparameter Intelligent Monitoring in Intensive Care III (MIMIC III) database and patients admitted to the Peking Union Medical College Hospital (PUMCH). We modeled the subtherapeutic, normal, and supratherapeutic activated partial thromboplastin time (aPTT) as the outcomes of heparin treatment and used a group of clinical features for modeling. Our model classifies patients into 3 different therapeutic states. We tested the prediction ability of our model and evaluated its performance by using accuracy, the kappa coefficient, precision, recall, and the F1 score. Furthermore, a dosage recommendation module was designed and evaluated for clinical decision support. Results: A total of 3607 patients selected from MIMIC III and 1549 patients admitted to the PUMCH who met our criteria were included in this study. The shallow neural network model showed results of F1 scores 0.887 (MIMIC III) and 0.925 (PUMCH). When compared with the actual dosage prescribed, our model recommended increasing the dosage for 72.2\% (MIMIC III, 1240/1718) and 64.7\% (PUMCH, 281/434) of the subtherapeutic patients and decreasing the dosage for 80.9\% (MIMIC III, 504/623) and 76.7\% (PUMCH, 277/361) of the supratherapeutic patients, suggesting that the recommendations can contribute to clinical improvements and that they may effectively reduce the time to optimal dosage in the clinical setting. Conclusions: The evaluation of our model for predicting heparin treatment outcomes demonstrated that the developed model is potentially applicable for reducing the misdosage of heparin and for providing appropriate decision recommendations to clinicians. ", doi="10.2196/27118", url="https://www.jmir.org/2021/5/e27118", url="http://www.ncbi.nlm.nih.gov/pubmed/34014171" } @Article{info:doi/10.2196/17886, author="Kong, Guilan and Wu, Jingyi and Chu, Hong and Yang, Chao and Lin, Yu and Lin, Ke and Shi, Ying and Wang, Haibo and Zhang, Luxia", title="Predicting Prolonged Length of Hospital Stay for Peritoneal Dialysis--Treated Patients Using Stacked Generalization: Model Development and Validation Study", journal="JMIR Med Inform", year="2021", month="May", day="19", volume="9", number="5", pages="e17886", keywords="peritoneal dialysis", keywords="prolonged length of stay", keywords="machine learning", keywords="prediction model", keywords="clinical decision support", abstract="Background: The increasing number of patients treated with peritoneal dialysis (PD) and their consistently high rate of hospital admissions have placed a large burden on the health care system. Early clinical interventions and optimal management of patients at a high risk of prolonged length of stay (pLOS) may help improve the medical efficiency and prognosis of PD-treated patients. If timely clinical interventions are not provided, patients at a high risk of pLOS may face a poor prognosis and high medical expenses, which will also be a burden on hospitals. Therefore, physicians need an effective pLOS prediction model for PD-treated patients. Objective: This study aimed to develop an optimal data-driven model for predicting the pLOS risk of PD-treated patients using basic admission data. Methods: Patient data collected using the Hospital Quality Monitoring System (HQMS) in China were used to develop pLOS prediction models. A stacking model was constructed with support vector machine, random forest (RF), and K-nearest neighbor algorithms as its base models and traditional logistic regression (LR) as its meta-model. The meta-model used the outputs of all 3 base models as input and generated the output of the stacking model. Another LR-based pLOS prediction model was built as the benchmark model. The prediction performance of the stacking model was compared with that of its base models and the benchmark model. Five-fold cross-validation was employed to develop and validate the models. Performance measures included the Brier score, area under the receiver operating characteristic curve (AUROC), estimated calibration index (ECI), accuracy, sensitivity, specificity, and geometric mean (Gm). In addition, a calibration plot was employed to visually demonstrate the calibration power of each model. Results: The final cohort extracted from the HQMS database consisted of 23,992 eligible PD-treated patients, among whom 30.3\% had a pLOS (ie, longer than the average LOS, which was 16 days in our study). Among the models, the stacking model achieved the best calibration (ECI 8.691), balanced accuracy (Gm 0.690), accuracy (0.695), and specificity (0.701). Meanwhile, the stacking and RF models had the best overall performance (Brier score 0.174 for both) and discrimination (AUROC 0.757 for the stacking model and 0.756 for the RF model). Compared with the benchmark LR model, the stacking model was superior in all performance measures except sensitivity, but there was no significant difference in sensitivity between the 2 models. The 2-sided t tests revealed significant performance differences between the stacking and LR models in overall performance, discrimination, calibration, balanced accuracy, and accuracy. Conclusions: This study is the first to develop data-driven pLOS prediction models for PD-treated patients using basic admission data from a national database. The results indicate the feasibility of utilizing a stacking-based pLOS prediction model for PD-treated patients. The pLOS prediction tools developed in this study have the potential to assist clinicians in identifying patients at a high risk of pLOS and to allocate resources optimally for PD-treated patients. ", doi="10.2196/17886", url="https://medinform.jmir.org/2021/5/e17886", url="http://www.ncbi.nlm.nih.gov/pubmed/34009135" } @Article{info:doi/10.2196/25869, author="Lee, Haeyun and Chai, Jun Young and Joo, Hyunjin and Lee, Kyungsu and Hwang, Youn Jae and Kim, Seok-Mo and Kim, Kwangsoon and Nam, Inn-Chul and Choi, Young June and Yu, Won Hyeong and Lee, Myung-Chul and Masuoka, Hiroo and Miyauchi, Akira and Lee, Eun Kyu and Kim, Sungwan and Kong, Hyoun-Joong", title="Federated Learning for Thyroid Ultrasound Image Analysis to Protect Personal Information: Validation Study in a Real Health Care Environment", journal="JMIR Med Inform", year="2021", month="May", day="18", volume="9", number="5", pages="e25869", keywords="deep learning", keywords="federated learning", keywords="thyroid nodules", keywords="ultrasound image", abstract="Background: Federated learning is a decentralized approach to machine learning; it is a training strategy that overcomes medical data privacy regulations and generalizes deep learning algorithms. Federated learning mitigates many systemic privacy risks by sharing only the model and parameters for training, without the need to export existing medical data sets. In this study, we performed ultrasound image analysis using federated learning to predict whether thyroid nodules were benign or malignant. Objective: The goal of this study was to evaluate whether the performance of federated learning was comparable with that of conventional deep learning. Methods: A total of 8457 (5375 malignant, 3082 benign) ultrasound images were collected from 6 institutions and used for federated learning and conventional deep learning. Five deep learning networks (VGG19, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) were used. Using stratified random sampling, we selected 20\% (1075 malignant, 616 benign) of the total images for internal validation. For external validation, we used 100 ultrasound images (50 malignant, 50 benign) from another institution. Results: For internal validation, the area under the receiver operating characteristic (AUROC) curve for federated learning was between 78.88\% and 87.56\%, and the AUROC for conventional deep learning was between 82.61\% and 91.57\%. For external validation, the AUROC for federated learning was between 75.20\% and 86.72\%, and the AUROC curve for conventional deep learning was between 73.04\% and 91.04\%. Conclusions: We demonstrated that the performance of federated learning using decentralized data was comparable to that of conventional deep learning using pooled data. Federated learning might be potentially useful for analyzing medical images while protecting patients' personal information. ", doi="10.2196/25869", url="https://medinform.jmir.org/2021/5/e25869", url="http://www.ncbi.nlm.nih.gov/pubmed/33858817" } @Article{info:doi/10.2196/24803, author="Park, Hyung and Song, Min and Lee, Byul Eun and Seo, Kyung Bo and Choi, Min Chang", title="An Attention Model With Transfer Embeddings to Classify Pneumonia-Related Bilingual Imaging Reports: Algorithm Development and Validation", journal="JMIR Med Inform", year="2021", month="May", day="17", volume="9", number="5", pages="e24803", keywords="deep learning", keywords="natural language process", keywords="attention", keywords="clinical data", keywords="pneumonia", keywords="classification", keywords="medical imaging", keywords="electronic health record", keywords="machine learning", keywords="model", abstract="Background: In the analysis of electronic health records, proper labeling of outcomes is mandatory. To obtain proper information from radiologic reports, several studies were conducted to classify radiologic reports using deep learning. However, the classification of pneumonia in bilingual radiologic reports has not been conducted previously. Objective: The aim of this research was to classify radiologic reports into pneumonia or no pneumonia using a deep learning method. Methods: A data set of radiology reports for chest computed tomography and chest x-rays of surgical patients from January 2008 to January 2018 in the Asan Medical Center in Korea was retrospectively analyzed. The classification performance of our long short-term memory (LSTM)--Attention model was compared with various deep learning and machine learning methods. The area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve, sensitivity, specificity, accuracy, and F1 score for the models were compared. Results: A total of 5450 radiologic reports were included that contained at least one pneumonia-related word. In the test set (n=1090), our proposed model showed 91.01\% (992/1090) accuracy (AUROCs for negative, positive, and obscure were 0.98, 0.97, and 0.90, respectively). The top 3 performances of the models were based on FastText or LSTM. The convolutional neural network--based model showed a lower accuracy 73.03\% (796/1090) than the other 2 algorithms. The classification of negative results had an F1 score of 0.96, whereas the classification of positive and uncertain results showed a lower performance (positive F1 score 0.83; uncertain F1 score 0.62). In the extra-validation set, our model showed 80.0\% (642/803) accuracy (AUROCs for negative, positive, and obscure were 0.92, 0.96, and 0.84, respectively). Conclusions: Our method showed excellent performance in classifying pneumonia in bilingual radiologic reports. The method could enrich the research on pneumonia by obtaining exact outcomes from electronic health data. ", doi="10.2196/24803", url="https://medinform.jmir.org/2021/5/e24803", url="http://www.ncbi.nlm.nih.gov/pubmed/33820755" } @Article{info:doi/10.2196/27614, author="Wu, Chien-Wei and Chen, Hsing-Yu and Yang, Ching-Wei and Chen, Yu-Chun", title="Deciphering the Efficacy and Mechanisms of Chinese Herbal Medicine for Diabetic Kidney Disease by Integrating Web-Based Biochemical Databases and Real-World Clinical Data: Retrospective Cohort Study", journal="JMIR Med Inform", year="2021", month="May", day="11", volume="9", number="5", pages="e27614", keywords="association rule mining", keywords="Chinese medicine network", keywords="social network analysis", keywords="survival", abstract="Background: Diabetic kidney disease (DKD) is one of the most crucial causes of chronic kidney disease (CKD). However, the efficacy and biomedical mechanisms of Chinese herbal medicine (CHM) for DKD in clinical settings remain unclear. Objective: This study aimed to analyze the outcomes of DKD patients with CHM-only management and the possible molecular pathways of CHM by integrating web-based biomedical databases and real-world clinical data. Methods: A total of 152,357 patients with incident DKD from 2004 to 2012 were identified from the National Health Insurance Research Database (NHIRD) in Taiwan. The risk of mortality was estimated with the Kaplan-Meier method and Cox regression considering demographic covariates. The inverse probability of treatment weighting was used for confounding bias between CHM users and nonusers. Furthermore, to decipher the CHM used for DKD, we analyzed all CHM prescriptions using the Chinese Herbal Medicine Network (CMN), which combined association rule mining and social network analysis for all CHM prescriptions. Further, web-based biomedical databases, including STITCH, STRING, BindingDB, TCMSP, TCM@Taiwan, and DisGeNET, were integrated with the CMN and commonly used Western medicine (WM) to explore the differences in possible target proteins and molecular pathways between CHM and WM. An application programming interface was used to assess these online databases to obtain the latest biomedical information. Results: About 13.7\% (20,947/131,410) of patients were classified as CHM users among eligible DKD patients. The median follow-up duration of all patients was 2.49 years. The cumulative mortality rate in the CHM cohort was significantly lower than that in the WM cohort (28\% vs 48\%, P<.001). The risk of mortality was 0.41 in the CHM cohort with covariate adjustment (99\% CI 0.38-0.43; P<.001). A total of 173,525 CHM prescriptions were used to construct the CMN with 11 CHM clusters. CHM covered more DKD-related proteins and pathways than WM; nevertheless, WM aimed at managing DKD more specifically. From the overrepresentation tests carried out by the online website Reactome, the molecular pathways covered by the CHM clusters in the CMN and WM seemed distinctive but complementary. Complementary effects were also found among DKD patients with concurrent WM and CHM use. The risk of mortality for CHM users under renin-angiotensin-aldosterone system (RAAS) inhibition therapy was lower than that for CHM nonusers among DKD patients with hypertension (adjusted hazard ratio [aHR] 0.47, 99\% CI 0.45-0.51; P<.001), chronic heart failure (aHR 0.43, 99\% CI 0.37-0.51; P<.001), and ischemic heart disease (aHR 0.46, 99\% CI 0.41-0.51; P<.001). Conclusions: CHM users among DKD patients seemed to have a lower risk of mortality, which may benefit from potentially synergistic renoprotection effects. The framework of integrating real-world clinical databases and web-based biomedical databases could help in exploring the roles of treatments for diseases. ", doi="10.2196/27614", url="https://medinform.jmir.org/2021/5/e27614", url="http://www.ncbi.nlm.nih.gov/pubmed/33973855" } @Article{info:doi/10.2196/24205, author="Taushanov, Zhivko and Verloo, Henk and Wernli, Boris and Di Giovanni, Saviana and von Gunten, Armin and Pereira, Filipa", title="Transforming a Patient Registry Into a Customized Data Set for the Advanced Statistical Analysis of Health Risk Factors and for Medication-Related Hospitalization Research: Retrospective Hospital Patient Registry Study", journal="JMIR Med Inform", year="2021", month="May", day="11", volume="9", number="5", pages="e24205", keywords="cluster analysis", keywords="hierarchical 2-step clustering", keywords="registry", keywords="raw data", keywords="hospital", keywords="retrospective", keywords="population based", keywords="multidimensional", abstract="Background: Hospital patient registries provide substantial longitudinal data sets describing the clinical and medical health statuses of inpatients and their pharmacological prescriptions. Despite the multiple advantages of routinely collecting multidimensional longitudinal data, those data sets are rarely suitable for advanced statistical analysis and they require customization and synthesis. Objective: The aim of this study was to describe the methods used to transform and synthesize a raw, multidimensional, hospital patient registry data set into an exploitable database for the further investigation of risk profiles and predictive and survival health outcomes among polymorbid, polymedicated, older inpatients in relation to their medicine prescriptions at hospital discharge. Methods: A raw, multidimensional data set from a public hospital was extracted from the hospital registry in a CSV (.csv) file and imported into the R statistical package for cleaning, customization, and synthesis. Patients fulfilling the criteria for inclusion were home-dwelling, polymedicated, older adults with multiple chronic conditions aged ?65 who became hospitalized. The patient data set covered 140 variables from 20,422 hospitalizations of polymedicated, home-dwelling older adults from 2015 to 2018. Each variable, according to type, was explored and computed to describe distributions, missing values, and associations. Different clustering methods, expert opinion, recoding, and missing-value techniques were used to customize and synthesize these multidimensional data sets. Results: Sociodemographic data showed no missing values. Average age, hospital length of stay, and frequency of hospitalization were computed. Discharge details were recoded and summarized. Clinical data were cleaned up and best practices for managing missing values were applied. Seven clusters of medical diagnoses, surgical interventions, somatic, cognitive, and medicines data were extracted using empirical and statistical best practices, with each presenting the health status of the patients included in it as accurately as possible. Medical, comorbidity, and drug data were recoded and summarized. Conclusions: A cleaner, better-structured data set was obtained, combining empirical and best-practice statistical approaches. The overall strategy delivered an exploitable, population-based database suitable for an advanced analysis of the descriptive, predictive, and survival statistics relating to polymedicated, home-dwelling older adults admitted as inpatients. More research is needed to develop best practices for customizing and synthesizing large, multidimensional, population-based registries. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2019-030030 ", doi="10.2196/24205", url="https://medinform.jmir.org/2021/5/e24205", url="http://www.ncbi.nlm.nih.gov/pubmed/33973865" } @Article{info:doi/10.2196/27113, author="Jin, Haomiao and Chien, Sandy and Meijer, Erik and Khobragade, Pranali and Lee, Jinkook", title="Learning From Clinical Consensus Diagnosis in India to Facilitate Automatic Classification of Dementia: Machine Learning Study", journal="JMIR Ment Health", year="2021", month="May", day="10", volume="8", number="5", pages="e27113", keywords="dementia", keywords="Alzheimer disease", keywords="machine learning", keywords="artificial intelligence", keywords="diagnosis", keywords="classification", keywords="India", keywords="model", abstract="Background: The Harmonized Diagnostic Assessment of Dementia for the Longitudinal Aging Study in India (LASI-DAD) is the first and only nationally representative study on late-life cognition and dementia in India (n=4096). LASI-DAD obtained clinical consensus diagnosis of dementia for a subsample of 2528 respondents. Objective: This study develops a machine learning model that uses data from the clinical consensus diagnosis in LASI-DAD to support the classification of dementia status. Methods: Clinicians were presented with the extensive data collected from LASI-DAD, including sociodemographic information and health history of respondents, results from the screening tests of cognitive status, and information obtained from informant interviews. Based on the Clinical Dementia Rating (CDR) and using an online platform, clinicians individually evaluated each case and then reached a consensus diagnosis. A 2-step procedure was implemented to train several candidate machine learning models, which were evaluated using a separate test set for predictive accuracy measurement, including the area under receiver operating curve (AUROC), accuracy, sensitivity, specificity, precision, F1 score, and kappa statistic. The ultimate model was selected based on overall agreement as measured by kappa. We further examined the overall accuracy and agreement with the final consensus diagnoses between the selected machine learning model and individual clinicians who participated in the clinical consensus diagnostic process. Finally, we applied the selected model to a subgroup of LASI-DAD participants for whom the clinical consensus diagnosis was not obtained to predict their dementia status. Results: Among the 2528 individuals who received clinical consensus diagnosis, 192 (6.7\% after adjusting for sampling weight) were diagnosed with dementia. All candidate machine learning models achieved outstanding discriminative ability, as indicated by AUROC >.90, and had similar accuracy and specificity (both around 0.95). The support vector machine model outperformed other models with the highest sensitivity (0.81), F1 score (0.72), and kappa (.70, indicating substantial agreement) and the second highest precision (0.65). As a result, the support vector machine was selected as the ultimate model. Further examination revealed that overall accuracy and agreement were similar between the selected model and individual clinicians. Application of the prediction model on 1568 individuals without clinical consensus diagnosis classified 127 individuals as living with dementia. After applying sampling weight, we can estimate the prevalence of dementia in the population as 7.4\%. Conclusions: The selected machine learning model has outstanding discriminative ability and substantial agreement with a clinical consensus diagnosis of dementia. The model can serve as a computer model of the clinical knowledge and experience encoded in the clinical consensus diagnostic process and has many potential applications, including predicting missed dementia diagnoses and serving as a clinical decision support tool or virtual rater to assist diagnosis of dementia. ", doi="10.2196/27113", url="https://mental.jmir.org/2021/5/e27113", url="http://www.ncbi.nlm.nih.gov/pubmed/33970122" } @Article{info:doi/10.2196/25304, author="Zhang, Kunli and Cai, Linkun and Song, Yu and Liu, Tao and Zhao, Yueshu", title="Combining External Medical Knowledge for Improving Obstetric Intelligent Diagnosis: Model Development and Validation", journal="JMIR Med Inform", year="2021", month="May", day="10", volume="9", number="5", pages="e25304", keywords="intelligent diagnosis", keywords="obstetric electronic medical record", keywords="medical knowledge", keywords="attention mechanism", abstract="Background: Data-driven medical health information processing has become a new development trend in obstetrics. Electronic medical records (EMRs) are the basis of evidence-based medicine and an important information source for intelligent diagnosis. To obtain diagnostic results, doctors combine clinical experience and medical knowledge in their diagnosis process. External medical knowledge provides strong support for diagnosis. Therefore, it is worth studying how to make full use of EMRs and medical knowledge in intelligent diagnosis. Objective: This study aims to improve the performance of intelligent diagnosis in EMRs by combining medical knowledge. Methods: As an EMR usually contains multiple types of diagnostic results, the intelligent diagnosis can be treated as a multilabel classification task. We propose a novel neural network knowledge-aware hierarchical diagnosis model (KHDM) in which Chinese obstetric EMRs and external medical knowledge can be synchronously and effectively used for intelligent diagnostics. In KHDM, EMRs and external knowledge documents are integrated by the attention mechanism contained in the hierarchical deep learning framework. In this way, we enrich the language model with curated knowledge documents, combining the advantages of both to make a knowledge-aware diagnosis. Results: We evaluate our model on a real-world Chinese obstetric EMR dataset and showed that KHDM achieves an accuracy of 0.8929, which exceeds that of the most advanced classification benchmark methods. We also verified the model's interpretability advantage. Conclusions: In this paper, an improved model combining medical knowledge and an attention mechanism is proposed, based on the problem of diversity of diagnostic results in Chinese EMRs. KHDM can effectively integrate domain knowledge to greatly improve the accuracy of diagnosis. ", doi="10.2196/25304", url="https://medinform.jmir.org/2021/5/e25304", url="http://www.ncbi.nlm.nih.gov/pubmed/33970113" } @Article{info:doi/10.2196/20865, author="Manas, Gaur and Aribandi, Vamsi and Kursuncu, Ugur and Alambo, Amanuel and Shalin, L. Valerie and Thirunarayan, Krishnaprasad and Beich, Jonathan and Narasimhan, Meera and Sheth, Amit", title="Knowledge-Infused Abstractive Summarization of Clinical Diagnostic Interviews: Framework Development Study", journal="JMIR Ment Health", year="2021", month="May", day="10", volume="8", number="5", pages="e20865", keywords="knowledge-infusion", keywords="abstractive summarization", keywords="distress clinical diagnostic interviews", keywords="Patient Health Questionnaire-9", keywords="healthcare informatics", keywords="interpretable evaluations", abstract="Background: In clinical diagnostic interviews, mental health professionals (MHPs) implement a care practice that involves asking open questions (eg, ``What do you want from your life?'' ``What have you tried before to bring change in your life?'') while listening empathetically to patients. During these interviews, MHPs attempted to build a trusting human-centered relationship while collecting data necessary for professional medical and psychiatric care. Often, because of the social stigma of mental health disorders, patient discomfort in discussing their presenting problem may add additional complexities and nuances to the language they use, that is, hidden signals among noisy content. Therefore, a focused, well-formed, and elaborative summary of clinical interviews is critical to MHPs in making informed decisions by enabling a more profound exploration of a patient's behavior, especially when it endangers life. Objective: The aim of this study is to propose an unsupervised, knowledge-infused abstractive summarization (KiAS) approach that generates summaries to enable MHPs to perform a well-informed follow-up with patients to improve the existing summarization methods built on frequency heuristics by creating more informative summaries. Methods: Our approach incorporated domain knowledge from the Patient Health Questionnaire-9 lexicon into an integer linear programming framework that optimizes linguistic quality and informativeness. We used 3 baseline approaches: extractive summarization using the SumBasic algorithm, abstractive summarization using integer linear programming without the infusion of knowledge, and abstraction over extractive summarization to evaluate the performance of KiAS. The capability of KiAS on the Distress Analysis Interview Corpus-Wizard of Oz data set was demonstrated through interpretable qualitative and quantitative evaluations. Results: KiAS generates summaries (7 sentences on average) that capture informative questions and responses exchanged during long (58 sentences on average), ambiguous, and sparse clinical diagnostic interviews. The summaries generated using KiAS improved upon the 3 baselines by 23.3\%, 4.4\%, 2.5\%, and 2.2\% for thematic overlap, Flesch Reading Ease, contextual similarity, and Jensen Shannon divergence, respectively. On the Recall-Oriented Understudy for Gisting Evaluation-2 and Recall-Oriented Understudy for Gisting Evaluation-L metrics, KiAS showed an improvement of 61\% and 49\%, respectively. We validated the quality of the generated summaries through visual inspection and substantial interrater agreement from MHPs. Conclusions: Our collaborator MHPs observed the potential utility and significant impact of KiAS in leveraging valuable but voluminous communications that take place outside of normally scheduled clinical appointments. This study shows promise in generating semantically relevant summaries that will help MHPs make informed decisions about patient status. ", doi="10.2196/20865", url="https://mental.jmir.org/2021/5/e20865", url="http://www.ncbi.nlm.nih.gov/pubmed/33970116" } @Article{info:doi/10.2196/27172, author="Churov{\'a}, Vendula and Vy{\vs}kovsk{\'y}, Roman and Mar{\vs}{\'a}lov{\'a}, Kate?ina and Kudl{\'a}{\v c}ek, David and Schwarz, Daniel", title="Anomaly Detection Algorithm for Real-World Data and Evidence in Clinical Research: Implementation, Evaluation, and Validation Study", journal="JMIR Med Inform", year="2021", month="May", day="7", volume="9", number="5", pages="e27172", keywords="clinical research data", keywords="real-world evidence", keywords="registry database", keywords="data quality", keywords="EDC system", keywords="anomaly detection", abstract="Background: Statistical analysis, which has become an integral part of evidence-based medicine, relies heavily on data quality that is of critical importance in modern clinical research. Input data are not only at risk of being falsified or fabricated, but also at risk of being mishandled by investigators. Objective: The urgent need to assure the highest data quality possible has led to the implementation of various auditing strategies designed to monitor clinical trials and detect errors of different origin that frequently occur in the field. The objective of this study was to describe a machine learning--based algorithm to detect anomalous patterns in data created as a consequence of carelessness, systematic error, or intentionally by entering fabricated values. Methods: A particular electronic data capture (EDC) system, which is used for data management in clinical registries, is presented including its architecture and data structure. This EDC system features an algorithm based on machine learning designed to detect anomalous patterns in quantitative data. The detection algorithm combines clustering with a series of 7 distance metrics that serve to determine the strength of an anomaly. For the detection process, the thresholds and combinations of the metrics were used and the detection performance was evaluated and validated in the experiments involving simulated anomalous data and real-world data. Results: Five different clinical registries related to neuroscience were presented---all of them running in the given EDC system. Two of the registries were selected for the evaluation experiments and served also to validate the detection performance on an independent data set. The best performing combination of the distance metrics was that of Canberra, Manhattan, and Mahalanobis, whereas Cosine and Chebyshev metrics had been excluded from further analysis due to the lowest performance when used as single distance metric--based classifiers. Conclusions: The experimental results demonstrate that the algorithm is universal in nature, and as such may be implemented in other EDC systems, and is capable of anomalous data detection with a sensitivity exceeding 85\%. ", doi="10.2196/27172", url="https://medinform.jmir.org/2021/5/e27172", url="http://www.ncbi.nlm.nih.gov/pubmed/33851576" } @Article{info:doi/10.2196/22591, author="Wu, Chia-Tung and Li, Guo-Hung and Huang, Chun-Ta and Cheng, Yu-Chieh and Chen, Chi-Hsien and Chien, Jung-Yien and Kuo, Ping-Hung and Kuo, Lu-Cheng and Lai, Feipei", title="Acute Exacerbation of a Chronic Obstructive Pulmonary Disease Prediction System Using Wearable Device Data, Machine Learning, and Deep Learning: Development and Cohort Study", journal="JMIR Mhealth Uhealth", year="2021", month="May", day="6", volume="9", number="5", pages="e22591", keywords="chronic obstructive pulmonary disease", keywords="clinical decision support systems", keywords="health risk assessment", keywords="wearable device", abstract="Background: The World Health Organization has projected that by 2030, chronic obstructive pulmonary disease (COPD) will be the third-leading cause of mortality and the seventh-leading cause of morbidity worldwide. Acute exacerbations of chronic obstructive pulmonary disease (AECOPD) are associated with an accelerated decline in lung function, diminished quality of life, and higher mortality. Accurate early detection of acute exacerbations will enable early management and reduce mortality. Objective: The aim of this study was to develop a prediction system using lifestyle data, environmental factors, and patient symptoms for the early detection of AECOPD in the upcoming 7 days. Methods: This prospective study was performed at National Taiwan University Hospital. Patients with COPD that did not have a pacemaker and were not pregnant were invited for enrollment. Data on lifestyle, temperature, humidity, and fine particulate matter were collected using wearable devices (Fitbit Versa), a home air quality--sensing device (EDIMAX Airbox), and a smartphone app. AECOPD episodes were evaluated via standardized questionnaires. With these input features, we evaluated the prediction performance of machine learning models, including random forest, decision trees, k-nearest neighbor, linear discriminant analysis, and adaptive boosting, and a deep neural network model. Results: The continuous real-time monitoring of lifestyle and indoor environment factors was implemented by integrating home air quality--sensing devices, a smartphone app, and wearable devices. All data from 67 COPD patients were collected prospectively during a mean 4-month follow-up period, resulting in the detection of 25 AECOPD episodes. For 7-day AECOPD prediction, the proposed AECOPD predictive model achieved an accuracy of 92.1\%, sensitivity of 94\%, and specificity of 90.4\%. Receiver operating characteristic curve analysis showed that the area under the curve of the model in predicting AECOPD was greater than 0.9. The most important variables in the model were daily steps walked, stairs climbed, and daily distance moved. Conclusions: Using wearable devices, home air quality--sensing devices, a smartphone app, and supervised prediction algorithms, we achieved excellent power to predict whether a patient would experience AECOPD within the upcoming 7 days. The AECOPD prediction system provided an effective way to collect lifestyle and environmental data, and yielded reliable predictions of future AECOPD events. Compared with previous studies, we have comprehensively improved the performance of the AECOPD prediction model by adding objective lifestyle and environmental data. This model could yield more accurate prediction results for COPD patients than using only questionnaire data. ", doi="10.2196/22591", url="https://mhealth.jmir.org/2021/5/e22591", url="http://www.ncbi.nlm.nih.gov/pubmed/33955840" } @Article{info:doi/10.2196/21347, author="Alghatani, Khalid and Ammar, Nariman and Rezgui, Abdelmounaam and Shaban-Nejad, Arash", title="Predicting Intensive Care Unit Length of Stay and Mortality Using Patient Vital Signs: Machine Learning Model Development and Validation", journal="JMIR Med Inform", year="2021", month="May", day="5", volume="9", number="5", pages="e21347", keywords="intensive care unit (ICU)", keywords="ICU patient monitoring", keywords="machine learning", keywords="predictive model", keywords="vital signs measurements", keywords="clinical intelligence", abstract="Background: Patient monitoring is vital in all stages of care. In particular, intensive care unit (ICU) patient monitoring has the potential to reduce complications and morbidity, and to increase the quality of care by enabling hospitals to deliver higher-quality, cost-effective patient care, and improve the quality of medical services in the ICU. Objective: We here report the development and validation of ICU length of stay and mortality prediction models. The models will be used in an intelligent ICU patient monitoring module of an Intelligent Remote Patient Monitoring (IRPM) framework that monitors the health status of patients, and generates timely alerts, maneuver guidance, or reports when adverse medical conditions are predicted. Methods: We utilized the publicly available Medical Information Mart for Intensive Care (MIMIC) database to extract ICU stay data for adult patients to build two prediction models: one for mortality prediction and another for ICU length of stay. For the mortality model, we applied six commonly used machine learning (ML) binary classification algorithms for predicting the discharge status (survived or not). For the length of stay model, we applied the same six ML algorithms for binary classification using the median patient population ICU stay of 2.64 days. For the regression-based classification, we used two ML algorithms for predicting the number of days. We built two variations of each prediction model: one using 12 baseline demographic and vital sign features, and the other based on our proposed quantiles approach, in which we use 21 extra features engineered from the baseline vital sign features, including their modified means, standard deviations, and quantile percentages. Results: We could perform predictive modeling with minimal features while maintaining reasonable performance using the quantiles approach. The best accuracy achieved in the mortality model was approximately 89\% using the random forest algorithm. The highest accuracy achieved in the length of stay model, based on the population median ICU stay (2.64 days), was approximately 65\% using the random forest algorithm. Conclusions: The novelty in our approach is that we built models to predict ICU length of stay and mortality with reasonable accuracy based on a combination of ML and the quantiles approach that utilizes only vital signs available from the patient's profile without the need to use any external features. This approach is based on feature engineering of the vital signs by including their modified means, standard deviations, and quantile percentages of the original features, which provided a richer dataset to achieve better predictive power in our models. ", doi="10.2196/21347", url="https://medinform.jmir.org/2021/5/e21347", url="http://www.ncbi.nlm.nih.gov/pubmed/33949961" } @Article{info:doi/10.2196/15708, author="Le Glaz, Aziliz and Haralambous, Yannis and Kim-Dufor, Deok-Hee and Lenca, Philippe and Billot, Romain and Ryan, C. Taylor and Marsh, Jonathan and DeVylder, Jordan and Walter, Michel and Berrouiguet, Sofian and Lemey, Christophe", title="Machine Learning and Natural Language Processing in Mental Health: Systematic Review", journal="J Med Internet Res", year="2021", month="May", day="4", volume="23", number="5", pages="e15708", keywords="machine learning", keywords="natural language processing", keywords="artificial intelligence", keywords="data mining", keywords="mental health", keywords="psychiatry", abstract="Background: Machine learning systems are part of the field of artificial intelligence that automatically learn models from data to make better decisions. Natural language processing (NLP), by using corpora and learning approaches, provides good performance in statistical tasks, such as text classification or sentiment mining. Objective: The primary aim of this systematic review was to summarize and characterize, in methodological and technical terms, studies that used machine learning and NLP techniques for mental health. The secondary aim was to consider the potential use of these methods in mental health clinical practice Methods: This systematic review follows the PRISMA (Preferred Reporting Items for Systematic Review and Meta-analysis) guidelines and is registered with PROSPERO (Prospective Register of Systematic Reviews; number CRD42019107376). The search was conducted using 4 medical databases (PubMed, Scopus, ScienceDirect, and PsycINFO) with the following keywords: machine learning, data mining, psychiatry, mental health, and mental disorder. The exclusion criteria were as follows: languages other than English, anonymization process, case studies, conference papers, and reviews. No limitations on publication dates were imposed. Results: A total of 327 articles were identified, of which 269 (82.3\%) were excluded and 58 (17.7\%) were included in the review. The results were organized through a qualitative perspective. Although studies had heterogeneous topics and methods, some themes emerged. Population studies could be grouped into 3 categories: patients included in medical databases, patients who came to the emergency room, and social media users. The main objectives were to extract symptoms, classify severity of illness, compare therapy effectiveness, provide psychopathological clues, and challenge the current nosography. Medical records and social media were the 2 major data sources. With regard to the methods used, preprocessing used the standard methods of NLP and unique identifier extraction dedicated to medical texts. Efficient classifiers were preferred rather than transparent functioning classifiers. Python was the most frequently used platform. Conclusions: Machine learning and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research. However, these processes tend to confirm clinical hypotheses rather than developing entirely new information, and only one major category of the population (ie, social media users) is an imprecise cohort. Moreover, some language-specific features can improve the performance of NLP methods, and their extension to other languages should be more closely investigated. However, machine learning and NLP techniques provide useful information from unexplored data (ie, patients' daily habits that are usually inaccessible to care providers). Before considering It as an additional tool of mental health care, ethical issues remain and should be discussed in a timely manner. Machine learning and NLP methods may offer multiple perspectives in mental health research but should also be considered as tools to support clinical practice. ", doi="10.2196/15708", url="https://www.jmir.org/2021/5/e15708", url="http://www.ncbi.nlm.nih.gov/pubmed/33944788" } @Article{info:doi/10.2196/22766, author="Scalia, Peter and Ahmad, Farhan and Schubbe, Danielle and Forcino, Rachel and Durand, Marie-Anne and Barr, James Paul and Elwyn, Glyn", title="Integrating Option Grid Patient Decision Aids in the Epic Electronic Health Record: Case Study at 5 Health Systems", journal="J Med Internet Res", year="2021", month="May", day="3", volume="23", number="5", pages="e22766", keywords="shared decision making", keywords="patient decision aids", keywords="electronic health record", keywords="implementation", keywords="HL7 SMART on FHIR", abstract="Background: Some researchers argue that the successful implementation of patient decision aids (PDAs) into clinical workflows depends on their integration into electronic health records (EHRs). Anecdotally, we know that EHR integration is a complex and time-consuming task; yet, the process has not been examined in detail. As part of an implementation project, we examined the work involved in integrating an encounter PDA for symptomatic uterine fibroids into Epic EHR systems. Objective: This study aims to identify the steps and time required to integrate a PDA into the Epic EHR system and examine facilitators and barriers to the integration effort. Methods: We conducted a case study at 5 academic medical centers in the United States. A clinical champion at each institution liaised with their Epic EHR team to initiate the integration of the uterine fibroid Option Grid PDAs into clinician-facing menus. We scheduled regular meetings with the Epic software analysts and an expert Epic technologist to discuss how best to integrate the tools into Epic for use by clinicians with patients. The meetings were then recorded and transcribed. Two researchers independently coded the transcripts and field notes before categorizing the codes and conducting a thematic analysis to identify the facilitators and barriers to EHR integration. The steps were reviewed and edited by an Epic technologist to ensure their accuracy. Results: Integrating the uterine fibroid Option Grid PDA into clinician-facing menus required an 18-month timeline and a 6-step process, as follows: task priority negotiation with Epic software teams, security risk assessment, technical review, Epic configuration; troubleshooting, and launch. The key facilitators of the process were the clinical champions who advocated for integration at the institutional level and the presence of an experienced technologist who guided Epic software analysts during the build. Another facilitator was the use of an emerging industry standard app platform (Health Level 7 Substitutable Medical Applications and Reusable Technologies on Fast Healthcare Interoperability Resources) as a means of integrating the Option Grid into existing systems. This standard platform enabled clinicians to access the tools by using single sign-on credentials and prevented protected health information from leaving the EHR. Key barriers were the lack of control over the Option Grid product developed by EBSCO (Elton B Stephens Company) Health; the periodic Epic upgrades that can result in a pause on new software configurations; and the unforeseen software problems with Option Grid (ie, inability to print the PDA), which delayed the launch of the PDA. Conclusions: The integration of PDAs into the Epic EHR system requires a 6-step process and an 18-month timeline. The process required support and prioritization from a clinical champion, guidance from an experienced technologist, and a willing EHR software developer team. ", doi="10.2196/22766", url="https://www.jmir.org/2021/5/e22766", url="http://www.ncbi.nlm.nih.gov/pubmed/33938806" } @Article{info:doi/10.2196/24020, author="Rybinski, Maciej and Dai, Xiang and Singh, Sonit and Karimi, Sarvnaz and Nguyen, Anthony", title="Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis", journal="JMIR Med Inform", year="2021", month="Apr", day="30", volume="9", number="4", pages="e24020", keywords="information extraction", keywords="natural language processing", keywords="clinical natural language processing", keywords="named entity recognition", keywords="sequence tagging", keywords="neural language modeling", keywords="data augmentation", abstract="Background: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. Objective: The aim of this study is to develop automated methods that enable access to FH data through natural language processing. Methods: We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. Results: Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63\% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59\%. Conclusions: Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision. ", doi="10.2196/24020", url="https://medinform.jmir.org/2021/4/e24020", url="http://www.ncbi.nlm.nih.gov/pubmed/33664015" } @Article{info:doi/10.2196/25075, author="O'Keefe, B. James and Tong, J. Elizabeth and Taylor Jr, H. Thomas and O'Keefe, Datoo Ghazala A. and Tong, C. David", title="Use of a Telemedicine Risk Assessment Tool to Predict the Risk of Hospitalization of 496 Outpatients With COVID-19: Retrospective Analysis", journal="JMIR Public Health Surveill", year="2021", month="Apr", day="30", volume="7", number="4", pages="e25075", keywords="COVID-19", keywords="SARS-CoV-2", keywords="nonhospitalized", keywords="risk assessment", keywords="outpatient", keywords="outcomes", keywords="telemedicine", abstract="Background: Risk assessment of patients with acute COVID-19 in a telemedicine context is not well described. In settings of large numbers of patients, a risk assessment tool may guide resource allocation not only for patient care but also for maximum health care and public health benefit. Objective: The goal of this study was to determine whether a COVID-19 telemedicine risk assessment tool accurately predicts hospitalizations. Methods: We conducted a retrospective study of a COVID-19 telemedicine home monitoring program serving health care workers and the community in Atlanta, Georgia, with enrollment from March 24 to May 26, 2020; the final call range was from March 27 to June 19, 2020. All patients were assessed by medical providers using an institutional COVID-19 risk assessment tool designating patients as Tier 1 (low risk for hospitalization), Tier 2 (intermediate risk for hospitalization), or Tier 3 (high risk for hospitalization). Patients were followed with regular telephone calls to an endpoint of improvement or hospitalization. Using survival analysis by Cox regression with days to hospitalization as the metric, we analyzed the performance of the risk tiers and explored individual patient factors associated with risk of hospitalization. Results: Providers using the risk assessment rubric assigned 496 outpatients to tiers: Tier 1, 237 out of 496 (47.8\%); Tier 2, 185 out of 496 (37.3\%); and Tier 3, 74 out of 496 (14.9\%). Subsequent hospitalizations numbered 3 out of 237 (1.3\%) for Tier 1, 15 out of 185 (8.1\%) for Tier 2, and 17 out of 74 (23\%) for Tier 3. From a Cox regression model with age of 60 years or older, gender, and reported obesity as covariates, the adjusted hazard ratios for hospitalization using Tier 1 as reference were 3.74 (95\% CI 1.06-13.27; P=.04) for Tier 2 and 10.87 (95\% CI 3.09-38.27; P<.001) for Tier 3. Conclusions: A telemedicine risk assessment tool prospectively applied to an outpatient population with COVID-19 identified populations with low, intermediate, and high risk of hospitalization. ", doi="10.2196/25075", url="https://publichealth.jmir.org/2021/4/e25075", url="http://www.ncbi.nlm.nih.gov/pubmed/33667174" } @Article{info:doi/10.2196/21394, author="Poly, Nasrin Tahmina and Islam, Mohaimenul Md and Li, Jack Yu-Chuan and Alsinglawi, Belal and Hsu, Min-Huei and Jian, Shan Wen and Yang, Hsuan-Chia", title="Application of Artificial Intelligence for Screening COVID-19 Patients Using Digital Images: Meta-analysis", journal="JMIR Med Inform", year="2021", month="Apr", day="29", volume="9", number="4", pages="e21394", keywords="COVID-19", keywords="SARS-CoV-2", keywords="pneumonia", keywords="artificial intelligence", keywords="deep learning", abstract="Background: The COVID-19 outbreak has spread rapidly and hospitals are overwhelmed with COVID-19 patients. While analysis of nasal and throat swabs from patients is the main way to detect COVID-19, analyzing chest images could offer an alternative method to hospitals, where health care personnel and testing kits are scarce. Deep learning (DL), in particular, has shown impressive levels of performance when analyzing medical images, including those related to COVID-19 pneumonia. Objective: The goal of this study was to perform a systematic review with a meta-analysis of relevant studies to quantify the performance of DL algorithms in the automatic stratification of COVID-19 patients using chest images. Methods: A search strategy for use in PubMed, Scopus, Google Scholar, and Web of Science was developed, where we searched for articles published between January 1 and April 25, 2020. We used the key terms ``COVID-19,'' or ``coronavirus,'' or ``SARS-CoV-2,'' or ``novel corona,'' or ``2019-ncov,'' and ``deep learning,'' or ``artificial intelligence,'' or ``automatic detection.'' Two authors independently extracted data on study characteristics, methods, risk of bias, and outcomes. Any disagreement between them was resolved by consensus. Results: A total of 16 studies were included in the meta-analysis, which included 5896 chest images from COVID-19 patients. The pooled sensitivity and specificity of the DL models in detecting COVID-19 were 0.95 (95\% CI 0.94-0.95) and 0.96 (95\% CI 0.96-0.97), respectively, with an area under the receiver operating characteristic curve of 0.98. The positive likelihood, negative likelihood, and diagnostic odds ratio were 19.02 (95\% CI 12.83-28.19), 0.06 (95\% CI 0.04-0.10), and 368.07 (95\% CI 162.30-834.75), respectively. The pooled sensitivity and specificity for distinguishing other types of pneumonia from COVID-19 were 0.93 (95\% CI 0.92-0.94) and 0.95 (95\% CI 0.94-0.95), respectively. The performance of radiologists in detecting COVID-19 was lower than that of the DL models; however, the performance of junior radiologists was improved when they used DL-based prediction tools. Conclusions: Our study findings show that DL models have immense potential in accurately stratifying COVID-19 patients and in correctly differentiating them from patients with other types of pneumonia and normal patients. Implementation of DL-based tools can assist radiologists in correctly and quickly detecting COVID-19 and, consequently, in combating the COVID-19 pandemic. ", doi="10.2196/21394", url="https://medinform.jmir.org/2021/4/e21394", url="http://www.ncbi.nlm.nih.gov/pubmed/33764884" } @Article{info:doi/10.2196/26075, author="Patr{\'i}cio, Andr{\'e} and Costa, S. Rafael and Henriques, Rui", title="Predictability of COVID-19 Hospitalizations, Intensive Care Unit Admissions, and Respiratory Assistance in Portugal: Longitudinal Cohort Study", journal="J Med Internet Res", year="2021", month="Apr", day="28", volume="23", number="4", pages="e26075", keywords="COVID-19", keywords="machine learning", keywords="intensive care admissions", keywords="respiratory assistance", keywords="predictive models", keywords="data modeling", keywords="clinical informatics", abstract="Background: In the face of the current COVID-19 pandemic, the timely prediction of upcoming medical needs for infected individuals enables better and quicker care provision when necessary and management decisions within health care systems. Objective: This work aims to predict the medical needs (hospitalizations, intensive care unit admissions, and respiratory assistance) and survivability of individuals testing positive for SARS-CoV-2 infection in Portugal. Methods: A retrospective cohort of 38,545 infected individuals during 2020 was used. Predictions of medical needs were performed using state-of-the-art machine learning approaches at various stages of a patient's cycle, namely, at testing (prehospitalization), at posthospitalization, and during postintensive care. A thorough optimization of state-of-the-art predictors was undertaken to assess the ability to anticipate medical needs and infection outcomes using demographic and comorbidity variables, as well as dates associated with symptom onset, testing, and hospitalization. Results: For the target cohort, 75\% of hospitalization needs could be identified at the time of testing for SARS-CoV-2 infection. Over 60\% of respiratory needs could be identified at the time of hospitalization. Both predictions had >50\% precision. Conclusions: The conducted study pinpoints the relevance of the proposed predictive models as good candidates to support medical decisions in the Portuguese population, including both monitoring and in-hospital care decisions. A clinical decision support system is further provided to this end. ", doi="10.2196/26075", url="https://www.jmir.org/2021/4/e26075", url="http://www.ncbi.nlm.nih.gov/pubmed/33835931" } @Article{info:doi/10.2196/25493, author="Brandberg, Helge and Sundberg, Johan Carl and Spaak, Jonas and Koch, Sabine and Zakim, David and Kahan, Thomas", title="Use of Self-Reported Computerized Medical History Taking for Acute Chest Pain in the Emergency Department -- the Clinical Expert Operating System Chest Pain Danderyd Study (CLEOS-CPDS): Prospective Cohort Study", journal="J Med Internet Res", year="2021", month="Apr", day="27", volume="23", number="4", pages="e25493", keywords="chest pain", keywords="computerized history taking", keywords="coronary artery disease", keywords="eHealth", keywords="emergency department", keywords="health informatics", keywords="medical history", keywords="risk management", abstract="Background: Chest pain is one of the most common chief complaints in emergency departments (EDs). Collecting an adequate medical history is challenging but essential in order to use recommended risk scores such as the HEART score (based on history, electrocardiogram, age, risk factors, and troponin). Self-reported computerized history taking (CHT) is a novel method to collect structured medical history data directly from the patient through a digital device. CHT is rarely used in clinical practice, and there is a lack of evidence for utility in an acute setting. Objective: This substudy of the Clinical Expert Operating System Chest Pain Danderyd Study (CLEOS-CPDS) aimed to evaluate whether patients with acute chest pain can interact effectively with CHT in the ED. Methods: Prospective cohort study on self-reported medical histories collected from acute chest pain patients using a CHT program on a tablet. Clinically stable patients aged 18 years and older with a chief complaint of chest pain, fluency in Swedish, and a nondiagnostic electrocardiogram or serum markers for acute coronary syndrome were eligible for inclusion. Patients unable to carry out an interview with CHT (eg, inadequate eyesight, confusion or agitation) were excluded. Effectiveness was assessed as the proportion of patients completing the interview and the time required in order to collect a medical history sufficient for cardiovascular risk stratification according to HEART score. Results: During 2017-2018, 500 participants were consecutively enrolled. The age and sex distribution (mean 54.3, SD 17.0 years; 213/500, 42.6\% women) was similar to that of the general chest pain population (mean 57.5, SD 19.2 years; 49.6\% women). Common reasons for noninclusion were language issues (182/1000, 18.2\%), fatigue (158/1000, 15.8\%), and inability to use a tablet (152/1000, 15.2\%). Sufficient data to calculate HEART score were collected in 70.4\% (352/500) of the patients. Key modules for chief complaint, cardiovascular history, and respiratory history were completed by 408 (81.6\%), 339 (67.8\%), and 291 (58.2\%) of the 500 participants, respectively, while 148 (29.6\%) completed the entire interview (in all 14 modules). Factors associated with completeness were age 18-69 years (all key modules: Ps<.001), male sex (cardiovascular: P=.04), active workers (all key modules: Ps<.005), not arriving by ambulance (chief complaint: P=.03; cardiovascular: P=.045), and ongoing chest pain (complete interview: P=.002). The median time to collect HEART score data was 23 (IQR 18-31) minutes and to complete an interview was 64 (IQR 53-77) minutes. The main reasons for discontinuing the interview prior to completion (n=352) were discharge from the ED (101, 28.7\%) and tiredness (95, 27.0\%). Conclusions: A majority of patients with acute chest pain can interact effectively with CHT on a tablet in the ED to provide sufficient data for risk stratification with a well-established risk score. The utility was somewhat lower in patients 70 years and older, in patients arriving by ambulance, and in patients without ongoing chest pain. Further studies are warranted to assess whether CHT can contribute to improved management and prognosis in this large patient group. Trial Registration: ClinicalTrials.gov NCT03439449; https://clinicaltrials.gov/ct2/show/NCT03439449 International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2019-031871 ", doi="10.2196/25493", url="https://www.jmir.org/2021/4/e25493", url="http://www.ncbi.nlm.nih.gov/pubmed/33904821" } @Article{info:doi/10.2196/27468, author="Ghaderzadeh, Mustafa and Asadi, Farkhondeh and Jafari, Ramezan and Bashash, Davood and Abolghasemi, Hassan and Aria, Mehrad", title="Deep Convolutional Neural Network--Based Computer-Aided Detection System for COVID-19 Using Multiple Lung Scans: Design and Implementation Study", journal="J Med Internet Res", year="2021", month="Apr", day="26", volume="23", number="4", pages="e27468", keywords="artificial intelligence", keywords="classification", keywords="computer-aided detection", keywords="computed tomography scan", keywords="convolutional neural network", keywords="coronavirus", keywords="COVID-19", keywords="deep learning", keywords="machine learning", keywords="machine vision", keywords="model", keywords="pandemic", abstract="Background: Owing to the COVID-19 pandemic and the imminent collapse of health care systems following the exhaustion of financial, hospital, and medicinal resources, the World Health Organization changed the alert level of the COVID-19 pandemic from high to very high. Meanwhile, more cost-effective and precise COVID-19 detection methods are being preferred worldwide. Objective: Machine vision--based COVID-19 detection methods, especially deep learning as a diagnostic method in the early stages of the pandemic, have been assigned great importance during the pandemic. This study aimed to design a highly efficient computer-aided detection (CAD) system for COVID-19 by using a neural search architecture network (NASNet)--based algorithm. Methods: NASNet, a state-of-the-art pretrained convolutional neural network for image feature extraction, was adopted to identify patients with COVID-19 in their early stages of the disease. A local data set, comprising 10,153 computed tomography scans of 190 patients with and 59 without COVID-19 was used. Results: After fitting on the training data set, hyperparameter tuning, and topological alterations of the classifier block, the proposed NASNet-based model was evaluated on the test data set and yielded remarkable results. The proposed model's performance achieved a detection sensitivity, specificity, and accuracy of 0.999, 0.986, and 0.996, respectively. Conclusions: The proposed model achieved acceptable results in the categorization of 2 data classes. Therefore, a CAD system was designed on the basis of this model for COVID-19 detection using multiple lung computed tomography scans. The system differentiated all COVID-19 cases from non--COVID-19 ones without any error in the application phase. Overall, the proposed deep learning--based CAD system can greatly help radiologists detect COVID-19 in its early stages. During the COVID-19 pandemic, the use of a CAD system as a screening tool would accelerate disease detection and prevent the loss of health care resources. ", doi="10.2196/27468", url="https://www.jmir.org/2021/4/e27468", url="http://www.ncbi.nlm.nih.gov/pubmed/33848973" } @Article{info:doi/10.2196/21459, author="Her, Qoua and Kent, Thomas and Samizo, Yuji and Slavkovic, Aleksandra and Vilk, Yury and Toh, Sengwee", title="Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study", journal="JMIR Med Inform", year="2021", month="Apr", day="23", volume="9", number="4", pages="e21459", keywords="distributed regression analysis", keywords="distributed data networks", keywords="privacy-protecting analytics", keywords="vertically partitioned data", keywords="informatics", keywords="data networks", keywords="data", abstract="Background: In clinical research, important variables may be collected from multiple data sources. Physical pooling of patient-level data from multiple sources often raises several challenges, including proper protection of patient privacy and proprietary interests. We previously developed an SAS-based package to perform distributed regression---a suite of privacy-protecting methods that perform multivariable-adjusted regression analysis using only summary-level information---with horizontally partitioned data, a setting where distinct cohorts of patients are available from different data sources. We integrated the package with PopMedNet, an open-source file transfer software, to facilitate secure file transfer between the analysis center and the data-contributing sites. The feasibility of using PopMedNet to facilitate distributed regression analysis (DRA) with vertically partitioned data, a setting where the data attributes from a cohort of patients are available from different data sources, was unknown. Objective: The objective of the study was to describe the feasibility of using PopMedNet and enhancements to PopMedNet to facilitate automatable vertical DRA (vDRA) in real-world settings. Methods: We gathered the statistical and informatic requirements of using PopMedNet to facilitate automatable vDRA. We enhanced PopMedNet based on these requirements to improve its technical capability to support vDRA. Results: PopMedNet can enable automatable vDRA. We identified and implemented two enhancements to PopMedNet that improved its technical capability to perform automatable vDRA in real-world settings. The first was the ability to simultaneously upload and download multiple files, and the second was the ability to directly transfer summary-level information between the data-contributing sites without a third-party analysis center. Conclusions: PopMedNet can be used to facilitate automatable vDRA to protect patient privacy and support clinical research in real-world settings. ", doi="10.2196/21459", url="https://medinform.jmir.org/2021/4/e21459", url="http://www.ncbi.nlm.nih.gov/pubmed/33890866" } @Article{info:doi/10.2196/25759, author="Yin, Jiamin and Ngiam, Yuan Kee and Teo, Hai Hock", title="Role of Artificial Intelligence Applications in Real-Life Clinical Practice: Systematic Review", journal="J Med Internet Res", year="2021", month="Apr", day="22", volume="23", number="4", pages="e25759", keywords="artificial intelligence", keywords="machine learning", keywords="deep learning", keywords="system implementation", keywords="clinical practice", keywords="review", abstract="Background: Artificial intelligence (AI) applications are growing at an unprecedented pace in health care, including disease diagnosis, triage or screening, risk analysis, surgical operations, and so forth. Despite a great deal of research in the development and validation of health care AI, only few applications have been actually implemented at the frontlines of clinical practice. Objective: The objective of this study was to systematically review AI applications that have been implemented in real-life clinical practice. Methods: We conducted a literature search in PubMed, Embase, Cochrane Central, and CINAHL to identify relevant articles published between January 2010 and May 2020. We also hand searched premier computer science journals and conferences as well as registered clinical trials. Studies were included if they reported AI applications that had been implemented in real-world clinical settings. Results: We identified 51 relevant studies that reported the implementation and evaluation of AI applications in clinical practice, of which 13 adopted a randomized controlled trial design and eight adopted an experimental design. The AI applications targeted various clinical tasks, such as screening or triage (n=16), disease diagnosis (n=16), risk analysis (n=14), and treatment (n=7). The most commonly addressed diseases and conditions were sepsis (n=6), breast cancer (n=5), diabetic retinopathy (n=4), and polyp and adenoma (n=4). Regarding the evaluation outcomes, we found that 26 studies examined the performance of AI applications in clinical settings, 33 studies examined the effect of AI applications on clinician outcomes, 14 studies examined the effect on patient outcomes, and one study examined the economic impact associated with AI implementation. Conclusions: This review indicates that research on the clinical implementation of AI applications is still at an early stage despite the great potential. More research needs to assess the benefits and challenges associated with clinical AI applications through a more rigorous methodology. ", doi="10.2196/25759", url="https://www.jmir.org/2021/4/e25759", url="http://www.ncbi.nlm.nih.gov/pubmed/33885365" } @Article{info:doi/10.2196/22797, author="Kim, Youngjun and Heider, M. Paul and Lally, RH Isabel and Meystre, M. St{\'e}phane", title="A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System", journal="JMIR Med Inform", year="2021", month="Apr", day="22", volume="9", number="4", pages="e22797", keywords="natural language processing", keywords="machine learning", keywords="deep learning", keywords="named entity recognition", keywords="clinical entity identification", keywords="relation extraction", abstract="Background: Family history information is important to assess the risk of inherited medical conditions. Natural language processing has the potential to extract this information from unstructured free-text notes to improve patient care and decision making. We describe the end-to-end information extraction system the Medical University of South Carolina team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) shared task. Objective: This task involves identifying mentions of family members and observations in electronic health record text notes and recognizing the 2 types of relations (family member-living status relations and family member-observation relations). Our system aims to achieve a high level of performance by integrating heuristics and advanced information extraction methods. Our efforts also include improving the performance of 2 subtasks by exploiting additional labeled data and clinical text-based embedding models. Methods: We present a hybrid method that combines machine learning and rule-based approaches. We implemented an end-to-end system with multiple information extraction and attribute classification components. For entity identification, we trained bidirectional long short-term memory deep learning models. These models incorporated static word embeddings and context-dependent embeddings. We created a voting ensemble that combined the predictions of all individual models. For relation extraction, we trained 2 relation extraction models. The first model determined the living status of each family member. The second model identified observations associated with each family member. We implemented online gradient descent models to extract related entity pairs. As part of postchallenge efforts, we used the BioCreative/OHNLP 2018 corpus and trained new models with the union of these 2 datasets. We also pretrained language models using clinical notes from the Medical Information Mart for Intensive Care (MIMIC-III) clinical database. Results: The voting ensemble achieved better performance than individual classifiers. In the entity identification task, our top-performing system reached a precision of 78.90\% and a recall of 83.84\%. Our natural language processing system for entity identification took 3rd place out of 17 teams in the challenge. We ranked 4th out of 9 teams in the relation extraction task. Our system substantially benefited from the combination of the 2 datasets. Compared to our official submission with F1 scores of 81.30\% and 64.94\% for entity identification and relation extraction, respectively, the revised system yielded significantly better performance (P<.05) with F1 scores of 86.02\% and 72.48\%, respectively. Conclusions: We demonstrated that a hybrid model could be used to successfully extract family history information recorded in unstructured free-text notes. In this study, our approach to entity identification as a sequence labeling problem produced satisfactory results. Our postchallenge efforts significantly improved performance by leveraging additional labeled data and using word vector representations learned from large collections of clinical notes. ", doi="10.2196/22797", url="https://medinform.jmir.org/2021/4/e22797", url="http://www.ncbi.nlm.nih.gov/pubmed/33885370" } @Article{info:doi/10.2196/25066, author="Cummings, C. Brandon and Ansari, Sardar and Motyka, R. Jonathan and Wang, Guan and Medlin Jr, P. Richard and Kronick, L. Steven and Singh, Karandeep and Park, K. Pauline and Napolitano, M. Lena and Dickson, P. Robert and Mathis, R. Michael and Sjoding, W. Michael and Admon, J. Andrew and Blank, Ross and McSparron, I. Jakob and Ward, R. Kevin and Gillies, E. Christopher", title="Predicting Intensive Care Transfers and Other Unforeseen Events: Analytic Model Validation Study and Comparison to Existing Methods", journal="JMIR Med Inform", year="2021", month="Apr", day="21", volume="9", number="4", pages="e25066", keywords="COVID-19", keywords="biomedical informatics", keywords="critical care", keywords="machine learning", keywords="deterioration", keywords="predictive analytics", keywords="informatics", keywords="prediction", keywords="intensive care unit", keywords="ICU", keywords="mortality", abstract="Background: COVID-19 has led to an unprecedented strain on health care facilities across the United States. Accurately identifying patients at an increased risk of deterioration may help hospitals manage their resources while improving the quality of patient care. Here, we present the results of an analytical model, Predicting Intensive Care Transfers and Other Unforeseen Events (PICTURE), to identify patients at high risk for imminent intensive care unit transfer, respiratory failure, or death, with the intention to improve the prediction of deterioration due to COVID-19. Objective: This study aims to validate the PICTURE model's ability to predict unexpected deterioration in general ward and COVID-19 patients, and to compare its performance with the Epic Deterioration Index (EDI), an existing model that has recently been assessed for use in patients with COVID-19. Methods: The PICTURE model was trained and validated on a cohort of hospitalized non--COVID-19 patients using electronic health record data from 2014 to 2018. It was then applied to two holdout test sets: non--COVID-19 patients from 2019 and patients testing positive for COVID-19 in 2020. PICTURE results were aligned to EDI and NEWS scores for head-to-head comparison via area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve. We compared the models' ability to predict an adverse event (defined as intensive care unit transfer, mechanical ventilation use, or death). Shapley values were used to provide explanations for PICTURE predictions. Results: In non--COVID-19 general ward patients, PICTURE achieved an AUROC of 0.819 (95\% CI 0.805-0.834) per observation, compared to the EDI's AUROC of 0.763 (95\% CI 0.746-0.781; n=21,740; P<.001). In patients testing positive for COVID-19, PICTURE again outperformed the EDI with an AUROC of 0.849 (95\% CI 0.820-0.878) compared to the EDI's AUROC of 0.803 (95\% CI 0.772-0.838; n=607; P<.001). The most important variables influencing PICTURE predictions in the COVID-19 cohort were a rapid respiratory rate, a high level of oxygen support, low oxygen saturation, and impaired mental status (Glasgow Coma Scale). Conclusions: The PICTURE model is more accurate in predicting adverse patient outcomes for both general ward patients and COVID-19 positive patients in our cohorts compared to the EDI. The ability to consistently anticipate these events may be especially valuable when considering potential incipient waves of COVID-19 infections. The generalizability of the model will require testing in other health care systems for validation. ", doi="10.2196/25066", url="https://medinform.jmir.org/2021/4/e25066", url="http://www.ncbi.nlm.nih.gov/pubmed/33818393" } @Article{info:doi/10.2196/23587, author="Zhan, Kecheng and Peng, Weihua and Xiong, Ying and Fu, Huhao and Chen, Qingcai and Wang, Xiaolong and Tang, Buzhou", title="Novel Graph-Based Model With Biaffine Attention for Family History Extraction From Clinical Text: Modeling Study", journal="JMIR Med Inform", year="2021", month="Apr", day="21", volume="9", number="4", pages="e23587", keywords="family history information", keywords="named entity recognition", keywords="relation extraction", keywords="deep biaffine attention", abstract="Background: Family history information, including information on family members, side of the family of family members, living status of family members, and observations of family members, plays an important role in disease diagnosis and treatment. Family member information extraction aims to extract family history information from semistructured/unstructured text in electronic health records (EHRs), which is a challenging task regarding named entity recognition (NER) and relation extraction (RE), where named entities refer to family members, living status, and observations, and relations refer to relations between family members and living status, and relations between family members and observations. Objective: This study aimed to introduce the system we developed for the 2019 n2c2/OHNLP track on family history extraction, which can jointly extract entities and relations about family history information from clinical text. Methods: We proposed a novel graph-based model with biaffine attention for family history extraction from clinical text. In this model, we first designed a graph to represent family history information, that is, representing NER and RE regarding family history in a unified way, and then introduced a biaffine attention mechanism to extract family history information in clinical text. Convolution neural network (CNN)-Bidirectional Long Short Term Memory network (BiLSTM) and Bidirectional Encoder Representation from Transformers (BERT) were used to encode the input sentence, and a biaffine classifier was used to extract family history information. In addition, we developed a postprocessing module to adjust the results. A system based on the proposed method was developed for the 2019 n2c2/OHNLP shared task track on family history information extraction. Results: Our system ranked first in the challenge, and the F1 scores of the best system on the NER subtask and RE subtask were 0.8745 and 0.6810, respectively. After the challenge, we further fine tuned the parameters and improved the F1 scores of the two subtasks to 0.8823 and 0.7048, respectively. Conclusions: The experimental results showed that the system based on the proposed method can extract family history information from clinical text effectively. ", doi="10.2196/23587", url="https://medinform.jmir.org/2021/4/e23587", url="http://www.ncbi.nlm.nih.gov/pubmed/33881405" } @Article{info:doi/10.2196/27060, author="Chung, Heewon and Ko, Hoon and Kang, Seong Wu and Kim, Won Kyung and Lee, Hooseok and Park, Chul and Song, Hyun-Ok and Choi, Tae-Young and Seo, Ho Jae and Lee, Jinseok", title="Prediction and Feature Importance Analysis for Severity of COVID-19 in South Korea Using Artificial Intelligence: Model Development and Validation", journal="J Med Internet Res", year="2021", month="Apr", day="19", volume="23", number="4", pages="e27060", keywords="COVID-19", keywords="artificial intelligence", keywords="blood samples", keywords="mortality prediction", abstract="Background: The number of deaths from COVID-19 continues to surge worldwide. In particular, if a patient's condition is sufficiently severe to require invasive ventilation, it is more likely to lead to death than to recovery. Objective: The goal of our study was to analyze the factors related to COVID-19 severity in patients and to develop an artificial intelligence (AI) model to predict the severity of COVID-19 at an early stage. Methods: We developed an AI model that predicts severity based on data from 5601 COVID-19 patients from all national and regional hospitals across South Korea as of April 2020. The clinical severity of COVID-19 was divided into two categories: low and high severity. The condition of patients in the low-severity group corresponded to no limit of activity, oxygen support with nasal prong or facial mask, and noninvasive ventilation. The condition of patients in the high-severity group corresponded to invasive ventilation, multi-organ failure with extracorporeal membrane oxygenation required, and death. For the AI model input, we used 37 variables from the medical records, including basic patient information, a physical index, initial examination findings, clinical findings, comorbid diseases, and general blood test results at an early stage. Feature importance analysis was performed with AdaBoost, random forest, and eXtreme Gradient Boosting (XGBoost); the AI model for predicting COVID-19 severity among patients was developed with a 5-layer deep neural network (DNN) with the 20 most important features, which were selected based on ranked feature importance analysis of 37 features from the comprehensive data set. The selection procedure was performed using sensitivity, specificity, accuracy, balanced accuracy, and area under the curve (AUC). Results: We found that age was the most important factor for predicting disease severity, followed by lymphocyte level, platelet count, and shortness of breath or dyspnea. Our proposed 5-layer DNN with the 20 most important features provided high sensitivity (90.2\%), specificity (90.4\%), accuracy (90.4\%), balanced accuracy (90.3\%), and AUC (0.96). Conclusions: Our proposed AI model with the selected features was able to predict the severity of COVID-19 accurately. We also made a web application so that anyone can access the model. We believe that sharing the AI model with the public will be helpful in validating and improving its performance. ", doi="10.2196/27060", url="https://www.jmir.org/2021/4/e27060", url="http://www.ncbi.nlm.nih.gov/pubmed/33764883" } @Article{info:doi/10.2196/24996, author="Lv, Haichen and Yang, Xiaolei and Wang, Bingyi and Wang, Shaobo and Du, Xiaoyan and Tan, Qian and Hao, Zhujing and Liu, Ying and Yan, Jun and Xia, Yunlong", title="Machine Learning--Driven Models to Predict Prognostic Outcomes in Patients Hospitalized With Heart Failure Using Electronic Health Records: Retrospective Study", journal="J Med Internet Res", year="2021", month="Apr", day="19", volume="23", number="4", pages="e24996", keywords="heart failure", keywords="machine learning", keywords="predictive modeling", keywords="mortality", keywords="positive inotropic agents", keywords="readmission", abstract="Background: With the prevalence of cardiovascular diseases increasing worldwide, early prediction and accurate assessment of heart failure (HF) risk are crucial to meet the clinical demand. Objective: Our study objective was to develop machine learning (ML) models based on real-world electronic health records to predict 1-year in-hospital mortality, use of positive inotropic agents, and 1-year all-cause readmission rate. Methods: For this single-center study, we recruited patients with newly diagnosed HF hospitalized between December 2010 and August 2018 at the First Affiliated Hospital of Dalian Medical University (Liaoning Province, China). The models were constructed for a population set (90:10 split of data set into training and test sets) using 79 variables during the first hospitalization. Logistic regression, support vector machine, artificial neural network, random forest, and extreme gradient boosting models were investigated for outcome predictions. Results: Of the 13,602 patients with HF enrolled in the study, 537 (3.95\%) died within 1 year and 2779 patients (20.43\%) had a history of use of positive inotropic agents. ML algorithms improved the performance of predictive models for 1-year in-hospital mortality (areas under the curve [AUCs] 0.92-1.00), use of positive inotropic medication (AUCs 0.85-0.96), and 1-year readmission rates (AUCs 0.63-0.96). A decision tree of mortality risk was created and stratified by single variables at levels of high-sensitivity cardiac troponin I (<0.068 $\mu$g/L), followed by percentage of lymphocytes (<14.688\%) and neutrophil count (4.870{\texttimes}109/L). Conclusions: ML techniques based on a large scale of clinical variables can improve outcome predictions for patients with HF. The mortality decision tree may contribute to guiding better clinical risk assessment and decision making. ", doi="10.2196/24996", url="https://www.jmir.org/2021/4/e24996", url="http://www.ncbi.nlm.nih.gov/pubmed/33871375" } @Article{info:doi/10.2196/25852, author="Oh, Bumjo and Hwangbo, Suhyun and Jung, Taeyeong and Min, Kyungha and Lee, Chanhee and Apio, Catherine and Lee, Hyejin and Lee, Seungyeoun and Moon, Kyong Min and Kim, Shin-Woo and Park, Taesung", title="Prediction Models for the Clinical Severity of Patients With COVID-19 in Korea: Retrospective Multicenter Cohort Study", journal="J Med Internet Res", year="2021", month="Apr", day="16", volume="23", number="4", pages="e25852", keywords="clinical decision support system", keywords="clinical characteristics", keywords="COVID-19", keywords="SARS-CoV-2", keywords="prognostic tool", keywords="severity", abstract="Background: Limited information is available about the present characteristics and dynamic clinical changes that occur in patients with COVID-19 during the early phase of the illness. Objective: This study aimed to develop and validate machine learning models based on clinical features to assess the risk of severe disease and triage for COVID-19 patients upon hospital admission. Methods: This retrospective multicenter cohort study included patients with COVID-19 who were released from quarantine until April 30, 2020, in Korea. A total of 5628 patients were included in the training and testing cohorts to train and validate the models that predict clinical severity and the duration of hospitalization, and the clinical severity score was defined at four levels: mild, moderate, severe, and critical. Results: Out of a total of 5601 patients, 4455 (79.5\%), 330 (5.9\%), 512 (9.1\%), and 301 (5.4\%) were included in the mild, moderate, severe, and critical levels, respectively. As risk factors for predicting critical patients, we selected older age, shortness of breath, a high white blood cell count, low hemoglobin levels, a low lymphocyte count, and a low platelet count. We developed 3 prediction models to classify clinical severity levels. For example, the prediction model with 6 variables yielded a predictive power of >0.93 for the area under the receiver operating characteristic curve. We developed a web-based nomogram, using these models. Conclusions: Our prediction models, along with the web-based nomogram, are expected to be useful for the assessment of the onset of severe and critical illness among patients with COVID-19 and triage patients upon hospital admission. ", doi="10.2196/25852", url="https://www.jmir.org/2021/4/e25852", url="http://www.ncbi.nlm.nih.gov/pubmed/33822738" } @Article{info:doi/10.2196/22796, author="Tong, Yao and Messinger, I. Amanda and Wilcox, B. Adam and Mooney, D. Sean and Davidson, H. Giana and Suri, Pradeep and Luo, Gang", title="Forecasting Future Asthma Hospital Encounters of Patients With Asthma in an Academic Health Care System: Predictive Model Development and Secondary Analysis Study", journal="J Med Internet Res", year="2021", month="Apr", day="16", volume="23", number="4", pages="e22796", keywords="asthma", keywords="forecasting", keywords="machine learning", keywords="patient care management", keywords="risk factors", abstract="Background: Asthma affects a large proportion of the population and leads to many hospital encounters involving both hospitalizations and emergency department visits every year. To lower the number of such encounters, many health care systems and health plans deploy predictive models to prospectively identify patients at high risk and offer them care management services for preventive care. However, the previous models do not have sufficient accuracy for serving this purpose well. Embracing the modeling strategy of examining many candidate features, we built a new machine learning model to forecast future asthma hospital encounters of patients with asthma at Intermountain Healthcare, a nonacademic health care system. This model is more accurate than the previously published models. However, it is unclear how well our modeling strategy generalizes to academic health care systems, whose patient composition differs from that of Intermountain Healthcare. Objective: This study aims to evaluate the generalizability of our modeling strategy to the University of Washington Medicine (UWM), an academic health care system. Methods: All adult patients with asthma who visited UWM facilities between 2011 and 2018 served as the patient cohort. We considered 234 candidate features. Through a secondary analysis of 82,888 UWM data instances from 2011 to 2018, we built a machine learning model to forecast asthma hospital encounters of patients with asthma in the subsequent 12 months. Results: Our UWM model yielded an area under the receiver operating characteristic curve (AUC) of 0.902. When placing the cutoff point for making binary classification at the top 10\% (1464/14,644) of patients with asthma with the largest forecasted risk, our UWM model yielded an accuracy of 90.6\% (13,268/14,644), a sensitivity of 70.2\% (153/218), and a specificity of 90.91\% (13,115/14,426). Conclusions: Our modeling strategy showed excellent generalizability to the UWM, leading to a model with an AUC that is higher than all of the AUCs previously reported in the literature for forecasting asthma hospital encounters. After further optimization, our model could be used to facilitate the efficient and effective allocation of asthma care management resources to improve outcomes. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.5039 ", doi="10.2196/22796", url="https://www.jmir.org/2021/4/e22796", url="http://www.ncbi.nlm.nih.gov/pubmed/33861206" } @Article{info:doi/10.2196/24120, author="Kim, Kipyo and Yang, Hyeonsik and Yi, Jinyeong and Son, Hyung-Eun and Ryu, Ji-Young and Kim, Chul Yong and Jeong, Cheol Jong and Chin, Jun Ho and Na, Young Ki and Chae, Dong-Wan and Han, Seok Seung and Kim, Sejoong", title="Real-Time Clinical Decision Support Based on Recurrent Neural Networks for In-Hospital Acute Kidney Injury: External Validation and Model Interpretation", journal="J Med Internet Res", year="2021", month="Apr", day="16", volume="23", number="4", pages="e24120", keywords="acute kidney injury", keywords="recurrent neural network", keywords="prediction model", keywords="external validation", keywords="internal validation", keywords="kidney", keywords="neural networks", abstract="Background: Acute kidney injury (AKI) is commonly encountered in clinical practice and is associated with poor patient outcomes and increased health care costs. Despite it posing significant challenges for clinicians, effective measures for AKI prediction and prevention are lacking. Previously published AKI prediction models mostly have a simple design without external validation. Furthermore, little is known about the process of linking model output and clinical decisions due to the black-box nature of neural network models. Objective: We aimed to present an externally validated recurrent neural network (RNN)--based continuous prediction model for in-hospital AKI and show applicable model interpretations in relation to clinical decision support. Methods: Study populations were all patients aged 18 years or older who were hospitalized for more than 48 hours between 2013 and 2017 in 2 tertiary hospitals in Korea (Seoul National University Bundang Hospital and Seoul National University Hospital). All demographic data, laboratory values, vital signs, and clinical conditions of patients were obtained from electronic health records of each hospital. We developed 2-stage hierarchical prediction models (model 1 and model 2) using RNN algorithms. The outcome variable for model 1 was the occurrence of AKI within 7 days from the present. Model 2 predicted the future trajectory of creatinine values up to 72 hours. The performance of each developed model was evaluated using the internal and external validation data sets. For the explainability of our models, different model-agnostic interpretation methods were used, including Shapley Additive Explanations, partial dependence plots, individual conditional expectation, and accumulated local effects plots. Results: We included 69,081 patients in the training, 7675 in the internal validation, and 72,352 in the external validation cohorts for model development after excluding cases with missing data and those with an estimated glomerular filtration rate less than 15 mL/min/1.73 m2 or end-stage kidney disease. Model 1 predicted any AKI development with an area under the receiver operating characteristic curve (AUC) of 0.88 (internal validation) and 0.84 (external validation), and stage 2 or higher AKI development with an AUC of 0.93 (internal validation) and 0.90 (external validation). Model 2 predicted the future creatinine values within 3 days with mean-squared errors of 0.04-0.09 for patients with higher risks of AKI and 0.03-0.08 for those with lower risks. Based on the developed models, we showed AKI probability according to feature values in total patients and each individual with partial dependence, accumulated local effects, and individual conditional expectation plots. We also estimated the effects of feature modifications such as nephrotoxic drug discontinuation on future creatinine levels. Conclusions: We developed and externally validated a continuous AKI prediction model using RNN algorithms. Our model could provide real-time assessment of future AKI occurrences and individualized risk factors for AKI in general inpatient cohorts; thus, we suggest approaches to support clinical decisions based on prediction models for in-hospital AKI. ", doi="10.2196/24120", url="https://www.jmir.org/2021/4/e24120", url="http://www.ncbi.nlm.nih.gov/pubmed/33861200" } @Article{info:doi/10.2196/25657, author="Tremoulet, D. Patrice and Shah, D. Priyanka and Acosta, A. Alisha and Grant, W. Christian and Kurtz, T. Jon and Mounas, Peter and Kirchhoff, Michael and Wade, Elizabeth", title="Usability of Electronic Health Record--Generated Discharge Summaries: Heuristic Evaluation", journal="J Med Internet Res", year="2021", month="Apr", day="15", volume="23", number="4", pages="e25657", keywords="discharge summary", keywords="usability", keywords="electronic health record (EHR)", keywords="care coordination", keywords="elderly patients", keywords="patient safety", keywords="heuristic evaluation", keywords="human factors", abstract="Background: Obtaining accurate clinical information about recent acute care visits is extremely important for outpatient providers. However, documents used to communicate this information are often difficult to use. This puts patients at risk of adverse events. Elderly patients who are seen by more providers and have more care transitions are especially vulnerable. Objective: This study aimed to (1) identify the information about elderly patients' recent acute care visits needed to coordinate their care, (2) use this information to assess discharge summaries, and (3) provide recommendations to help improve the quality of electronic health record (EHR)--generated discharge summaries, thereby increasing patient safety. Methods: A literature review, clinician interviews, and a survey of outpatient providers were used to identify and categorize data needed to coordinate care for recently discharged elderly patients. Based upon those data, 2 guidelines for creating useful discharge summaries were created. The new guidelines, along with 17 previously developed medical documentation usability heuristics, were applied to assess 4 simulated elderly patient discharge summaries. Results: The initial research effort yielded a list of 29 items that should always be included in elderly patient discharge summaries and a list of 7 ``helpful, but not always necessary'' items. Evaluation of 4 deidentified elderly patient discharge summaries revealed that none of the documents contained all 36 necessary items; between 14 and 18 were missing. The documents each had several other issues, and they differed significantly in organization, layout, and formatting. Conclusions: Variations in content and structure of discharge summaries in the United States make them unnecessarily difficult to use. Standardization would benefit both patients, by lowering the risk of care transition--related adverse events, and outpatient providers, by helping reduce frustration that can contribute to burnout. In the short term, acute care providers can help improve the quality of their discharge summaries by working with EHR vendors to follow recommendations based upon this study. Meanwhile, additional human factors work should determine the most effective way to organize and present information in discharge summaries, to facilitate effective standardization. ", doi="10.2196/25657", url="https://www.jmir.org/2021/4/e25657", url="http://www.ncbi.nlm.nih.gov/pubmed/33856353" } @Article{info:doi/10.2196/18803, author="Liu, Xiaoli and Liu, Tongbo and Zhang, Zhengbo and Kuo, Po-Chih and Xu, Haoran and Yang, Zhicheng and Lan, Ke and Li, Peiyao and Ouyang, Zhenchao and Ng, Lam Yeuk and Yan, Wei and Li, Deyu", title="TOP-Net Prediction Model Using Bidirectional Long Short-term Memory and Medical-Grade Wearable Multisensor System for Tachycardia Onset: Algorithm Development Study", journal="JMIR Med Inform", year="2021", month="Apr", day="15", volume="9", number="4", pages="e18803", keywords="tachycardia onset", keywords="early prediction", keywords="deep neural network", keywords="wearable monitoring system", keywords="electronic health record", abstract="Background: Without timely diagnosis and treatment, tachycardia, also called tachyarrhythmia, can cause serious complications such as heart failure, cardiac arrest, and even death. The predictive performance of conventional clinical diagnostic procedures needs improvement in order to assist physicians in detecting risk early on. Objective: We aimed to develop a deep tachycardia onset prediction (TOP-Net) model based on deep learning (ie, bidirectional long short-term memory) for early tachycardia diagnosis with easily accessible data. Methods: TOP-Net leverages 2 easily accessible data sources: vital signs, including heart rate, respiratory rate, and blood oxygen saturation (SpO2) acquired continuously by wearable embedded systems, and electronic health records, containing age, gender, admission type, first care unit, and cardiovascular disease history. The model was trained with a large data set from an intensive care unit and then transferred to a real-world scenario in the general ward. In this study, 3 experiments incorporated merging patients' personal information, temporal memory, and different feature combinations. Six metrics (area under the receiver operating characteristic curve [AUROC], sensitivity, specificity, accuracy, F1 score, and precision) were used to evaluate predictive performance. Results: TOP-Net outperformed the baseline models on the large critical care data set (AUROC 0.796, 95\% CI 0.768-0.824; sensitivity 0.753, 95\% CI 0.663-0.793; specificity 0.720, 95\% CI 0.645-0.758; accuracy 0.721; F1 score 0.718; precision 0.686) when predicting tachycardia onset 6 hours in advance. When predicting tachycardia onset 2 hours in advance with data acquired from our hospital using the transferred TOP-Net, the 6 metrics were 0.965, 0.955, 0.881, 0.937, 0.793, and 0.680, respectively. The best performance was achieved using comprehensive vital signs (heart rate, respiratory rate, and SpO2) statistical information. Conclusions: TOP-Net is an early tachycardia prediction model that uses 8 types of data from wearable sensors and electronic health records. When validated in clinical scenarios, the model achieved a prediction performance that outperformed baseline models 0 to 6 hours before tachycardia onset in the intensive care unit and 2 hours before tachycardia onset in the general ward. Because of the model's implementation and use of easily accessible data from wearable sensors, the model can assist physicians with early discovery of patients at risk in general wards and houses. ", doi="10.2196/18803", url="https://medinform.jmir.org/2021/4/e18803", url="http://www.ncbi.nlm.nih.gov/pubmed/33856350" } @Article{info:doi/10.2196/24153, author="Luo, Gang and Nau, L. Claudia and Crawford, W. William and Schatz, Michael and Zeiger, S. Robert and Koebnick, Corinna", title="Generalizability of an Automatic Explanation Method for Machine Learning Prediction Results on Asthma-Related Hospital Visits in Patients With Asthma: Quantitative Analysis", journal="J Med Internet Res", year="2021", month="Apr", day="15", volume="23", number="4", pages="e24153", keywords="asthma", keywords="forecasting", keywords="patient care management", keywords="machine learning", abstract="Background: Asthma exerts a substantial burden on patients and health care systems. To facilitate preventive care for asthma management and improve patient outcomes, we recently developed two machine learning models, one on Intermountain Healthcare data and the other on Kaiser Permanente Southern California (KPSC) data, to forecast asthma-related hospital visits, including emergency department visits and hospitalizations, in the succeeding 12 months among patients with asthma. As is typical for machine learning approaches, these two models do not explain their forecasting results. To address the interpretability issue of black-box models, we designed an automatic method to offer rule format explanations for the forecasting results of any machine learning model on imbalanced tabular data and to suggest customized interventions with no accuracy loss. Our method worked well for explaining the forecasting results of our Intermountain Healthcare model, but its generalizability to other health care systems remains unknown. Objective: The objective of this study is to evaluate the generalizability of our automatic explanation method to KPSC for forecasting asthma-related hospital visits. Methods: Through a secondary analysis of 987,506 data instances from 2012 to 2017 at KPSC, we used our method to explain the forecasting results of our KPSC model and to suggest customized interventions. The patient cohort covered a random sample of 70\% of patients with asthma who had a KPSC health plan for any period between 2015 and 2018. Results: Our method explained the forecasting results for 97.57\% (2204/2259) of the patients with asthma who were correctly forecasted to undergo asthma-related hospital visits in the succeeding 12 months. Conclusions: For forecasting asthma-related hospital visits, our automatic explanation method exhibited an acceptable generalizability to KPSC. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.5039 ", doi="10.2196/24153", url="https://www.jmir.org/2021/4/e24153", url="http://www.ncbi.nlm.nih.gov/pubmed/33856359" } @Article{info:doi/10.2196/25053, author="Bang, Seok Chang and Ahn, Yong Ji and Kim, Jie-Hyun and Kim, Young-Il and Choi, Ju Il and Shin, Geon Woon", title="Establishing Machine Learning Models to Predict Curative Resection in Early Gastric Cancer with Undifferentiated Histology: Development and Usability Study", journal="J Med Internet Res", year="2021", month="Apr", day="15", volume="23", number="4", pages="e25053", keywords="early gastric cancer", keywords="artificial intelligence", keywords="machine learning", keywords="endoscopic submucosal dissection", keywords="undifferentiated", keywords="gastric cancer", keywords="endoscopy", keywords="dissection", abstract="Background: Undifferentiated type of early gastric cancer (U-EGC) is included among the expanded indications of endoscopic submucosal dissection (ESD); however, the rate of curative resection remains unsatisfactory. Endoscopists predict the probability of curative resection by considering the size and shape of the lesion and whether ulcers are present or not. The location of the lesion, indicating the likely technical difficulty, is also considered. Objective: The aim of this study was to establish machine learning (ML) models to better predict the possibility of curative resection in U-EGC prior to ESD. Methods: A nationwide cohort of 2703 U-EGCs treated by ESD or surgery were adopted for the training and internal validation cohorts. Separately, an independent data set of the Korean ESD registry (n=275) and an Asan medical center data set (n=127) treated by ESD were chosen for external validation. Eighteen ML classifiers were selected to establish prediction models of curative resection with the following variables: age; sex; location, size, and shape of the lesion; and whether ulcers were present or not. Results: Among the 18 models, the extreme gradient boosting classifier showed the best performance (internal validation accuracy 93.4\%, 95\% CI 90.4\%-96.4\%; precision 92.6\%, 95\% CI 89.5\%-95.7\%; recall 99.0\%, 95\% CI 97.8\%-99.9\%; and F1 score 95.7\%, 95\% CI 93.3\%-98.1\%). Attempts at external validation showed substantial accuracy (first external validation 81.5\%, 95\% CI 76.9\%-86.1\% and second external validation 89.8\%, 95\% CI 84.5\%-95.1\%). Lesion size was the most important feature in each explainable artificial intelligence analysis. Conclusions: We established an ML model capable of accurately predicting the curative resection of U-EGC before ESD by considering the morphological and ecological characteristics of the lesions. ", doi="10.2196/25053", url="https://www.jmir.org/2021/4/e25053", url="http://www.ncbi.nlm.nih.gov/pubmed/33856358" } @Article{info:doi/10.2196/26390, author="Masya, Lindy and Shepherd, L. Heather and Butow, Phyllis and Geerligs, Liesbeth and Allison, C. Karen and Dolan, Colette and Prest, Gabrielle and and Shaw, Joanne", title="Impact of Individual, Organizational, and Technological Factors on the Implementation of an Online Portal to Support a Clinical Pathway Addressing Psycho-Oncology Care: Mixed Methods Study", journal="JMIR Hum Factors", year="2021", month="Apr", day="14", volume="8", number="2", pages="e26390", keywords="decision support systems", keywords="clinical decision making", keywords="psycho-oncology", keywords="health informatics", keywords="clinical pathways", keywords="health services research", abstract="Background: Clinical pathways (CPs) can improve patient outcomes but can be complex to implement. Technologies, such as clinical decision support (CDS) tools, can facilitate their use, but require end-user testing in clinical settings. Objective: This study applied the Technology Acceptance Model to evaluate the individual, organizational, and technological contexts impacting application of a portal to facilitate a CP for anxiety and depression (the ADAPT Portal) in a metropolitan cancer service. The ADAPT Portal triggers patient screening on patient reported outcomes, alerts staff to high scores, recommends evidence-based management, and triggers review and rescreening at set intervals. Methods: Quantitative and qualitative data on portal activity, data accuracy, and health service staff perspectives were collected. Quantitative data were analyzed descriptively, and thematic analysis was applied to qualitative data. Results: Overall, 15 (100\% of those invited) health service staff agreed to be interviewed. During the pilot, 73 users (36 health service staff members and 37 patients) were registered on the ADAPT Portal. Of the 37 patients registered, 16 (43\%) completed screening at least once, with seven screening positive and triaged appropriately. In total, 34 support requests were lodged, resulting in 17 portal enhancements (technical issues). Health service staff considered the ADAPT Portal easy to use and useful; however, some deemed it unnecessary or burdensome (individual issues), particularly in a busy cancer service (organizational issues). Conclusions: User testing of a CDS to facilitate screening and assessment of anxiety and depression in cancer patients highlighted some technological issues in implementing the ADAPT CDS, resulting in 17 enhancements. Our results highlight the importance of obtaining health service staff feedback when piloting specialized CDS tools and addressing contextual factors when implementing them. ", doi="10.2196/26390", url="https://humanfactors.jmir.org/2021/2/e26390", url="http://www.ncbi.nlm.nih.gov/pubmed/33851926" } @Article{info:doi/10.2196/26211, author="Dom{\'i}nguez-Olmedo, L. Juan and Gragera-Mart{\'i}nez, {\'A}lvaro and Mata, Jacinto and Pach{\'o}n {\'A}lvarez, Victoria", title="Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation", journal="J Med Internet Res", year="2021", month="Apr", day="14", volume="23", number="4", pages="e26211", keywords="COVID-19", keywords="electronic health record", keywords="machine learning", keywords="mortality", keywords="prediction", abstract="Background: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain's health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. Objective: This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. Methods: After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. Results: After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95\% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95\% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95\% CI 0.92-0.95) for accuracy, 0.77 (95\% CI 0.72-0.83) for the F-score, 0.93 (95\% CI 0.89-0.98) for sensitivity, and 0.91 (95\% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. Conclusions: Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality. ", doi="10.2196/26211", url="https://www.jmir.org/2021/4/e26211", url="http://www.ncbi.nlm.nih.gov/pubmed/33793407" } @Article{info:doi/10.2196/22397, author="Bittar, Andr{\'e} and Velupillai, Sumithra and Roberts, Angus and Dutta, Rina", title="Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis", journal="JMIR Med Inform", year="2021", month="Apr", day="13", volume="9", number="4", pages="e22397", keywords="psychiatry", keywords="suicide", keywords="suicide, attempted", keywords="risk assessment", keywords="electronic health records", keywords="sentiment analysis", keywords="natural language processing", keywords="corpus linguistics", abstract="Background: Suicide is a serious public health issue, accounting for 1.4\% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians' subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora. Objective: This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment. Methods: The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score. Results: The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful. Conclusions: Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non--suicide-related EHR texts. ", doi="10.2196/22397", url="https://medinform.jmir.org/2021/4/e22397", url="http://www.ncbi.nlm.nih.gov/pubmed/33847595" } @Article{info:doi/10.2196/25884, author="Aktar, Sakifa and Ahamad, Martuza Md and Rashed-Al-Mahfuz, Md and Azad, AKM and Uddin, Shahadat and Kamal, AHM and Alyami, A. Salem and Lin, Ping-I and Islam, Shariful Sheikh Mohammed and Quinn, MW Julian and Eapen, Valsamma and Moni, Ali Mohammad", title="Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development", journal="JMIR Med Inform", year="2021", month="Apr", day="13", volume="9", number="4", pages="e25884", keywords="COVID-19", keywords="blood samples", keywords="machine learning", keywords="statistical analysis", keywords="prediction", keywords="severity", keywords="mortality", keywords="morbidity", keywords="risk", keywords="blood", keywords="testing", keywords="outcome", keywords="data set", abstract="Background: Accurate prediction of the disease severity of patients with COVID-19 would greatly improve care delivery and resource allocation and thereby reduce mortality risks, especially in less developed countries. Many patient-related factors, such as pre-existing comorbidities, affect disease severity and can be used to aid this prediction. Objective: Because rapid automated profiling of peripheral blood samples is widely available, we aimed to investigate how data from the peripheral blood of patients with COVID-19 can be used to predict clinical outcomes. Methods: We investigated clinical data sets of patients with COVID-19 with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, k-nearest neighbor, and deep learning methods. Results: Our work revealed that several clinical parameters that are measurable in blood samples are factors that can discriminate between healthy people and COVID-19--positive patients, and we showed the value of these parameters in predicting later severity of COVID-19 symptoms. We developed a number of analytical methods that showed accuracy and precision scores >90\% for disease severity prediction. Conclusions: We developed methodologies to analyze routine patient clinical data that enable more accurate prediction of COVID-19 patient outcomes. With this approach, data from standard hospital laboratory analyses of patient blood could be used to identify patients with COVID-19 who are at high risk of mortality, thus enabling optimization of hospital facilities for COVID-19 treatment. ", doi="10.2196/25884", url="https://medinform.jmir.org/2021/4/e25884", url="http://www.ncbi.nlm.nih.gov/pubmed/33779565" } @Article{info:doi/10.2196/16651, author="Austrian, Jonathan and Mendoza, Felicia and Szerencsy, Adam and Fenelon, Lucille and Horwitz, I. Leora and Jones, Simon and Kuznetsova, Masha and Mann, M. Devin", title="Applying A/B Testing to Clinical Decision Support: Rapid Randomized Controlled Trials", journal="J Med Internet Res", year="2021", month="Apr", day="9", volume="23", number="4", pages="e16651", keywords="AB testing", keywords="randomized controlled trials", keywords="clinical decision support", keywords="clinical informatics", keywords="usability", keywords="alert fatigue", abstract="Background: Clinical decision support (CDS) is a valuable feature of electronic health records (EHRs) designed to improve quality and safety. However, due to the complexities of system design and inconsistent results, CDS tools may inadvertently increase alert fatigue and contribute to physician burnout. A/B testing, or rapid-cycle randomized tests, is a useful method that can be applied to the EHR in order to rapidly understand and iteratively improve design choices embedded within CDS tools. Objective: This paper describes how rapid randomized controlled trials (RCTs) embedded within EHRs can be used to quickly ascertain the superiority of potential CDS design changes to improve their usability, reduce alert fatigue, and promote quality of care. Methods: A multistep process combining tools from user-centered design, A/B testing, and implementation science was used to understand, ideate, prototype, test, analyze, and improve each candidate CDS. CDS engagement metrics (alert views, acceptance rates) were used to evaluate which CDS version is superior. Results: To demonstrate the impact of the process, 2 experiments are highlighted. First, after multiple rounds of usability testing, a revised CDS influenza alert was tested against usual care CDS in a rapid ({\textasciitilde}6 weeks) RCT. The new alert text resulted in minimal impact on reducing firings per patients per day, but this failure triggered another round of review that identified key technical improvements (ie, removal of dismissal button and firings in procedural areas) that led to a dramatic decrease in firings per patient per day (23.1 to 7.3). In the second experiment, the process was used to test 3 versions (financial, quality, regulatory) of text supporting tobacco cessation alerts as well as 3 supporting images. Based on 3 rounds of RCTs, there was no significant difference in acceptance rates based on the framing of the messages or addition of images. Conclusions: These experiments support the potential for this new process to rapidly develop, deploy, and rigorously evaluate CDS within an EHR. We also identified important considerations in applying these methods. This approach may be an important tool for improving the impact of and experience with CDS. Trial Registration: Flu alert trial: ClinicalTrials.gov NCT03415425; https://clinicaltrials.gov/ct2/show/NCT03415425. Tobacco alert trial: ClinicalTrials.gov NCT03714191; https://clinicaltrials.gov/ct2/show/NCT03714191 ", doi="10.2196/16651", url="https://www.jmir.org/2021/4/e16651", url="http://www.ncbi.nlm.nih.gov/pubmed/33835035" } @Article{info:doi/10.2196/23948, author="Chen, Yuanfang and Ouyang, Liu and Bao, S. Forrest and Li, Qian and Han, Lei and Zhang, Hengdong and Zhu, Baoli and Ge, Yaorong and Robinson, Patrick and Xu, Ming and Liu, Jie and Chen, Shi", title="A Multimodality Machine Learning Approach to Differentiate Severe and Nonsevere COVID-19: Model Development and Validation", journal="J Med Internet Res", year="2021", month="Apr", day="7", volume="23", number="4", pages="e23948", keywords="COVID-19", keywords="clinical type", keywords="multimodality", keywords="classification", keywords="machine learning", keywords="diagnosis", keywords="prediction", keywords="reliable", keywords="decision support", abstract="Background: Effectively and efficiently diagnosing patients who have COVID-19 with the accurate clinical type of the disease is essential to achieve optimal outcomes for the patients as well as to reduce the risk of overloading the health care system. Currently, severe and nonsevere COVID-19 types are differentiated by only a few features, which do not comprehensively characterize the complicated pathological, physiological, and immunological responses to SARS-CoV-2 infection in the different disease types. In addition, these type-defining features may not be readily testable at the time of diagnosis. Objective: In this study, we aimed to use a machine learning approach to understand COVID-19 more comprehensively, accurately differentiate severe and nonsevere COVID-19 clinical types based on multiple medical features, and provide reliable predictions of the clinical type of the disease. Methods: For this study, we recruited 214 confirmed patients with nonsevere COVID-19 and 148 patients with severe COVID-19. The clinical characteristics (26 features) and laboratory test results (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest models based on all the features in each modality as well as on the top 5 features in each modality combined were developed and validated to differentiate COVID-19 clinical types. Results: Using clinical and laboratory results independently as input, the random forest models achieved >90\% and >95\% predictive accuracy, respectively. The importance scores of the input features were further evaluated, and the top 5 features from each modality were identified (age, hypertension, cardiovascular disease, gender, and diabetes for the clinical features modality, and dimerized plasmin fragment D, high sensitivity troponin I, absolute neutrophil count, interleukin 6, and lactate dehydrogenase for the laboratory testing modality, in descending order). Using these top 10 multimodal features as the only input instead of all 52 features combined, the random forest model was able to achieve 97\% predictive accuracy. Conclusions: Our findings shed light on how the human body reacts to SARS-CoV-2 infection as a unit and provide insights on effectively evaluating the disease severity of patients with COVID-19 based on more common medical features when gold standard features are not available. We suggest that clinical information can be used as an initial screening tool for self-evaluation and triage, while laboratory test results should be applied when accuracy is the priority. ", doi="10.2196/23948", url="https://www.jmir.org/2021/4/e23948", url="http://www.ncbi.nlm.nih.gov/pubmed/33714935" } @Article{info:doi/10.2196/24754, author="Wang, Haishuai and Avillach, Paul", title="Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning", journal="JMIR Med Inform", year="2021", month="Apr", day="7", volume="9", number="4", pages="e24754", keywords="deep learning", keywords="autism spectrum disorder", keywords="common genetic variants, diagnostic classification", abstract="Background: In the United States, about 3 million people have autism spectrum disorder (ASD), and around 1 out of 59 children are diagnosed with ASD. People with ASD have characteristic social communication deficits and repetitive behaviors. The causes of this disorder remain unknown; however, in up to 25\% of cases, a genetic cause can be identified. Detecting ASD as early as possible is desirable because early detection of ASD enables timely interventions in children with ASD. Identification of ASD based on objective pathogenic mutation screening is the major first step toward early intervention and effective treatment of affected children. Objective: Recent investigation interrogated genomics data for detecting and treating autism disorders, in addition to the conventional clinical interview as a diagnostic test. Since deep neural networks perform better than shallow machine learning models on complex and high-dimensional data, in this study, we sought to apply deep learning to genetic data obtained across thousands of simplex families at risk for ASD to identify contributory mutations and to create an advanced diagnostic classifier for autism screening. Methods: After preprocessing the genomics data from the Simons Simplex Collection, we extracted top ranking common variants that may be protective or pathogenic for autism based on a chi-square test. A convolutional neural network--based diagnostic classifier was then designed using the identified significant common variants to predict autism. The performance was then compared with shallow machine learning--based classifiers and randomly selected common variants. Results: The selected contributory common variants were significantly enriched in chromosome X while chromosome Y was also discriminatory in determining the identification of autistic individuals from nonautistic individuals. The ARSD, MAGEB16, and MXRA5 genes had the largest effect in the contributory variants. Thus, screening algorithms were adapted to include these common variants. The deep learning model yielded an area under the receiver operating characteristic curve of 0.955 and an accuracy of 88\% for identifying autistic individuals from nonautistic individuals. Our classifier demonstrated a considerable improvement of {\textasciitilde}13\% in terms of classification accuracy compared to standard autism screening tools. Conclusions: Common variants are informative for autism identification. Our findings also suggest that the deep learning process is a reliable method for distinguishing the diseased group from the control group based on the common variants of autism. ", doi="10.2196/24754", url="https://medinform.jmir.org/2021/4/e24754", url="http://www.ncbi.nlm.nih.gov/pubmed/33714937" } @Article{info:doi/10.2196/21547, author="Reps, M. Jenna and Kim, Chungsoo and Williams, D. Ross and Markus, F. Aniek and Yang, Cynthia and Duarte-Salles, Talita and Falconer, Thomas and Jonnagaddala, Jitendra and Williams, Andrew and Fern{\'a}ndez-Bertol{\'i}n, Sergio and DuVall, L. Scott and Kostka, Kristin and Rao, Gowtham and Shoaibi, Azza and Ostropolets, Anna and Spotnitz, E. Matthew and Zhang, Lin and Casajust, Paula and Steyerberg, W. Ewout and Nyberg, Fredrik and Kaas-Hansen, Skov Benjamin and Choi, Hwa Young and Morales, Daniel and Liaw, Siaw-Teng and Abrah{\~a}o, Fernandes Maria Tereza and Areia, Carlos and Matheny, E. Michael and Lynch, E. Kristine and Arag{\'o}n, Mar{\'i}a and Park, Woong Rae and Hripcsak, George and Reich, G. Christian and Suchard, A. Marc and You, Chan Seng and Ryan, B. Patrick and Prieto-Alhambra, Daniel and Rijnbeek, R. Peter", title="Implementation of the COVID-19 Vulnerability Index Across an International Network of Health Care Data Sets: Collaborative External Validation Study", journal="JMIR Med Inform", year="2021", month="Apr", day="5", volume="9", number="4", pages="e21547", keywords="external validation", keywords="transportability", keywords="COVID-19", keywords="prognostic model", keywords="prediction", keywords="C-19", keywords="modeling", keywords="datasets", keywords="observation", keywords="hospitalization", keywords="bias", keywords="risk", keywords="decision-making", abstract="Background: SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who require hospitalization from those who do not. The COVID-19 vulnerability (C-19) index, a model that predicts which patients will be admitted to hospital for treatment of pneumonia or pneumonia proxies, has been developed and proposed as a valuable tool for decision-making during the pandemic. However, the model is at high risk of bias according to the ``prediction model risk of bias assessment'' criteria, and it has not been externally validated. Objective: The aim of this study was to externally validate the C-19 index across a range of health care settings to determine how well it broadly predicts hospitalization due to pneumonia in COVID-19 cases. Methods: We followed the Observational Health Data Sciences and Informatics (OHDSI) framework for external validation to assess the reliability of the C-19 index. We evaluated the model on two different target populations, 41,381 patients who presented with SARS-CoV-2 at an outpatient or emergency department visit and 9,429,285 patients who presented with influenza or related symptoms during an outpatient or emergency department visit, to predict their risk of hospitalization with pneumonia during the following 0-30 days. In total, we validated the model across a network of 14 databases spanning the United States, Europe, Australia, and Asia. Results: The internal validation performance of the C-19 index had a C statistic of 0.73, and the calibration was not reported by the authors. When we externally validated it by transporting it to SARS-CoV-2 data, the model obtained C statistics of 0.36, 0.53 (0.473-0.584) and 0.56 (0.488-0.636) on Spanish, US, and South Korean data sets, respectively. The calibration was poor, with the model underestimating risk. When validated on 12 data sets containing influenza patients across the OHDSI network, the C statistics ranged between 0.40 and 0.68. Conclusions: Our results show that the discriminative performance of the C-19 index model is low for influenza cohorts and even worse among patients with COVID-19 in the United States, Spain, and South Korea. These results suggest that C-19 should not be used to aid decision-making during the COVID-19 pandemic. Our findings highlight the importance of performing external validation across a range of settings, especially when a prediction model is being extrapolated to a different population. In the field of prediction, extensive validation is required to create appropriate trust in a model. ", doi="10.2196/21547", url="https://medinform.jmir.org/2021/4/e21547", url="http://www.ncbi.nlm.nih.gov/pubmed/33661754" } @Article{info:doi/10.2196/21109, author="Dimitrovski, Tomi and Bath, A. Peter and Ketikidis, Panayiotis and Lazuras, Lambros", title="Factors Affecting General Practitioners' Readiness to Accept and Use an Electronic Health Record System in the Republic of North Macedonia: A National Survey of General Practitioners", journal="JMIR Med Inform", year="2021", month="Apr", day="5", volume="9", number="4", pages="e21109", keywords="general practitioner", keywords="eHealth", keywords="technology acceptance", keywords="electronic health record", abstract="Background: Electronic health records (EHRs) represent an important aspect of digital health care, and to promote their use further, we need to better understand the drivers of their acceptance among health care professionals. EHRs are not simple computer applications; they should be considered as a highly integrated set of systems. Technology acceptance theories can be used to better understand users' intentions to use EHRs. It is recommended to assess factors that determine the future acceptance of a system before it is implemented. Objective: This study uses a modified version of the Unified Theory of Acceptance and Use of Technology with the aim of examining the factors associated with intentions to use an EHR application among general practitioners (GPs) in the Republic of North Macedonia, a country that has been underrepresented in extant literature. More specifically, this study aims to assess the role of technology acceptance predictors such as performance expectancy, effort expectancy, social influence, facilitating conditions, job relevance, descriptive norms, and satisfaction with existing eHealth systems already implemented in the country. Methods: A web-based invitation was sent to 1174 GPs, of whom 458 completed the questionnaire (response rate=40.2\%). The research instrument assessed performance expectancy, effort expectancy, facilitating conditions, and social influence in relation to the GPs' intentions to use future EHR systems. Job relevance, descriptive norms, satisfaction with currently used eHealth systems in the country, and computer/internet use were also measured. Results: Hierarchical linear regression analysis showed that effort expectancy, descriptive norms, social influence, facilitating conditions, and job relevance were significantly associated with intentions to use the future EHR system, but performance expectance was not. Multiple mediation modeling analyses further showed that social influence (z=2.64; P<.001), facilitating conditions (z=4.54; P<.001), descriptive norms (z=4.91; P<.001), and effort expectancy (z=5.81; P=.008) mediated the association between job relevance and intentions. Finally, moderated regression analysis showed that the association between social influence and usage intention was significantly moderated (P=.02) by experience (Bexperience{\texttimes}social influence=.005; 95\% CI 0.001 to 0.010; $\beta$=.080). In addition, the association between social influence and intentions was significantly moderated (P=.02) by age (Bage{\texttimes}social influence=.005; 95\% CI 0.001 to 0.010; $\beta$=.077). Conclusions: Expectations of less effort in using EHRs and perceptions on supportive infrastructures for enabling EHR use were significantly associated with the greater acceptance of EHRs among GPs. Social norms were also associated with intentions, even more so among older GPs and those with less work experience. The theoretical and practical implications of these findings are also discussed. ", doi="10.2196/21109", url="https://medinform.jmir.org/2021/4/e21109", url="http://www.ncbi.nlm.nih.gov/pubmed/33818399" } @Article{info:doi/10.2196/22394, author="Castaldo, Rossana and Cavaliere, Carlo and Soricelli, Andrea and Salvatore, Marco and Pecchia, Leandro and Franzese, Monica", title="Radiomic and Genomic Machine Learning Method Performance for Prostate Cancer Diagnosis: Systematic Literature Review", journal="J Med Internet Res", year="2021", month="Apr", day="1", volume="23", number="4", pages="e22394", keywords="prostate cancer", keywords="machine learning", keywords="systematic review", keywords="meta-analysis", keywords="diagnosis", keywords="imaging", keywords="radiomics", keywords="genomics", keywords="clinical", keywords="biomarkers", abstract="Background: Machine learning algorithms have been drawing attention at the joining of pathology and radiology in prostate cancer research. However, due to their algorithmic learning complexity and the variability of their architecture, there is an ongoing need to analyze their performance. Objective: This study assesses the source of heterogeneity and the performance of machine learning applied to radiomic, genomic, and clinical biomarkers for the diagnosis of prostate cancer. One research focus of this study was on clearly identifying problems and issues related to the implementation of machine learning in clinical studies. Methods: Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) protocol, 816 titles were identified from the PubMed, Scopus, and OvidSP databases. Studies that used machine learning to detect prostate cancer and provided performance measures were included in our analysis. The quality of the eligible studies was assessed using the QUADAS-2 (quality assessment of diagnostic accuracy studies--version 2) tool. The hierarchical multivariate model was applied to the pooled data in a meta-analysis. To investigate the heterogeneity among studies, I2 statistics were performed along with visual evaluation of coupled forest plots. Due to the internal heterogeneity among machine learning algorithms, subgroup analysis was carried out to investigate the diagnostic capability of machine learning systems in clinical practice. Results: In the final analysis, 37 studies were included, of which 29 entered the meta-analysis pooling. The analysis of machine learning methods to detect prostate cancer reveals the limited usage of the methods and the lack of standards that hinder the implementation of machine learning in clinical applications. Conclusions: The performance of machine learning for diagnosis of prostate cancer was considered satisfactory for several studies investigating the multiparametric magnetic resonance imaging and urine biomarkers; however, given the limitations indicated in our study, further studies are warranted to extend the potential use of machine learning to clinical settings. Recommendations on the use of machine learning techniques were also provided to help researchers to design robust studies to facilitate evidence generation from the use of radiomic and genomic biomarkers. ", doi="10.2196/22394", url="https://www.jmir.org/2021/4/e22394", url="http://www.ncbi.nlm.nih.gov/pubmed/33792552" } @Article{info:doi/10.2196/25000, author="Tran, Linh and Chi, Lianhua and Bonti, Alessio and Abdelrazek, Mohamed and Chen, Phoebe Yi-Ping", title="Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study", journal="JMIR Med Inform", year="2021", month="Apr", day="1", volume="9", number="4", pages="e25000", keywords="mortality", keywords="cardiovascular", keywords="medical claims data", keywords="imbalanced data", keywords="machine learning", keywords="deep learning", abstract="Background: Cardiovascular disease (CVD) is the greatest health problem in Australia, which kills more people than any other disease and incurs enormous costs for the health care system. In this study, we present a benchmark comparison of various artificial intelligence (AI) architectures for predicting the mortality rate of patients with CVD using structured medical claims data. Compared with other research in the clinical literature, our models are more efficient because we use a smaller number of features, and this study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. Objective: This study aims to support health clinicians in accurately predicting mortality among patients with CVD using only claims data before a clinic visit. Methods: The data set was obtained from the Medicare Benefits Scheme and Pharmaceutical Benefits Scheme service information in the period between 2004 and 2014, released by the Department of Health Australia in 2016. It included 346,201 records, corresponding to 346,201 patients. A total of five AI algorithms, including four classical machine learning algorithms (logistic regression [LR], random forest [RF], extra trees [ET], and gradient boosting trees [GBT]) and a deep learning algorithm, which is a densely connected neural network (DNN), were developed and compared in this study. In addition, because of the minority of deceased patients in the data set, a separate experiment using the Synthetic Minority Oversampling Technique (SMOTE) was conducted to enrich the data. Results: Regarding model performance, in terms of discrimination, GBT and RF were the models with the highest area under the receiver operating characteristic curve (97.8\% and 97.7\%, respectively), followed by ET (96.8\%) and LR (96.4\%), whereas DNN was the least discriminative (95.3\%). In terms of reliability, LR predictions were the least calibrated compared with the other four algorithms. In this study, despite increasing the training time, SMOTE was proven to further improve the model performance of LR, whereas other algorithms, especially GBT and DNN, worked well with class imbalanced data. Conclusions: Compared with other research in the clinical literature involving AI models using claims data to predict patient health outcomes, our models are more efficient because we use a smaller number of features but still achieve high performance. This study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. ", doi="10.2196/25000", url="https://medinform.jmir.org/2021/4/e25000", url="http://www.ncbi.nlm.nih.gov/pubmed/33792549" } @Article{info:doi/10.2196/23983, author="Park, Jimyung and You, Chan Seng and Jeong, Eugene and Weng, Chunhua and Park, Dongsu and Roh, Jin and Lee, Yun Dong and Cheong, Youn Jae and Choi, Wook Jin and Kang, Mira and Park, Woong Rae", title="A Framework (SOCRATex) for Hierarchical Annotation of Unstructured Electronic Health Records and Integration Into a Standardized Medical Database: Development and Usability Study", journal="JMIR Med Inform", year="2021", month="Mar", day="30", volume="9", number="3", pages="e23983", keywords="natural language processing", keywords="search engine", keywords="data curation", keywords="data management", keywords="common data model", abstract="Background: Although electronic health records (EHRs) have been widely used in secondary assessments, clinical documents are relatively less utilized owing to the lack of standardized clinical text frameworks across different institutions. Objective: This study aimed to develop a framework for processing unstructured clinical documents of EHRs and integration with standardized structured data. Methods: We developed a framework known as Staged Optimization of Curation, Regularization, and Annotation of clinical text (SOCRATex). SOCRATex has the following four aspects: (1) extracting clinical notes for the target population and preprocessing the data, (2) defining the annotation schema with a hierarchical structure, (3) performing document-level hierarchical annotation using the annotation schema, and (4) indexing annotations for a search engine system. To test the usability of the proposed framework, proof-of-concept studies were performed on EHRs. We defined three distinctive patient groups and extracted their clinical documents (ie, pathology reports, radiology reports, and admission notes). The documents were annotated and integrated into the Observational Medical Outcomes Partnership (OMOP)-common data model (CDM) database. The annotations were used for creating Cox proportional hazard models with different settings of clinical analyses to measure (1) all-cause mortality, (2) thyroid cancer recurrence, and (3) 30-day hospital readmission. Results: Overall, 1055 clinical documents of 953 patients were extracted and annotated using the defined annotation schemas. The generated annotations were indexed into an unstructured textual data repository. Using the annotations of pathology reports, we identified that node metastasis and lymphovascular tumor invasion were associated with all-cause mortality among colon and rectum cancer patients (both P=.02). The other analyses involving measuring thyroid cancer recurrence using radiology reports and 30-day hospital readmission using admission notes in depressive disorder patients also showed results consistent with previous findings. Conclusions: We propose a framework for hierarchical annotation of textual data and integration into a standardized OMOP-CDM medical database. The proof-of-concept studies demonstrated that our framework can effectively process and integrate diverse clinical documents with standardized structured data for clinical research. ", doi="10.2196/23983", url="https://medinform.jmir.org/2021/3/e23983", url="http://www.ncbi.nlm.nih.gov/pubmed/33783361" } @Article{info:doi/10.2196/27767, author="Haddad, Tufia and Helgeson, M. Jane and Pomerleau, E. Katharine and Preininger, M. Anita and Roebuck, Christopher M. and Dankwa-Mullan, Irene and Jackson, Purcell Gretchen and Goetz, P. Matthew", title="Accuracy of an Artificial Intelligence System for Cancer Clinical Trial Eligibility Screening: Retrospective Pilot Study", journal="JMIR Med Inform", year="2021", month="Mar", day="26", volume="9", number="3", pages="e27767", keywords="clinical trial matching", keywords="clinical decision support system", keywords="machine learning", keywords="artificial intelligence", keywords="screening", keywords="clinical trials", keywords="eligibility", keywords="breast cancer", abstract="Background: Screening patients for eligibility for clinical trials is labor intensive. It requires abstraction of data elements from multiple components of the longitudinal health record and matching them to inclusion and exclusion criteria for each trial. Artificial intelligence (AI) systems have been developed to improve the efficiency and accuracy of this process. Objective: This study aims to evaluate the ability of an AI clinical decision support system (CDSS) to identify eligible patients for a set of clinical trials. Methods: This study included the deidentified data from a cohort of patients with breast cancer seen at the medical oncology clinic of an academic medical center between May and July 2017 and assessed patient eligibility for 4 breast cancer clinical trials. CDSS eligibility screening performance was validated against manual screening. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for eligibility determinations were calculated. Disagreements between manual screeners and the CDSS were examined to identify sources of discrepancies. Interrater reliability between manual reviewers was analyzed using Cohen (pairwise) and Fleiss (three-way) $\kappa$, and the significance of differences was determined by Wilcoxon signed-rank test. Results: In total, 318 patients with breast cancer were included. Interrater reliability for manual screening ranged from 0.60-0.77, indicating substantial agreement. The overall accuracy of breast cancer trial eligibility determinations by the CDSS was 87.6\%. CDSS sensitivity was 81.1\% and specificity was 89\%. Conclusions: The AI CDSS in this study demonstrated accuracy, sensitivity, and specificity of greater than 80\% in determining the eligibility of patients for breast cancer clinical trials. CDSSs can accurately exclude ineligible patients for clinical trials and offer the potential to increase screening efficiency and accuracy. Additional research is needed to explore whether increased efficiency in screening and trial matching translates to improvements in trial enrollment, accruals, feasibility assessments, and cost. ", doi="10.2196/27767", url="https://medinform.jmir.org/2021/3/e27767", url="http://www.ncbi.nlm.nih.gov/pubmed/33769304" } @Article{info:doi/10.2196/25576, author="Feofanova, Valeryevna Elena and Zhang, Guo-Qiang and Lhatoo, Samden and Metcalf, A. Ginger and Boerwinkle, Eric and Venner, Eric", title="The Implementation Science for Genomic Health Translation (INSIGHT) Study in Epilepsy: Protocol for a Learning Health Care System", journal="JMIR Res Protoc", year="2021", month="Mar", day="26", volume="10", number="3", pages="e25576", keywords="genomic medicine", keywords="electronic health record", keywords="implementation", keywords="genetics", keywords="prototype", keywords="decision support", abstract="Background: Genomic medicine is poised to improve care for common complex diseases such as epilepsy, but additional clinical informatics and implementation science research is needed for it to become a part of the standard of care. Epilepsy is an exemplary complex neurological disorder for which DNA diagnostics have shown to be advantageous for patient care. Objective: We designed the Implementation Science for Genomic Health Translation (INSIGHT) study to leverage the fact that both the clinic and testing laboratory control the development and customization of their respective electronic health records and clinical reporting platforms. Through INSIGHT, we can rapidly prototype and benchmark novel approaches to incorporating clinical genomics into patient care. Of particular interest are clinical decision support tools that take advantage of domain knowledge from clinical genomics and can be rapidly adjusted based on feedback from clinicians. Methods: Building on previously developed evidence and infrastructure components, our model includes the following: establishment of an intervention-ready genomic knowledge base for patient care, creation of a health informatics platform and linking it to a clinical genomics reporting system, and scaling and evaluation of INSIGHT following established implementation science principles. Results: INSIGHT was approved by the Institutional Review Board at the University of Texas Health Science Center at Houston on May 15, 2020, and is designed as a 2-year proof-of-concept study beginning in December 2021. By design, 120 patients from the Texas Comprehensive Epilepsy Program are to be enrolled to test the INSIGHT workflow. Initial results are expected in the first half of 2023. Conclusions: INSIGHT's domain-specific, practical but generalizable approach may help catalyze a pathway to accelerate translation of genomic knowledge into impactful interventions in patient care. International Registered Report Identifier (IRRID): PRR1-10.2196/25576 ", doi="10.2196/25576", url="https://www.researchprotocols.org/2021/3/e25576", url="http://www.ncbi.nlm.nih.gov/pubmed/33769305" } @Article{info:doi/10.2196/25696, author="Huang, Yingxiang and Radenkovic, Dina and Perez, Kevin and Nadeau, Kari and Verdin, Eric and Furman, David", title="Modeling Predictive Age-Dependent and Age-Independent Symptoms and Comorbidities of Patients Seeking Treatment for COVID-19: Model Development and Validation Study", journal="J Med Internet Res", year="2021", month="Mar", day="25", volume="23", number="3", pages="e25696", keywords="clinical informatics", keywords="predictive modeling", keywords="COVID-19", keywords="app", keywords="model", keywords="prediction", keywords="symptom", keywords="informatics", keywords="age", keywords="morbidity", keywords="hospital", abstract="Background: The COVID-19 pandemic continues to ravage and burden hospitals around the world. The epidemic started in Wuhan, China, and was subsequently recognized by the World Health Organization as an international public health emergency and declared a pandemic in March 2020. Since then, the disruptions caused by the COVID-19 pandemic have had an unparalleled effect on all aspects of life. Objective: With increasing total hospitalization and intensive care unit admissions, a better understanding of features related to patients with COVID-19 could help health care workers stratify patients based on the risk of developing a more severe case of COVID-19. Using predictive models, we strive to select the features that are most associated with more severe cases of COVID-19. Methods: Over 3 million participants reported their potential symptoms of COVID-19, along with their comorbidities and demographic information, on a smartphone-based app. Using data from the >10,000 individuals who indicated that they had tested positive for COVID-19 in the United Kingdom, we leveraged the Elastic Net regularized binary classifier to derive the predictors that are most correlated with users having a severe enough case of COVID-19 to seek treatment in a hospital setting. We then analyzed such features in relation to age and other demographics and their longitudinal trend. Results: The most predictive features found include fever, use of immunosuppressant medication, use of a mobility aid, shortness of breath, and severe fatigue. Such features are age-related, and some are disproportionally high in minority populations. Conclusions: Predictors selected from the predictive models can be used to stratify patients into groups based on how much medical attention they are expected to require. This could help health care workers devote valuable resources to prevent the escalation of the disease in vulnerable populations. ", doi="10.2196/25696", url="https://www.jmir.org/2021/3/e25696", url="http://www.ncbi.nlm.nih.gov/pubmed/33621185" } @Article{info:doi/10.2196/23888, author="Jiang, Huizhen and Su, Longxiang and Wang, Hao and Li, Dongkai and Zhao, Congpu and Hong, Na and Long, Yun and Zhu, Weiguo", title="Noninvasive Real-Time Mortality Prediction in Intensive Care Units Based on Gradient Boosting Method: Model Development and Validation Study", journal="JMIR Med Inform", year="2021", month="Mar", day="25", volume="9", number="3", pages="e23888", keywords="real time", keywords="mortality prediction", keywords="intensive care unit", keywords="noninvasive", abstract="Background: Monitoring critically ill patients in intensive care units (ICUs) in real time is vitally important. Although scoring systems are most often used in risk prediction of mortality, they are usually not highly precise, and the clinical data are often simply weighted. This method is inefficient and time-consuming in the clinical setting. Objective: The objective of this study was to integrate all medical data and noninvasively predict the real-time mortality of ICU patients using a gradient boosting method. Specifically, our goal was to predict mortality using a noninvasive method to minimize the discomfort to patients. Methods: In this study, we established five models to predict mortality in real time based on different features. According to the monitoring, laboratory, and scoring data, we constructed the feature engineering. The five real-time mortality prediction models were RMM (based on monitoring features), RMA (based on monitoring features and the Acute Physiology and Chronic Health Evaluation [APACHE]), RMS (based on monitoring features and Sequential Organ Failure Assessment [SOFA]), RMML (based on monitoring and laboratory features), and RM (based on all monitoring, laboratory, and scoring features). All models were built using LightGBM and tested with XGBoost. We then compared the performance of all models, with particular focus on the noninvasive method, the RMM model. Results: After extensive experiments, the area under the curve of the RMM model was 0.8264, which was superior to that of the RMA and RMS models. Therefore, predicting mortality using the noninvasive method was both efficient and practical, as it eliminated the need for extra physical interventions on patients, such as the drawing of blood. In addition, we explored the top nine features relevant to real-time mortality prediction: invasive mean blood pressure, heart rate, invasive systolic blood pressure, oxygen concentration, oxygen saturation, balance of input and output, total input, invasive diastolic blood pressure, and noninvasive mean blood pressure. These nine features should be given more focus in routine clinical practice. Conclusions: The results of this study may be helpful in real-time mortality prediction in patients in the ICU, especially the noninvasive method. It is efficient and favorable to patients, which offers a strong practical significance. ", doi="10.2196/23888", url="https://medinform.jmir.org/2021/3/e23888", url="http://www.ncbi.nlm.nih.gov/pubmed/33764311" } @Article{info:doi/10.2196/16306, author="Zhao, Peng and Yoo, Illhoi and Naqvi, H. Syed", title="Early Prediction of Unplanned 30-Day Hospital Readmission: Model Development and Retrospective Data Analysis", journal="JMIR Med Inform", year="2021", month="Mar", day="23", volume="9", number="3", pages="e16306", keywords="patient readmission", keywords="risk factors", keywords="unplanned", keywords="early detection", keywords="all-cause", keywords="predictive model", keywords="30-day", keywords="machine learning", abstract="Background: Existing readmission reduction solutions tend to focus on complementing inpatient care with enhanced care transition and postdischarge interventions. These solutions are initiated near or after discharge, when clinicians' impact on inpatient care is ending. Preventive intervention during hospitalization is an underexplored area that holds potential for reducing readmission risk. However, it is challenging to predict readmission risk at the early stage of hospitalization because few data are available. Objective: The objective of this study was to build an early prediction model of unplanned 30-day hospital readmission using a large and diverse sample. We were also interested in identifying novel readmission risk factors and protective factors. Methods: We extracted the medical records of 96,550 patients in 205 participating Cerner client hospitals across four US census regions in 2016 from the Health Facts database. The model was built with index admission data that can become available within 24 hours and data from previous encounters up to 1 year before the index admission. The candidate models were evaluated for performance, timeliness, and generalizability. Multivariate logistic regression analysis was used to identify readmission risk factors and protective factors. Results: We developed six candidate readmission models with different machine learning algorithms. The best performing model of extreme gradient boosting (XGBoost) achieved an area under the receiver operating characteristic curve of 0.753 on the development data set and 0.742 on the validation data set. By multivariate logistic regression analysis, we identified 14 risk factors and 2 protective factors of readmission that have never been reported. Conclusions: The performance of our model is better than that of the most widely used models in US health care settings. This model can help clinicians identify readmission risk at the early stage of hospitalization so that they can pay extra attention during the care process of high-risk patients. The 14 novel risk factors and 2 novel protective factors can aid understanding of the factors associated with readmission. ", doi="10.2196/16306", url="https://medinform.jmir.org/2021/3/e16306", url="http://www.ncbi.nlm.nih.gov/pubmed/33755027" } @Article{info:doi/10.2196/24359, author="Trinkley, E. Katy and Kroehl, E. Miranda and Kahn, G. Michael and Allen, A. Larry and Bennett, D. Tellen and Hale, Gary and Haugen, Heather and Heckman, Simeon and Kao, P. David and Kim, Janet and Matlock, M. Daniel and Malone, C. Daniel and Page 2nd, L. Robert and Stine, Jessica and Suresh, Krithika and Wells, Lauren and Lin, Chen-Tan", title="Applying Clinical Decision Support Design Best Practices With the Practical Robust Implementation and Sustainability Model Versus Reliance on Commercially Available Clinical Decision Support Tools: Randomized Controlled Trial", journal="JMIR Med Inform", year="2021", month="Mar", day="22", volume="9", number="3", pages="e24359", keywords="PRISM", keywords="implementation science", keywords="clinical decision support systems", keywords="RE-AIM", keywords="congestive heart failure", abstract="Background: Limited consideration of clinical decision support (CDS) design best practices, such as a user-centered design, is often cited as a key barrier to CDS adoption and effectiveness. The application of CDS best practices is resource intensive; thus, institutions often rely on commercially available CDS tools that are created to meet the generalized needs of many institutions and are not user centered. Beyond resource availability, insufficient guidance on how to address key aspects of implementation, such as contextual factors, may also limit the application of CDS best practices. An implementation science (IS) framework could provide needed guidance and increase the reproducibility of CDS implementations. Objective: This study aims to compare the effectiveness of an enhanced CDS tool informed by CDS best practices and an IS framework with a generic, commercially available CDS tool. Methods: We conducted an explanatory sequential mixed methods study. An IS-enhanced and commercial CDS alert were compared in a cluster randomized trial across 28 primary care clinics. Both alerts aimed to improve beta-blocker prescribing for heart failure. The enhanced alert was informed by CDS best practices and the Practical, Robust, Implementation, and Sustainability Model (PRISM) IS framework, whereas the commercial alert followed vendor-supplied specifications. Following PRISM, the enhanced alert was informed by iterative, multilevel stakeholder input and the dynamic interactions of the internal and external environment. Outcomes aligned with PRISM's evaluation measures, including patient reach, clinician adoption, and changes in prescribing behavior. Clinicians exposed to each alert were interviewed to identify design features that might influence adoption. The interviews were analyzed using a thematic approach. Results: Between March 15 and August 23, 2019, the enhanced alert fired for 61 patients (106 alerts, 87 clinicians) and the commercial alert fired for 26 patients (59 alerts, 31 clinicians). The adoption and effectiveness of the enhanced alert were significantly higher than those of the commercial alert (62\% vs 29\% alerts adopted, P<.001; 14\% vs 0\% changed prescribing, P=.006). Of the 21 clinicians interviewed, most stated that they preferred the enhanced alert. Conclusions: The results of this study suggest that applying CDS best practices with an IS framework to create CDS tools improves implementation success compared with a commercially available tool. Trial Registration: ClinicalTrials.gov NCT04028557; http://clinicaltrials.gov/ct2/show/NCT04028557 ", doi="10.2196/24359", url="https://medinform.jmir.org/2021/3/e24359", url="http://www.ncbi.nlm.nih.gov/pubmed/33749610" } @Article{info:doi/10.2196/23215, author="Stroh, N. J. and Bennett, D. Tellen and Kheyfets, Vitaly and Albers, David", title="Clinical Decision Support for Traumatic Brain Injury: Identifying a Framework for Practical Model-Based Intracranial Pressure Estimation at Multihour Timescales", journal="JMIR Med Inform", year="2021", month="Mar", day="22", volume="9", number="3", pages="e23215", keywords="intracranial pressure", keywords="traumatic brain injury", keywords="intracranial hypertension", keywords="patient-specific modeling", keywords="theoretical models", abstract="Background: The clinical mitigation of intracranial hypertension due to traumatic brain injury requires timely knowledge of intracranial pressure to avoid secondary injury or death. Noninvasive intracranial pressure (nICP) estimation that operates sufficiently fast at multihour timescales and requires only common patient measurements is a desirable tool for clinical decision support and improving traumatic brain injury patient outcomes. However, existing model-based nICP estimation methods may be too slow or require data that are not easily obtained. Objective: This work considers short- and real-time nICP estimation at multihour timescales based on arterial blood pressure (ABP) to better inform the ongoing development of practical models with commonly available data. Methods: We assess and analyze the effects of two distinct pathways of model development, either by increasing physiological integration using a simple pressure estimation model, or by increasing physiological fidelity using a more complex model. Comparison of the model approaches is performed using a set of quantitative model validation criteria over hour-scale times applied to model nICP estimates in relation to observed ICP. Results: The simple fully coupled estimation scheme based on windowed regression outperforms a more complex nICP model with prescribed intracranial inflow when pulsatile ABP inflow conditions are provided. We also show that the simple estimation data requirements can be reduced to 1-minute averaged ABP summary data under generic waveform representation. Conclusions: Stronger performance of the simple bidirectional model indicates that feedback between the systemic vascular network and nICP estimation scheme is crucial for modeling over long intervals. However, simple model reduction to ABP-only dependence limits its utility in cases involving other brain injuries such as ischemic stroke and subarachnoid hemorrhage. Additional methodologies and considerations needed to overcome these limitations are illustrated and discussed. ", doi="10.2196/23215", url="https://medinform.jmir.org/2021/3/e23215", url="http://www.ncbi.nlm.nih.gov/pubmed/33749613" } @Article{info:doi/10.2196/23595, author="Cos, Heidy and Li, Dingwen and Williams, Gregory and Chininis, Jeffrey and Dai, Ruixuan and Zhang, Jingwen and Srivastava, Rohit and Raper, Lacey and Sanford, Dominic and Hawkins, William and Lu, Chenyang and Hammill, W. Chet", title="Predicting Outcomes in Patients Undergoing Pancreatectomy Using Wearable Technology and Machine Learning: Prospective Cohort Study", journal="J Med Internet Res", year="2021", month="Mar", day="18", volume="23", number="3", pages="e23595", keywords="pancreatectomy", keywords="pancreatic cancer", keywords="telemonitoring", keywords="remote monitoring", keywords="machine learning", keywords="wearable technology", keywords="activity", abstract="Background: Pancreatic cancer is the third leading cause of cancer-related deaths, and although pancreatectomy is currently the only curative treatment, it is associated with significant morbidity. Objective: The objective of this study was to evaluate the utility of wearable telemonitoring technologies to predict treatment outcomes using patient activity metrics and machine learning. Methods: In this prospective, single-center, single-cohort study, patients scheduled for pancreatectomy were provided with a wearable telemonitoring device to be worn prior to surgery. Patient clinical data were collected and all patients were evaluated using the American College of Surgeons National Surgical Quality Improvement Program surgical risk calculator (ACS-NSQIP SRC). Machine learning models were developed to predict whether patients would have a textbook outcome and compared with the ACS-NSQIP SRC using area under the receiver operating characteristic (AUROC) curves. Results: Between February 2019 and February 2020, 48 patients completed the study. Patient activity metrics were collected over an average of 27.8 days before surgery. Patients took an average of 4162.1 (SD 4052.6) steps per day and had an average heart rate of 75.6 (SD 14.8) beats per minute. Twenty-eight (58\%) patients had a textbook outcome after pancreatectomy. The group of 20 (42\%) patients who did not have a textbook outcome included 14 patients with severe complications and 11 patients requiring readmission. The ACS-NSQIP SRC had an AUROC curve of 0.6333 to predict failure to achieve a textbook outcome, while our model combining patient clinical characteristics and patient activity data achieved the highest performance with an AUROC curve of 0.7875. Conclusions: Machine learning models outperformed ACS-NSQIP SRC estimates in predicting textbook outcomes after pancreatectomy. The highest performance was observed when machine learning models incorporated patient clinical characteristics and activity metrics. ", doi="10.2196/23595", url="https://www.jmir.org/2021/3/e23595", url="http://www.ncbi.nlm.nih.gov/pubmed/33734096" } @Article{info:doi/10.2196/20030, author="Chiu, Yen-Lin and Lee, Yu-Chen and Tsai, Chin-Chung", title="Internet-Specific Epistemic Beliefs in Medicine and Intention to Use Evidence-Based Online Medical Databases Among Health Care Professionals: Cross-sectional Survey", journal="J Med Internet Res", year="2021", month="Mar", day="18", volume="23", number="3", pages="e20030", keywords="evidence-based medicine (EBM)", keywords="health care professionals", keywords="internet-specific epistemic beliefs", keywords="medical informatics", abstract="Background: Evidence-based medicine has been regarded as a prerequisite for ensuring health care quality. The increase in health care professionals' adoption of web-based medical information and the lack of awareness of alternative access to evidence-based online resources suggest the need for an investigation of their information-searching behaviors of using evidence-based online medical databases. Objective: The main purposes of this study were to (1) modify and validate the internet-specific epistemic beliefs in medicine (ISEBM) questionnaire and (2) explore the associations between health care professionals' demographics, ISEBM, and intention to use evidence-based online medical databases for clinical practice. Methods: Health care professionals in a university-affiliated teaching hospital were surveyed using the ISEBM questionnaire. The partial least squares-structural equation modeling was conducted to analyze the reliability and validity of ISEBM. Furthermore, the structural model was analyzed to examine the possible linkages between health professionals' demographics, ISEBM, and intention to utilize the evidence-based online medical databases for clinical practice. Results: A total of 273 health care professionals with clinical working experience were surveyed. The results of the measurement model analysis indicated that all items had significant loadings ranging from 0.71 to 0.92 with satisfactory composite reliability values ranging from 0.87 to 0.94 and average variance explained values ranging from 0.70 to 0.84. The results of the structural relationship analysis revealed that the source of internet-based medical knowledge (path coefficient --0.26, P=.01) and justification of internet-based knowing in medicine (path coefficient 0.21, P=.001) were correlated with the intention to use evidence-based online medical databases. However, certainty and simplicity of internet-based medical knowledge were not. In addition, gender (path coefficient 0.12, P=.04) and academic degree (path coefficient 0.15, P=.004) were associated with intention to use evidence-based online medical databases for clinical practice. Conclusions: Advancing health care professionals' ISEBM regarding source and justification may encourage them to retrieve valid medical information through evidence-based medical databases. Moreover, providing support for specific health care professionals (ie, females, without a master's degree) may promote their intention to use certain databases for clinical practice. ", doi="10.2196/20030", url="https://www.jmir.org/2021/3/e20030", url="http://www.ncbi.nlm.nih.gov/pubmed/33734092" } @Article{info:doi/10.2196/25473, author="Entezarjou, Artin and Calling, Susanna and Bhattacharyya, Tapomita and Milos Nymberg, Veronica and Vigren, Lina and Labaf, Ashkan and Jakobsson, Ulf and Midl{\"o}v, Patrik", title="Antibiotic Prescription Rates After eVisits Versus Office Visits in Primary Care: Observational Study", journal="JMIR Med Inform", year="2021", month="Mar", day="15", volume="9", number="3", pages="e25473", keywords="telemedicine", keywords="antibiotics", keywords="streptococcal tonsillitis", keywords="cystitis", keywords="respiratory tract infection", keywords="virtual visit", keywords="virtual", keywords="eVisit", abstract="Background: Direct-to-consumer telemedicine is an increasingly used modality to access primary care. Previous research on assessment using synchronous virtual visits showed mixed results regarding antibiotic prescription rates, and research on assessment using asynchronous chat-based eVisits is lacking. Objective: The goal of the research was to investigate if eVisit management of sore throat, other respiratory symptoms, or dysuria leads to higher rates of antibiotic prescription compared with usual management using physical office visits. Methods: Data from 3847 eVisits and 759 office visits for sore throat, dysuria, or respiratory symptoms were acquired from a large private health care provider in Sweden. Data were analyzed to compare antibiotic prescription rates within 3 days, antibiotic type, and diagnoses made. For a subset of sore throat visits (n=160 eVisits, n=125 office visits), Centor criteria data were manually extracted and validated. Results: Antibiotic prescription rates were lower following eVisits compared with office visits for sore throat (169/798, 21.2\%, vs 124/312, 39.7\%; P<.001) and respiratory symptoms (27/1724, 1.6\%, vs 50/251, 19.9\%; P<.001), while no significant differences were noted comparing eVisits to office visits for dysuria (1016/1325, 76.7\%, vs 143/196, 73.0\%; P=.25). Guideline-recommended antibiotics were prescribed similarly following sore throat eVisits and office visits (163/169, 96.4\%, vs 117/124, 94.4\%; P=.39). eVisits for respiratory symptoms and dysuria were more often prescribed guideline-recommended antibiotics (26/27, 96.3\%, vs 37/50, 74.0\%; P=.02 and 1009/1016, 99.3\%, vs 135/143, 94.4\%; P<.001, respectively). Odds ratios of antibiotic prescription following office visits compared with eVisits after adjusting for age and differences in set diagnoses were 2.94 (95\% CI 1.99-4.33), 11.57 (95\% CI 5.50-24.32), 1.01 (95\% CI 0.66-1.53), for sore throat, respiratory symptoms, and dysuria, respectively. Conclusions: The use of asynchronous eVisits for the management of sore throat, dysuria, and respiratory symptoms is not associated with an inherent overprescription of antibiotics compared with office visits. Trial Registration: ClinicalTrials.gov NCT03474887; https://clinicaltrials.gov/ct2/show/NCT03474887 ", doi="10.2196/25473", url="https://medinform.jmir.org/2021/3/e25473", url="http://www.ncbi.nlm.nih.gov/pubmed/33720032" } @Article{info:doi/10.2196/23456, author="Ridgway, P. Jessica and Uvin, Arno and Schmitt, Jessica and Oliwa, Tomasz and Almirol, Ellen and Devlin, Samantha and Schneider, John", title="Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study", journal="JMIR Med Inform", year="2021", month="Mar", day="10", volume="9", number="3", pages="e23456", keywords="natural language processing", keywords="HIV", keywords="substance use", keywords="mental illness", keywords="electronic medical records", abstract="Background: Mental illness and substance use are prevalent among people living with HIV and often lead to poor health outcomes. Electronic medical record (EMR) data are increasingly being utilized for HIV-related clinical research and care, but mental illness and substance use are often underdocumented in structured EMR fields. Natural language processing (NLP) of unstructured text of clinical notes in the EMR may more accurately identify mental illness and substance use among people living with HIV than structured EMR fields alone. Objective: The aim of this study was to utilize NLP of clinical notes to detect mental illness and substance use among people living with HIV and to determine how often these factors are documented in structured EMR fields. Methods: We collected both structured EMR data (diagnosis codes, social history, Problem List) as well as the unstructured text of clinical HIV care notes for adults living with HIV. We developed NLP algorithms to identify words and phrases associated with mental illness and substance use in the clinical notes. The algorithms were validated based on chart review. We compared numbers of patients with documentation of mental illness or substance use identified by structured EMR fields with those identified by the NLP algorithms. Results: The NLP algorithm for detecting mental illness had a positive predictive value (PPV) of 98\% and a negative predictive value (NPV) of 98\%. The NLP algorithm for detecting substance use had a PPV of 92\% and an NPV of 98\%. The NLP algorithm for mental illness identified 54.0\% (420/778) of patients as having documentation of mental illness in the text of clinical notes. Among the patients with mental illness detected by NLP, 58.6\% (246/420) had documentation of mental illness in at least one structured EMR field. Sixty-three patients had documentation of mental illness in structured EMR fields that was not detected by NLP of clinical notes. The NLP algorithm for substance use detected substance use in the text of clinical notes in 18.1\% (141/778) of patients. Among patients with substance use detected by NLP, 73.8\% (104/141) had documentation of substance use in at least one structured EMR field. Seventy-six patients had documentation of substance use in structured EMR fields that was not detected by NLP of clinical notes. Conclusions: Among patients in an urban HIV care clinic, NLP of clinical notes identified high rates of mental illness and substance use that were often not documented in structured EMR fields. This finding has important implications for epidemiologic research and clinical care for people living with HIV. ", doi="10.2196/23456", url="https://medinform.jmir.org/2021/3/e23456", url="http://www.ncbi.nlm.nih.gov/pubmed/33688848" } @Article{info:doi/10.2196/21435, author="Velardo, Carmelo and Clifton, David and Hamblin, Steven and Khan, Rabia and Tarassenko, Lionel and Mackillop, Lucy", title="Toward a Multivariate Prediction Model of Pharmacological Treatment for Women With Gestational Diabetes Mellitus: Algorithm Development and Validation", journal="J Med Internet Res", year="2021", month="Mar", day="10", volume="23", number="3", pages="e21435", keywords="gestational diabetes mellitus", keywords="mobile health", keywords="machine learning", keywords="algorithms", abstract="Background: Successful management of gestational diabetes mellitus (GDM) reduces the risk of morbidity in women and newborns. A woman's blood glucose readings and risk factors are used by clinical staff to make decisions regarding the initiation of pharmacological treatment in women with GDM. Mobile health (mHealth) solutions allow the real-time follow-up of women with GDM and allow timely treatment and management. Machine learning offers the opportunity to quickly analyze large quantities of data to automatically flag women at risk of requiring pharmacological treatment. Objective: The aim of this study is to assess whether data collected through an mHealth system can be analyzed to automatically evaluate the switch to pharmacological treatment from diet-based management of GDM. Methods: We collected data from 3029 patients to design a machine learning model that can identify when a woman with GDM needs to switch to medications (insulin or metformin) by analyzing the data related to blood glucose and other risk factors. Results: Through the analysis of 411,785 blood glucose readings, we designed a machine learning model that can predict the timing of initiation of pharmacological treatment. After 100 experimental repetitions, we obtained an average area under the receiver operating characteristic curve of 0.80 (SD 0.02) and an algorithm that allows the flexibility of setting the operating point rather than relying on a static heuristic method, which is currently used in clinical practice. Conclusions: Using real-time data collected via an mHealth system may further improve the timeliness of the intervention and potentially improve patient care. Further real-time clinical testing will enable the validation of our algorithm using real-world data. ", doi="10.2196/21435", url="https://www.jmir.org/2021/3/e21435", url="http://www.ncbi.nlm.nih.gov/pubmed/33688832" } @Article{info:doi/10.2196/13182, author="Martinez-Garcia, Alicia and Naranjo-Saucedo, Bel{\'e}n Ana and Rivas, Antonio Jose and Romero Tabares, Antonio and Mar{\'i}n Cassinello, Ana and Andr{\'e}s-Mart{\'i}n, Anselmo and S{\'a}nchez Laguna, Jos{\'e} Francisco and Villegas, Roman and P{\'e}rez Le{\'o}n, Paula Francisco De and Moreno Conde, Jes{\'u}s and Parra Calder{\'o}n, Luis Carlos", title="A Clinical Decision Support System (KNOWBED) to Integrate Scientific Knowledge at the Bedside: Development and Evaluation Study", journal="JMIR Med Inform", year="2021", month="Mar", day="10", volume="9", number="3", pages="e13182", keywords="evidence-based medicine", keywords="clinical decision support system", keywords="scientific knowledge integration", abstract="Background: The evidence-based medicine (EBM) paradigm requires the development of health care professionals' skills in the efficient search of evidence in the literature, and in the application of formal rules to evaluate this evidence. Incorporating this methodology into the decision-making routine of clinical practice will improve the patients' health care, increase patient safety, and optimize resources use. Objective: The aim of this study is to develop and evaluate a new tool (KNOWBED system) as a clinical decision support system to support scientific knowledge, enabling health care professionals to quickly carry out decision-making processes based on EBM during their routine clinical practice. Methods: Two components integrate the KNOWBED system: a web-based knowledge station and a mobile app. A use case (bronchiolitis pathology) was selected to validate the KNOWBED system in the context of the Paediatrics Unit of the Virgen Macarena University Hospital (Seville, Spain). The validation was covered in a 3-month pilot using 2 indicators: usability and efficacy. Results: The KNOWBED system has been designed, developed, and validated to support clinical decision making in mobility based on standards that have been incorporated into the routine clinical practice of health care professionals. Using this tool, health care professionals can consult existing scientific knowledge at the bedside, and access recommendations of clinical protocols established based on EBM. During the pilot project, 15 health care professionals participated and accessed the system for a total of 59 times. Conclusions: The KNOWBED system is a useful and innovative tool for health care professionals. The usability surveys filled in by the system users highlight that it is easy to access the knowledge base. This paper also sets out some improvements to be made in the future. ", doi="10.2196/13182", url="https://medinform.jmir.org/2021/3/e13182", url="http://www.ncbi.nlm.nih.gov/pubmed/33709932" } @Article{info:doi/10.2196/22951, author="Zhao, Yiqing and Fu, Sunyang and Bielinski, J. Suzette and Decker, A. Paul and Chamberlain, M. Alanna and Roger, L. Veronique and Liu, Hongfang and Larson, B. Nicholas", title="Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation", journal="J Med Internet Res", year="2021", month="Mar", day="8", volume="23", number="3", pages="e22951", keywords="stroke", keywords="natural language processing", keywords="electronic health records", keywords="machine learning", abstract="Background: Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. Objective: The aim of this study was to develop a machine learning--based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. Methods: The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). Results: Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86\% (43/50; 95\% CI 0.74-0.93) and a negative predictive value of 96\% (96/100). For subtype identification, we achieved an accuracy of 83\% in the AF cohort and 80\% in the general population sample. Conclusions: We developed and validated a machine learning--based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions. ", doi="10.2196/22951", url="https://www.jmir.org/2021/3/e22951", url="http://www.ncbi.nlm.nih.gov/pubmed/33683212" } @Article{info:doi/10.2196/25121, author="op den Buijs, Jorn and Pijl, Marten and Landgraf, Andreas", title="Predictive Modeling of 30-Day Emergency Hospital Transport of German Patients Using a Personal Emergency Response: Retrospective Study and Comparison with the United States", journal="JMIR Med Inform", year="2021", month="Mar", day="8", volume="9", number="3", pages="e25121", keywords="emergency hospital transport", keywords="predictive modeling", keywords="personal emergency response system", keywords="population health management", keywords="emergency transport", keywords="emergency response system", keywords="emergency response", keywords="health management", abstract="Background: Predictive analytics based on data from remote monitoring of elderly via a personal emergency response system (PERS) in the United States can identify subscribers at high risk for emergency hospital transport. These risk predictions can subsequently be used to proactively target interventions and prevent avoidable, costly health care use. It is, however, unknown if PERS-based risk prediction with targeted interventions could also be applied in the German health care setting. Objective: The objectives were to develop and validate a predictive model of 30-day emergency hospital transport based on data from a German PERS provider and compare the model with our previously published predictive model developed on data from a US PERS provider. Methods: Retrospective data of 5805 subscribers to a German PERS service were used to develop and validate an extreme gradient boosting predictive model of 30-day hospital transport, including predictors derived from subscriber demographics, self-reported medical conditions, and a 2-year history of case data. Models were trained on 80\% (4644/5805) of the data, and performance was evaluated on an independent test set of 20\% (1161/5805). Results were compared with our previously published prediction model developed on a data set of PERS users in the United States. Results: German PERS subscribers were on average aged 83.6 years, with 64.0\% (743/1161) females, with 65.4\% (759/1161) reported 3 or more chronic conditions. A total of 1.4\% (350/24,847) of subscribers had one or more emergency transports in 30 days in the test set, which was significantly lower compared with the US data set (2455/109,966, 2.2\%). Performance of the predictive model of emergency hospital transport, as evaluated by area under the receiver operator characteristic curve (AUC), was 0.749 (95\% CI 0.721-0.777), which was similar to the US prediction model (AUC=0.778 [95\% CI 0.769-0.788]). The top 1\% (12/1161) of predicted high-risk patients were 10.7 times more likely to experience an emergency hospital transport in 30 days than the overall German PERS population. This lift was comparable to a model lift of 11.9 obtained by the US predictive model. Conclusions: Despite differences in emergency care use, PERS-based collected subscriber data can be used to predict use outcomes in different international settings. These predictive analytic tools can be used by health care organizations to extend population health management into the home by identifying and delivering timelier targeted interventions to high-risk patients. This could lead to overall improved patient experience, higher quality of care, and more efficient resource use. ", doi="10.2196/25121", url="https://medinform.jmir.org/2021/3/e25121", url="http://www.ncbi.nlm.nih.gov/pubmed/33682679" } @Article{info:doi/10.2196/26646, author="Maassen, Oliver and Fritsch, Sebastian and Palm, Julia and Deffge, Saskia and Kunze, Julian and Marx, Gernot and Riedel, Morris and Schuppert, Andreas and Bickenbach, Johannes", title="Future Medical Artificial Intelligence Application Requirements and Expectations of Physicians in German University Hospitals: Web-Based Survey", journal="J Med Internet Res", year="2021", month="Mar", day="5", volume="23", number="3", pages="e26646", keywords="artificial intelligence", keywords="AI", keywords="machine learning", keywords="algorithms", keywords="clinical decision support", keywords="physician", keywords="requirement", keywords="expectation", keywords="hospital care", abstract="Background: The increasing development of artificial intelligence (AI) systems in medicine driven by researchers and entrepreneurs goes along with enormous expectations for medical care advancement. AI might change the clinical practice of physicians from almost all medical disciplines and in most areas of health care. While expectations for AI in medicine are high, practical implementations of AI for clinical practice are still scarce in Germany. Moreover, physicians' requirements and expectations of AI in medicine and their opinion on the usage of anonymized patient data for clinical and biomedical research have not been investigated widely in German university hospitals. Objective: This study aimed to evaluate physicians' requirements and expectations of AI in medicine and their opinion on the secondary usage of patient data for (bio)medical research (eg, for the development of machine learning algorithms) in university hospitals in Germany. Methods: A web-based survey was conducted addressing physicians of all medical disciplines in 8 German university hospitals. Answers were given using Likert scales and general demographic responses. Physicians were asked to participate locally via email in the respective hospitals. Results: The online survey was completed by 303 physicians (female: 121/303, 39.9\%; male: 173/303, 57.1\%; no response: 9/303, 3.0\%) from a wide range of medical disciplines and work experience levels. Most respondents either had a positive (130/303, 42.9\%) or a very positive attitude (82/303, 27.1\%) towards AI in medicine. There was a significant association between the personal rating of AI in medicine and the self-reported technical affinity level (H4=48.3, P<.001). A vast majority of physicians expected the future of medicine to be a mix of human and artificial intelligence (273/303, 90.1\%) but also requested a scientific evaluation before the routine implementation of AI-based systems (276/303, 91.1\%). Physicians were most optimistic that AI applications would identify drug interactions (280/303, 92.4\%) to improve patient care substantially but were quite reserved regarding AI-supported diagnosis of psychiatric diseases (62/303, 20.5\%). Of the respondents, 82.5\% (250/303) agreed that there should be open access to anonymized patient databases for medical and biomedical research. Conclusions: Physicians in stationary patient care in German university hospitals show a generally positive attitude towards using most AI applications in medicine. Along with this optimism comes several expectations and hopes that AI will assist physicians in clinical decision making. Especially in fields of medicine where huge amounts of data are processed (eg, imaging procedures in radiology and pathology) or data are collected continuously (eg, cardiology and intensive care medicine), physicians' expectations of AI to substantially improve future patient care are high. In the study, the greatest potential was seen in the application of AI for the identification of drug interactions, assumedly due to the rising complexity of drug administration to polymorbid, polypharmacy patients. However, for the practical usage of AI in health care, regulatory and organizational challenges still have to be mastered. ", doi="10.2196/26646", url="https://www.jmir.org/2021/3/e26646", url="http://www.ncbi.nlm.nih.gov/pubmed/33666563" } @Article{info:doi/10.2196/24501, author="Lu{\vs}trek, Mitja and Bohanec, Marko and Cavero Barca, Carlos and Ciancarelli, Costanza Maria and Clays, Els and Dawodu, Adeyemo Amos and Derboven, Jan and De Smedt, Delphine and Dovgan, Erik and Lampe, Jure and Marino, Flavia and Mlakar, Miha and Pioggia, Giovanni and Puddu, Emilio Paolo and Rodr{\'i}guez, Mario Juan and Schiariti, Michele and Slapni{\v c}ar, Ga{\vs}per and Slegers, Karin and Tartarisco, Gennaro and Vali{\v c}, Jakob and Vodopija, Aljo{\vs}a", title="A Personal Health System for Self-Management of Congestive Heart Failure (HeartMan): Development, Technical Evaluation, and Proof-of-Concept Randomized Controlled Trial", journal="JMIR Med Inform", year="2021", month="Mar", day="5", volume="9", number="3", pages="e24501", keywords="congestive heart failure", keywords="personal health system", keywords="mobile application", keywords="mobile phone", keywords="wearable electronic devices", keywords="decision support techniques", keywords="psychological support", keywords="human centered design", abstract="Background: Congestive heart failure (CHF) is a disease that requires complex management involving multiple medications, exercise, and lifestyle changes. It mainly affects older patients with depression and anxiety, who commonly find management difficult. Existing mobile apps supporting the self-management of CHF have limited features and are inadequately validated. Objective: The HeartMan project aims to develop a personal health system that would comprehensively address CHF self-management by using sensing devices and artificial intelligence methods. This paper presents the design of the system and reports on the accuracy of its patient-monitoring methods, overall effectiveness, and patient perceptions. Methods: A mobile app was developed as the core of the HeartMan system, and the app was connected to a custom wristband and cloud services. The system features machine learning methods for patient monitoring: continuous blood pressure (BP) estimation, physical activity monitoring, and psychological profile recognition. These methods feed a decision support system that provides recommendations on physical health and psychological support. The system was designed using a human-centered methodology involving the patients throughout development. It was evaluated in a proof-of-concept trial with 56 patients. Results: Fairly high accuracy of the patient-monitoring methods was observed. The mean absolute error of BP estimation was 9.0 mm Hg for systolic BP and 7.0 mm Hg for diastolic BP. The accuracy of psychological profile detection was 88.6\%. The F-measure for physical activity recognition was 71\%. The proof-of-concept clinical trial in 56 patients showed that the HeartMan system significantly improved self-care behavior (P=.02), whereas depression and anxiety rates were significantly reduced (P<.001), as were perceived sexual problems (P=.01). According to the Unified Theory of Acceptance and Use of Technology questionnaire, a positive attitude toward HeartMan was seen among end users, resulting in increased awareness, self-monitoring, and empowerment. Conclusions: The HeartMan project combined a range of advanced technologies with human-centered design to develop a complex system that was shown to help patients with CHF. More psychological than physical benefits were observed. Trial Registration: ClinicalTrials.gov NCT03497871; https://clinicaltrials.gov/ct2/history/NCT03497871. International Registered Report Identifier (IRRID): RR2-10.1186/s12872-018-0921-2 ", doi="10.2196/24501", url="https://medinform.jmir.org/2021/3/e24501", url="http://www.ncbi.nlm.nih.gov/pubmed/33666562" } @Article{info:doi/10.2196/22923, author="Mogharbel, Asra and Dowding, Dawn and Ainsworth, John", title="Physicians' Use of the Computerized Physician Order Entry System for Medication Prescribing: Systematic Review", journal="JMIR Med Inform", year="2021", month="Mar", day="4", volume="9", number="3", pages="e22923", keywords="computerized physician order entry", keywords="CPOE", keywords="e-prescribing", keywords="system use", keywords="actual usage", keywords="systematic review", abstract="Background: Computerized physician order entry (CPOE) systems in health care settings have many benefits for prescribing medication, such as improved quality of patient care and patient safety. However, to achieve their full potential, the factors influencing the usage of CPOE systems by physicians must be identified and understood. Objective: The aim of this study is to identify the factors influencing the usage of CPOE systems by physicians for medication prescribing in their clinical practice. Methods: We conducted a systematic search of the literature on this topic using four databases: PubMed, CINAHL, Ovid MEDLINE, and Embase. Searches were performed from September 2019 to December 2019. The retrieved papers were screened by examining the titles and abstracts of relevant studies; two reviewers screened the full text of potentially relevant papers for inclusion in the review. Qualitative, quantitative, and mixed methods studies with the aim of conducting assessments or investigations of factors influencing the use of CPOE for medication prescribing among physicians were included. The identified factors were grouped based on constructs from two models: the unified theory of acceptance and use of technology model and the Delone and McLean Information System Success Model. We used the Mixed Method Appraisal Tool to assess the quality of the included studies and narrative synthesis to report the results. Results: A total of 11 articles were included in the review, and 37 factors related to the usage of CPOE systems were identified as the factors influencing how physicians used CPOE for medication prescribing. These factors represented three main themes: individual, technological, and organizational. Conclusions: This study identified the common factors that influenced the usage of CPOE systems by physicians for medication prescribing regardless of the type of setting or the duration of the use of a system by participants. Our findings can be used to inform implementation and support the usage of the CPOE system by physicians. ", doi="10.2196/22923", url="https://medinform.jmir.org/2021/3/e22923", url="http://www.ncbi.nlm.nih.gov/pubmed/33661126" } @Article{info:doi/10.2196/25505, author="Husain, Amna and Cohen, Eyal and Dubrowski, Raluca and Jamieson, Trevor and Kurahashi, Miyoshi Allison and Lokuge, Bhadra and Rapoport, Adam and Saunders, Stephanie and Stasiulis, Elaine and Stinson, Jennifer and Subramaniam, Saranjah and Wegier, Pete and Barwick, Melanie", title="A Clinical Communication Tool (Loop) for Team-Based Care in Pediatric and Adult Care Settings: Hybrid Mixed Methods Implementation Study", journal="J Med Internet Res", year="2021", month="Mar", day="3", volume="23", number="3", pages="e25505", keywords="coordination of care", keywords="complexity", keywords="internet communication technology", keywords="collaborative care", keywords="implementation science", keywords="theory of behavior", keywords="interprofessional team", keywords="patient engagement", keywords="social networking technology", keywords="user-centered design", keywords="Consolidated Framework for Implementation Research", keywords="Quality Improvement Framework", keywords="Implementation Outcome Taxonomy", abstract="Background: Communication within the circle of care is central to coordinated, safe, and effective care; yet patients, caregivers, and health care providers often experience poor communication and fragmented care. Through a sequential program of research, the Loop Research Collaborative developed a web-based, asynchronous clinical communication system for team-based care. Loop assembles the circle of care centered on a patient, in private networking spaces called Patient Loops. The patient, their caregiver, or both are part of the Patient Loop. The communication is threaded, it can be filtered and sorted in multiple ways, it is securely stored, and can be exported for upload to a medical record. Objective: The objective of this study was to implement and evaluate Loop. The study reporting adheres to the Standards for Reporting Implementation Research. Methods: The study was a hybrid type II mixed methods design to simultaneously evaluate Loop's clinical and implementation effectiveness, and implementation barriers and facilitators in 6 health care sites. Data included monthly user check-in interviews and bimonthly surveys to capture patient or caregiver experience of continuity of care, in-depth interviews to explore barriers and facilitators based on the Consolidated Framework for Implementation Research (CFIR), and Loop usage extracted directly from the Loop system. Results: We recruited 25 initiating health care providers across 6 sites who then identified patients or caregivers for recruitment. Of 147 patient or caregiver participants who were assessed and met screening criteria, 57 consented and 52 were enrolled on Loop, creating 52 Patient Loops. Across all Patient Loops, 96 additional health care providers consented to join the Loop teams. Loop usage was followed for up to 8 months. The median number of messages exchanged per team was 1 (range 0-28). The monthly check-in and CFIR interviews showed that although participants acknowledged that Loop could potentially fill a gap, existing modes of communication, workflows, incentives, and the lack of integration with the hospital electronic medical records and patient portals were barriers to its adoption. While participants acknowledged Loop's potential value for engaging the patient and caregiver, and for improving communication within the patient's circle of care, Loop's relative advantage was not realized during the study and there was insufficient tension for change. Missing data limited the analysis of continuity of care. Conclusions: Fundamental structural and implementation challenges persist toward realizing Loop's potential as a shared system of asynchronous communication. Barriers include health information system integration; system, organizational, and individual tension for change; and a fee structure for health care provider compensation for asynchronous communication. ", doi="10.2196/25505", url="https://www.jmir.org/2021/3/e25505", url="http://www.ncbi.nlm.nih.gov/pubmed/33656445" } @Article{info:doi/10.2196/26997, author="Liu, Taoran and Tsang, Winghei and Xie, Yifei and Tian, Kang and Huang, Fengqiu and Chen, Yanhui and Lau, Oiying and Feng, Guanrui and Du, Jianhao and Chu, Bojia and Shi, Tingyu and Zhao, Junjie and Cai, Yiming and Hu, Xueyan and Akinwunmi, Babatunde and Huang, Jian and Zhang, P. Casper J. and Ming, Wai-Kit", title="Preferences for Artificial Intelligence Clinicians Before and During the COVID-19 Pandemic: Discrete Choice Experiment and Propensity Score Matching Study", journal="J Med Internet Res", year="2021", month="Mar", day="2", volume="23", number="3", pages="e26997", keywords="propensity score matching", keywords="discrete latent traits", keywords="patients' preferences", keywords="artificial intelligence", keywords="COVID-19", keywords="preference", keywords="discrete choice", keywords="choice", keywords="traditional medicine", keywords="public health", keywords="resource", keywords="patient", keywords="diagnosis", keywords="accuracy", abstract="Background: Artificial intelligence (AI) methods can potentially be used to relieve the pressure that the COVID-19 pandemic has exerted on public health. In cases of medical resource shortages caused by the pandemic, changes in people's preferences for AI clinicians and traditional clinicians are worth exploring. Objective: We aimed to quantify and compare people's preferences for AI clinicians and traditional clinicians before and during the COVID-19 pandemic, and to assess whether people's preferences were affected by the pressure of pandemic. Methods: We used the propensity score matching method to match two different groups of respondents with similar demographic characteristics. Respondents were recruited in 2017 and 2020. A total of 2048 respondents (2017: n=1520; 2020: n=528) completed the questionnaire and were included in the analysis. Multinomial logit models and latent class models were used to assess people's preferences for different diagnosis methods. Results: In total, 84.7\% (1115/1317) of respondents in the 2017 group and 91.3\% (482/528) of respondents in the 2020 group were confident that AI diagnosis methods would outperform human clinician diagnosis methods in the future. Both groups of matched respondents believed that the most important attribute of diagnosis was accuracy, and they preferred to receive combined diagnoses from both AI and human clinicians (2017: odds ratio [OR] 1.645, 95\% CI 1.535-1.763; P<.001; 2020: OR 1.513, 95\% CI 1.413-1.621; P<.001; reference: clinician diagnoses). The latent class model identified three classes with different attribute priorities. In class 1, preferences for combined diagnoses and accuracy remained constant in 2017 and 2020, and high accuracy (eg, 100\% accuracy in 2017: OR 1.357, 95\% CI 1.164-1.581) was preferred. In class 2, the matched data from 2017 were similar to those from 2020; combined diagnoses from both AI and human clinicians (2017: OR 1.204, 95\% CI 1.039-1.394; P=.011; 2020: OR 2.009, 95\% CI 1.826-2.211; P<.001; reference: clinician diagnoses) and an outpatient waiting time of 20 minutes (2017: OR 1.349, 95\% CI 1.065-1.708; P<.001; 2020: OR 1.488, 95\% CI 1.287-1.721; P<.001; reference: 0 minutes) were consistently preferred. In class 3, the respondents in the 2017 and 2020 groups preferred different diagnosis methods; respondents in the 2017 group preferred clinician diagnoses, whereas respondents in the 2020 group preferred AI diagnoses. In the latent class, which was stratified according to sex, all male and female respondents in the 2017 and 2020 groups believed that accuracy was the most important attribute of diagnosis. Conclusions: Individuals' preferences for receiving clinical diagnoses from AI and human clinicians were generally unaffected by the pandemic. Respondents believed that accuracy and expense were the most important attributes of diagnosis. These findings can be used to guide policies that are relevant to the development of AI-based health care. ", doi="10.2196/26997", url="https://www.jmir.org/2021/3/e26997", url="http://www.ncbi.nlm.nih.gov/pubmed/33556034" } @Article{info:doi/10.2196/25235, author="Liverpool, Shaun and Edbrooke-Childs, Julian", title="Feasibility and Acceptability of a Digital Intervention to Support Shared Decision-making in Children's and Young People's Mental Health: Mixed Methods Pilot Randomized Controlled Trial", journal="JMIR Form Res", year="2021", month="Mar", day="2", volume="5", number="3", pages="e25235", keywords="mental health", keywords="pilot projects", keywords="child", keywords="adolescent", keywords="parents", keywords="shared decision making", abstract="Background: Interventions to involve parents in decisions regarding children's and young people's mental health are associated with positive outcomes. However, appropriately planning effectiveness studies is critical to ensure that meaningful evidence is collected. It is important to conduct pilot studies to evaluate the feasibility and acceptability of the intervention itself and the feasibility of the protocol to test effectiveness. Objective: This paper reports the findings from a feasibility and acceptability study of Power Up for Parents, an intervention to promote shared decision-making (SDM) and support parents and caregivers making decisions regarding children's and young people's mental health. Methods: A mixed method study design was adopted. In stage 1, health care professionals and parents provided feedback on acceptability, usefulness, and suggestions for further development. Stage 2 was a multicenter, 3-arm, individual, and cluster randomized controlled pilot feasibility trial with parents accessing services related to children's and young people's mental health. Outcome measures collected data on demographics, participation rates, SDM, satisfaction, and parents' anxiety. Qualitative data were analyzed using thematic analysis. Google Analytics estimates were used to report engagement with the prototype. Outcomes from both stages were tested against a published set of criteria for proceeding to a randomized controlled trial. Results: Despite evidence suggesting the acceptability of Power Up for Parents, the findings suggest that recruitment modifications are needed to enhance the feasibility of collecting follow-up data before scaling up to a fully powered randomized controlled trial. On the basis of the Go or No-Go criteria, only 50\% (6/12) of the sites successfully recruited participants, and only 38\% (16/42) of parents completed follow-up measures. Nonetheless, health care practitioners and parents generally accessed and used the intervention. Themes describing appearance and functionality, perceived need and general helpfulness, accessibility and appropriateness, and a wish list for improvement emerged, providing valuable information to inform future development and refinement of the intervention. Conclusions: Owing to the high attrition observed in the trial, proceeding directly to a full randomized controlled trial may not be feasible with this recruitment strategy. Nonetheless, with some minor adjustments and upgrades to the intervention, this pilot study provides a platform for future evaluations of Power Up for Parents. Trial Registration: International Standard Randomized Controlled Trial Number (ISRCTN) 39238984; http://www.isrctn.com/ISRCTN39238984. International Registered Report Identifier (IRRID): RR2-10.2196/14571 ", doi="10.2196/25235", url="https://formative.jmir.org/2021/3/e25235", url="http://www.ncbi.nlm.nih.gov/pubmed/33650973" } @Article{info:doi/10.2196/25635, author="Kim, HyungMin and Lee, Jung Sun and Park, Jin So and Choi, Young In and Hong, Sung-Hoo", title="Machine Learning Approach to Predict the Probability of Recurrence of Renal Cell Carcinoma After Surgery: Prediction Model Development Study", journal="JMIR Med Inform", year="2021", month="Mar", day="1", volume="9", number="3", pages="e25635", keywords="renal cell carcinoma", keywords="recurrence", keywords="machine learning", keywords="na{\"i}ve Bayes", keywords="algorithm", keywords="cancer", keywords="surgery", keywords="web-based", keywords="database", keywords="prediction", keywords="probability", keywords="carcinoma", keywords="kidney", keywords="model", keywords="development", abstract="Background: Renal cell carcinoma (RCC) has a high recurrence rate of 20\% to 30\% after nephrectomy for clinically localized disease, and more than 40\% of patients eventually die of the disease, making regular monitoring and constant management of utmost importance. Objective: The objective of this study was to develop an algorithm that predicts the probability of recurrence of RCC within 5 and 10 years of surgery. Methods: Data from 6849 Korean patients with RCC were collected from eight tertiary care hospitals listed in the KOrean Renal Cell Carcinoma (KORCC) web-based database. To predict RCC recurrence, analytical data from 2814 patients were extracted from the database. Eight machine learning algorithms were used to predict the probability of RCC recurrence, and the results were compared. Results: Within 5 years of surgery, the highest area under the receiver operating characteristic curve (AUROC) was obtained from the na{\"i}ve Bayes (NB) model, with a value of 0.836. Within 10 years of surgery, the highest AUROC was obtained from the NB model, with a value of 0.784. Conclusions: An algorithm was developed that predicts the probability of RCC recurrence within 5 and 10 years using the KORCC database, a large-scale RCC cohort in Korea. It is expected that the developed algorithm will help clinicians manage prognosis and establish customized treatment strategies for patients with RCC after surgery. ", doi="10.2196/25635", url="https://medinform.jmir.org/2021/3/e25635", url="http://www.ncbi.nlm.nih.gov/pubmed/33646127" } @Article{info:doi/10.2196/23458, author="Ikemura, Kenji and Bellin, Eran and Yagi, Yukako and Billett, Henny and Saada, Mahmoud and Simone, Katelyn and Stahl, Lindsay and Szymanski, James and Goldstein, Y. D. and Reyes Gil, Morayma", title="Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study", journal="J Med Internet Res", year="2021", month="Feb", day="26", volume="23", number="2", pages="e23458", keywords="automated machine learning", keywords="COVID-19", keywords="biomarker", keywords="ranking", keywords="decision support tool", keywords="machine learning", keywords="decision support", keywords="Shapley additive explanation", keywords="partial dependence plot", keywords="dimensionality reduction", abstract="Background: During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. Objective: In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients' chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. Methods: Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients' data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. Results: Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). Conclusions: We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning--based clinical decision support tools. ", doi="10.2196/23458", url="https://www.jmir.org/2021/2/e23458", url="http://www.ncbi.nlm.nih.gov/pubmed/33539308" } @Article{info:doi/10.2196/20298, author="Hu, Mingyue and Shu, Xinhui and Yu, Gang and Wu, Xinyin and V{\"a}lim{\"a}ki, Maritta and Feng, Hui", title="A Risk Prediction Model Based on Machine Learning for Cognitive Impairment Among Chinese Community-Dwelling Elderly People With Normal Cognition: Development and Validation Study", journal="J Med Internet Res", year="2021", month="Feb", day="24", volume="23", number="2", pages="e20298", keywords="prediction model", keywords="cognitive impairment", keywords="machine learning", keywords="nomogram", abstract="Background: Identifying cognitive impairment early enough could support timely intervention that may hinder or delay the trajectory of cognitive impairment, thus increasing the chances for successful cognitive aging. Objective: We aimed to build a prediction model based on machine learning for cognitive impairment among Chinese community-dwelling elderly people with normal cognition. Methods: A prospective cohort of 6718 older people from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) register, followed between 2008 and 2011, was used to develop and validate the prediction model. Participants were included if they were aged 60 years or above, were community-dwelling elderly people, and had a cognitive Mini-Mental State Examination (MMSE) score ?18. They were excluded if they were diagnosed with a severe disease (eg, cancer and dementia) or were living in institutions. Cognitive impairment was identified using the Chinese version of the MMSE. Several machine learning algorithms (random forest, XGBoost, na{\"i}ve Bayes, and logistic regression) were used to assess the 3-year risk of developing cognitive impairment. Optimal cutoffs and adjusted parameters were explored in validation data, and the model was further evaluated in test data. A nomogram was established to vividly present the prediction model. Results: The mean age of the participants was 80.4 years (SD 10.3 years), and 50.85\% (3416/6718) were female. During a 3-year follow-up, 991 (14.8\%) participants were identified with cognitive impairment. Among 45 features, the following four features were finally selected to develop the model: age, instrumental activities of daily living, marital status, and baseline cognitive function. The concordance index of the model constructed by logistic regression was 0.814 (95\% CI 0.781-0.846). Older people with normal cognitive functioning having a nomogram score of less than 170 were considered to have a low 3-year risk of cognitive impairment, and those with a score of 170 or greater were considered to have a high 3-year risk of cognitive impairment. Conclusions: This simple and feasible cognitive impairment prediction model could identify community-dwelling elderly people at the greatest 3-year risk for cognitive impairment, which could help community nurses in the early identification of dementia. ", doi="10.2196/20298", url="https://www.jmir.org/2021/2/e20298", url="http://www.ncbi.nlm.nih.gov/pubmed/33625369" } @Article{info:doi/10.2196/19306, author="Spinazze, Pier and Aardoom, Jiska and Chavannes, Niels and Kasteleyn, Marise", title="The Computer Will See You Now: Overcoming Barriers to Adoption of Computer-Assisted History Taking (CAHT) in Primary Care", journal="J Med Internet Res", year="2021", month="Feb", day="24", volume="23", number="2", pages="e19306", keywords="computer-assisted history taking", keywords="history taking", keywords="clinical consultation", keywords="digital health", keywords="electronic health record", keywords="patient-provided health information", doi="10.2196/19306", url="https://www.jmir.org/2021/2/e19306", url="http://www.ncbi.nlm.nih.gov/pubmed/33625360" } @Article{info:doi/10.2196/18766, author="Frias, Mario and Moyano, M. Jose and Rivero-Juarez, Antonio and Luna, M. Jose and Camacho, {\'A}ngela and Fardoun, M. Habib and Machuca, Isabel and Al-Twijri, Mohamed and Rivero, Antonio and Ventura, Sebastian", title="Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach", journal="J Med Internet Res", year="2021", month="Feb", day="24", volume="23", number="2", pages="e18766", keywords="HIV/HCV", keywords="data mining", keywords="PART", keywords="ensemble", keywords="classification accuracy", abstract="Background: The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology. Objective: The aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied. Methods: We built predictive models using different subsets of factors, selected according to their importance in predicting patient classification. We then evaluated each independent model and also a combination of them, leading to a better predictive model. Results: Our data mining approach identified genetic patterns that escaped detection using conventional statistics. More specifically, the partial decision trees and ensemble models increased the classification accuracy of hepatitis C virus outcome compared with conventional methods. Conclusions: Data mining can be used more extensively in biomedicine, facilitating knowledge building and management of human diseases. ", doi="10.2196/18766", url="https://www.jmir.org/2021/2/e18766", url="http://www.ncbi.nlm.nih.gov/pubmed/33624609" } @Article{info:doi/10.2196/22841, author="Liu, Taoran and Tsang, Winghei and Huang, Fengqiu and Lau, Ying Oi and Chen, Yanhui and Sheng, Jie and Guo, Yiwei and Akinwunmi, Babatunde and Zhang, JP Casper and Ming, Wai-Kit", title="Patients' Preferences for Artificial Intelligence Applications Versus Clinicians in Disease Diagnosis During the SARS-CoV-2 Pandemic in China: Discrete Choice Experiment", journal="J Med Internet Res", year="2021", month="Feb", day="23", volume="23", number="2", pages="e22841", keywords="discrete choice experiment", keywords="artificial intelligence", keywords="patient preference", keywords="multinomial logit analysis", keywords="questionnaire", keywords="latent-class conditional logit", keywords="app", keywords="human clinicians", keywords="diagnosis", keywords="COVID-19", keywords="China", abstract="Background: Misdiagnosis, arbitrary charges, annoying queues, and clinic waiting times among others are long-standing phenomena in the medical industry across the world. These factors can contribute to patient anxiety about misdiagnosis by clinicians. However, with the increasing growth in use of big data in biomedical and health care communities, the performance of artificial intelligence (Al) techniques of diagnosis is improving and can help avoid medical practice errors, including under the current circumstance of COVID-19. Objective: This study aims to visualize and measure patients' heterogeneous preferences from various angles of AI diagnosis versus clinicians in the context of the COVID-19 epidemic in China. We also aim to illustrate the different decision-making factors of the latent class of a discrete choice experiment (DCE) and prospects for the application of AI techniques in judgment and management during the pandemic of SARS-CoV-2 and in the future. Methods: A DCE approach was the main analysis method applied in this paper. Attributes from different dimensions were hypothesized: diagnostic method, outpatient waiting time, diagnosis time, accuracy, follow-up after diagnosis, and diagnostic expense. After that, a questionnaire is formed. With collected data from the DCE questionnaire, we apply Sawtooth software to construct a generalized multinomial logit (GMNL) model, mixed logit model, and latent class model with the data sets. Moreover, we calculate the variables' coefficients, standard error, P value, and odds ratio (OR) and form a utility report to present the importance and weighted percentage of attributes. Results: A total of 55.8\% of the respondents (428 out of 767) opted for AI diagnosis regardless of the description of the clinicians. In the GMNL model, we found that people prefer the 100\% accuracy level the most (OR 4.548, 95\% CI 4.048-5.110, P<.001). For the latent class model, the most acceptable model consists of 3 latent classes of respondents. The attributes with the most substantial effects and highest percentage weights are the accuracy (39.29\% in general) and expense of diagnosis (21.69\% in general), especially the preferences for the diagnosis ``accuracy'' attribute, which is constant across classes. For class 1 and class 3, people prefer the AI + clinicians method (class 1: OR 1.247, 95\% CI 1.036-1.463, P<.001; class 3: OR 1.958, 95\% CI 1.769-2.167, P<.001). For class 2, people prefer the AI method (OR 1.546, 95\% CI 0.883-2.707, P=.37). The OR of levels of attributes increases with the increase of accuracy across all classes. Conclusions: Latent class analysis was prominent and useful in quantifying preferences for attributes of diagnosis choice. People's preferences for the ``accuracy'' and ``diagnostic expenses'' attributes are palpable. AI will have a potential market. However, accuracy and diagnosis expenses need to be taken into consideration. ", doi="10.2196/22841", url="https://www.jmir.org/2021/2/e22841", url="http://www.ncbi.nlm.nih.gov/pubmed/33493130" } @Article{info:doi/10.2196/23026, author="Sang, Shengtian and Sun, Ran and Coquet, Jean and Carmichael, Harris and Seto, Tina and Hernandez-Boussard, Tina", title="Learning From Past Respiratory Infections to Predict COVID-19 Outcomes: Retrospective Study", journal="J Med Internet Res", year="2021", month="Feb", day="22", volume="23", number="2", pages="e23026", keywords="COVID-19", keywords="invasive mechanical ventilation", keywords="all-cause mortality", keywords="machine learning", keywords="artificial intelligence", keywords="respiratory", keywords="infection", keywords="outcome", keywords="data", keywords="feasibility", keywords="framework", abstract="Background: For the clinical care of patients with well-established diseases, randomized trials, literature, and research are supplemented with clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, artificial intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, a lack of clinical data restricts the design and development of such AI tools, particularly in preparation for an impending crisis or pandemic. Objective: This study aimed to develop and test the feasibility of a ``patients-like-me'' framework to predict the deterioration of patients with COVID-19 using a retrospective cohort of patients with similar respiratory diseases. Methods: Our framework used COVID-19--like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-19--like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) at an academic medical center from 2008 to 2019. In total, 15 training cohorts were created using different combinations of the COVID-19--like cohorts with the ARDS cohort for exploratory purposes. In this study, two machine learning models were developed: one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value. We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features. Results: Compared to the COVID-19--like cohorts (n=16,509), the patients hospitalized with COVID-19 (n=159) were significantly younger, with a higher proportion of patients of Hispanic ethnicity, a lower proportion of patients with smoking history, and fewer patients with comorbidities (P<.001). Patients with COVID-19 had a lower IMV rate (15.1 versus 23.2, P=.02) and shorter time to IMV (2.9 versus 4.1 days, P<.001) compared to the COVID-19--like patients. In the COVID-19--like training data, the top models achieved excellent performance (AUROC>0.90). Validating in the COVID-19 cohort, the top-performing model for predicting IMV was the XGBoost model (AUROC=0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all 4 COVID-19--like cohorts without ARDS achieved the best performance (AUROC=0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood cell count, cardiac troponin, albumin, etc). Our models had class imbalance, which resulted in high negative predictive values and low positive predictive values. Conclusions: We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic. ", doi="10.2196/23026", url="https://www.jmir.org/2021/2/e23026", url="http://www.ncbi.nlm.nih.gov/pubmed/33534724" } @Article{info:doi/10.2196/23147, author="Jo, Yong-Yeon and Han, JaiHong and Park, Woo Hyun and Jung, Hyojung and Lee, Dong Jae and Jung, Jipmin and Cha, Soung Hyo and Sohn, Kyung Dae and Hwangbo, Yul", title="Prediction of Prolonged Length of Hospital Stay After Cancer Surgery Using Machine Learning on Electronic Health Records: Retrospective Cross-sectional Study", journal="JMIR Med Inform", year="2021", month="Feb", day="22", volume="9", number="2", pages="e23147", keywords="postoperative length of stay", keywords="cancer surgery", keywords="machine learning", keywords="electronic health records", abstract="Background: Postoperative length of stay is a key indicator in the management of medical resources and an indirect predictor of the incidence of surgical complications and the degree of recovery of the patient after cancer surgery. Recently, machine learning has been used to predict complex medical outcomes, such as prolonged length of hospital stay, using extensive medical information. Objective: The objective of this study was to develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach. Methods: In our retrospective study, electronic health records (EHRs) from 42,751 patients who underwent primary surgery for 17 types of cancer between January 1, 2000, and December 31, 2017, were sourced from a single cancer center. The EHRs included numerous variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multilayer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer was defined as bed-days of the group of patients who accounted for the top 50\% of the distribution of bed-days by cancer type. Results: In the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrated excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve [AUC] >0.85). A moderate performance (AUC 0.70-0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases each, the extreme gradient boosting classifier model showed slightly better performance than the logistic regression model, although the logistic regression model also performed adequately. We identified risk variables for the prediction of prolonged postoperative length of stay for each type of cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables. Conclusions: A machine learning approach using EHRs may improve the prediction of prolonged length of hospital stay after primary cancer surgery. This algorithm may help to provide a more effective allocation of medical resources in cancer surgery. ", doi="10.2196/23147", url="https://medinform.jmir.org/2021/2/e23147", url="http://www.ncbi.nlm.nih.gov/pubmed/33616544" } @Article{info:doi/10.2196/26257, author="Cho, Sung-Yeon and Park, Sung-Soo and Song, Min-Kyu and Bae, Yi Young and Lee, Dong-Gun and Kim, Dong-Wook", title="Prognosis Score System to Predict Survival for COVID-19 Cases: a Korean Nationwide Cohort Study", journal="J Med Internet Res", year="2021", month="Feb", day="22", volume="23", number="2", pages="e26257", keywords="COVID-19", keywords="length of stay", keywords="mortality", keywords="prognosis", keywords="triage", keywords="digital health", keywords="prediction", keywords="cohort", keywords="risk", keywords="allocation", keywords="disease management", keywords="intensive care", keywords="decision making", abstract="Background: As the COVID-19 pandemic continues, an initial risk-adapted allocation is crucial for managing medical resources and providing intensive care. Objective: In this study, we aimed to identify factors that predict the overall survival rate for COVID-19 cases and develop a COVID-19 prognosis score (COPS) system based on these factors. In addition, disease severity and the length of hospital stay for patients with COVID-19 were analyzed. Methods: We retrospectively analyzed a nationwide cohort of laboratory-confirmed COVID-19 cases between January and April 2020 in Korea. The cohort was split randomly into a development cohort and a validation cohort with a 2:1 ratio. In the development cohort (n=3729), we tried to identify factors associated with overall survival and develop a scoring system to predict the overall survival rate by using parameters identified by the Cox proportional hazard regression model with bootstrapping methods. In the validation cohort (n=1865), we evaluated the prediction accuracy using the area under the receiver operating characteristic curve. The score of each variable in the COPS system was rounded off following the log-scaled conversion of the adjusted hazard ratio. Results: Among the 5594 patients included in this analysis, 234 (4.2\%) died after receiving a COVID-19 diagnosis. In the development cohort, six parameters were significantly related to poor overall survival: older age, dementia, chronic renal failure, dyspnea, mental disturbance, and absolute lymphocyte count <1000/mm3. The following risk groups were formed: low-risk (score 0-2), intermediate-risk (score 3), high-risk (score 4), and very high-risk (score 5-7) groups. The COPS system yielded an area under the curve value of 0.918 for predicting the 14-day survival rate and 0.896 for predicting the 28-day survival rate in the validation cohort. Using the COPS system, 28-day survival rates were discriminatively estimated at 99.8\%, 95.4\%, 82.3\%, and 55.1\% in the low-risk, intermediate-risk, high-risk, and very high-risk groups, respectively, of the total cohort (P<.001). The length of hospital stay and disease severity were directly associated with overall survival (P<.001), and the hospital stay duration was significantly longer among survivors (mean 26.1, SD 10.7 days) than among nonsurvivors (mean 15.6, SD 13.3 days). Conclusions: The newly developed predictive COPS system may assist in making risk-adapted decisions for the allocation of medical resources, including intensive care, during the COVID-19 pandemic. ", doi="10.2196/26257", url="https://www.jmir.org/2021/2/e26257", url="http://www.ncbi.nlm.nih.gov/pubmed/33539312" } @Article{info:doi/10.2196/20545, author="Barr, J. Paul and Ryan, James and Jacobson, C. Nicholas", title="Precision Assessment of COVID-19 Phenotypes Using Large-Scale Clinic Visit Audio Recordings: Harnessing the Power of Patient Voice", journal="J Med Internet Res", year="2021", month="Feb", day="19", volume="23", number="2", pages="e20545", keywords="communication", keywords="coronavirus", keywords="COVID-19", keywords="Machine Learning", keywords="natural language processing", keywords="patient-physician communication", keywords="patient records", keywords="recording", doi="10.2196/20545", url="http://www.jmir.org/2021/2/e20545/", url="http://www.ncbi.nlm.nih.gov/pubmed/33556031" } @Article{info:doi/10.2196/23606, author="Zhang, Yaqi and Han, Yongxia and Gao, Peng and Mo, Yifu and Hao, Shiying and Huang, Jia and Ye, Fangfan and Li, Zhen and Zheng, Le and Yao, Xiaoming and Li, Xiaodong and Wang, Xiaofang and Huang, Chao-Jung and Jin, Bo and Zhang, Yani and Yang, Gabriel and Alfreds, T. Shaun and Kanov, Laura and Sylvester, G. Karl and Widen, Eric and Li, Licheng and Ling, Xuefeng", title="Electronic Health Record--Based Prediction of 1-Year Risk of Incident Cardiac Dysrhythmia: Prospective Case-Finding Algorithm Development and Validation Study", journal="JMIR Med Inform", year="2021", month="Feb", day="17", volume="9", number="2", pages="e23606", keywords="cardiac dysrhythmia", keywords="prospective case finding", keywords="risk stratification", keywords="electronic health records", abstract="Background: Cardiac dysrhythmia is currently an extremely common disease. Severe arrhythmias often cause a series of complications, including congestive heart failure, fainting or syncope, stroke, and sudden death. Objective: The aim of this study was to predict incident arrhythmia prospectively within a 1-year period to provide early warning of impending arrhythmia. Methods: Retrospective (1,033,856 individuals enrolled between October 1, 2016, and October 1, 2017) and prospective (1,040,767 individuals enrolled between October 1, 2017, and October 1, 2018) cohorts were constructed from integrated electronic health records in Maine, United States. An ensemble learning workflow was built through multiple machine learning algorithms. Differentiating features, including acute and chronic diseases, procedures, health status, laboratory tests, prescriptions, clinical utilization indicators, and socioeconomic determinants, were compiled for incident arrhythmia assessment. The predictive model was retrospectively trained and calibrated using an isotonic regression method and was prospectively validated. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Results: The cardiac dysrhythmia case-finding algorithm (retrospective: AUROC 0.854; prospective: AUROC 0.827) stratified the population into 5 risk groups: 53.35\% (555,233/1,040,767), 44.83\% (466,594/1,040,767), 1.76\% (18,290/1,040,767), 0.06\% (623/1,040,767), and 0.003\% (27/1,040,767) were in the very low-risk, low-risk, medium-risk, high-risk, and very high-risk groups, respectively; 51.85\% (14/27) patients in the very high-risk subgroup were confirmed to have incident cardiac dysrhythmia within the subsequent 1 year. Conclusions: Our case-finding algorithm is promising for prospectively predicting 1-year incident cardiac dysrhythmias in a general population, and we believe that our case-finding algorithm can serve as an early warning system to allow statewide population-level screening and surveillance to improve cardiac dysrhythmia care. ", doi="10.2196/23606", url="http://medinform.jmir.org/2021/2/e23606/", url="http://www.ncbi.nlm.nih.gov/pubmed/33595452" } @Article{info:doi/10.2196/21615, author="Desai, Varma Anjali and Michael, L. Chelsea and Kuperman, J. Gilad and Jordan, Gregory and Mittelstaedt, Haley and Epstein, S. Andrew and Connor, MaryAnn and B Villar, Paula Rika and Bernal, Camila and Kramer, Dana and Davis, Elizabeth Mary and Chen, Yuxiao and Malisse, Catherine and Markose, Gigi and Nelson, E. Judith", title="A Novel Patient Values Tab for the Electronic Health Record: A User-Centered Design Approach", journal="J Med Internet Res", year="2021", month="Feb", day="17", volume="23", number="2", pages="e21615", keywords="electronic health record", keywords="health informatics", keywords="supportive care", keywords="palliative care", keywords="oncology", abstract="Background: The COVID-19 pandemic has shined a harsh light on a critical deficiency in our health care system: our inability to access important information about patients' values, goals, and preferences in the electronic health record (EHR). At Memorial Sloan Kettering Cancer Center (MSK), we have integrated and systematized health-related values discussions led by oncology nurses for newly diagnosed cancer patients as part of routine comprehensive cancer care. Such conversations include not only the patient's wishes for care at the end of life but also more holistic personal values, including sources of strength, concerns, hopes, and their definition of an acceptable quality of life. In addition, health care providers use a structured template to document their discussions of patient goals of care. Objective: To provide ready access to key information about the patient as a person with individual values, goals, and preferences, we undertook the creation of the Patient Values Tab in our center's EHR to display this information in a single, central location. Here, we describe the interprofessional, interdisciplinary, iterative process and user-centered design methodology that we applied to build this novel functionality as well as our initial implementation experience and plans for evaluation. Methods: We first convened a working group of experts from multiple departments, including medical oncology, health informatics, information systems, nursing informatics, nursing education, and supportive care, and a user experience designer. We conducted in-depth, semistructured, audiorecorded interviews of over 100 key stakeholders. The working group sought consensus on the tab's main content, homing in on high-priority areas identified by the stakeholders. The core content was mapped to various EHR data sources. We established a set of high-level design principles to guide our process. Our user experience designer then created wireframes of the tab design. The designer conducted usability testing with physicians, nurses, and other health professionals. Data validation testing was conducted. Results: We have already deployed the Patient Values Tab to a pilot sample of users in the MSK Gastrointestinal Medical Oncology Service, including physicians, advanced practice providers, nurses, and administrative staff. We have early evidence of the positive impact of this EHR innovation. Audit logs show increasing use. Many of the initial user comments have been enthusiastically positive, while others have provided constructive suggestions for additional tab refinements with respect to format and content. Conclusions: It is our challenge and obligation to enrich the EHR with information about the patient as a person. Realization of this capability is a pressing public health need requiring the collaboration of technological experts with a broad range of clinical leaders, users, patients, and families to achieve solutions that are both principled and practical. Our new Patient Values Tab represents a step forward in this important direction. ", doi="10.2196/21615", url="http://www.jmir.org/2021/2/e21615/", url="http://www.ncbi.nlm.nih.gov/pubmed/33595448" } @Article{info:doi/10.2196/18840, author="Walkey, J. Allan and Bashar, K. Syed and Hossain, Billal Md and Ding, Eric and Albuquerque, Daniella and Winter, Michael and Chon, H. Ki and McManus, D. David", title="Development and Validation of an Automated Algorithm to Detect Atrial Fibrillation Within Stored Intensive Care Unit Continuous Electrocardiographic Data: Observational Study", journal="JMIR Cardio", year="2021", month="Feb", day="15", volume="5", number="1", pages="e18840", keywords="atrial fibrillation", keywords="sepsis", keywords="intensive care unit", keywords="big data", keywords="data science", abstract="Background: Atrial fibrillation (AF) is the most common arrhythmia during critical illness, representing a sepsis-defining cardiac dysfunction associated with adverse outcomes. Large burdens of premature beats and noisy signal during sepsis may pose unique challenges to automated AF detection. Objective: The objective of this study is to develop and validate an automated algorithm to accurately identify AF within electronic health care data among critically ill patients with sepsis. Methods: This is a retrospective cohort study of patients hospitalized with sepsis identified from Medical Information Mart for Intensive Care (MIMIC III) electronic health data with linked electrocardiographic (ECG) telemetry waveforms. Within 3 separate cohorts of 50 patients, we iteratively developed and validated an automated algorithm that identifies ECG signals, removes noise, and identifies irregular rhythm and premature beats in order to identify AF. We compared the automated algorithm to current methods of AF identification in large databases, including ICD-9 (International Classification of Diseases, 9th edition) codes and hourly nurse annotation of heart rhythm. Methods of AF identification were tested against gold-standard manual ECG review. Results: AF detection algorithms that did not differentiate AF from premature atrial and ventricular beats performed modestly, with 76\% (95\% CI 61\%-87\%) accuracy. Performance improved (P=.02) with the addition of premature beat detection (validation set accuracy: 94\% [95\% CI 83\%-99\%]). Median time between automated and manual detection of AF onset was 30 minutes (25th-75th percentile 0-208 minutes). The accuracy of ICD-9 codes (68\%; P=.002 vs automated algorithm) and nurse charting (80\%; P=.02 vs algorithm) was lower than that of the automated algorithm. Conclusions: An automated algorithm using telemetry ECG data can feasibly and accurately detect AF among critically ill patients with sepsis, and represents an improvement in AF detection within large databases. ", doi="10.2196/18840", url="http://cardio.jmir.org/2021/1/e18840/", url="http://www.ncbi.nlm.nih.gov/pubmed/33587041" } @Article{info:doi/10.2196/21401, author="Yang, Hsuan-Chia and Islam, Mohaimenul Md and Nguyen, Alex Phung Anh and Wang, Ching-Huan and Poly, Nasrin Tahmina and Huang, Chih-Wei and Li, Jack Yu-Chuan", title="Development of a Web-Based System for Exploring Cancer Risk With Long-term Use of Drugs: Logistic Regression Approach", journal="JMIR Public Health Surveill", year="2021", month="Feb", day="15", volume="7", number="2", pages="e21401", keywords="cancer", keywords="risk", keywords="prevention", keywords="chemoprevention", keywords="long-term--use drugs", keywords="drug", keywords="epidemiology", keywords="temporal model", keywords="modeling", keywords="web-based system", abstract="Background: Existing epidemiological evidence regarding the association between the long-term use of drugs and cancer risk remains controversial. Objective: We aimed to have a comprehensive view of the cancer risk of the long-term use of drugs. Methods: A nationwide population-based, nested, case-control study was conducted within the National Health Insurance Research Database sample cohort of 1999 to 2013 in Taiwan. We identified cases in adults aged 20 years and older who were receiving treatment for at least two months before the index date. We randomly selected control patients from the patients without a cancer diagnosis during the 15 years (1999-2013) of the study period. Case and control patients were matched 1:4 based on age, sex, and visit date. Conditional logistic regression was used to estimate the association between drug exposure and cancer risk by adjusting potential confounders such as drugs and comorbidities. Results: There were 79,245 cancer cases and 316,980 matched controls included in this study. Of the 45,368 associations, there were 2419, 1302, 662, and 366 associations found statistically significant at a level of P<.05, P<.01, P<.001, and P<.0001, respectively. Benzodiazepine derivatives were associated with an increased risk of brain cancer (adjusted odds ratio [AOR] 1.379, 95\% CI 1.138-1.670; P=.001). Statins were associated with a reduced risk of liver cancer (AOR 0.470, 95\% CI 0.426-0.517; P<.0001) and gastric cancer (AOR 0.781, 95\% CI 0.678-0.900; P<.001). Our web-based system, which collected comprehensive data of associations, contained 2 domains: (1) the drug and cancer association page and (2) the overview page. Conclusions: Our web-based system provides an overview of comprehensive quantified data of drug-cancer associations. With all the quantified data visualized, the system is expected to facilitate further research on cancer risk and prevention, potentially serving as a stepping-stone to consulting and exploring associations between the long-term use of drugs and cancer risk. ", doi="10.2196/21401", url="http://publichealth.jmir.org/2021/2/e21401/", url="http://www.ncbi.nlm.nih.gov/pubmed/33587043" } @Article{info:doi/10.2196/18372, author="Sheng, Qiuhua Jessica and Hu, Jen-Hwa Paul and Liu, Xiao and Huang, Ting-Shuo and Chen, Hsien Yu", title="Predictive Analytics for Care and Management of Patients With Acute Diseases: Deep Learning--Based Method to Predict Crucial Complication Phenotypes", journal="J Med Internet Res", year="2021", month="Feb", day="12", volume="23", number="2", pages="e18372", keywords="data analytics", keywords="neural networks", keywords="phenotype", keywords="deep learning", keywords="electronic health records", abstract="Background: Acute diseases present severe complications that develop rapidly, exhibit distinct phenotypes, and have profound effects on patient outcomes. Predictive analytics can enhance physicians' care and management of patients with acute diseases by predicting crucial complication phenotypes for a timely diagnosis and treatment. However, effective phenotype predictions require several challenges to be overcome. First, patient data collected in the early stages of an acute disease (eg, clinical data and laboratory results) are less informative for predicting phenotypic outcomes. Second, patient data are temporal and heterogeneous; for example, patients receive laboratory tests at different time intervals and frequencies. Third, imbalanced distributions of patient outcomes create additional complexity for predicting complication phenotypes. Objective: To predict crucial complication phenotypes among patients with acute diseases, we propose a novel, deep learning--based method that uses recurrent neural network--based sequence embedding to represent disease progression while considering temporal heterogeneities in patient data. Our method incorporates a latent regulator to alleviate data insufficiency constraints by accounting for the underlying mechanisms that are not observed in patient data. The proposed method also includes cost-sensitive learning to address imbalanced outcome distributions in patient data for improved predictions. Methods: From a major health care organization in Taiwan, we obtained a sample of 10,354 electronic health records that pertained to 6545 patients with peritonitis. The proposed method projects these temporal, heterogeneous, and clinical data into a substantially reduced feature space and then incorporates a latent regulator (latent parameter matrix) to obviate data insufficiencies and account for variations in phenotypic expressions. Moreover, our method employs cost-sensitive learning to further increase the predictive performance. Results: We evaluated the efficacy of the proposed method for predicting two hepatic complication phenotypes in patients with peritonitis: acute hepatic encephalopathy and hepatorenal syndrome. The following three benchmark techniques were evaluated: temporal multiple measurement case-based reasoning (MMCBR), temporal short long-term memory (T-SLTM) networks, and time fusion convolutional neural network (CNN). For acute hepatic encephalopathy predictions, our method attained an area under the curve (AUC) value of 0.82, which outperforms temporal MMCBR by 64\%, T-SLTM by 26\%, and time fusion CNN by 26\%. For hepatorenal syndrome predictions, our method achieved an AUC value of 0.64, which is 29\% better than that of temporal MMCBR (0.54). Overall, the evaluation results show that the proposed method significantly outperforms all the benchmarks, as measured by recall, F-measure, and AUC while maintaining comparable precision values. Conclusions: The proposed method learns a short-term temporal representation from patient data to predict complication phenotypes and offers greater predictive utilities than prevalent data-driven techniques. This method is generalizable and can be applied to different acute disease (illness) scenarios that are characterized by insufficient patient clinical data availability, temporal heterogeneities, and imbalanced distributions of important patient outcomes. ", doi="10.2196/18372", url="http://www.jmir.org/2021/2/e18372/", url="http://www.ncbi.nlm.nih.gov/pubmed/33576744" } @Article{info:doi/10.2196/24572, author="Quiroz, Carlos Juan and Feng, You-Zhen and Cheng, Zhong-Yuan and Rezazadegan, Dana and Chen, Ping-Kang and Lin, Qi-Ting and Qian, Long and Liu, Xiao-Fang and Berkovsky, Shlomo and Coiera, Enrico and Song, Lei and Qiu, Xiaoming and Liu, Sidong and Cai, Xiang-Ran", title="Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Feb", day="11", volume="9", number="2", pages="e24572", keywords="algorithm", keywords="clinical data", keywords="clinical features", keywords="COVID-19", keywords="CT scans", keywords="development", keywords="imaging", keywords="imbalanced data", keywords="machine learning", keywords="oversampling", keywords="severity assessment", keywords="validation", abstract="Background: COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective: This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods: Clinical data---including demographics, signs, symptoms, comorbidities, and blood test results---and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results: Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions: Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease. ", doi="10.2196/24572", url="http://medinform.jmir.org/2021/2/e24572/", url="http://www.ncbi.nlm.nih.gov/pubmed/33534723" } @Article{info:doi/10.2196/24246, author="Bolourani, Siavash and Brenner, Max and Wang, Ping and McGinn, Thomas and Hirsch, S. Jamie and Barnaby, Douglas and Zanos, P. Theodoros and ", title="A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation", journal="J Med Internet Res", year="2021", month="Feb", day="10", volume="23", number="2", pages="e24246", keywords="artificial intelligence", keywords="prognostic", keywords="model", keywords="pandemic", keywords="severe acute respiratory syndrome coronavirus 2", keywords="modeling", keywords="development", keywords="validation", keywords="COVID-19", keywords="machine learning", abstract="Background: Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. Objective: Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. Methods: Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1\%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. Results: The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. Conclusions: The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19. ", doi="10.2196/24246", url="http://www.jmir.org/2021/2/e24246/", url="http://www.ncbi.nlm.nih.gov/pubmed/33476281" } @Article{info:doi/10.2196/23693, author="Albahli, Saleh and Yar, Hassan Ghulam Nabi Ahmad", title="Fast and Accurate Detection of COVID-19 Along With 14 Other Chest Pathologies Using a Multi-Level Classification: Algorithm Development and Validation Study", journal="J Med Internet Res", year="2021", month="Feb", day="10", volume="23", number="2", pages="e23693", keywords="COVID-19", keywords="chest x-ray", keywords="convolutional neural network", keywords="data augmentation", keywords="biomedical imaging", keywords="automatic detection", abstract="Background: COVID-19 has spread very rapidly, and it is important to build a system that can detect it in order to help an overwhelmed health care system. Many research studies on chest diseases rely on the strengths of deep learning techniques. Although some of these studies used state-of-the-art techniques and were able to deliver promising results, these techniques are not very useful if they can detect only one type of disease without detecting the others. Objective: The main objective of this study was to achieve a fast and more accurate diagnosis of COVID-19. This study proposes a diagnostic technique that classifies COVID-19 x-ray images from normal x-ray images and those specific to 14 other chest diseases. Methods: In this paper, we propose a novel, multilevel pipeline, based on deep learning models, to detect COVID-19 along with other chest diseases based on x-ray images. This pipeline reduces the burden of a single network to classify a large number of classes. The deep learning models used in this study were pretrained on the ImageNet dataset, and transfer learning was used for fast training. The lungs and heart were segmented from the whole x-ray images and passed onto the first classifier that checks whether the x-ray is normal, COVID-19 affected, or characteristic of another chest disease. If it is neither a COVID-19 x-ray image nor a normal one, then the second classifier comes into action and classifies the image as one of the other 14 diseases. Results: We show how our model uses state-of-the-art deep neural networks to achieve classification accuracy for COVID-19 along with 14 other chest diseases and normal cases based on x-ray images, which is competitive with currently used state-of-the-art models. Due to the lack of data in some classes such as COVID-19, we applied 10-fold cross-validation through the ResNet50 model. Our classification technique thus achieved an average training accuracy of 96.04\% and test accuracy of 92.52\% for the first level of classification (ie, 3 classes). For the second level of classification (ie, 14 classes), our technique achieved a maximum training accuracy of 88.52\% and test accuracy of 66.634\% by using ResNet50. We also found that when all the 16 classes were classified at once, the overall accuracy for COVID-19 detection decreased, which in the case of ResNet50 was 88.92\% for training data and 71.905\% for test data. Conclusions: Our proposed pipeline can detect COVID-19 with a higher accuracy along with detecting 14 other chest diseases based on x-ray images. This is achieved by dividing the classification task into multiple steps rather than classifying them collectively. ", doi="10.2196/23693", url="http://www.jmir.org/2021/2/e23693/", url="http://www.ncbi.nlm.nih.gov/pubmed/33529154" } @Article{info:doi/10.2196/25457, author="Fernandes, Marta and Sun, Haoqi and Jain, Aayushee and Alabsi, S. Haitham and Brenner, N. Laura and Ye, Elissa and Ge, Wendong and Collens, I. Sarah and Leone, J. Michael and Das, Sudeshna and Robbins, K. Gregory and Mukerji, S. Shibani and Westover, Brandon M.", title="Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing", journal="JMIR Med Inform", year="2021", month="Feb", day="10", volume="9", number="2", pages="e25457", keywords="ICU", keywords="coronavirus", keywords="electronic health record", keywords="unstructured text", keywords="natural language processing", keywords="BoW", keywords="LASSO", keywords="feature selection", keywords="machine learning", keywords="intensive care unit", keywords="COVID-19", keywords="EHR", abstract="Background: Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19. Objective: Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes. Methods: Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women's Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70\%) and hold-out test set (30\%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10\% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set. Results: The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55\% men; 45\% White and 16\% Black; 14\% nonsurvivors and 61\% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: ``appointments specialty,'' ``home health,'' and ``home care'' (home); ``intubate'' and ``ARDS'' (inpatient rehabilitation); ``service'' (SNIF); ``brief assessment'' and ``covid'' (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95\% CI 0.97-0.98) and average precision of 0.81 (95\% CI 0.75-0.84) in the testing set for prediction of discharge disposition. Conclusions: A supervised learning--based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients' discharge disposition that is possible with EHR data. ", doi="10.2196/25457", url="https://medinform.jmir.org/2021/2/e25457", url="http://www.ncbi.nlm.nih.gov/pubmed/33449908" } @Article{info:doi/10.2196/25187, author="Muralitharan, Sankavi and Nelson, Walter and Di, Shuang and McGillion, Michael and Devereaux, PJ and Barr, Grant Neil and Petch, Jeremy", title="Machine Learning--Based Early Warning Systems for Clinical Deterioration: Systematic Scoping Review", journal="J Med Internet Res", year="2021", month="Feb", day="4", volume="23", number="2", pages="e25187", keywords="machine learning", keywords="early warning systems", keywords="clinical deterioration", keywords="ambulatory care", keywords="acute care", keywords="remote patient monitoring", keywords="vital signs", keywords="sepsis", keywords="cardiorespiratory instability", keywords="risk prediction", abstract="Background: Timely identification of patients at a high risk of clinical deterioration is key to prioritizing care, allocating resources effectively, and preventing adverse outcomes. Vital signs--based, aggregate-weighted early warning systems are commonly used to predict the risk of outcomes related to cardiorespiratory instability and sepsis, which are strong predictors of poor outcomes and mortality. Machine learning models, which can incorporate trends and capture relationships among parameters that aggregate-weighted models cannot, have recently been showing promising results. Objective: This study aimed to identify, summarize, and evaluate the available research, current state of utility, and challenges with machine learning--based early warning systems using vital signs to predict the risk of physiological deterioration in acutely ill patients, across acute and ambulatory care settings. Methods: PubMed, CINAHL, Cochrane Library, Web of Science, Embase, and Google Scholar were searched for peer-reviewed, original studies with keywords related to ``vital signs,'' ``clinical deterioration,'' and ``machine learning.'' Included studies used patient vital signs along with demographics and described a machine learning model for predicting an outcome in acute and ambulatory care settings. Data were extracted following PRISMA, TRIPOD, and Cochrane Collaboration guidelines. Results: We identified 24 peer-reviewed studies from 417 articles for inclusion; 23 studies were retrospective, while 1 was prospective in nature. Care settings included general wards, intensive care units, emergency departments, step-down units, medical assessment units, postanesthetic wards, and home care. Machine learning models including logistic regression, tree-based methods, kernel-based methods, and neural networks were most commonly used to predict the risk of deterioration. The area under the curve for models ranged from 0.57 to 0.97. Conclusions: In studies that compared performance, reported results suggest that machine learning--based early warning systems can achieve greater accuracy than aggregate-weighted early warning systems but several areas for further research were identified. While these models have the potential to provide clinical decision support, there is a need for standardized outcome measures to allow for rigorous evaluation of performance across models. Further research needs to address the interpretability of model outputs by clinicians, clinical efficacy of these systems through prospective study design, and their potential impact in different clinical settings. ", doi="10.2196/25187", url="https://www.jmir.org/2021/2/e25187", url="http://www.ncbi.nlm.nih.gov/pubmed/33538696" } @Article{info:doi/10.2196/23934, author="Lee, Seungwon and Doktorchik, Chelsea and Martin, Asher Elliot and D'Souza, Giles Adam and Eastwood, Cathy and Shaheen, Aziz Abdel and Naugler, Christopher and Lee, Joon and Quan, Hude", title="Electronic Medical Record--Based Case Phenotyping for the Charlson Conditions: Scoping Review", journal="JMIR Med Inform", year="2021", month="Feb", day="1", volume="9", number="2", pages="e23934", keywords="electronic medical records", keywords="Charlson comorbidity", keywords="EMR phenotyping", keywords="health services research", abstract="Background: Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. Objective: This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. Methods: A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. Results: A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5\%), followed by the United Kingdom (42/299, 14.0\%) and Canada (15/299, 5.0\%). These algorithms were mostly developed either in primary care (103/299, 34.4\%) or inpatient (168/299, 56.2\%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule--based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. Conclusions: Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed. ", doi="10.2196/23934", url="https://medinform.jmir.org/2021/2/e23934", url="http://www.ncbi.nlm.nih.gov/pubmed/33522976" } @Article{info:doi/10.2196/24973, author="Ho, Thi Thao and Park, Jongmin and Kim, Taewoo and Park, Byunggeon and Lee, Jaehee and Kim, Young Jin and Kim, Beom Ki and Choi, Sooyoung and Kim, Hwan Young and Lim, Jae-Kwang and Choi, Sanghun", title="Deep Learning Models for Predicting Severe Progression in COVID-19-Infected Patients: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Jan", day="28", volume="9", number="1", pages="e24973", keywords="COVID-19", keywords="deep learning", keywords="artificial neural network", keywords="convolutional neural network", keywords="lung CT", abstract="Background: Many COVID-19 patients rapidly progress to respiratory failure with a broad range of severities. Identification of high-risk cases is critical for early intervention. Objective: The aim of this study is to develop deep learning models that can rapidly identify high-risk COVID-19 patients based on computed tomography (CT) images and clinical data. Methods: We analyzed 297 COVID-19 patients from five hospitals in Daegu, South Korea. A mixed artificial convolutional neural network (ACNN) model, combining an artificial neural network for clinical data and a convolutional neural network for 3D CT imaging data, was developed to classify these cases as either high risk of severe progression (ie, event) or low risk (ie, event-free). Results: Using the mixed ACNN model, we were able to obtain high classification performance using novel coronavirus pneumonia lesion images (ie, 93.9\% accuracy, 80.8\% sensitivity, 96.9\% specificity, and 0.916 area under the curve [AUC] score) and lung segmentation images (ie, 94.3\% accuracy, 74.7\% sensitivity, 95.9\% specificity, and 0.928 AUC score) for event versus event-free groups. Conclusions: Our study successfully differentiated high-risk cases among COVID-19 patients using imaging and clinical features. The developed model can be used as a predictive tool for interventions in aggressive therapies. ", doi="10.2196/24973", url="http://medinform.jmir.org/2021/1/e24973/", url="http://www.ncbi.nlm.nih.gov/pubmed/33455900" } @Article{info:doi/10.2196/21712, author="Feldman, Jonah and Szerencsy, Adam and Mann, Devin and Austrian, Jonathan and Kothari, Ulka and Heo, Hye and Barzideh, Sam and Hickey, Maureen and Snapp, Catherine and Aminian, Rod and Jones, Lauren and Testa, Paul", title="Giving Your Electronic Health Record a Checkup After COVID-19: A Practical Framework for Reviewing Clinical Decision Support in Light of the Telemedicine Expansion", journal="JMIR Med Inform", year="2021", month="Jan", day="27", volume="9", number="1", pages="e21712", keywords="COVID-19", keywords="EHR", keywords="clinical decision support", keywords="telemedicine", keywords="ambulatory care", keywords="electronic health record", keywords="framework", keywords="implementation", abstract="Background: The transformation of health care during COVID-19, with the rapid expansion of telemedicine visits, presents new challenges to chronic care and preventive health providers. Clinical decision support (CDS) is critically important to chronic care providers, and CDS malfunction is common during times of change. It is essential to regularly reassess an organization's ambulatory CDS program to maintain care quality. This is especially true after an immense change, like the COVID-19 telemedicine expansion. Objective: Our objective is to reassess the ambulatory CDS program at a large academic medical center in light of telemedicine's expansion in response to the COVID-19 pandemic. Methods: Our clinical informatics team devised a practical framework for an intrapandemic ambulatory CDS assessment focused on the impact of the telemedicine expansion. This assessment began with a quantitative analysis comparing CDS alert performance in the context of in-person and telemedicine visits. Board-certified physician informaticists then completed a formal workflow review of alerts with inferior performance in telemedicine visits. Informaticists then reported on themes and optimization opportunities through the existing CDS governance structure. Results: Our assessment revealed that 10 of our top 40 alerts by volume were not firing as expected in telemedicine visits. In 3 of the top 5 alerts, providers were significantly less likely to take action in telemedicine when compared to office visits. Cumulatively, alerts in telemedicine encounters had an action taken rate of 5.3\% (3257/64,938) compared to 8.3\% (19,427/233,636) for office visits. Observations from a clinical informaticist workflow review included the following: (1) Telemedicine visits have different workflows than office visits. Some alerts developed for the office were not appearing at the optimal time in the telemedicine workflow. (2) Missing clinical data is a common reason for the decreased alert firing seen in telemedicine visits. (3) Remote patient monitoring and patient-reported clinical data entered through the portal could replace data collection usually completed in the office by a medical assistant or registered nurse. Conclusions: In a large academic medical center at the pandemic epicenter, an intrapandemic ambulatory CDS assessment revealed clinically significant CDS malfunctions that highlight the importance of reassessing ambulatory CDS performance after the telemedicine expansion. ", doi="10.2196/21712", url="http://medinform.jmir.org/2021/1/e21712/", url="http://www.ncbi.nlm.nih.gov/pubmed/33400683" } @Article{info:doi/10.2196/25113, author="Chen, Yen-Pin and Lo, Yuan-Hsun and Lai, Feipei and Huang, Chien-Hua", title="Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study", journal="J Med Internet Res", year="2021", month="Jan", day="27", volume="23", number="1", pages="e25113", keywords="electronic health record", keywords="EHR", keywords="disease embedding", keywords="disease retrieval", keywords="emergency department", keywords="concept", keywords="extraction", keywords="deep learning", keywords="machine learning", keywords="natural language processing", keywords="NLP", abstract="Background: The electronic health record (EHR) contains a wealth of medical information. An organized EHR can greatly help doctors treat patients. In some cases, only limited patient information is collected to help doctors make treatment decisions. Because EHRs can serve as a reference for this limited information, doctors' treatment capabilities can be enhanced. Natural language processing and deep learning methods can help organize and translate EHR information into medical knowledge and experience. Objective: In this study, we aimed to create a model to extract concept embeddings from EHRs for disease pattern retrieval and further classification tasks. Methods: We collected 1,040,989 emergency department visits from the National Taiwan University Hospital Integrated Medical Database and 305,897 samples from the National Hospital and Ambulatory Medical Care Survey Emergency Department data. After data cleansing and preprocessing, the data sets were divided into training, validation, and test sets. We proposed a Transformer-based model to embed EHRs and used Bidirectional Encoder Representations from Transformers (BERT) to extract features from free text and concatenate features with structural data as input to our proposed model. Then, Deep InfoMax (DIM) and Simple Contrastive Learning of Visual Representations (SimCLR) were used for the unsupervised embedding of the disease concept. The pretrained disease concept-embedding model, named EDisease, was further finetuned to adapt to the critical care outcome prediction task. We evaluated the performance of embedding using t-distributed stochastic neighbor embedding (t-SNE) to perform dimension reduction for visualization. The performance of the finetuned predictive model was evaluated against published models using the area under the receiver operating characteristic (AUROC). Results: The performance of our model on the outcome prediction had the highest AUROC of 0.876. In the ablation study, the use of a smaller data set or fewer unsupervised methods for pretraining deteriorated the prediction performance. The AUROCs were 0.857, 0.870, and 0.868 for the model without pretraining, the model pretrained by only SimCLR, and the model pretrained by only DIM, respectively. On the smaller finetuning set, the AUROC was 0.815 for the proposed model. Conclusions: Through contrastive learning methods, disease concepts can be embedded meaningfully. Moreover, these methods can be used for disease retrieval tasks to enhance clinical practice capabilities. The disease concept model is also suitable as a pretrained model for subsequent prediction tasks. ", doi="10.2196/25113", url="http://www.jmir.org/2021/1/e25113/", url="http://www.ncbi.nlm.nih.gov/pubmed/33502324" } @Article{info:doi/10.2196/22148, author="Fujihara, Kazuya and Matsubayashi, Yasuhiro and Harada Yamada, Mayuko and Yamamoto, Masahiko and Iizuka, Toshihiro and Miyamura, Kosuke and Hasegawa, Yoshinori and Maegawa, Hiroshi and Kodama, Satoru and Yamazaki, Tatsuya and Sone, Hirohito", title="Machine Learning Approach to Decision Making for Insulin Initiation in Japanese Patients With Type 2 Diabetes (JDDM 58): Model Development and Validation Study", journal="JMIR Med Inform", year="2021", month="Jan", day="27", volume="9", number="1", pages="e22148", keywords="hypoglycemic prescription", keywords="diabetes specialists", keywords="initial therapy", keywords="patterns of usage", keywords="machine learning", abstract="Background: Applications of machine learning for the early detection of diseases for which a clear-cut diagnostic gold standard exists have been evaluated. However, little is known about the usefulness of machine learning approaches in the decision-making process for decisions such as insulin initiation by diabetes specialists for which no absolute standards exist in clinical settings. Objective: The objectives of this study were to examine the ability of machine learning models to predict insulin initiation by specialists and whether the machine learning approach could support decision making by general physicians for insulin initiation in patients with type 2 diabetes. Methods: Data from patients prescribed hypoglycemic agents from December 2009 to March 2015 were extracted from diabetes specialists' registries, resulting in a sample size of 4860 patients who had received initial monotherapy with either insulin (n=293) or noninsulin (n=4567). Neural network output was insulin initiation ranging from 0 to 1 with a cutoff of >0.5 for the dichotomous classification. Accuracy, recall, and area under the receiver operating characteristic curve (AUC) were calculated to compare the ability of machine learning models to make decisions regarding insulin initiation to the decision-making ability of logistic regression and general physicians. By comparing the decision-making ability of machine learning and logistic regression to that of general physicians, 7 cases were chosen based on patient information as the gold standard based on the agreement of 8 of the 9 specialists. Results: The AUCs, accuracy, and recall of logistic regression were higher than those of machine learning (AUCs of 0.89-0.90 for logistic regression versus 0.67-0.74 for machine learning). When the examination was limited to cases receiving insulin, discrimination by machine learning was similar to that of logistic regression analysis (recall of 0.05-0.68 for logistic regression versus 0.11-0.52 for machine learning). Accuracies of logistic regression, a machine learning model (downsampling ratio of 1:8), and general physicians were 0.80, 0.70, and 0.66, respectively, for 43 randomly selected cases. For the 7 gold standard cases, the accuracies of logistic regression and the machine learning model were 1.00 and 0.86, respectively, with a downsampling ratio of 1:8, which were higher than the accuracy of general physicians (ie, 0.43). Conclusions: Although we found no superior performance of machine learning over logistic regression, machine learning had higher accuracy in prediction of insulin initiation than general physicians, defined by diabetes specialists' choice of the gold standard. Further study is needed before the use of machine learning--based decision support systems for insulin initiation can be incorporated into clinical practice. ", doi="10.2196/22148", url="http://medinform.jmir.org/2021/1/e22148/", url="http://www.ncbi.nlm.nih.gov/pubmed/33502325" } @Article{info:doi/10.2196/19739, author="Diao, Xiaolin and Huo, Yanni and Yan, Zhanzheng and Wang, Haibin and Yuan, Jing and Wang, Yuxin and Cai, Jun and Zhao, Wei", title="An Application of Machine Learning to Etiological Diagnosis of Secondary Hypertension: Retrospective Study Using Electronic Medical Records", journal="JMIR Med Inform", year="2021", month="Jan", day="25", volume="9", number="1", pages="e19739", keywords="secondary hypertension", keywords="etiological diagnosis", keywords="machine learning", keywords="prediction model", abstract="Background: Secondary hypertension is a kind of hypertension with a definite etiology and may be cured. Patients with suspected secondary hypertension can benefit from timely detection and treatment and, conversely, will have a higher risk of morbidity and mortality than those with primary hypertension. Objective: The aim of this study was to develop and validate machine learning (ML) prediction models of common etiologies in patients with suspected secondary hypertension. Methods: The analyzed data set was retrospectively extracted from electronic medical records of patients discharged from Fuwai Hospital between January 1, 2016, and June 30, 2019. A total of 7532 unique patients were included and divided into 2 data sets by time: 6302 patients in 2016-2018 as the training data set for model building and 1230 patients in 2019 as the validation data set for further evaluation. Extreme Gradient Boosting (XGBoost) was adopted to develop 5 models to predict 4 etiologies of secondary hypertension and occurrence of any of them (named as composite outcome), including renovascular hypertension (RVH), primary aldosteronism (PA), thyroid dysfunction, and aortic stenosis. Both univariate logistic analysis and Gini Impurity were used for feature selection. Grid search and 10-fold cross-validation were used to select the optimal hyperparameters for each model. Results: Validation of the composite outcome prediction model showed good performance with an area under the receiver-operating characteristic curve (AUC) of 0.924 in the validation data set, while the 4 prediction models of RVH, PA, thyroid dysfunction, and aortic stenosis achieved AUC of 0.938, 0.965, 0.959, and 0.946, respectively, in the validation data set. A total of 79 clinical indicators were identified in all and finally used in our prediction models. The result of subgroup analysis on the composite outcome prediction model demonstrated high discrimination with AUCs all higher than 0.890 among all age groups of adults. Conclusions: The ML prediction models in this study showed good performance in detecting 4 etiologies of patients with suspected secondary hypertension; thus, they may potentially facilitate clinical diagnosis decision making of secondary hypertension in an intelligent way. ", doi="10.2196/19739", url="http://medinform.jmir.org/2021/1/e19739/", url="http://www.ncbi.nlm.nih.gov/pubmed/33492233" } @Article{info:doi/10.2196/20184, author="Zolnoori, Maryam and McDonald, V. Margaret and Barr{\'o}n, Yolanda and Cato, Kenrick and Sockolow, Paulina and Sridharan, Sridevi and Onorato, Nicole and Bowles, Kathryn and Topaz, Maxim", title="Improving Patient Prioritization During Hospital-Homecare Transition: Protocol for a Mixed Methods Study of a Clinical Decision Support Tool Implementation", journal="JMIR Res Protoc", year="2021", month="Jan", day="22", volume="10", number="1", pages="e20184", keywords="clinical decision support system", keywords="homecare agencies", keywords="rehospitalization", keywords="RE-AIM framework", keywords="PREVENT", keywords="effective implementation", abstract="Background: Homecare settings across the United States provide care to more than 5 million patients every year. About one in five homecare patients are rehospitalized during the homecare episode, with up to two-thirds of these rehospitalizations occurring within the first 2 weeks of services. Timely allocation of homecare services might prevent a significant portion of these rehospitalizations. The first homecare nursing visit is one of the most critical steps of the homecare episode. This visit includes an assessment of the patient's capacity for self-care, medication reconciliation, an examination of the home environment, and a discussion regarding whether a caregiver is present. Hence, appropriate timing of the first visit is crucial, especially for patients with urgent health care needs. However, nurses often have limited and inaccurate information about incoming patients, and patient priority decisions vary significantly between nurses. We developed an innovative decision support tool called Priority for the First Nursing Visit Tool (PREVENT) to assist nurses in prioritizing patients in need of immediate first homecare nursing visits. Objective: This study aims to evaluate the effectiveness of the PREVENT tool on process and patient outcomes and to examine the reach, adoption, and implementation of PREVENT. Methods: Employing a pre-post design, survival analysis, and logistic regression with propensity score matching analysis, we will test the following hypotheses: compared with not using the tool in the preintervention phase, when homecare clinicians use the PREVENT tool, high-risk patients in the intervention phase will (1) receive more timely first homecare visits and (2) have decreased incidence of rehospitalization and have decreased emergency department use within 60 days. Reach, adoption, and implementation will be assessed using mixed methods including homecare admission staff interviews, think-aloud observations, and analysis of staffing and other relevant data. Results: The study research protocol was approved by the institutional review board in October 2019. PREVENT is currently being integrated into the electronic health records at the participating study sites. Data collection is planned to start in early 2021. Conclusions: Mixed methods will enable us to gain an in-depth understanding of the complex socio-technological aspects of the hospital to homecare transition. The results have the potential to (1) influence the standardization and individualization of nurse decision making through the use of cutting-edge technology and (2) improve patient outcomes in the understudied homecare setting. Trial Registration: ClinicalTrials.gov NCT04136951; https://clinicaltrials.gov/ct2/show/NCT04136951 International Registered Report Identifier (IRRID): PRR1-10.2196/20184 ", doi="10.2196/20184", url="https://www.researchprotocols.org/2021/1/e20184", url="http://www.ncbi.nlm.nih.gov/pubmed/33480855" } @Article{info:doi/10.2196/21804, author="Hill, Adele and Joyner, H. Christopher and Keith-Jopp, Chloe and Yet, Barbaros and Tuncer Sakar, Ceren and Marsh, William and Morrissey, Dylan", title="A Bayesian Network Decision Support Tool for Low Back Pain Using a RAND Appropriateness Procedure: Proposal and Internal Pilot Study", journal="JMIR Res Protoc", year="2021", month="Jan", day="15", volume="10", number="1", pages="e21804", keywords="back pain", keywords="decision making", keywords="Bayesian methods", keywords="consensus", abstract="Background: Low back pain (LBP) is an increasingly burdensome condition for patients and health professionals alike, with consistent demonstration of increasing persistent pain and disability. Previous decision support tools for LBP management have focused on a subset of factors owing to time constraints and ease of use for the clinician. With the explosion of interest in machine learning tools and the commitment from Western governments to introduce this technology, there are opportunities to develop intelligent decision support tools. We will do this for LBP using a Bayesian network, which will entail constructing a clinical reasoning model elicited from experts. Objective: This paper proposes a method for conducting a modified RAND appropriateness procedure to elicit the knowledge required to construct a Bayesian network from a group of domain experts in LBP, and reports the lessons learned from the internal pilot of the procedure. Methods: We propose to recruit expert clinicians with a special interest in LBP from across a range of medical specialties, such as orthopedics, rheumatology, and sports medicine. The procedure will consist of four stages. Stage 1 is an online elicitation of variables to be considered by the model, followed by a face-to-face workshop. Stage 2 is an online elicitation of the structure of the model, followed by a face-to-face workshop. Stage 3 consists of an online phase to elicit probabilities to populate the Bayesian network. Stage 4 is a rudimentary validation of the Bayesian network. Results: Ethical approval has been obtained from the Research Ethics Committee at Queen Mary University of London. An internal pilot of the procedure has been run with clinical colleagues from the research team. This showed that an alternating process of three remote activities and two in-person meetings was required to complete the elicitation without overburdening participants. Lessons learned have included the need for a bespoke online elicitation tool to run between face-to-face meetings and for careful operational definition of descriptive terms, even if widely clinically used. Further, tools are required to remotely deliver training about self-identification of various forms of cognitive bias and explain the underlying principles of a Bayesian network. The use of the internal pilot was recognized as being a methodological necessity. Conclusions: We have proposed a method to construct Bayesian networks that are representative of expert clinical reasoning for a musculoskeletal condition in this case. We have tested the method with an internal pilot to refine the process prior to deployment, which indicates the process can be successful. The internal pilot has also revealed the software support requirements for the elicitation process to model clinical reasoning for a range of conditions. International Registered Report Identifier (IRRID): DERR1-10.2196/21804 ", doi="10.2196/21804", url="http://www.researchprotocols.org/2021/1/e21804/", url="http://www.ncbi.nlm.nih.gov/pubmed/33448937" } @Article{info:doi/10.2196/23443, author="Blum, Edna and Abdelwahed, S. Youssef and Spiess, Eileen and Mueller-Werdan, Ursula and Leistner, M. David and Rosada, Adrian", title="COVID-19 \#StayAtHome Restrictions and Deep Vein Thrombosis: Case Report", journal="Interact J Med Res", year="2021", month="Jan", day="14", volume="10", number="1", pages="e23443", keywords="thrombosis", keywords="public health", keywords="social distancing", keywords="physical inactivity", keywords="pandemic management", keywords="COVID-19", keywords="case study", keywords="vein", keywords="adverse effect", keywords="physical activity", abstract="Background: The COVID-19 pandemic triggered countermeasures like \#StayAtHome initiatives, which have changed the whole world. Despite the success of such initiatives in limiting the spread of COVID-19 to \#FlattenTheCurve, physicians are now confronted with the adverse effects of the current restrictive pandemic management strategies and social distancing measures. Objective: We aim to draw attention to the particular importance and magnitude of what may be the adverse effects of COVID-19--related policies. Methods: We herein report a case of an otherwise healthy 84-year-old woman with deep vein thrombosis (DVT) due to COVID-19--related directives. \#StayAtHome policies and consequential social isolation have diminished our patient's social life and reduced her healthy movement behaviors. The patient spent long hours in a seated position while focusing on the intensive flow of media information regarding the pandemic. Results: Reduced mobility due to preventive social isolation during the COVID-19 pandemic was the only identified cause of the DVT. Conclusions: While evaluating the effect of the COVID-19 pandemic and governmentally implemented containment measures, including social isolation and mobility reduction, adverse events should be considered. Digital approaches might play a crucial role in supporting public health. ", doi="10.2196/23443", url="http://www.i-jmr.org/2021/1/e23443/", url="http://www.ncbi.nlm.nih.gov/pubmed/33400676" } @Article{info:doi/10.2196/19689, author="Liu, Honglei and Zhang, Zhiqiang and Xu, Yan and Wang, Ni and Huang, Yanqun and Yang, Zhenghan and Jiang, Rui and Chen, Hui", title="Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework", journal="J Med Internet Res", year="2021", month="Jan", day="12", volume="23", number="1", pages="e19689", keywords="BiLSTM-CRF", keywords="natural language processing", keywords="radiology reports", keywords="information extraction", keywords="computer-aided diagnosis", keywords="BERT", abstract="Background: Liver cancer is a substantial disease burden in China. As one of the primary diagnostic tools for detecting liver cancer, dynamic contrast-enhanced computed tomography provides detailed evidences for diagnosis that are recorded in free-text radiology reports. Objective: The aim of our study was to apply a deep learning model and rule-based natural language processing (NLP) method to identify evidences for liver cancer diagnosis automatically. Methods: We proposed a pretrained, fine-tuned BERT (Bidirectional Encoder Representations from Transformers)-based BiLSTM-CRF (Bidirectional Long Short-Term Memory-Conditional Random Field) model to recognize the phrases of APHE (hyperintense enhancement in the arterial phase) and PDPH (hypointense in the portal and delayed phases). To identify more essential diagnostic evidences, we used the traditional rule-based NLP methods for the extraction of radiological features. APHE, PDPH, and other extracted radiological features were used to design a computer-aided liver cancer diagnosis framework by random forest. Results: The BERT-BiLSTM-CRF predicted the phrases of APHE and PDPH with an F1 score of 98.40\% and 90.67\%, respectively. The prediction model using combined features had a higher performance (F1 score, 88.55\%) than those using APHE and PDPH (84.88\%) or other extracted radiological features (83.52\%). APHE and PDPH were the top 2 essential features for liver cancer diagnosis. Conclusions: This work was a comprehensive NLP study, wherein we identified evidences for the diagnosis of liver cancer from Chinese radiology reports, considering both clinical knowledge and radiology findings. The BERT-based deep learning method for the extraction of diagnostic evidence achieved state-of-the-art performance. The high performance proves the feasibility of the BERT-BiLSTM-CRF model in information extraction from Chinese radiology reports. The findings of our study suggest that the deep learning--based method for automatically identifying evidences for diagnosis can be extended to other types of Chinese clinical texts. ", doi="10.2196/19689", url="http://www.jmir.org/2021/1/e19689/", url="http://www.ncbi.nlm.nih.gov/pubmed/33433395" } @Article{info:doi/10.2196/18872, author="Marcolino, Soriano Milena and Oliveira, Queiroz Jo{\~a}o Antonio and Cimini, Rodrigues Christiane Corr{\^e}a and Maia, Xavier Junia and Pinto, Almeida V{\^a}nia Soares Oliveira and S{\'a}, Vivas Th{\'a}bata Queiroz and Amancio, Kaique and Coelho, Lissandra and Ribeiro, Bonisson Leonardo and Cardoso, Silva Clareci and Ribeiro, Luiz Antonio", title="Development and Implementation of a Decision Support System to Improve Control of Hypertension and Diabetes in a Resource-Constrained Area in Brazil: Mixed Methods Study", journal="J Med Internet Res", year="2021", month="Jan", day="11", volume="23", number="1", pages="e18872", keywords="clinical decision support systems", keywords="primary health care", keywords="hypertension", keywords="diabetes mellitus", keywords="evidence-based practice", keywords="telemedicine", keywords="patient care management", abstract="Background: The low levels of control of hypertension and diabetes mellitus are a challenge that requires innovative strategies to surpass barriers of low sources, distance, and quality of health care. Objective: The aim of this study is to develop a clinical decision support system (CDSS) for diabetes and hypertension management in primary care, to implement it in a resource-constrained region, and to evaluate its usability and health care practitioner satisfaction. Methods: This mixed methods study is a substudy of HealthRise Brazil Project, a multinational study designed to implement pilot programs to improve screening, diagnosis, management, and control of hypertension and diabetes among underserved communities. Following the identification of gaps in usual care, a team of clinicians established the software functional requirements. Recommendations from evidence-based guidelines were reviewed and organized into a decision algorithm, which bases the CDSS reminders and suggestions. Following pretesting and expert panel assessment, pilot testing was conducted in a quasi-experimental study, which included 34 primary care units of 10 municipalities in a resource-constrained area in Brazil. A Likert-scale questionnaire evaluating perceived feasibility, usability, and utility of the application and professionals' satisfaction was applied after 6 months. In the end-line assessment, 2 focus groups with primary care physicians and nurses were performed. Results: A total of 159 reminders and suggestions were created and implemented for the CDSS. At the 6-month assessment, there were 1939 patients registered in the application database and 2160 consultations were performed by primary care teams. Of the 96 health care professionals who were invited for the usability assessment, 26\% (25/96) were physicians, 46\% (44/96) were nurses, and 28\% (27/96) were other health professionals. The questionnaire included 24 items on impressions of feasibility, usability, utility, and satisfaction, and presented global Cronbach $\alpha$ of .93. As for feasibility, all professionals agreed (median scores of 4 or 5) that the application could be used in primary care settings and it could be easily incorporated in work routines, but physicians claimed that the application might have caused significant delays in daily routines. As for usability, overall evaluation was good and it was claimed that the application was easy to understand and use. All professionals agreed that the application was useful (score 4 or 5) to promote prevention, assist treatment, and might improve patient care, and they were overall satisfied with the application (median scores between 4 and 5). In the end-line assessment, there were 4211 patients (94.82\% [3993/4211] with hypertension and 24.41\% [1028/4211] with diabetes) registered in the application's database and 7960 consultations were performed by primary health care teams. The 17 participants of the focus groups were consistent to affirm they were very satisfied with the CDSS. Conclusions: The CDSS was applicable in the context of primary health care settings in low-income regions, with good user satisfaction and potential to improve adherence to evidence-based practices. ", doi="10.2196/18872", url="http://www.jmir.org/2021/1/e18872/", url="http://www.ncbi.nlm.nih.gov/pubmed/33427686" } @Article{info:doi/10.2196/25442, author="Ko, Hoon and Chung, Heewon and Kang, Seong Wu and Park, Chul and Kim, Wan Do and Kim, Eun Seong and Chung, Ryang Chi and Ko, Eun Ryoung and Lee, Hooseok and Seo, Ho Jae and Choi, Tae-Young and Jaimes, Rafael and Kim, Won Kyung and Lee, Jinseok", title="An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model", journal="J Med Internet Res", year="2020", month="Dec", day="23", volume="22", number="12", pages="e25442", keywords="COVID-19", keywords="artificial intelligence", keywords="blood samples", keywords="mortality prediction", abstract="Background: COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. Objective: To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. Methods: We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. Results: In the testing data sets, EDRnet provided high sensitivity (100\%), specificity (91\%), and accuracy (92\%). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. Conclusions: Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients' outcomes. ", doi="10.2196/25442", url="http://www.jmir.org/2020/12/e25442/", url="http://www.ncbi.nlm.nih.gov/pubmed/33301414" } @Article{info:doi/10.2196/24478, author="D'Ambrosia, Christopher and Christensen, Henrik and Aronoff-Spencer, Eliah", title="Computing SARS-CoV-2 Infection Risk From Symptoms, Imaging, and Test Data: Diagnostic Model Development", journal="J Med Internet Res", year="2020", month="Dec", day="16", volume="22", number="12", pages="e24478", keywords="health", keywords="informatics", keywords="computation", keywords="COVID-19", keywords="infection", keywords="risk", keywords="symptom", keywords="imaging", keywords="diagnostic", keywords="probability", keywords="machine learning", keywords="Bayesian", keywords="model", abstract="Background: Assigning meaningful probabilities of SARS-CoV-2 infection risk presents a diagnostic challenge across the continuum of care. Objective: The aim of this study was to develop and clinically validate an adaptable, personalized diagnostic model to assist clinicians in ruling in and ruling out COVID-19 in potential patients. We compared the diagnostic performance of probabilistic, graphical, and machine learning models against a previously published benchmark model. Methods: We integrated patient symptoms and test data using machine learning and Bayesian inference to quantify individual patient risk of SARS-CoV-2 infection. We trained models with 100,000 simulated patient profiles based on 13 symptoms and estimated local prevalence, imaging, and molecular diagnostic performance from published reports. We tested these models with consecutive patients who presented with a COVID-19--compatible illness at the University of California San Diego Medical Center over the course of 14 days starting in March 2020. Results: We included 55 consecutive patients with fever (n=43, 78\%) or cough (n=42, 77\%) presenting for ambulatory (n=11, 20\%) or hospital care (n=44, 80\%). In total, 51\% (n=28) were female and 49\% (n=27) were aged <60 years. Common comorbidities included diabetes (n=12, 22\%), hypertension (n=15, 27\%), cancer (n=9, 16\%), and cardiovascular disease (n=7, 13\%). Of these, 69\% (n=38) were confirmed via reverse transcription-polymerase chain reaction (RT-PCR) to be positive for SARS-CoV-2 infection, and 20\% (n=11) had repeated negative nucleic acid testing and an alternate diagnosis. Bayesian inference network, distance metric learning, and ensemble models discriminated between patients with SARS-CoV-2 infection and alternate diagnoses with sensitivities of 81.6\%-84.2\%, specificities of 58.8\%-70.6\%, and accuracies of 61.4\%-71.8\%. After integrating imaging and laboratory test statistics with the predictions of the Bayesian inference network, changes in diagnostic uncertainty at each step in the simulated clinical evaluation process were highly sensitive to location, symptom, and diagnostic test choices. Conclusions: Decision support models that incorporate symptoms and available test results can help providers diagnose SARS-CoV-2 infection in real-world settings. ", doi="10.2196/24478", url="http://www.jmir.org/2020/12/e24478/", url="http://www.ncbi.nlm.nih.gov/pubmed/33301417" } @Article{info:doi/10.2196/23530, author="Wolfe, Christopher and Pestian, Teresa and Gecili, Emrah and Su, Weiji and Keogh, H. Ruth and Pestian, P. John and Seid, Michael and Diggle, J. Peter and Ziady, Assem and Clancy, Paul John and Grossoehme, H. Daniel and Szczesniak, D. Rhonda and Brokamp, Cole", title="Cystic Fibrosis Point of Personalized Detection (CFPOPD): An Interactive Web Application", journal="JMIR Med Inform", year="2020", month="Dec", day="16", volume="8", number="12", pages="e23530", keywords="application programming interface", keywords="chronic disease", keywords="clinical decision rules", keywords="clinical decision support", keywords="medical monitoring", abstract="Background: Despite steady gains in life expectancy, individuals with cystic fibrosis (CF) lung disease still experience rapid pulmonary decline throughout their clinical course, which can ultimately end in respiratory failure. Point-of-care tools for accurate and timely information regarding the risk of rapid decline is essential for clinical decision support. Objective: This study aims to translate a novel algorithm for earlier, more accurate prediction of rapid lung function decline in patients with CF into an interactive web-based application that can be integrated within electronic health record systems, via collaborative development with clinicians. Methods: Longitudinal clinical history, lung function measurements, and time-invariant characteristics were obtained for 30,879 patients with CF who were followed in the US Cystic Fibrosis Foundation Patient Registry (2003-2015). We iteratively developed the application using the R Shiny framework and by conducting a qualitative study with care provider focus groups (N=17). Results: A clinical conceptual model and 4 themes were identified through coded feedback from application users: (1) ambiguity in rapid decline, (2) clinical utility, (3) clinical significance, and (4) specific suggested revisions. These themes were used to revise our application to the currently released version, available online for exploration. This study has advanced the application's potential prognostic utility for monitoring individuals with CF lung disease. Further application development will incorporate additional clinical characteristics requested by the users and also a more modular layout that can be useful for care provider and family interactions. Conclusions: Our framework for creating an interactive and visual analytics platform enables generalized development of applications to synthesize, model, and translate electronic health data, thereby enhancing clinical decision support and improving care and health outcomes for chronic diseases and disorders. A prospective implementation study is necessary to evaluate this tool's effectiveness regarding increased communication, enhanced shared decision-making, and improved clinical outcomes for patients with CF. ", doi="10.2196/23530", url="https://medinform.jmir.org/2020/12/e23530", url="http://www.ncbi.nlm.nih.gov/pubmed/33325834" } @Article{info:doi/10.2196/18097, author="{\'C}irkovi{\'c}, Aleksandar", title="Evaluation of Four Artificial Intelligence--Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study", journal="J Med Internet Res", year="2020", month="Dec", day="4", volume="22", number="12", pages="e18097", keywords="artificial intelligence", keywords="machine learning", keywords="mobile apps", keywords="medical diagnosis", keywords="mHealth", abstract="Background: Consumer-oriented mobile self-diagnosis apps have been developed using undisclosed algorithms, presumably based on machine learning and other artificial intelligence (AI) technologies. The US Food and Drug Administration now discerns apps with learning AI algorithms from those with stable ones and treats the former as medical devices. To the author's knowledge, no self-diagnosis app testing has been performed in the field of ophthalmology so far. Objective: The objective of this study was to test apps that were previously mentioned in the scientific literature on a set of diagnoses in a deliberate time interval, comparing the results and looking for differences that hint at ``nonlocked'' learning algorithms. Methods: Four apps from the literature were chosen (Ada, Babylon, Buoy, and Your.MD). A set of three ophthalmology diagnoses (glaucoma, retinal tear, dry eye syndrome) representing three levels of urgency was used to simultaneously test the apps' diagnostic efficiency and treatment recommendations in this specialty. Two years was the chosen time interval between the tests (2018 and 2020). Scores were awarded by one evaluating physician using a defined scheme. Results: Two apps (Ada and Your.MD) received significantly higher scores than the other two. All apps either worsened in their results between 2018 and 2020 or remained unchanged at a low level. The variation in the results over time indicates ``nonlocked'' learning algorithms using AI technologies. None of the apps provided correct diagnoses and treatment recommendations for all three diagnoses in 2020. Two apps (Babylon and Your.MD) asked significantly fewer questions than the other two (P<.001). Conclusions: ``Nonlocked'' algorithms are used by self-diagnosis apps. The diagnostic efficiency of the tested apps seems to worsen over time, with some apps being more capable than others. Systematic studies on a wider scale are necessary for health care providers and patients to correctly assess the safety and efficacy of such apps and for correct classification by health care regulating authorities. ", doi="10.2196/18097", url="https://www.jmir.org/2020/12/e18097", url="http://www.ncbi.nlm.nih.gov/pubmed/33275113" } @Article{info:doi/10.2196/18352, author="Sakib, Nazmus and Ahamed, Iqbal Sheikh and Khan, Ahmed Rumi and Griffin, M. Paul and Haque, Munirul Md", title="Unpacking Prevalence and Dichotomy in Quick Sequential Organ Failure Assessment and Systemic Inflammatory Response Syndrome Parameters: Observational Data--Driven Approach Backed by Sepsis Pathophysiology", journal="JMIR Med Inform", year="2020", month="Dec", day="3", volume="8", number="12", pages="e18352", keywords="sepsis", keywords="MIMIC-III", keywords="SIRS", keywords="qSOFA", keywords="pathophysiology", keywords="medical internet research", keywords="medical informatics", keywords="critical care", keywords="intensive care unit", keywords="multicollinearity", abstract="Background: Considering morbidity, mortality, and annual treatment costs, the dramatic rise in the incidence of sepsis and septic shock among intensive care unit (ICU) admissions in US hospitals is an increasing concern. Recent changes in the sepsis definition (sepsis-3), based on the quick Sequential Organ Failure Assessment (qSOFA), have motivated the international medical informatics research community to investigate score recalculation and information retrieval, and to study the intersection between sepsis-3 and the previous definition (sepsis-2) based on systemic inflammatory response syndrome (SIRS) parameters. Objective: The objective of this study was three-fold. First, we aimed to unpack the most prevalent criterion for sepsis (for both sepsis-3 and sepsis-2 predictors). Second, we intended to determine the most prevalent sepsis scenario in the ICU among 4 possible scenarios for qSOFA and 11 possible scenarios for SIRS. Third, we investigated the multicollinearity or dichotomy among qSOFA and SIRS predictors. Methods: This observational study was conducted according to the most recent update of Medical Information Mart for Intensive Care (MIMIC-III, Version 1.4), the critical care database developed by MIT. The qSOFA (sepsis-3) and SIRS (sepsis-2) parameters were analyzed for patients admitted to critical care units from 2001 to 2012 in Beth Israel Deaconess Medical Center (Boston, MA, USA) to determine the prevalence and underlying relation between these parameters among patients undergoing sepsis screening. We adopted a multiblind Delphi method to seek a rationale for decisions in several stages of the research design regarding handling missing data and outlier values, statistical imputations and biases, and generalizability of the study. Results: Altered mental status in the Glasgow Coma Scale (59.28\%, 38,854/65,545 observations) was the most prevalent sepsis-3 (qSOFA) criterion and the white blood cell count (53.12\%, 17,163/32,311 observations) was the most prevalent sepsis-2 (SIRS) criterion confronted in the ICU. In addition, the two-factored sepsis criterion of high respiratory rate (?22 breaths/minute) and altered mental status (28.19\%, among four possible qSOFA scenarios besides no sepsis) was the most prevalent sepsis-3 (qSOFA) scenario, and the three-factored sepsis criterion of tachypnea, high heart rate, and high white blood cell count (12.32\%, among 11 possible scenarios besides no sepsis) was the most prevalent sepsis-2 (SIRS) scenario in the ICU. Moreover, the absolute Pearson correlation coefficients were not significant, thereby nullifying the likelihood of any linear correlation among the critical parameters and assuring the lack of multicollinearity between the parameters. Although this further bolsters evidence for their dichotomy, the absence of multicollinearity cannot guarantee that two random variables are statistically independent. Conclusions: Quantifying the prevalence of the qSOFA criteria of sepsis-3 in comparison with the SIRS criteria of sepsis-2, and understanding the underlying dichotomy among these parameters provides significant inferences for sepsis treatment initiatives in the ICU and informing hospital resource allocation. These data-driven results further offer design implications for multiparameter intelligent sepsis prediction in the ICU. ", doi="10.2196/18352", url="https://medinform.jmir.org/2020/12/e18352", url="http://www.ncbi.nlm.nih.gov/pubmed/33270030" } @Article{info:doi/10.2196/24048, author="Plante, B. Timothy and Blau, M. Aaron and Berg, N. Adrian and Weinberg, S. Aaron and Jun, C. Ik and Tapson, F. Victor and Kanigan, S. Tanya and Adib, B. Artur", title="Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study", journal="J Med Internet Res", year="2020", month="Dec", day="2", volume="22", number="12", pages="e24048", keywords="COVID-19", keywords="SARS-CoV-2", keywords="machine learning", keywords="artificial intelligence", keywords="electronic medical records", keywords="laboratory results", keywords="development", keywords="validation", keywords="testing", keywords="model", keywords="emergency department", abstract="Background: Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. Objective: We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. Methods: Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ?20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). Results: Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5\% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95\% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9\% and specificity of 41.7\%; with a cutoff of 2.0, sensitivity was 92.6\% and specificity was 59.9\%. At the cutoff of 2.0, the NPVs at a prevalence of 1\%, 10\%, and 20\% were 99.9\%, 98.6\%, and 97\%, respectively. Conclusions: A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing. ", doi="10.2196/24048", url="https://www.jmir.org/2020/12/e24048", url="http://www.ncbi.nlm.nih.gov/pubmed/33226957" } @Article{info:doi/10.2196/17487, author="Jhang, Jing-Yi and Tzeng, I-Shiang and Chou, Hsin-Hua and Jang, Shih-Jung and Hsieh, Chien-An and Ko, Yu-Lin and Huang, Hsuan-Li", title="Association Rule Mining and Prognostic Stratification of 2-Year Longevity in Octogenarians Undergoing Endovascular Therapy for Lower Extremity Arterial Disease: Observational Cohort Study", journal="J Med Internet Res", year="2020", month="Dec", day="1", volume="22", number="12", pages="e17487", keywords="endovascular therapy", keywords="lower extremity arterial disease", keywords="octogenarians", keywords="longevity", keywords="association rules", keywords="older people", keywords="prognosis", keywords="risk", keywords="medical informatics", keywords="clinical informatics", abstract="Background: Two-year longevity is a crucial consideration in revascularization strategies for patients with symptomatic lower extremity arterial disease (LEAD). However, factors associated with 2-year longevity and risk stratification in octogenarians or nonagenarians have been underreported. Objective: This paper aims to investigate the associated variables and stratify the 2-year prognosis in older patients with LEAD. Methods: We performed logistic regression and association rule mining based on the Apriori algorithm to discover independent variables and validate their associations with 2-year longevity. Malnutrition, inflammation, and stroke factors were identified. C statistics and Kaplan-Meier analysis were used to assess the impact of different numbers of malnutrition, inflammation, and stroke factors on 2-year longevity. Results: We recruited a total of 232 octogenarians or nonagenarians (mean age 85 years, SD 4.2 years) treated with endovascular therapy. During the study period, 81 patients died, and 27 of those (33\%) died from a cardiac origin within 2 years. Association rules analysis showed the interrelationships between 2-year longevity and the neutrophil-lymphocyte ratio (NLR) and nutritional status as determined by the Controlling Nutritional Status (CONUT) score or Geriatric Nutritional Risk Index (GNRI). The cut-off values of NLR, GNRI, and CONUT were ?3.89, ?90.3, and >3, respectively. The C statistics for the predictive power for 2-year longevity were similar between the CONUT score and the GNRI-based models (0.773 vs 0.760; P=.57). The Kaplan-Meier analysis showed that 2-year longevity was worse as the number of malnutrition, inflammation, and stroke factors increased from 0 to 3 in both the GNRI-based model (92\% vs 68\% vs 46\% vs 12\%, respectively; P<.001) and the CONUT score model (87\% vs 75\% vs 49\% vs 10\%, respectively; P<.001). The hazard ratio between those with 3 factors and those without was 18.2 (95\% CI 7.0-47.2; P<.001) in the GNRI and 13.6 (95\% CI 5.9-31.5; P<.001) in the CONUT score model. Conclusions: This study demonstrated the association and crucial role of malnutrition, inflammation, and stroke factors in assessing 2-year longevity in older patients with LEAD. Using this simple risk score might assist clinicians in selecting the appropriate treatment. ", doi="10.2196/17487", url="https://www.jmir.org/2020/12/e17487", url="http://www.ncbi.nlm.nih.gov/pubmed/33177036" } @Article{info:doi/10.2196/24375, author="Jiang, Huizhen and Li, Yuanjie and Zeng, Xuejun and Xu, Na and Zhao, Congpu and Zhang, Jing and Zhu, Weiguo", title="Exploring Fever of Unknown Origin Intelligent Diagnosis Based on Clinical Data: Model Development and Validation", journal="JMIR Med Inform", year="2020", month="Nov", day="30", volume="8", number="11", pages="e24375", keywords="fever of unknown origin", keywords="intelligent diagnosis", keywords="machine learning", keywords="BERT", keywords="fever", keywords="misdiagnosis", abstract="Background: Fever of unknown origin (FUO) is a group of diseases with heterogeneous complex causes that are misdiagnosed or have delayed diagnoses. Previous studies have focused mainly on the statistical analysis and research of the cases. The treatments are very different for the different categories of FUO. Therefore, how to intelligently diagnose FUO into one category is worth studying. Objective: We aimed to fuse all of the medical data together to automatically predict the categories of the causes of FUO among patients using a machine learning method, which could help doctors diagnose FUO more accurately. Methods: In this paper, we innovatively and manually built the FUO intelligent diagnosis (FID) model to help clinicians predict the category of the cause and improve the manual diagnostic precision. First, we classified FUO cases into four categories (infections, immune diseases, tumors, and others) according to the large numbers of different causes and treatment methods. Then, we cleaned the basic information data and clinical laboratory results and structured the electronic medical record (EMR) data using the bidirectional encoder representations from transformers (BERT) model. Next, we extracted the features based on the structured sample data and trained the FID model using LightGBM. Results: Experiments were based on data from 2299 desensitized cases from Peking Union Medical College Hospital. From the extensive experiments, the precision of the FID model was 81.68\% for top 1 classification diagnosis and 96.17\% for top 2 classification diagnosis, which were superior to the precision of the comparative method. Conclusions: The FID model showed excellent performance in FUO diagnosis and thus would be a potentially useful tool for clinicians to enhance the precision of FUO diagnosis and reduce the rate of misdiagnosis. ", doi="10.2196/24375", url="http://medinform.jmir.org/2020/11/e24375/", url="http://www.ncbi.nlm.nih.gov/pubmed/33172835" } @Article{info:doi/10.2196/23930, author="Maarseveen, D. Tjardo and Meinderink, Timo and Reinders, T. Marcel J. and Knitza, Johannes and Huizinga, J. Tom W. and Kleyer, Arnd and Simon, David and van den Akker, B. Erik and Knevel, Rachel", title="Machine Learning Electronic Health Record Identification of Patients with Rheumatoid Arthritis: Algorithm Pipeline Development and Validation Study", journal="JMIR Med Inform", year="2020", month="Nov", day="30", volume="8", number="11", pages="e23930", keywords="Supervised machine learning", keywords="Electronic Health Records", keywords="Natural Language Processing", keywords="Support Vector Machine", keywords="Gradient Boosting", keywords="Rheumatoid Arthritis", abstract="Background: Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. Objective: The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. Methods: Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a na{\"i}ve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. Results: For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). Conclusions: We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems. ", doi="10.2196/23930", url="http://medinform.jmir.org/2020/11/e23930/", url="http://www.ncbi.nlm.nih.gov/pubmed/33252349" } @Article{info:doi/10.2196/18559, author="Getahun, Darios and Shi, M. Jiaxiao and Chandra, Malini and Fassett, J. Michael and Alexeeff, Stacey and Im, M. Theresa and Chiu, Y. Vicki and Armstrong, Anne Mary and Xie, Fagen and Stern, Julie and Takhar, S. Harpreet and Asiimwe, Alex and Raine-Bennett, Tina", title="Identifying Ectopic Pregnancy in a Large Integrated Health Care Delivery System: Algorithm Validation", journal="JMIR Med Inform", year="2020", month="Nov", day="30", volume="8", number="11", pages="e18559", keywords="ectopic pregnancy", keywords="pregnancy", keywords="validation", keywords="predictive value", keywords="electronic health records", keywords="electronic database", abstract="Background: Surveillance of ectopic pregnancy (EP) using electronic databases is important. To our knowledge, no published study has assessed the validity of EP case ascertainment using electronic health records. Objective: We aimed to assess the validity of an enhanced version of a previously validated algorithm, which used a combination of encounters with EP-related diagnostic/procedure codes and methotrexate injections. Methods: Medical records of 500 women aged 15-44 years with membership at Kaiser Permanente Southern and Northern California between 2009 and 2018 and a potential EP were randomly selected for chart review, and true cases were identified. The enhanced algorithm included diagnostic/procedure codes from the International Classification of Diseases, Tenth Revision, used telephone appointment visits, and excluded cases with only abdominal EP diagnosis codes. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall performance (Youden index and F-score) of the algorithm were evaluated and compared to the validated algorithm. Results: There were 334 true positive and 166 true negative EP cases with available records. True positive and true negative EP cases did not differ significantly according to maternal age, race/ethnicity, and smoking status. EP cases with only one encounter and non-tubal EPs were more likely to be misclassified. The sensitivity, specificity, PPV, and NPV of the enhanced algorithm for EP were 97.6\%, 84.9\%, 92.9\%, and 94.6\%, respectively. The Youden index and F-score were 82.5\% and 95.2\%, respectively. The sensitivity and NPV were lower for the previously published algorithm at 94.3\% and 88.1\%, respectively. The sensitivity of surgical procedure codes from electronic chart abstraction to correctly identify surgical management was 91.9\%. The overall accuracy, defined as the percentage of EP cases with correct management (surgical, medical, and unclassified) identified by electronic chart abstraction, was 92.3\%. Conclusions: The performance of the enhanced algorithm for EP case ascertainment in integrated health care databases is adequate to allow for use in future epidemiological studies. Use of this algorithm will likely result in better capture of true EP cases than the previously validated algorithm. ", doi="10.2196/18559", url="http://medinform.jmir.org/2020/11/e18559/", url="http://www.ncbi.nlm.nih.gov/pubmed/33141678" } @Article{info:doi/10.2196/19761, author="Mohammadi, Ramin and Jain, Sarthak and Namin, T. Amir and Scholem Heller, Melissa and Palacholla, Ramya and Kamarthi, Sagar and Wallace, Byron", title="Predicting Unplanned Readmissions Following a Hip or Knee Arthroplasty: Retrospective Observational Study", journal="JMIR Med Inform", year="2020", month="Nov", day="27", volume="8", number="11", pages="e19761", keywords="deep learning", keywords="natural language processing", keywords="electronic health records", keywords="auto ML", keywords="30-days readmission", keywords="hip arthroplasty", keywords="knee arthroplasty", abstract="Background: Total joint replacements are high-volume and high-cost procedures that should be monitored for cost and quality control. Models that can identify patients at high risk of readmission might help reduce costs by suggesting who should be enrolled in preventive care programs. Previous models for risk prediction have relied on structured data of patients rather than clinical notes in electronic health records (EHRs). The former approach requires manual feature extraction by domain experts, which may limit the applicability of these models. Objective: This study aims to develop and evaluate a machine learning model for predicting the risk of 30-day readmission following knee and hip arthroplasty procedures. The input data for these models come from raw EHRs. We empirically demonstrate that unstructured free-text notes contain a reasonably predictive signal for this task. Methods: We performed a retrospective analysis of data from 7174 patients at Partners Healthcare collected between 2006 and 2016. These data were split into train, validation, and test sets. These data sets were used to build, validate, and test models to predict unplanned readmission within 30 days of hospital discharge. The proposed models made predictions on the basis of clinical notes, obviating the need for performing manual feature extraction by domain and machine learning experts. The notes that served as model inputs were written by physicians, nurses, pathologists, and others who diagnose and treat patients and may have their own predictions, even if these are not recorded. Results: The proposed models output readmission risk scores (propensities) for each patient. The best models (as selected on a development set) yielded an area under the receiver operating characteristic curve of 0.846 (95\% CI 82.75-87.11) for hip and 0.822 (95\% CI 80.94-86.22) for knee surgery, indicating reasonable discriminative ability. Conclusions: Machine learning models can predict which patients are at a high risk of readmission within 30 days following hip and knee arthroplasty procedures on the basis of notes in EHRs with reasonable discriminative power. Following further validation and empirical demonstration that the models realize predictive performance above that which clinical judgment may provide, such models may be used to build an automated decision support tool to help caretakers identify at-risk patients. ", doi="10.2196/19761", url="https://medinform.jmir.org/2020/11/e19761", url="http://www.ncbi.nlm.nih.gov/pubmed/33245283" } @Article{info:doi/10.2196/17964, author="Cox, Steven and Ahalt, C. Stanley and Balhoff, James and Bizon, Chris and Fecho, Karamarie and Kebede, Yaphet and Morton, Kenneth and Tropsha, Alexander and Wang, Patrick and Xu, Hao", title="Visualization Environment for Federated Knowledge Graphs: Development of an Interactive Biomedical Query Language and Web Application Interface", journal="JMIR Med Inform", year="2020", month="Nov", day="23", volume="8", number="11", pages="e17964", keywords="knowledge graphs", keywords="clinical data", keywords="biomedical data", keywords="federation", keywords="ontologies", keywords="semantic harmonization", keywords="visualization", keywords="application programming interface", keywords="translational science", keywords="clinical practice", abstract="Background: Efforts are underway to semantically integrate large biomedical knowledge graphs using common upper-level ontologies to federate graph-oriented application programming interfaces (APIs) to the data. However, federation poses several challenges, including query routing to appropriate knowledge sources, generation and evaluation of answer subsets, semantic merger of those answer subsets, and visualization and exploration of results. Objective: We aimed to develop an interactive environment for query, visualization, and deep exploration of federated knowledge graphs. Methods: We developed a biomedical query language and web application interphase---termed as Translator Query Language (TranQL)---to query semantically federated knowledge graphs and explore query results. TranQL uses the Biolink data model as an upper-level biomedical ontology and an API standard that has been adopted by the Biomedical Data Translator Consortium to specify a protocol for expressing a query as a graph of Biolink data elements compiled from statements in the TranQL query language. Queries are mapped to federated knowledge sources, and answers are merged into a knowledge graph, with mappings between the knowledge graph and specific elements of the query. The TranQL interactive web application includes a user interface to support user exploration of the federated knowledge graph. Results: We developed 2 real-world use cases to validate TranQL and address biomedical questions of relevance to translational science. The use cases posed questions that traversed 2 federated Translator API endpoints: Integrated Clinical and Environmental Exposures Service (ICEES) and Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP). ICEES provides open access to observational clinical and environmental data, and ROBOKOP provides access to linked biomedical entities, such as ``gene,'' ``chemical substance,'' and ``disease,'' that are derived largely from curated public data sources. We successfully posed queries to TranQL that traversed these endpoints and retrieved answers that we visualized and evaluated. Conclusions: TranQL can be used to ask questions of relevance to translational science, rapidly obtain answers that require assertions from a federation of knowledge sources, and provide valuable insights for translational research and clinical practice. ", doi="10.2196/17964", url="http://medinform.jmir.org/2020/11/e17964/", url="http://www.ncbi.nlm.nih.gov/pubmed/33226347" } @Article{info:doi/10.2196/19679, author="Jeong, Seung-Hyun and Lee, Rim Tae and Kang, Bae Jung and Choi, Mun-Taek", title="Analysis of Health Insurance Big Data for Early Detection of Disabilities: Algorithm Development and Validation", journal="JMIR Med Inform", year="2020", month="Nov", day="23", volume="8", number="11", pages="e19679", keywords="early detection of disabilities", keywords="health insurance", keywords="big data", keywords="feature selection", keywords="classification", abstract="Background: Early detection of childhood developmental delays is very important for the treatment of disabilities. Objective: To investigate the possibility of detecting childhood developmental delays leading to disabilities before clinical registration by analyzing big data from a health insurance database. Methods: In this study, the data from children, individuals aged up to 13 years (n=2412), from the Sample Cohort 2.0 DB of the Korea National Health Insurance Service were organized by age range. Using 6 categories (having no disability, having a physical disability, having a brain lesion, having a visual impairment, having a hearing impairment, and having other conditions), features were selected in the order of importance with a tree-based model. We used multiple classification algorithms to find the best model for each age range. The earliest age range with clinically significant performance showed the age at which conditions can be detected early. Results: The disability detection model showed that it was possible to detect disabilities with significant accuracy even at the age of 4 years, about a year earlier than the mean diagnostic age of 4.99 years. Conclusions: Using big data analysis, we discovered the possibility of detecting disabilities earlier than clinical diagnoses, which would allow us to take appropriate action to prevent disabilities. ", doi="10.2196/19679", url="http://medinform.jmir.org/2020/11/e19679/", url="http://www.ncbi.nlm.nih.gov/pubmed/33226352" } @Article{info:doi/10.2196/22421, author="Sandhu, Sahil and Lin, L. Anthony and Brajer, Nathan and Sperling, Jessica and Ratliff, William and Bedoya, D. Armando and Balu, Suresh and O'Brien, Cara and Sendak, P. Mark", title="Integrating a Machine Learning System Into Clinical Workflows: Qualitative Study", journal="J Med Internet Res", year="2020", month="Nov", day="19", volume="22", number="11", pages="e22421", keywords="machine learning", keywords="sepsis", keywords="qualitative research", keywords="hospital rapid response team", keywords="emergency medicine", abstract="Background: Machine learning models have the potential to improve diagnostic accuracy and management of acute conditions. Despite growing efforts to evaluate and validate such models, little is known about how to best translate and implement these products as part of routine clinical care. Objective: This study aims to explore the factors influencing the integration of a machine learning sepsis early warning system (Sepsis Watch) into clinical workflows. Methods: We conducted semistructured interviews with 15 frontline emergency department physicians and rapid response team nurses who participated in the Sepsis Watch quality improvement initiative. Interviews were audio recorded and transcribed. We used a modified grounded theory approach to identify key themes and analyze qualitative data. Results: A total of 3 dominant themes emerged: perceived utility and trust, implementation of Sepsis Watch processes, and workforce considerations. Participants described their unfamiliarity with machine learning models. As a result, clinician trust was influenced by the perceived accuracy and utility of the model from personal program experience. Implementation of Sepsis Watch was facilitated by the easy-to-use tablet application and communication strategies that were developed by nurses to share model outputs with physicians. Barriers included the flow of information among clinicians and gaps in knowledge about the model itself and broader workflow processes. Conclusions: This study generated insights into how frontline clinicians perceived machine learning models and the barriers to integrating them into clinical workflows. These findings can inform future efforts to implement machine learning interventions in real-world settings and maximize the adoption of these interventions. ", doi="10.2196/22421", url="http://www.jmir.org/2020/11/e22421/", url="http://www.ncbi.nlm.nih.gov/pubmed/33211015" } @Article{info:doi/10.2196/19489, author="Poly, Nasrin Tahmina and Islam, Md.Mohaimenul and Muhtar, Solihuddin Muhammad and Yang, Hsuan-Chia and Nguyen, (Alex) Phung Anh and Li, (Jack) Yu-Chuan", title="Machine Learning Approach to Reduce Alert Fatigue Using a Disease Medication--Related Clinical Decision Support System: Model Development and Validation", journal="JMIR Med Inform", year="2020", month="Nov", day="19", volume="8", number="11", pages="e19489", keywords="clinical decision support system", keywords="alert fatigue", keywords="machine learning", keywords="artificial neural network", abstract="Background: Computerized physician order entry (CPOE) systems are incorporated into clinical decision support systems (CDSSs) to reduce medication errors and improve patient safety. Automatic alerts generated from CDSSs can directly assist physicians in making useful clinical decisions and can help shape prescribing behavior. Multiple studies reported that approximately 90\%-96\% of alerts are overridden by physicians, which raises questions about the effectiveness of CDSSs. There is intense interest in developing sophisticated methods to combat alert fatigue, but there is no consensus on the optimal approaches so far. Objective: Our objective was to develop machine learning prediction models to predict physicians' responses in order to reduce alert fatigue from disease medication--related CDSSs. Methods: We collected data from a disease medication--related CDSS from a university teaching hospital in Taiwan. We considered prescriptions that triggered alerts in the CDSS between August 2018 and May 2019. Machine learning models, such as artificial neural network (ANN), random forest (RF), na{\"i}ve Bayes (NB), gradient boosting (GB), and support vector machine (SVM), were used to develop prediction models. The data were randomly split into training (80\%) and testing (20\%) datasets. Results: A total of 6453 prescriptions were used in our model. The ANN machine learning prediction model demonstrated excellent discrimination (area under the receiver operating characteristic curve [AUROC] 0.94; accuracy 0.85), whereas the RF, NB, GB, and SVM models had AUROCs of 0.93, 0.91, 0.91, and 0.80, respectively. The sensitivity and specificity of the ANN model were 0.87 and 0.83, respectively. Conclusions: In this study, ANN showed substantially better performance in predicting individual physician responses to an alert from a disease medication--related CDSS, as compared to the other models. To our knowledge, this is the first study to use machine learning models to predict physician responses to alerts; furthermore, it can help to develop sophisticated CDSSs in real-world clinical settings. ", doi="10.2196/19489", url="https://medinform.jmir.org/2020/11/e19489", url="http://www.ncbi.nlm.nih.gov/pubmed/33211018" } @Article{info:doi/10.2196/16503, author="Sufriyana, Herdiantri and Husnayain, Atina and Chen, Ya-Lin and Kuo, Chao-Yang and Singh, Onkar and Yeh, Tso-Yang and Wu, Yu-Wei and Su, Chia-Yu Emily", title="Comparison of Multivariable Logistic Regression and Other Machine Learning Algorithms for Prognostic Prediction Studies in Pregnancy Care: Systematic Review and Meta-Analysis", journal="JMIR Med Inform", year="2020", month="Nov", day="17", volume="8", number="11", pages="e16503", keywords="machine learning", keywords="pregnancy complications", keywords="prognosis", keywords="clinical prediction rule", keywords="meta-analysis", keywords="systematic review", abstract="Background: Predictions in pregnancy care are complex because of interactions among multiple factors. Hence, pregnancy outcomes are not easily predicted by a single predictor using only one algorithm or modeling method. Objective: This study aims to review and compare the predictive performances between logistic regression (LR) and other machine learning algorithms for developing or validating a multivariable prognostic prediction model for pregnancy care to inform clinicians' decision making. Methods: Research articles from MEDLINE, Scopus, Web of Science, and Google Scholar were reviewed following several guidelines for a prognostic prediction study, including a risk of bias (ROB) assessment. We report the results based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Studies were primarily framed as PICOTS (population, index, comparator, outcomes, timing, and setting): Population: men or women in procreative management, pregnant women, and fetuses or newborns; Index: multivariable prognostic prediction models using non-LR algorithms for risk classification to inform clinicians' decision making; Comparator: the models applying an LR; Outcomes: pregnancy-related outcomes of procreation or pregnancy outcomes for pregnant women and fetuses or newborns; Timing: pre-, inter-, and peripregnancy periods (predictors), at the pregnancy, delivery, and either puerperal or neonatal period (outcome), and either short- or long-term prognoses (time interval); and Setting: primary care or hospital. The results were synthesized by reporting study characteristics and ROBs and by random effects modeling of the difference of the logit area under the receiver operating characteristic curve of each non-LR model compared with the LR model for the same pregnancy outcomes. We also reported between-study heterogeneity by using $\tau$2 and I2. Results: Of the 2093 records, we included 142 studies for the systematic review and 62 studies for a meta-analysis. Most prediction models used LR (92/142, 64.8\%) and artificial neural networks (20/142, 14.1\%) among non-LR algorithms. Only 16.9\% (24/142) of studies had a low ROB. A total of 2 non-LR algorithms from low ROB studies significantly outperformed LR. The first algorithm was a random forest for preterm delivery (logit AUROC 2.51, 95\% CI 1.49-3.53; I2=86\%; $\tau$2=0.77) and pre-eclampsia (logit AUROC 1.2, 95\% CI 0.72-1.67; I2=75\%; $\tau$2=0.09). The second algorithm was gradient boosting for cesarean section (logit AUROC 2.26, 95\% CI 1.39-3.13; I2=75\%; $\tau$2=0.43) and gestational diabetes (logit AUROC 1.03, 95\% CI 0.69-1.37; I2=83\%; $\tau$2=0.07). Conclusions: Prediction models with the best performances across studies were not necessarily those that used LR but also used random forest and gradient boosting that also performed well. We recommend a reanalysis of existing LR models for several pregnancy outcomes by comparing them with those algorithms that apply standard guidelines. Trial Registration: PROSPERO (International Prospective Register of Systematic Reviews) CRD42019136106; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=136106 ", doi="10.2196/16503", url="http://medinform.jmir.org/2020/11/e16503/", url="http://www.ncbi.nlm.nih.gov/pubmed/33200995" } @Article{info:doi/10.2196/24225, author="Kim, Hyung-Jun and Han, Deokjae and Kim, Jeong-Han and Kim, Daehyun and Ha, Beomman and Seog, Woong and Lee, Yeon-Kyeng and Lim, Dosang and Hong, Ok Sung and Park, Mi-Jin and Heo, JoonNyung", title="An Easy-to-Use Machine Learning Model to Predict the Prognosis of Patients With COVID-19: Retrospective Cohort Study", journal="J Med Internet Res", year="2020", month="Nov", day="9", volume="22", number="11", pages="e24225", keywords="COVID-19", keywords="machine learning", keywords="prognosis", keywords="SARS-CoV-2", keywords="severe acute respiratory syndrome coronavirus 2", abstract="Background: Prioritizing patients in need of intensive care is necessary to reduce the mortality rate during the COVID-19 pandemic. Although several scoring methods have been introduced, many require laboratory or radiographic findings that are not always easily available. Objective: The purpose of this study was to develop a machine learning model that predicts the need for intensive care for patients with COVID-19 using easily obtainable characteristics---baseline demographics, comorbidities, and symptoms. Methods: A retrospective study was performed using a nationwide cohort in South Korea. Patients admitted to 100 hospitals from January 25, 2020, to June 3, 2020, were included. Patient information was collected retrospectively by the attending physicians in each hospital and uploaded to an online case report form. Variables that could be easily provided were extracted. The variables were age, sex, smoking history, body temperature, comorbidities, activities of daily living, and symptoms. The primary outcome was the need for intensive care, defined as admission to the intensive care unit, use of extracorporeal life support, mechanical ventilation, vasopressors, or death within 30 days of hospitalization. Patients admitted until March 20, 2020, were included in the derivation group to develop prediction models using an automated machine learning technique. The models were externally validated in patients admitted after March 21, 2020. The machine learning model with the best discrimination performance was selected and compared against the CURB-65 (confusion, urea, respiratory rate, blood pressure, and 65 years of age or older) score using the area under the receiver operating characteristic curve (AUC). Results: A total of 4787 patients were included in the analysis, of which 3294 were assigned to the derivation group and 1493 to the validation group. Among the 4787 patients, 460 (9.6\%) patients needed intensive care. Of the 55 machine learning models developed, the XGBoost model revealed the highest discrimination performance. The AUC of the XGBoost model was 0.897 (95\% CI 0.877-0.917) for the derivation group and 0.885 (95\% CI 0.855-0.915) for the validation group. Both the AUCs were superior to those of CURB-65, which were 0.836 (95\% CI 0.825-0.847) and 0.843 (95\% CI 0.829-0.857), respectively. Conclusions: We developed a machine learning model comprising simple patient-provided characteristics, which can efficiently predict the need for intensive care among patients with COVID-19. ", doi="10.2196/24225", url="http://www.jmir.org/2020/11/e24225/", url="http://www.ncbi.nlm.nih.gov/pubmed/33108316" } @Article{info:doi/10.2196/22689, author="Luo, Gang and Nau, L. Claudia and Crawford, W. William and Schatz, Michael and Zeiger, S. Robert and Rozema, Emily and Koebnick, Corinna", title="Developing a Predictive Model for Asthma-Related Hospital Encounters in Patients With Asthma in a Large, Integrated Health Care System: Secondary Analysis", journal="JMIR Med Inform", year="2020", month="Nov", day="9", volume="8", number="11", pages="e22689", keywords="asthma", keywords="forecasting", keywords="machine learning", keywords="patient care management", keywords="risk factors", abstract="Background: Asthma causes numerous hospital encounters annually, including emergency department visits and hospitalizations. To improve patient outcomes and reduce the number of these encounters, predictive models are widely used to prospectively pinpoint high-risk patients with asthma for preventive care via care management. However, previous models do not have adequate accuracy to achieve this goal well. Adopting the modeling guideline for checking extensive candidate features, we recently constructed a machine learning model on Intermountain Healthcare data to predict asthma-related hospital encounters in patients with asthma. Although this model is more accurate than the previous models, whether our modeling guideline is generalizable to other health care systems remains unknown. Objective: This study aims to assess the generalizability of our modeling guideline to Kaiser Permanente Southern California (KPSC). Methods: The patient cohort included a random sample of 70.00\% (397,858/568,369) of patients with asthma who were enrolled in a KPSC health plan for any duration between 2015 and 2018. We produced a machine learning model via a secondary analysis of 987,506 KPSC data instances from 2012 to 2017 and by checking 337 candidate features to project asthma-related hospital encounters in the following 12-month period in patients with asthma. Results: Our model reached an area under the receiver operating characteristic curve of 0.820. When the cutoff point for binary classification was placed at the top 10.00\% (20,474/204,744) of patients with asthma having the largest predicted risk, our model achieved an accuracy of 90.08\% (184,435/204,744), a sensitivity of 51.90\% (2259/4353), and a specificity of 90.91\% (182,176/200,391). Conclusions: Our modeling guideline exhibited acceptable generalizability to KPSC and resulted in a model that is more accurate than those formerly built by others. After further enhancement, our model could be used to guide asthma care management. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.5039 ", doi="10.2196/22689", url="http://medinform.jmir.org/2020/11/e22689/", url="http://www.ncbi.nlm.nih.gov/pubmed/33164906" } @Article{info:doi/10.2196/19069, author="Zhong, Xiaorong and Luo, Ting and Deng, Ling and Liu, Pei and Hu, Kejia and Lu, Donghao and Zheng, Dan and Luo, Chuanxu and Xie, Yuxin and Li, Jiayuan and He, Ping and Pu, Tianjie and Ye, Feng and Bu, Hong and Fu, Bo and Zheng, Hong", title="Multidimensional Machine Learning Personalized Prognostic Model in an Early Invasive Breast Cancer Population-Based Cohort in China: Algorithm Validation Study", journal="JMIR Med Inform", year="2020", month="Nov", day="9", volume="8", number="11", pages="e19069", keywords="breast cancer", keywords="prognosis", keywords="machine learning", keywords="prediction model", abstract="Background: Current online prognostic prediction models for breast cancer, such as Adjuvant! Online and PREDICT, are based on specific populations. They have been well validated and widely used in the United States and Western Europe; however, several validation attempts in non-European countries have revealed suboptimal predictions. Objective: We aimed to develop an advanced breast cancer prognosis model for disease progression, cancer-specific mortality, and all-cause mortality by integrating tumor, demographic, and treatment characteristics from a large breast cancer cohort in China. Methods: This study was approved by the Clinical Test and Biomedical Ethics Committee of West China Hospital, Sichuan University on May 17, 2012. Data collection for this project was started in May 2017 and ended in March 2019. Data on 5293 women diagnosed with stage I to III invasive breast cancer between 2000 and 2013 were collected. Disease progression, cancer-specific mortality, all-cause mortality, and the likelihood of disease progression or death within a 5-year period were predicted. Extreme gradient boosting was used to develop the prediction model. Model performance was assessed by calculating the area under the receiver operating characteristic curve (AUROC), and the model was calibrated and compared with PREDICT. Results: The training, test, and validation sets comprised 3276 (499 progressions, 202 breast cancer-specific deaths, and 261 all-cause deaths within 5-year follow-up), 1405 (211 progressions, 94 breast cancer-specific deaths, and 129 all-cause deaths), and 612 (109 progressions, 33 breast cancer-specific deaths, and 37 all-cause deaths) women, respectively. The AUROC values for disease progression, cancer-specific mortality, and all-cause mortality were 0.76, 0.88, and 0.82 for training set; 0.79, 0.80, and 0.83 for the test set; and 0.79, 0.84, and 0.88 for the validation set, respectively. Calibration analysis demonstrated good agreement between predicted and observed events within 5 years. Comparable AUROC and calibration results were confirmed in different age, residence status, and receptor status subgroups. Compared with PREDICT, our model showed similar AUROC and improved calibration values. Conclusions: Our prognostic model exhibits high discrimination and good calibration. It may facilitate prognosis prediction and clinical decision making for patients with breast cancer in China. ", doi="10.2196/19069", url="http://medinform.jmir.org/2020/11/e19069/", url="http://www.ncbi.nlm.nih.gov/pubmed/33164899" } @Article{info:doi/10.2196/21252, author="Spasic, Irena and Button, Kate", title="Patient Triage by Topic Modeling of Referral Letters: Feasibility Study", journal="JMIR Med Inform", year="2020", month="Nov", day="6", volume="8", number="11", pages="e21252", keywords="natural language processing", keywords="machine learning", keywords="data science", keywords="medical informatics", keywords="computer-assisted decision making", abstract="Background: Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective: This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments? Methods: We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics. Results: The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model. Conclusions: The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters. ", doi="10.2196/21252", url="https://medinform.jmir.org/2020/11/e21252", url="http://www.ncbi.nlm.nih.gov/pubmed/33155985" } @Article{info:doi/10.2196/23351, author="Yoo, Junsang and Lee, Jeonghoon and Rhee, Poong-Lyul and Chang, Kyung Dong and Kang, Mira and Choi, Soo Jong and Bates, W. David and Cha, Chul Won", title="Alert Override Patterns With a Medication Clinical Decision Support System in an Academic Emergency Department: Retrospective Descriptive Study", journal="JMIR Med Inform", year="2020", month="Nov", day="4", volume="8", number="11", pages="e23351", keywords="medical order entry systems", keywords="decision support systems", keywords="clinical", keywords="alert fatigue", keywords="health personnel", keywords="clinical decision support system", keywords="alert", keywords="emergency department", keywords="medication", abstract="Background: Physicians' alert overriding behavior is considered to be the most important factor leading to failure of computerized provider order entry (CPOE) combined with a clinical decision support system (CDSS) in achieving its potential adverse drug events prevention effect. Previous studies on this subject have focused on specific diseases or alert types for well-defined targets and particular settings. The emergency department is an optimal environment to examine physicians' alert overriding behaviors from a broad perspective because patients have a wider range of severity, and many receive interdisciplinary care in this environment. However, less than one-tenth of related studies have targeted this physician behavior in an emergency department setting. Objective: The aim of this study was to describe alert override patterns with a commercial medication CDSS in an academic emergency department. Methods: This study was conducted at a tertiary urban academic hospital in the emergency department with an annual census of 80,000 visits. We analyzed data on the patients who visited the emergency department for 18 months and the medical staff who treated them, including the prescription and CPOE alert log. We also performed descriptive analysis and logistic regression for assessing the risk factors for alert overrides. Results: During the study period, 611 physicians cared for 71,546 patients with 101,186 visits. The emergency department physicians encountered 13.75 alerts during every 100 orders entered. Of the total 102,887 alerts, almost two-thirds (65,616, 63.77\%) were overridden. Univariate and multivariate logistic regression analyses identified 21 statistically significant risk factors for emergency department physicians' alert override behavior. Conclusions: In this retrospective study, we described the alert override patterns with a medication CDSS in an academic emergency department. We found relatively low overrides and assessed their contributing factors, including physicians' designation and specialty, patients' severity and chief complaints, and alert and medication type. ", doi="10.2196/23351", url="https://medinform.jmir.org/2020/11/e23351", url="http://www.ncbi.nlm.nih.gov/pubmed/33146626" } @Article{info:doi/10.2196/18735, author="Yu, William Yun and Weber, M. Griffin", title="Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation", journal="J Med Internet Res", year="2020", month="Nov", day="3", volume="22", number="11", pages="e18735", keywords="algorithms", keywords="medical records", keywords="privacy", keywords="information storage and retrieval", keywords="medical record linkage", abstract="Background: Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. Objective: This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. Methods: We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques. Results: In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. Conclusions: Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks. ", doi="10.2196/18735", url="https://www.jmir.org/2020/11/e18735", url="http://www.ncbi.nlm.nih.gov/pubmed/33141090" } @Article{info:doi/10.2196/17050, author="Carter-Harris, Lisa and Comer, Skipworth Robert and Slaven II, E. James and Monahan, O. Patrick and Vode, Emilee and Hanna, H. Nasser and Ceppa, Pham DuyKhanh and Rawl, M. Susan", title="Computer-Tailored Decision Support Tool for Lung Cancer Screening: Community-Based Pilot Randomized Controlled Trial", journal="J Med Internet Res", year="2020", month="Nov", day="3", volume="22", number="11", pages="e17050", keywords="lung cancer screening", keywords="informed decision-making", keywords="shared decision-making", keywords="patient decision aid", keywords="patient education", abstract="Background: Lung cancer screening is a US Preventive Services Task Force Grade B recommendation that has been shown to decrease lung cancer-related mortality by approximately 20\%. However, making the decision to screen, or not, for lung cancer is a complex decision because there are potential risks (eg, false positive results, overdiagnosis). Shared decision making was incorporated into the lung cancer screening guideline and, for the first time, is a requirement for reimbursement of a cancer screening test from Medicare. Awareness of lung cancer screening remains low in both the general and screening-eligible populations. When a screening-eligible person visits their clinician never having heard about lung cancer screening, engaging in shared decision making to arrive at an informed decision can be a challenge. Methods to effectively prepare patients for these clinical encounters and support both patients and clinicians to engage in these important discussions are needed. Objective: The aim of the study was to estimate the effects of a computer-tailored decision support tool that meets the certification criteria of the International Patient Decision Aid Standards that will prepare individuals and support shared decision making in lung cancer screening decisions. Methods: A pilot randomized controlled trial with a community-based sample of 60 screening-eligible participants who have never been screened for lung cancer was conducted. Approximately half of the participants (n=31) were randomized to view LungTalk---a web-based tailored computer program---while the other half (n=29) viewed generic information about lung cancer screening from the American Cancer Society. The outcomes that were compared included lung cancer and screening knowledge, lung cancer screening health beliefs (perceived risk, perceived benefits, perceived barriers, and self-efficacy), and perception of being prepared to engage in a discussion about lung cancer screening with their clinician. Results: Knowledge scores increased significantly for both groups with greater improvement noted in the group receiving LungTalk (2.33 vs 1.14 mean change). Perceived self-efficacy and perceived benefits improved in the theoretically expected directions. Conclusions: LungTalk goes beyond other decision tools by addressing lung health broadly, in the context of performing a low-dose computed tomography of the chest that has the potential to uncover other conditions of concern beyond lung cancer, to more comprehensively educate the individual, and extends the work of nontailored decision aids in the field by introducing tailoring algorithms and message framing based upon smoking status in order to determine what components of the intervention drive behavior change when an individual is informed and makes the decision whether to be screened or not to be screened for lung cancer. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.8694 ", doi="10.2196/17050", url="https://www.jmir.org/2020/11/e17050", url="http://www.ncbi.nlm.nih.gov/pubmed/33141096" } @Article{info:doi/10.2196/21222, author="Chou, H. Joseph", title="Predictive Models for Neonatal Follow-Up Serum Bilirubin: Model Development and Validation", journal="JMIR Med Inform", year="2020", month="Oct", day="29", volume="8", number="10", pages="e21222", keywords="infant, newborn", keywords="neonatology", keywords="jaundice, neonatal", keywords="hyperbilirubinemia, neonatal", keywords="machine learning", keywords="supervised machine learning", keywords="data science", keywords="medical informatics", keywords="decision support techniques", keywords="models, statistical", keywords="predictive models", abstract="Background: Hyperbilirubinemia affects many newborn infants and, if not treated appropriately, can lead to irreversible brain injury. Objective: This study aims to develop predictive models of follow-up total serum bilirubin measurement and to compare their accuracy with that of clinician predictions. Methods: Subjects were patients born between June 2015 and June 2019 at 4 hospitals in Massachusetts. The prediction target was a follow-up total serum bilirubin measurement obtained <72 hours after a previous measurement. Birth before versus after February 2019 was used to generate a training set (27,428 target measurements) and a held-out test set (3320 measurements), respectively. Multiple supervised learning models were trained. To further assess model performance, predictions on the held-out test set were also compared with corresponding predictions from clinicians. Results: The best predictive accuracy on the held-out test set was obtained with the multilayer perceptron (ie, neural network, mean absolute error [MAE] 1.05 mg/dL) and Xgboost (MAE 1.04 mg/dL) models. A limited number of predictors were sufficient for constructing models with the best performance and avoiding overfitting: current bilirubin measurement, last rate of rise, proportion of time under phototherapy, time to next measurement, gestational age at birth, current age, and fractional weight change from birth. Clinicians made a total of 210 prospective predictions. The neural network model accuracy on this subset of predictions had an MAE of 1.06 mg/dL compared with clinician predictions with an MAE of 1.38 mg/dL (P<.0001). In babies born at 35 weeks of gestation or later, this approach was also applied to predict the binary outcome of subsequently exceeding consensus guidelines for phototherapy initiation and achieved an area under the receiver operator characteristic curve of 0.94 (95\% CI 0.91 to 0.97). Conclusions: This study developed predictive models for neonatal follow-up total serum bilirubin measurements that outperform clinicians. This may be the first report of models that predict specific bilirubin values, are not limited to near-term patients without risk factors, and take into account the effect of phototherapy. ", doi="10.2196/21222", url="http://medinform.jmir.org/2020/10/e21222/", url="http://www.ncbi.nlm.nih.gov/pubmed/33118947" } @Article{info:doi/10.2196/19676, author="Trinkley, E. Katy and Kahn, G. Michael and Bennett, D. Tellen and Glasgow, E. Russell and Haugen, Heather and Kao, P. David and Kroehl, E. Miranda and Lin, Chen-Tan and Malone, C. Daniel and Matlock, D. Daniel", title="Integrating the Practical Robust Implementation and Sustainability Model With Best Practices in Clinical Decision Support Design: Implementation Science Approach", journal="J Med Internet Res", year="2020", month="Oct", day="29", volume="22", number="10", pages="e19676", keywords="clinical decision support", keywords="PRISM", keywords="implementation science", abstract="Background: Clinical decision support (CDS) design best practices are intended to provide a narrative representation of factors that influence the success of CDS tools. However, they provide incomplete direction on evidence-based implementation principles. Objective: This study aims to describe an integrated approach toward applying an existing implementation science (IS) framework with CDS design best practices to improve the effectiveness, sustainability, and reproducibility of CDS implementations. Methods: We selected the Practical Robust Implementation and Sustainability Model (PRISM) IS framework. We identified areas where PRISM and CDS design best practices complemented each other and defined methods to address each. Lessons learned from applying these methods were then used to further refine the integrated approach. Results: Our integrated approach to applying PRISM with CDS design best practices consists of 5 key phases that iteratively interact and inform each other: multilevel stakeholder engagement, designing the CDS, design and usability testing, thoughtful deployment, and performance evaluation and maintenance. The approach is led by a dedicated implementation team that includes clinical informatics and analyst builder expertise. Conclusions: Integrating PRISM with CDS design best practices extends user-centered design and accounts for the multilevel, interacting, and dynamic factors that influence CDS implementation in health care. Integrating PRISM with CDS design best practices synthesizes the many known contextual factors that can influence the success of CDS tools, thereby enhancing the reproducibility and sustainability of CDS implementations. Others can adapt this approach to their situation to maximize and sustain CDS implementation success. ", doi="10.2196/19676", url="http://www.jmir.org/2020/10/e19676/", url="http://www.ncbi.nlm.nih.gov/pubmed/33118943" } @Article{info:doi/10.2196/21801, author="Izquierdo, Luis Jose and Ancochea, Julio and and Soriano, B. Joan", title="Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing", journal="J Med Internet Res", year="2020", month="Oct", day="28", volume="22", number="10", pages="e21801", keywords="artificial intelligence", keywords="big data", keywords="COVID-19", keywords="electronic health records", keywords="tachypnea", keywords="SARS-CoV-2", keywords="predictive model", abstract="Background: Many factors involved in the onset and clinical course of the ongoing COVID-19 pandemic are still unknown. Although big data analytics and artificial intelligence are widely used in the realms of health and medicine, researchers are only beginning to use these tools to explore the clinical characteristics and predictive factors of patients with COVID-19. Objective: Our primary objectives are to describe the clinical characteristics and determine the factors that predict intensive care unit (ICU) admission of patients with COVID-19. Determining these factors using a well-defined population can increase our understanding of the real-world epidemiology of the disease. Methods: We used a combination of classic epidemiological methods, natural language processing (NLP), and machine learning (for predictive modeling) to analyze the electronic health records (EHRs) of patients with COVID-19. We explored the unstructured free text in the EHRs within the Servicio de Salud de Castilla-La Mancha (SESCAM) Health Care Network (Castilla-La Mancha, Spain) from the entire population with available EHRs (1,364,924 patients) from January 1 to March 29, 2020. We extracted related clinical information regarding diagnosis, progression, and outcome for all COVID-19 cases. Results: A total of 10,504 patients with a clinical or polymerase chain reaction--confirmed diagnosis of COVID-19 were identified; 5519 (52.5\%) were male, with a mean age of 58.2 years (SD 19.7). Upon admission, the most common symptoms were cough, fever, and dyspnea; however, all three symptoms occurred in fewer than half of the cases. Overall, 6.1\% (83/1353) of hospitalized patients required ICU admission. Using a machine-learning, data-driven algorithm, we identified that a combination of age, fever, and tachypnea was the most parsimonious predictor of ICU admission; patients younger than 56 years, without tachypnea, and temperature <39 degrees Celsius (or >39 {\textordmasculine}C without respiratory crackles) were not admitted to the ICU. In contrast, patients with COVID-19 aged 40 to 79 years were likely to be admitted to the ICU if they had tachypnea and delayed their visit to the emergency department after being seen in primary care. Conclusions: Our results show that a combination of easily obtainable clinical variables (age, fever, and tachypnea with or without respiratory crackles) predicts whether patients with COVID-19 will require ICU admission. ", doi="10.2196/21801", url="http://www.jmir.org/2020/10/e21801/", url="http://www.ncbi.nlm.nih.gov/pubmed/33090964" } @Article{info:doi/10.2196/22013, author="Wan, Kengfai Paul and Satybaldy, Abylay and Huang, Lizhen and Holtskog, Halvor and Nowostawski, Mariusz", title="Reducing Alert Fatigue by Sharing Low-Level Alerts With Patients and Enhancing Collaborative Decision Making Using Blockchain Technology: Scoping Review and Proposed Framework (MedAlert)", journal="J Med Internet Res", year="2020", month="Oct", day="28", volume="22", number="10", pages="e22013", keywords="blockchain", keywords="health care", keywords="alert fatigue", keywords="clinical decision support", keywords="smart contracts", keywords="information sharing", abstract="Background: Clinical decision support (CDS) is a tool that helps clinicians in decision making by generating clinical alerts to supplement their previous knowledge and experience. However, CDS generates a high volume of irrelevant alerts, resulting in alert fatigue among clinicians. Alert fatigue is the mental state of alerts consuming too much time and mental energy, which often results in relevant alerts being overridden unjustifiably, along with clinically irrelevant ones. Consequently, clinicians become less responsive to important alerts, which opens the door to medication errors. Objective: This study aims to explore how a blockchain-based solution can reduce alert fatigue through collaborative alert sharing in the health sector, thus improving overall health care quality for both patients and clinicians. Methods: We have designed a 4-step approach to answer this research question. First, we identified five potential challenges based on the published literature through a scoping review. Second, a framework is designed to reduce alert fatigue by addressing the identified challenges with different digital components. Third, an evaluation is made by comparing MedAlert with other proposed solutions. Finally, the limitations and future work are also discussed. Results: Of the 341 academic papers collected, 8 were selected and analyzed. MedAlert securely distributes low-level (nonlife-threatening) clinical alerts to patients, enabling a collaborative clinical decision. Among the solutions in our framework, Hyperledger (private permissioned blockchain) and BankID (federated digital identity management) have been selected to overcome challenges such as data integrity, user identity, and privacy issues. Conclusions: MedAlert can reduce alert fatigue by attracting the attention of patients and clinicians, instead of solely reducing the total number of alerts. MedAlert offers other advantages, such as ensuring a higher degree of patient privacy and faster transaction times compared with other frameworks. This framework may not be suitable for elderly patients who are not technology savvy or in-patients. Future work in validating this framework based on real health care scenarios is needed to provide the performance evaluations of MedAlert and thus gain support for the better development of this idea. ", doi="10.2196/22013", url="http://www.jmir.org/2020/10/e22013/", url="http://www.ncbi.nlm.nih.gov/pubmed/33112253" } @Article{info:doi/10.2196/20324, author="Shirakawa, Toru and Sonoo, Tomohiro and Ogura, Kentaro and Fujimori, Ryo and Hara, Konan and Goto, Tadahiro and Hashimoto, Hideki and Takahashi, Yuji and Naraba, Hiromu and Nakamura, Kensuke", title="Institution-Specific Machine Learning Models for Prehospital Assessment to Predict Hospital Admission: Prediction Model Development Study", journal="JMIR Med Inform", year="2020", month="Oct", day="27", volume="8", number="10", pages="e20324", keywords="prehospital", keywords="prediction", keywords="hospital admission", keywords="emergency medicine", keywords="machine learning", keywords="data science", abstract="Background: Although multiple prediction models have been developed to predict hospital admission to emergency departments (EDs) to address overcrowding and patient safety, only a few studies have examined prediction models for prehospital use. Development of institution-specific prediction models is feasible in this age of data science, provided that predictor-related information is readily collectable. Objective: We aimed to develop a hospital admission prediction model based on patient information that is commonly available during ambulance transport before hospitalization. Methods: Patients transported by ambulance to our ED from April 2018 through March 2019 were enrolled. Candidate predictors were age, sex, chief complaint, vital signs, and patient medical history, all of which were recorded by emergency medical teams during ambulance transport. Patients were divided into two cohorts for derivation (3601/5145, 70.0\%) and validation (1544/5145, 30.0\%). For statistical models, logistic regression, logistic lasso, random forest, and gradient boosting machine were used. Prediction models were developed in the derivation cohort. Model performance was assessed by area under the receiver operating characteristic curve (AUROC) and association measures in the validation cohort. Results: Of 5145 patients transported by ambulance, including deaths in the ED and hospital transfers, 2699 (52.5\%) required hospital admission. Prediction performance was higher with the addition of predictive factors, attaining the best performance with an AUROC of 0.818 (95\% CI 0.792-0.839) with a machine learning model and predictive factors of age, sex, chief complaint, and vital signs. Sensitivity and specificity of this model were 0.744 (95\% CI 0.716-0.773) and 0.745 (95\% CI 0.709-0.776), respectively. Conclusions: For patients transferred to EDs, we developed a well-performing hospital admission prediction model based on routinely collected prehospital information including chief complaints. ", doi="10.2196/20324", url="http://medinform.jmir.org/2020/10/e20324/", url="http://www.ncbi.nlm.nih.gov/pubmed/33107830" } @Article{info:doi/10.2196/20891, author="Lee, Hyeong Geun and Shin, Soo-Yong", title="Federated Learning on Clinical Benchmark Data: Performance Assessment", journal="J Med Internet Res", year="2020", month="Oct", day="26", volume="22", number="10", pages="e20891", keywords="federated learning", keywords="medical data", keywords="privacy protection", keywords="machine learning", keywords="deep learning", abstract="Background: Federated learning (FL) is a newly proposed machine-learning method that uses a decentralized dataset. Since data transfer is not necessary for the learning process in FL, there is a significant advantage in protecting personal privacy. Therefore, many studies are being actively conducted in the applications of FL for diverse areas. Objective: The aim of this study was to evaluate the reliability and performance of FL using three benchmark datasets, including a clinical benchmark dataset. Methods: To evaluate FL in a realistic setting, we implemented FL using a client-server architecture with Python. The implemented client-server version of the FL software was deployed to Amazon Web Services. Modified National Institute of Standards and Technology (MNIST), Medical Information Mart for Intensive Care-III (MIMIC-III), and electrocardiogram (ECG) datasets were used to evaluate the performance of FL. To test FL in a realistic setting, the MNIST dataset was split into 10 different clients, with one digit for each client. In addition, we conducted four different experiments according to basic, imbalanced, skewed, and a combination of imbalanced and skewed data distributions. We also compared the performance of FL to that of the state-of-the-art method with respect to in-hospital mortality using the MIMIC-III dataset. Likewise, we conducted experiments comparing basic and imbalanced data distributions using MIMIC-III and ECG data. Results: FL on the basic MNIST dataset with 10 clients achieved an area under the receiver operating characteristic curve (AUROC) of 0.997 and an F1-score of 0.946. The experiment with the imbalanced MNIST dataset achieved an AUROC of 0.995 and an F1-score of 0.921. The experiment with the skewed MNIST dataset achieved an AUROC of 0.992 and an F1-score of 0.905. Finally, the combined imbalanced and skewed experiment achieved an AUROC of 0.990 and an F1-score of 0.891. The basic FL on in-hospital mortality using MIMIC-III data achieved an AUROC of 0.850 and an F1-score of 0.944, while the experiment with the imbalanced MIMIC-III dataset achieved an AUROC of 0.850 and an F1-score of 0.943. For ECG classification, the basic FL achieved an AUROC of 0.938 and an F1-score of 0.807, and the imbalanced ECG dataset achieved an AUROC of 0.943 and an F1-score of 0.807. Conclusions: FL demonstrated comparative performance on different benchmark datasets. In addition, FL demonstrated reliable performance in cases where the distribution was imbalanced, skewed, and extreme, reflecting the real-life scenario in which data distributions from various hospitals are different. FL can achieve high performance while maintaining privacy protection because there is no requirement to centralize the data. ", doi="10.2196/20891", url="http://www.jmir.org/2020/10/e20891/", url="http://www.ncbi.nlm.nih.gov/pubmed/33104011" } @Article{info:doi/10.2196/22400, author="Allen, Angier and Mataraso, Samson and Siefkas, Anna and Burdick, Hoyt and Braden, Gregory and Dellinger, Phillip R. and McCoy, Andrea and Pellegrini, Emily and Hoffman, Jana and Green-Saxena, Abigail and Barnes, Gina and Calvert, Jacob and Das, Ritankar", title="A Racially Unbiased, Machine Learning Approach to Prediction of Mortality: Algorithm Development Study", journal="JMIR Public Health Surveill", year="2020", month="Oct", day="22", volume="6", number="4", pages="e22400", keywords="machine learning", keywords="health disparities", keywords="racial disparities", keywords="mortality", keywords="prediction", abstract="Background: Racial disparities in health care are well documented in the United States. As machine learning methods become more common in health care settings, it is important to ensure that these methods do not contribute to racial disparities through biased predictions or differential accuracy across racial groups. Objective: The goal of the research was to assess a machine learning algorithm intentionally developed to minimize bias in in-hospital mortality predictions between white and nonwhite patient groups. Methods: Bias was minimized through preprocessing of algorithm training data. We performed a retrospective analysis of electronic health record data from patients admitted to the intensive care unit (ICU) at a large academic health center between 2001 and 2012, drawing data from the Medical Information Mart for Intensive Care--III database. Patients were included if they had at least 10 hours of available measurements after ICU admission, had at least one of every measurement used for model prediction, and had recorded race/ethnicity data. Bias was assessed through the equal opportunity difference. Model performance in terms of bias and accuracy was compared with the Modified Early Warning Score (MEWS), the Simplified Acute Physiology Score II (SAPS II), and the Acute Physiologic Assessment and Chronic Health Evaluation (APACHE). Results: The machine learning algorithm was found to be more accurate than all comparators, with a higher sensitivity, specificity, and area under the receiver operating characteristic. The machine learning algorithm was found to be unbiased (equal opportunity difference 0.016, P=.20). APACHE was also found to be unbiased (equal opportunity difference 0.019, P=.11), while SAPS II and MEWS were found to have significant bias (equal opportunity difference 0.038, P=.006 and equal opportunity difference 0.074, P<.001, respectively). Conclusions: This study indicates there may be significant racial bias in commonly used severity scoring systems and that machine learning algorithms may reduce bias while improving on the accuracy of these methods. ", doi="10.2196/22400", url="http://publichealth.jmir.org/2020/4/e22400/", url="http://www.ncbi.nlm.nih.gov/pubmed/33090117" } @Article{info:doi/10.2196/21798, author="Xie, Feng and Chakraborty, Bibhas and Ong, Hock Marcus Eng and Goldstein, Alan Benjamin and Liu, Nan", title="AutoScore: A Machine Learning--Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records", journal="JMIR Med Inform", year="2020", month="Oct", day="21", volume="8", number="10", pages="e21798", keywords="clinical decision making", keywords="machine learning", keywords="prognosis", keywords="clinical prediction rule", keywords="electronic health records", abstract="Background: Risk scores can be useful in clinical risk stratification and accurate allocations of medical resources, helping health providers improve patient care. Point-based scores are more understandable and explainable than other complex models and are now widely used in clinical decision making. However, the development of the risk scoring model is nontrivial and has not yet been systematically presented, with few studies investigating methods of clinical score generation using electronic health records. Objective: This study aims to propose AutoScore, a machine learning--based automatic clinical score generator consisting of 6 modules for developing interpretable point-based scores. Future users can employ the AutoScore framework to create clinical scores effortlessly in various clinical applications. Methods: We proposed the AutoScore framework comprising 6 modules that included variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. To demonstrate the performance of AutoScore, we used data from the Beth Israel Deaconess Medical Center to build a scoring model for mortality prediction and then compared the data with other baseline models using the receiver operating characteristic analysis. A software package in R 3.5.3 (R Foundation) was also developed to demonstrate the implementation of AutoScore. Results: Implemented on the data set with 44,918 individual admission episodes of intensive care, the AutoScore-created scoring models performed comparably well as other standard methods (ie, logistic regression, stepwise regression, least absolute shrinkage and selection operator, and random forest) in terms of predictive accuracy and model calibration but required fewer predictors and presented high interpretability and accessibility. The nine-variable, AutoScore-created, point-based scoring model achieved an area under the curve (AUC) of 0.780 (95\% CI 0.764-0.798), whereas the model of logistic regression with 24 variables had an AUC of 0.778 (95\% CI 0.760-0.795). Moreover, the AutoScore framework also drives the clinical research continuum and automation with its integration of all necessary modules. Conclusions: We developed an easy-to-use, machine learning--based automatic clinical score generator, AutoScore; systematically presented its structure; and demonstrated its superiority (predictive performance and interpretability) over other conventional methods using a benchmark database. AutoScore will emerge as a potential scoring tool in various medical applications. ", doi="10.2196/21798", url="http://medinform.jmir.org/2020/10/e21798/", url="http://www.ncbi.nlm.nih.gov/pubmed/33084589" } @Article{info:doi/10.2196/16094, author="Delvaux, Nicolas and Vaes, Bert and Aertgeerts, Bert and Van de Velde, Stijn and Vander Stichele, Robert and Nyberg, Peter and Vermandere, Mieke", title="Coding Systems for Clinical Decision Support: Theoretical and Real-World Comparative Analysis", journal="JMIR Form Res", year="2020", month="Oct", day="21", volume="4", number="10", pages="e16094", keywords="clinical decision support systems", keywords="clinical coding", keywords="medical informatics", keywords="electronic health records", abstract="Background: Effective clinical decision support systems require accurate translation of practice recommendations into machine-readable artifacts; developing code sets that represent clinical concepts are an important step in this process. Many clinical coding systems are currently used in electronic health records, and it is unclear whether all of these systems are capable of efficiently representing the clinical concepts required in executing clinical decision support systems. Objective: The aim of this study was to evaluate which clinical coding systems are capable of efficiently representing clinical concepts that are necessary for translating artifacts into executable code for clinical decision support systems. Methods: Two methods were used to evaluate a set of clinical coding systems. In a theoretical approach, we extracted all the clinical concepts from 3 preventive care recommendations and constructed a series of code sets containing codes from a single clinical coding system. In a practical approach using data from a real-world setting, we studied the content of 1890 code sets used in an internationally available clinical decision support system and compared the usage of various clinical coding systems. Results: SNOMED CT and ICD-10 (International Classification of Diseases, Tenth Revision) proved to be the most accurate clinical coding systems for most concepts in our theoretical evaluation. In our practical evaluation, we found that International Classification of Diseases (Tenth Revision) was most often used to construct code sets. Some coding systems were very accurate in representing specific types of clinical concepts, for example, LOINC (Logical Observation Identifiers Names and Codes) for investigation results and ATC (Anatomical Therapeutic Chemical Classification) for drugs. Conclusions: No single coding system seems to fulfill all the needs for representing clinical concepts for clinical decision support systems. Comprehensiveness of the coding systems seems to be offset by complexity and forms a barrier to usability for code set construction. Clinical vocabularies mapped to multiple clinical coding systems could facilitate clinical code set construction. ", doi="10.2196/16094", url="http://formative.jmir.org/2020/10/e16094/", url="http://www.ncbi.nlm.nih.gov/pubmed/33084593" } @Article{info:doi/10.2196/22550, author="Almog, Adar Yasmeen and Rai, Angshu and Zhang, Patrick and Moulaison, Amanda and Powell, Ross and Mishra, Anirban and Weinberg, Kerry and Hamilton, Celeste and Oates, Mary and McCloskey, Eugene and Cummings, R. Steven", title="Deep Learning With Electronic Health Records for Short-Term Fracture Risk Identification: Crystal Bone Algorithm Development and Validation", journal="J Med Internet Res", year="2020", month="Oct", day="16", volume="22", number="10", pages="e22550", keywords="fracture", keywords="bone", keywords="osteoporosis", keywords="low bone mass", keywords="prediction", keywords="natural language processing", keywords="NLP", keywords="machine learning", keywords="deep learning", keywords="artificial intelligence", keywords="AI", keywords="electronic health record", keywords="EHR", abstract="Background: Fractures as a result of osteoporosis and low bone mass are common and give rise to significant clinical, personal, and economic burden. Even after a fracture occurs, high fracture risk remains widely underdiagnosed and undertreated. Common fracture risk assessment tools utilize a subset of clinical risk factors for prediction, and often require manual data entry. Furthermore, these tools predict risk over the long term and do not explicitly provide short-term risk estimates necessary to identify patients likely to experience a fracture in the next 1-2 years. Objective: The goal of this study was to develop and evaluate an algorithm for the identification of patients at risk of fracture in a subsequent 1- to 2-year period. In order to address the aforementioned limitations of current prediction tools, this approach focused on a short-term timeframe, automated data entry, and the use of longitudinal data to inform the predictions. Methods: Using retrospective electronic health record data from over 1,000,000 patients, we developed Crystal Bone, an algorithm that applies machine learning techniques from natural language processing to the temporal nature of patient histories to generate short-term fracture risk predictions. Similar to how language models predict the next word in a given sentence or the topic of a document, Crystal Bone predicts whether a patient's future trajectory might contain a fracture event, or whether the signature of the patient's journey is similar to that of a typical future fracture patient. A holdout set with 192,590 patients was used to validate accuracy. Experimental baseline models and human-level performance were used for comparison. Results: The model accurately predicted 1- to 2-year fracture risk for patients aged over 50 years (area under the receiver operating characteristics curve [AUROC] 0.81). These algorithms outperformed the experimental baselines (AUROC 0.67) and showed meaningful improvements when compared to retrospective approximation of human-level performance by correctly identifying 9649 of 13,765 (70\%) at-risk patients who did not receive any preventative bone-health-related medical interventions from their physicians. Conclusions: These findings indicate that it is possible to use a patient's unique medical history as it changes over time to predict the risk of short-term fracture. Validating and applying such a tool within the health care system could enable automated and widespread prediction of this risk and may help with identification of patients at very high risk of fracture. ", doi="10.2196/22550", url="http://www.jmir.org/2020/10/e22550/", url="http://www.ncbi.nlm.nih.gov/pubmed/32956069" } @Article{info:doi/10.2196/17003, author="MacKenna, Brian and Bacon, Sebastian and Walker, J. Alex and Curtis, J. Helen and Croker, Richard and Goldacre, Ben", title="Impact of Electronic Health Record Interface Design on Unsafe Prescribing of Ciclosporin, Tacrolimus, and Diltiazem: Cohort Study in English National Health Service Primary Care", journal="J Med Internet Res", year="2020", month="Oct", day="16", volume="22", number="10", pages="e17003", keywords="prescribing", keywords="primary care", keywords="electronic health records", keywords="clinical software", keywords="branded prescribing", keywords="diltiazem", keywords="tacrolimus", keywords="ciclosporin", abstract="Background: In England, national safety guidance recommends that ciclosporin, tacrolimus, and diltiazem are prescribed by brand name due to their narrow therapeutic windows and, in the case of tacrolimus, to reduce the chance of organ transplantation rejection. Various small studies have shown that changes to electronic health record (EHR) system interfaces can affect prescribing choices. Objective: Our objectives were to assess variation by EHR systems in breach of safety guidance around prescribing of ciclosporin, tacrolimus, and diltiazem, and to conduct user-interface research into the causes of such breaches. Methods: We carried out a retrospective cohort study using prescribing data in English primary care. Participants were English general practices and their respective EHR systems. The main outcome measures were (1) the variation in ratio of safety breaches to adherent prescribing in all practices and (2) the description of observations of EHR system usage. Results: A total of 2,575,411 prescriptions were issued in 2018 for ciclosporin, tacrolimus, and diltiazem (over 60 mg); of these, 316,119 prescriptions breached NHS guidance (12.27\%). Breaches were most common among users of the EMIS EHR system (breaches in 18.81\% of ciclosporin and tacrolimus prescriptions and in 17.99\% of diltiazem prescriptions), but breaches were observed in all EHR systems. Conclusions: Design choices in EHR systems strongly influence safe prescribing of ciclosporin, tacrolimus, and diltiazem, and breaches are prevalent in general practices in England. We recommend that all EHR vendors review their systems to increase safe prescribing of these medicines in line with national guidance. Almost all clinical practice is now mediated through an EHR system; further quantitative research into the effect of EHR system design on clinical practice is long overdue. ", doi="10.2196/17003", url="https://www.jmir.org/2020/10/e17003", url="http://www.ncbi.nlm.nih.gov/pubmed/33064085" } @Article{info:doi/10.2196/21621, author="Araujo, Magalhaes Sabrina and Sousa, Paulino and Dutra, In{\^e}s", title="Clinical Decision Support Systems for Pressure Ulcer Management: Systematic Review", journal="JMIR Med Inform", year="2020", month="Oct", day="16", volume="8", number="10", pages="e21621", keywords="pressure ulcer", keywords="decision support systems, clinical", keywords="systematic review", abstract="Background: The clinical decision-making process in pressure ulcer management is complex, and its quality depends on both the nurse's experience and the availability of scientific knowledge. This process should follow evidence-based practices incorporating health information technologies to assist health care professionals, such as the use of clinical decision support systems. These systems, in addition to increasing the quality of care provided, can reduce errors and costs in health care. However, the widespread use of clinical decision support systems still has limited evidence, indicating the need to identify and evaluate its effects on nursing clinical practice. Objective: The goal of the review was to identify the effects of nurses using clinical decision support systems on clinical decision making for pressure ulcer management. Methods: The systematic review was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) recommendations. The search was conducted in April 2019 on 5 electronic databases: MEDLINE, SCOPUS, Web of Science, Cochrane, and CINAHL, without publication date or study design restrictions. Articles that addressed the use of computerized clinical decision support systems in pressure ulcer care applied in clinical practice were included. The reference lists of eligible articles were searched manually. The Mixed Methods Appraisal Tool was used to assess the methodological quality of the studies. Results: The search strategy resulted in 998 articles, 16 of which were included. The year of publication ranged from 1995 to 2017, with 45\% of studies conducted in the United States. Most addressed the use of clinical decision support systems by nurses in pressure ulcers prevention in inpatient units. All studies described knowledge-based systems that assessed the effects on clinical decision making, clinical effects secondary to clinical decision support system use, or factors that influenced the use or intention to use clinical decision support systems by health professionals and the success of their implementation in nursing practice. Conclusions: The evidence in the available literature about the effects of clinical decision support systems (used by nurses) on decision making for pressure ulcer prevention and treatment is still insufficient. No significant effects were found on nurses' knowledge following the integration of clinical decision support systems into the workflow, with assessments made for a brief period of up to 6 months. Clinical effects, such as outcomes in the incidence and prevalence of pressure ulcers, remain limited in the studies, and most found clinically but nonstatistically significant results in decreasing pressure ulcers. It is necessary to carry out studies that prioritize better adoption and interaction of nurses with clinical decision support systems, as well as studies with a representative sample of health care professionals, randomized study designs, and application of assessment instruments appropriate to the professional and institutional profile. In addition, long-term follow-up is necessary to assess the effects of clinical decision support systems that can demonstrate a more real, measurable, and significant effect on clinical decision making. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42019127663; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=127663 ", doi="10.2196/21621", url="http://medinform.jmir.org/2020/10/e21621/", url="http://www.ncbi.nlm.nih.gov/pubmed/33064099" } @Article{info:doi/10.2196/20265, author="Kao, David and Larson, Cynthia and Fletcher, Dana and Stegner, Kris", title="Clinical Decision Support May Link Multiple Domains to Improve Patient Care: Viewpoint", journal="JMIR Med Inform", year="2020", month="Oct", day="16", volume="8", number="10", pages="e20265", keywords="clinical decision support", keywords="population medicine", keywords="evidence-based medicine", keywords="precision medicine", keywords="care management", keywords="electronic health records", doi="10.2196/20265", url="https://medinform.jmir.org/2020/10/e20265", url="http://www.ncbi.nlm.nih.gov/pubmed/33064106" } @Article{info:doi/10.2196/17512, author="Torres Silva, Augusto Ever and Uribe, Sebastian and Smith, Jack and Luna Gomez, Felipe Ivan and Florez-Arango, Fernando Jose", title="XML Data and Knowledge-Encoding Structure for a Web-Based and Mobile Antenatal Clinical Decision Support System: Development Study", journal="JMIR Form Res", year="2020", month="Oct", day="16", volume="4", number="10", pages="e17512", keywords="clinical decision support systems", keywords="computer-interpretable guidelines", keywords="knowledge representation", keywords="state machine", keywords="system design", keywords="XML", abstract="Background: Displeasure with the functionality of clinical decision support systems (CDSSs) is considered the primary challenge in CDSS development. A major difficulty in CDSS design is matching the functionality to the desired and actual clinical workflow. Computer-interpretable guidelines (CIGs) are used to formalize medical knowledge in clinical practice guidelines (CPGs) in a computable language. However, existing CIG frameworks require a specific interpreter for each CIG language, hindering the ease of implementation and interoperability. Objective: This paper aims to describe a different approach to the representation of clinical knowledge and data. We intended to change the clinician's perception of a CDSS with sufficient expressivity of the representation while maintaining a small communication and software footprint for both a web application and a mobile app. This approach was originally intended to create a readable and minimal syntax for a web CDSS and future mobile app for antenatal care guidelines with improved human-computer interaction and enhanced usability by aligning the system behavior with clinical workflow. Methods: We designed and implemented an architecture design for our CDSS, which uses the model-view-controller (MVC) architecture and a knowledge engine in the MVC architecture based on XML. The knowledge engine design also integrated the requirement of matching clinical care workflow that was desired in the CDSS. For this component of the design task, we used a work ontology analysis of the CPGs for antenatal care in our particular target clinical settings. Results: In comparison to other common CIGs used for CDSSs, our XML approach can be used to take advantage of the flexible format of XML to facilitate the electronic sharing of structured data. More importantly, we can take advantage of its flexibility to standardize CIG structure design in a low-level specification language that is ubiquitous, universal, computationally efficient, integrable with web technologies, and human readable. Conclusions: Our knowledge representation framework incorporates fundamental elements of other CIGs used in CDSSs in medicine and proved adequate to encode a number of antenatal health care CPGs and their associated clinical workflows. The framework appears general enough to be used with other CPGs in medicine. XML proved to be a language expressive enough to describe planning problems in a computable form and restrictive and expressive enough to implement in a clinical system. It can also be effective for mobile apps, where intermittent communication requires a small footprint and an autonomous app. This approach can be used to incorporate overlapping capabilities of more specialized CIGs in medicine. ", doi="10.2196/17512", url="http://formative.jmir.org/2020/10/e17512/", url="http://www.ncbi.nlm.nih.gov/pubmed/33064087" } @Article{info:doi/10.2196/19878, author="Liu, Ping-Yen and Tsai, Yi-Shan and Chen, Po-Lin and Tsai, Huey-Pin and Hsu, Ling-Wei and Wang, Chi-Shiang and Lee, Nan-Yao and Huang, Mu-Shiang and Wu, Yun-Chiao and Ko, Wen-Chien and Yang, Yi-Ching and Chiang, Jung-Hsien and Shen, Meng-Ru", title="Application of an Artificial Intelligence Trilogy to Accelerate Processing of Suspected Patients With SARS-CoV-2 at a Smart Quarantine Station: Observational Study", journal="J Med Internet Res", year="2020", month="Oct", day="14", volume="22", number="10", pages="e19878", keywords="SARS-CoV-2", keywords="COVID-19", keywords="artificial intelligence", keywords="smart device assisted decision making", keywords="quarantine station", abstract="Background: As the COVID-19 epidemic increases in severity, the burden of quarantine stations outside emergency departments (EDs) at hospitals is increasing daily. To address the high screening workload at quarantine stations, all staff members with medical licenses are required to work shifts in these stations. Therefore, it is necessary to simplify the workflow and decision-making process for physicians and surgeons from all subspecialties. Objective: The aim of this paper is to demonstrate how the National Cheng Kung University Hospital artificial intelligence (AI) trilogy of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm improves medical care and reduces quarantine processing times. Methods: This observational study on the emerging COVID-19 pandemic included 643 patients. An ``AI trilogy'' of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm on a tablet computer was applied to shorten the quarantine survey process and reduce processing time during the COVID-19 pandemic. Results: The use of the AI trilogy facilitated the processing of suspected cases of COVID-19 with or without symptoms; also, travel, occupation, contact, and clustering histories were obtained with the tablet computer device. A separate AI-mode function that could quickly recognize pulmonary infiltrates on chest x-rays was merged into the smart clinical assisting system (SCAS), and this model was subsequently trained with COVID-19 pneumonia cases from the GitHub open source data set. The detection rates for posteroanterior and anteroposterior chest x-rays were 55/59 (93\%) and 5/11 (45\%), respectively. The SCAS algorithm was continuously adjusted based on updates to the Taiwan Centers for Disease Control public safety guidelines for faster clinical decision making. Our ex vivo study demonstrated the efficiency of disinfecting the tablet computer surface by wiping it twice with 75\% alcohol sanitizer. To further analyze the impact of the AI application in the quarantine station, we subdivided the station group into groups with or without AI. Compared with the conventional ED (n=281), the survey time at the quarantine station (n=1520) was significantly shortened; the median survey time at the ED was 153 minutes (95\% CI 108.5-205.0), vs 35 minutes at the quarantine station (95\% CI 24-56; P<.001). Furthermore, the use of the AI application in the quarantine station reduced the survey time in the quarantine station; the median survey time without AI was 101 minutes (95\% CI 40-153), vs 34 minutes (95\% CI 24-53) with AI in the quarantine station (P<.001). Conclusions: The AI trilogy improved our medical care workflow by shortening the quarantine survey process and reducing the processing time, which is especially important during an emerging infectious disease epidemic. ", doi="10.2196/19878", url="http://www.jmir.org/2020/10/e19878/", url="http://www.ncbi.nlm.nih.gov/pubmed/33001832" } @Article{info:doi/10.2196/18331, author="Sampa, Begum Masuda and Hossain, Nazmul Md and Hoque, Rakibul Md and Islam, Rafiqul and Yokota, Fumihiko and Nishikitani, Mariko and Ahmed, Ashir", title="Blood Uric Acid Prediction With Machine Learning: Model Development and Performance Comparison", journal="JMIR Med Inform", year="2020", month="Oct", day="8", volume="8", number="10", pages="e18331", keywords="blood uric acid", keywords="urban corporate population", keywords="machine learning", keywords="noncommunicable diseases", keywords="Bangladesh", keywords="boosted decision tree regression model", abstract="Background: Uric acid is associated with noncommunicable diseases such as cardiovascular diseases, chronic kidney disease, coronary artery disease, stroke, diabetes, metabolic syndrome, vascular dementia, and hypertension. Therefore, uric acid is considered to be a risk factor for the development of noncommunicable diseases. Most studies on uric acid have been performed in developed countries, and the application of machine-learning approaches in uric acid prediction in developing countries is rare. Different machine-learning algorithms will work differently on different types of data in various diseases; therefore, a different investigation is needed for different types of data to identify the most accurate algorithms. Specifically, no study has yet focused on the urban corporate population in Bangladesh, despite the high risk of developing noncommunicable diseases for this population. Objective: The aim of this study was to develop a model for predicting blood uric acid values based on basic health checkup test results, dietary information, and sociodemographic characteristics using machine-learning algorithms. The prediction of health checkup test measurements can be very helpful to reduce health management costs. Methods: Various machine-learning approaches were used in this study because clinical input data are not completely independent and exhibit complex interactions. Conventional statistical models have limitations to consider these complex interactions, whereas machine learning can consider all possible interactions among input data. We used boosted decision tree regression, decision forest regression, Bayesian linear regression, and linear regression to predict personalized blood uric acid based on basic health checkup test results, dietary information, and sociodemographic characteristics. We evaluated the performance of these five widely used machine-learning models using data collected from 271 employees in the Grameen Bank complex of Dhaka, Bangladesh. Results: The mean uric acid level was 6.63 mg/dL, indicating a borderline result for the majority of the sample (normal range <7.0 mg/dL). Therefore, these individuals should be monitoring their uric acid regularly. The boosted decision tree regression model showed the best performance among the models tested based on the root mean squared error of 0.03, which is also better than that of any previously reported model. Conclusions: A uric acid prediction model was developed based on personal characteristics, dietary information, and some basic health checkup measurements. This model will be useful for improving awareness among high-risk individuals and populations, which can help to save medical costs. A future study could include additional features (eg, work stress, daily physical activity, alcohol intake, eating red meat) in improving prediction. ", doi="10.2196/18331", url="https://medinform.jmir.org/2020/10/e18331", url="http://www.ncbi.nlm.nih.gov/pubmed/33030442" } @Article{info:doi/10.2196/21367, author="Calvo, Mireia and Gonz{\'a}lez, Rub{\`e}n and Seijas, N{\'u}ria and Vela, Emili and Hern{\'a}ndez, Carme and Batiste, Guillem and Miralles, Felip and Roca, Josep and Cano, Isaac and Jan{\'e}, Raimon", title="Health Outcomes from Home Hospitalization: Multisource Predictive Modeling", journal="J Med Internet Res", year="2020", month="Oct", day="7", volume="22", number="10", pages="e21367", keywords="home hospitalization", keywords="health risk assessment", keywords="predictive modeling", keywords="chronic care", keywords="integrated care", keywords="modeling", keywords="hospitalization", keywords="health risk", keywords="prediction", keywords="mortality", keywords="clinical decision support", abstract="Background: Home hospitalization is widely accepted as a cost-effective alternative to conventional hospitalization for selected patients. A recent analysis of the home hospitalization and early discharge (HH/ED) program at Hospital Cl{\'i}nic de Barcelona over a 10-year period demonstrated high levels of acceptance by patients and professionals, as well as health value-based generation at the provider and health-system levels. However, health risk assessment was identified as an unmet need with the potential to enhance clinical decision making. Objective: The objective of this study is to generate and assess predictive models of mortality and in-hospital admission at entry and at HH/ED discharge. Methods: Predictive modeling of mortality and in-hospital admission was done in 2 different scenarios: at entry into the HH/ED program and at discharge, from January 2009 to December 2015. Multisource predictive variables, including standard clinical data, patients' functional features, and population health risk assessment, were considered. Results: We studied 1925 HH/ED patients by applying a random forest classifier, as it showed the best performance. Average results of the area under the receiver operating characteristic curve (AUROC; sensitivity/specificity) for the prediction of mortality were 0.88 (0.81/0.76) and 0.89 (0.81/0.81) at entry and at home hospitalization discharge, respectively; the AUROC (sensitivity/specificity) values for in-hospital admission were 0.71 (0.67/0.64) and 0.70 (0.71/0.61) at entry and at home hospitalization discharge, respectively. Conclusions: The results showed potential for feeding clinical decision support systems aimed at supporting health professionals for inclusion of candidates into the HH/ED program, and have the capacity to guide transitions toward community-based care at HH discharge. ", doi="10.2196/21367", url="http://www.jmir.org/2020/10/e21367/", url="http://www.ncbi.nlm.nih.gov/pubmed/33026357" } @Article{info:doi/10.2196/19879, author="Gruendner, Julian and Wolf, Nicolas and T{\"o}gel, Lars and Haller, Florian and Prokosch, Hans-Ulrich and Christoph, Jan", title="Integrating Genomics and Clinical Data for Statistical Analysis by Using GEnome MINIng (GEMINI) and Fast Healthcare Interoperability Resources (FHIR): System Design and Implementation", journal="J Med Internet Res", year="2020", month="Oct", day="7", volume="22", number="10", pages="e19879", keywords="next-generation sequencing", keywords="data analysis", keywords="genetic databases", keywords="GEnome MINIng", keywords="Fast Healthcare Interoperability Resources", keywords="data standardization", abstract="Background: The introduction of next-generation sequencing (NGS) into molecular cancer diagnostics has led to an increase in the data available for the identification and evaluation of driver mutations and for defining personalized cancer treatment regimens. The meaningful combination of omics data, ie, pathogenic gene variants and alterations with other patient data, to understand the full picture of malignancy has been challenging. Objective: This study describes the implementation of a system capable of processing, analyzing, and subsequently combining NGS data with other clinical patient data for analysis within and across institutions. Methods: On the basis of the already existing NGS analysis workflows for the identification of malignant gene variants at the Institute of Pathology of the University Hospital Erlangen, we defined basic requirements on an NGS processing and analysis pipeline and implemented a pipeline based on the GEMINI (GEnome MINIng) open source genetic variation database. For the purpose of validation, this pipeline was applied to data from the 1000 Genomes Project and subsequently to NGS data derived from 206 patients of a local hospital. We further integrated the pipeline into existing structures of data integration centers at the University Hospital Erlangen and combined NGS data with local nongenomic patient-derived data available in Fast Healthcare Interoperability Resources format. Results: Using data from the 1000 Genomes Project and from the patient cohort as input, the implemented system produced the same results as already established methodologies. Further, it satisfied all our identified requirements and was successfully integrated into the existing infrastructure. Finally, we showed in an exemplary analysis how the data could be quickly loaded into and analyzed in KETOS, a web-based analysis platform for statistical analysis and clinical decision support. Conclusions: This study demonstrates that the GEMINI open source database can be augmented to create an NGS analysis pipeline. The pipeline generates high-quality results consistent with the already established workflows for gene variant annotation and pathological evaluation. We further demonstrate how NGS-derived genomic and other clinical data can be combined for further statistical analysis, thereby providing for data integration using standardized vocabularies and methods. Finally, we demonstrate the feasibility of the pipeline integration into hospital workflows by providing an exemplary integration into the data integration center infrastructure, which is currently being established across Germany. ", doi="10.2196/19879", url="http://www.jmir.org/2020/10/e19879/", url="http://www.ncbi.nlm.nih.gov/pubmed/33026356" } @Article{info:doi/10.2196/18287, author="Xiu, Xiaolei and Qian, Qing and Wu, Sizhu", title="Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study", journal="JMIR Med Inform", year="2020", month="Oct", day="7", volume="8", number="10", pages="e18287", keywords="Chinese electronic medical records", keywords="knowledge graph", keywords="digestive system tumor", keywords="graph evaluation", abstract="Background: With the increasing incidences and mortality of digestive system tumor diseases in China, ways to use clinical experience data in Chinese electronic medical records (CEMRs) to determine potentially effective relationships between diagnosis and treatment have become a priority. As an important part of artificial intelligence, a knowledge graph is a powerful tool for information processing and knowledge organization that provides an ideal means to solve this problem. Objective: This study aimed to construct a semantic-driven digestive system tumor knowledge graph (DSTKG) to represent the knowledge in CEMRs with fine granularity and semantics. Methods: This paper focuses on the knowledge graph schema and semantic relationships that were the main challenges for constructing a Chinese tumor knowledge graph. The DSTKG was developed through a multistep procedure. As an initial step, a complete DSTKG construction framework based on CEMRs was proposed. Then, this research built a knowledge graph schema containing 7 classes and 16 kinds of semantic relationships and accomplished the DSTKG by knowledge extraction, named entity linking, and drawing the knowledge graph. Finally, the quality of the DSTKG was evaluated from 3 aspects: data layer, schema layer, and application layer. Results: Experts agreed that the DSTKG was good overall (mean score 4.20). Especially for the aspects of ``rationality of schema structure,'' ``scalability,'' and ``readability of results,'' the DSTKG performed well, with scores of 4.72, 4.67, and 4.69, respectively, which were much higher than the average. However, the small amount of data in the DSTKG negatively affected its ``practicability'' score. Compared with other Chinese tumor knowledge graphs, the DSTKG can represent more granular entities, properties, and semantic relationships. In addition, the DSTKG was flexible, allowing personalized customization to meet the designer's focus on specific interests in the digestive system tumor. Conclusions: We constructed a granular semantic DSTKG. It could provide guidance for the construction of a tumor knowledge graph and provide a preliminary step for the intelligent application of knowledge graphs based on CEMRs. Additional data sources and stronger research on assertion classification are needed to gain insight into the DSTKG's potential. ", doi="10.2196/18287", url="http://medinform.jmir.org/2020/10/e18287/", url="http://www.ncbi.nlm.nih.gov/pubmed/33026359" } @Article{info:doi/10.2196/21628, author="Nan, Shan and Tang, Tianhua and Feng, Hongshuo and Wang, Yijie and Li, Mengyang and Lu, Xudong and Duan, Huilong", title="A Computer-Interpretable Guideline for COVID-19: Rapid Development and Dissemination", journal="JMIR Med Inform", year="2020", month="Oct", day="1", volume="8", number="10", pages="e21628", keywords="COVID-19", keywords="guideline", keywords="CDSS", keywords="openEHR", keywords="Guideline Definition Language", keywords="development", keywords="dissemination", keywords="electronic health record", keywords="algorithm", abstract="Background: COVID-19 is a global pandemic that is affecting more than 200 countries worldwide. Efficient diagnosis and treatment are crucial to combat the disease. Computer-interpretable guidelines (CIGs) can aid the broad global adoption of evidence-based diagnosis and treatment knowledge. However, currently, no internationally shareable CIG exists. Objective: The aim of this study was to establish a rapid CIG development and dissemination approach and apply it to develop a shareable CIG for COVID-19. Methods: A 6-step rapid CIG development and dissemination approach was designed and applied. Processes, roles, and deliverable artifacts were specified in this approach to eliminate ambiguities during development of the CIG. The Guideline Definition Language (GDL) was used to capture the clinical rules. A CIG for COVID-19 was developed by translating, interpreting, annotating, extracting, and formalizing the Chinese COVID-19 diagnosis and treatment guideline. A prototype application was implemented to validate the CIG. Results: We used 27 archetypes for the COVID-19 guideline. We developed 18 GDL rules to cover the diagnosis and treatment suggestion algorithms in the narrative guideline. The CIG was further translated to object data model and Drools rules to facilitate its use by people who do not employ the non-openEHR archetype. The prototype application validated the correctness of the CIG with a public data set. Both the GDL rules and Drools rules have been disseminated on GitHub. Conclusions: Our rapid CIG development and dissemination approach accelerated the pace of COVID-19 CIG development. A validated COVID-19 CIG is now available to the public. ", doi="10.2196/21628", url="https://medinform.jmir.org/2020/10/e21628", url="http://www.ncbi.nlm.nih.gov/pubmed/32931443" } @Article{info:doi/10.2196/20645, author="Li, Rui and Yin, Changchang and Yang, Samuel and Qian, Buyue and Zhang, Ping", title="Marrying Medical Domain Knowledge With Deep Learning on Electronic Health Records: A Deep Visual Analytics Approach", journal="J Med Internet Res", year="2020", month="Sep", day="28", volume="22", number="9", pages="e20645", keywords="electronic health records", keywords="interpretable deep learning", keywords="knowledge graph", keywords="visual analytics", abstract="Background: Deep learning models have attracted significant interest from health care researchers during the last few decades. There have been many studies that apply deep learning to medical applications and achieve promising results. However, there are three limitations to the existing models: (1) most clinicians are unable to interpret the results from the existing models, (2) existing models cannot incorporate complicated medical domain knowledge (eg, a disease causes another disease), and (3) most existing models lack visual exploration and interaction. Both the electronic health record (EHR) data set and the deep model results are complex and abstract, which impedes clinicians from exploring and communicating with the model directly. Objective: The objective of this study is to develop an interpretable and accurate risk prediction model as well as an interactive clinical prediction system to support EHR data exploration, knowledge graph demonstration, and model interpretation. Methods: A domain-knowledge--guided recurrent neural network (DG-RNN) model is proposed to predict clinical risks. The model takes medical event sequences as input and incorporates medical domain knowledge by attending to a subgraph of the whole medical knowledge graph. A global pooling operation and a fully connected layer are used to output the clinical outcomes. The middle results and the parameters of the fully connected layer are helpful in identifying which medical events cause clinical risks. DG-Viz is also designed to support EHR data exploration, knowledge graph demonstration, and model interpretation. Results: We conducted both risk prediction experiments and a case study on a real-world data set. A total of 554 patients with heart failure and 1662 control patients without heart failure were selected from the data set. The experimental results show that the proposed DG-RNN outperforms the state-of-the-art approaches by approximately 1.5\%. The case study demonstrates how our medical physician collaborator can effectively explore the data and interpret the prediction results using DG-Viz. Conclusions: In this study, we present DG-Viz, an interactive clinical prediction system, which brings together the power of deep learning (ie, a DG-RNN--based model) and visual analytics to predict clinical risks and visually interpret the EHR prediction results. Experimental results and a case study on heart failure risk prediction tasks demonstrate the effectiveness and usefulness of the DG-Viz system. This study will pave the way for interactive, interpretable, and accurate clinical risk predictions. ", doi="10.2196/20645", url="http://www.jmir.org/2020/9/e20645/", url="http://www.ncbi.nlm.nih.gov/pubmed/32985996" } @Article{info:doi/10.2196/23565, author="Jarrett, Mark and Schultz, Susanne and Lyall, Julie and Wang, Jason and Stier, Lori and De Geronimo, Marcella and Nelson, Karen", title="Clinical Mortality in a Large COVID-19 Cohort: Observational Study", journal="J Med Internet Res", year="2020", month="Sep", day="25", volume="22", number="9", pages="e23565", keywords="COVID-19", keywords="mortality", keywords="respiratory failure", keywords="hypoxemia", keywords="observational", keywords="review", keywords="cohort", keywords="ICU", keywords="intensive care unit", abstract="Background: Northwell Health, an integrated health system in New York, has treated more than 15,000 inpatients with COVID-19 at the US epicenter of the SARS-CoV-2 pandemic. Objective: We describe the demographic characteristics of patients who died of COVID-19, observation of frequent rapid response team/cardiac arrest (RRT/CA) calls for non--intensive care unit (ICU) patients, and factors that contributed to RRT/CA calls. Methods: A team of registered nurses reviewed the medical records of inpatients who tested positive for SARS-CoV-2 via polymerase chain reaction before or on admission and who died between March 13 (first Northwell Health inpatient expiration) and April 30, 2020, at 15 Northwell Health hospitals. The findings for these patients were abstracted into a database and statistically analyzed. Results: Of 2634 patients who died of COVID-19, 1478 (56.1\%) had oxygen saturation levels ?90\% on presentation and required no respiratory support. At least one RRT/CA was called on 1112/2634 patients (42.2\%) at a non-ICU level of care. Before the RRT/CA call, the most recent oxygen saturation levels for 852/1112 (76.6\%) of these non-ICU patients were at least 90\%. At the time the RRT/CA was called, 479/1112 patients (43.1\%) had an oxygen saturation of <80\%. Conclusions: This study represents one of the largest reviewed cohorts of mortality that also captures data in nonstructured fields. Approximately 50\% of deaths occurred at a non-ICU level of care despite admission to the appropriate care setting with normal staffing. The data imply a sudden, unexpected deterioration in respiratory status requiring RRT/CA in a large number of non-ICU patients. Patients admitted at a non-ICU level of care suffered rapid clinical deterioration, often with a sudden decrease in oxygen saturation. These patients could benefit from additional monitoring (eg, continuous central oxygenation saturation), although this approach warrants further study. ", doi="10.2196/23565", url="http://www.jmir.org/2020/9/e23565/", url="http://www.ncbi.nlm.nih.gov/pubmed/32930099" } @Article{info:doi/10.2196/20268, author="Kline, Adrienne and Kline, Theresa and Shakeri Hossein Abad, Zahra and Lee, Joon", title="Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach", journal="J Med Internet Res", year="2020", month="Sep", day="25", volume="22", number="9", pages="e20268", keywords="item response theory", keywords="machine learning", keywords="statistical model", keywords="mortality", abstract="Background: Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account. Objective: A more case-based, comprehensive approach is warranted to assess supervised ML outcomes and forms the rationale for this study. This study aims to demonstrate how the item response theory (IRT) can be used to stratify the data based on how difficult each case is to classify, independent of the outcome measure of interest (eg, accuracy). This stratification allows the evaluation of ML classifiers to take the form of a distribution rather than a single scalar value. Methods: Two large, public intensive care unit data sets, Medical Information Mart for Intensive Care III and electronic intensive care unit, were used to showcase this method in predicting mortality. For each data set, a balanced sample (n=8078 and n=21,940, respectively) and an imbalanced sample (n=12,117 and n=32,910, respectively) were drawn. A 2-parameter logistic model was used to provide scores for each case. Several ML algorithms were used in the demonstration to classify cases based on their health-related features: logistic regression, linear discriminant analysis, K-nearest neighbors, decision tree, naive Bayes, and a neural network. Generalized linear mixed model analyses were used to assess the effects of case difficulty strata, ML algorithm, and the interaction between them in predicting accuracy. Results: The results showed significant effects (P<.001) for case difficulty strata, ML algorithm, and their interaction in predicting accuracy and illustrated that all classifiers performed better with easier-to-classify cases and that overall the neural network performed best. Significant interactions suggest that cases that fall in the most arduous strata should be handled by logistic regression, linear discriminant analysis, decision tree, or neural network but not by naive Bayes or K-nearest neighbors. Conventional metrics for ML classification have been reported for methodological comparison. Conclusions: This demonstration shows that using the IRT is a viable method for understanding the data that are provided to ML algorithms, independent of outcome measures, and highlights how well classifiers differentiate cases of varying difficulty. This method explains which features are indicative of healthy states and why. It enables end users to tailor the classifier that is appropriate to the difficulty level of the patient for personalized medicine. ", doi="10.2196/20268", url="http://www.jmir.org/2020/9/e20268/", url="http://www.ncbi.nlm.nih.gov/pubmed/32975523" } @Article{info:doi/10.2196/19516, author="Dolci, Elisa and Sch{\"a}rer, Barbara and Grossmann, Nicole and Musy, Naima Sarah and Z{\'u}{\~n}iga, Franziska and Bachnick, Stefanie and Simon, Michael", title="Automated Fall Detection Algorithm With Global Trigger Tool, Incident Reports, Manual Chart Review, and Patient-Reported Falls: Algorithm Development and Validation With a Retrospective Diagnostic Accuracy Study", journal="J Med Internet Res", year="2020", month="Sep", day="21", volume="22", number="9", pages="e19516", keywords="falls", keywords="adverse event", keywords="harm", keywords="algorithm", keywords="natural language processing", abstract="Background: Falls are common adverse events in hospitals, frequently leading to additional health costs due to prolonged stays and extra care. Therefore, reliable fall detection is vital to develop and test fall prevention strategies. However, conventional methods---voluntary incident reports and manual chart reviews---are error-prone and time consuming, respectively. Using a search algorithm to examine patients' electronic health record data and flag fall indicators offers an inexpensive, sensitive, cost-effective alternative. Objective: This study's purpose was to develop a fall detection algorithm for use with electronic health record data, then to evaluate it alongside the Global Trigger Tool, incident reports, a manual chart review, and patient-reported falls. Methods: Conducted on 2 campuses of a large hospital system in Switzerland, this retrospective diagnostic accuracy study consisted of 2 substudies: the first, targeting 240 patients, for algorithm development and the second, targeting 298 patients, for validation. In the development study, we compared the new algorithm's in-hospital fall rates with those indicated by the Global Trigger Tool and incident reports; in the validation study, we compared the algorithm's in-hospital fall rates with those from patient-reported falls and manual chart review. We compared the various methods by calculating sensitivity, specificity, and predictive values. Results: Twenty in-hospital falls were discovered in the development study sample. Of these, the algorithm detected 19 (sensitivity 95\%), the Global Trigger Tool detected 18 (90\%), and incident reports detected 14 (67\%). Of the 15 falls found in the validation sample, the algorithm identified all 15 (100\%), the manual chart review identified 14 (93\%), and the patient-reported fall measure identified 5 (33\%). Owing to relatively high numbers of false positives based on falls present on admission, the algorithm's positive predictive values were 50\% (development sample) and 47\% (validation sample). Instead of requiring 10 minutes per case for a full manual review or 20 minutes to apply the Global Trigger Tool, the algorithm requires only a few seconds, after which only the positive results (roughly 11\% of the full case number) require review. Conclusions: The newly developed electronic health record algorithm demonstrated very high sensitivity for fall detection. Applied in near real time, the algorithm can record in-hospital falls events effectively and help to develop and test fall prevention measures. ", doi="10.2196/19516", url="http://www.jmir.org/2020/9/e19516/", url="http://www.ncbi.nlm.nih.gov/pubmed/32955445" } @Article{info:doi/10.2196/19096, author="Xu, Huiyu and Shi, Li and Feng, Guoshuang and Xiao, Zhen and Chen, Lixue and Li, Rong and Qiao, Jie", title="An Ovarian Reserve Assessment Model Based on Anti-M{\"u}llerian Hormone Levels, Follicle-Stimulating Hormone Levels, and Age: Retrospective Cohort Study", journal="J Med Internet Res", year="2020", month="Sep", day="21", volume="22", number="9", pages="e19096", keywords="ovarian reserve", keywords="poor ovarian response", keywords="AMH", keywords="AFC", keywords="FSH", keywords="logistic regression", abstract="Background: Previously, we reported a model for assessing ovarian reserves using 4 predictors: anti-M{\"u}llerian hormone (AMH) level, antral follicle count (AFC), follicle-stimulating hormone (FSH) level, and female age. This model is referred as the AAFA (anti-M{\"u}llerian hormone level--antral follicle count--follicle-stimulating hormone level--age) model. Objective: This study aims to explore the possibility of establishing a model for predicting ovarian reserves using only 3 factors: AMH level, FSH level, and age. The proposed model is referred to as the AFA (anti-M{\"u}llerian hormone level--follicle-stimulating hormone level--age) model. Methods: Oocytes from ovarian cycles stimulated by gonadotropin-releasing hormone antagonist were collected retrospectively at our reproductive center. Poor ovarian response (<5 oocytes retrieved) was defined as an outcome variable. The AFA model was built using a multivariable logistic regression analysis on data from 2017; data from 2018 were used to validate the performance of AFA model. Measurements of the area under the curve (AUC), sensitivity, specificity, positive predictive value, and negative predicative value were used to evaluate the performance of the model. To rank the ovarian reserves of the whole population, we ranked the subgroups according to the predicted probability of poor ovarian response and further divided the 60 subgroups into 4 clusters, A-D, according to cut-off values consistent with the AAFA model. Results: The AUCs of the AFA and AAFA models were similar for the same validation set, with values of 0.853 (95\% CI 0.841-0.865) and 0.850 (95\% CI 0.838-0.862), respectively. We further ranked the ovarian reserves according to their predicted probability of poor ovarian response, which was calculated using our AFA model. The actual incidences of poor ovarian response in groups from A-D in the AFA model were 0.037 (95\% CI 0.029-0.046), 0.128 (95\% CI 0.099-0.165), 0.294 (95\% CI 0.250-0.341), and 0.624 (95\% CI 0.577-0.669), respectively. The order of ovarian reserve from adequate to poor followed the order from A to D. The clinical pregnancy rate, live-birth rate, and specific differences in groups A-D were similar when predicted using the AFA and AAFA models. Conclusions: This AFA model for assessing the true ovarian reserve was more convenient, cost-effective, and objective than our original AAFA model. ", doi="10.2196/19096", url="https://www.jmir.org/2020/9/e19096", url="http://www.ncbi.nlm.nih.gov/pubmed/32667898" } @Article{info:doi/10.2196/18846, author="Dallora, Luiza Ana and Kvist, Ola and Berglund, Sanmartin Johan and Ruiz, Diaz Sandra and Boldt, Martin and Flodmark, Carl-Erik and Anderberg, Peter", title="Chronological Age Assessment in Young Individuals Using Bone Age Assessment Staging and Nonradiological Aspects: Machine Learning Multifactorial Approach", journal="JMIR Med Inform", year="2020", month="Sep", day="21", volume="8", number="9", pages="e18846", keywords="chronological age assessment", keywords="bone age", keywords="skeletal maturity", keywords="machine learning", keywords="magnetic resonance imaging", keywords="radius", keywords="distal tibia", keywords="proximal tibia", keywords="distal femur", keywords="calcaneus", abstract="Background: Bone age assessment (BAA) is used in numerous pediatric clinical settings as well as in legal settings when entities need an estimate of chronological age (CA) when valid documents are lacking. The latter case presents itself as critical as the law is harsher for adults and granted rights along with imputability changes drastically if the individual is a minor. Traditional BAA methods have drawbacks such as exposure of minors to radiation, they do not consider factors that might affect the bone age, and they mostly focus on a single region. Given the critical scenarios in which BAA can affect the lives of young individuals, it is important to focus on the drawbacks of the traditional methods and investigate the potential of estimating CA through BAA. Objective: This study aims to investigate CA estimation through BAA in young individuals aged 14-21 years with machine learning methods, addressing the drawbacks of research using magnetic resonance imaging (MRI), assessment of multiple regions of interest, and other factors that may affect the bone age. Methods: MRI examinations of the radius, distal tibia, proximal tibia, distal femur, and calcaneus were performed on 465 men and 473 women (aged 14-21 years). Measures of weight and height were taken from the subjects, and a questionnaire was given for additional information (self-assessed Tanner Scale, physical activity level, parents' origin, and type of residence during upbringing). Two pediatric radiologists independently assessed the MRI images to evaluate their stage of bone development (blinded to age, gender, and each other). All the gathered information was used in training machine learning models for CA estimation and minor versus adult classification (threshold of 18 years). Different machine learning methods were investigated. Results: The minor versus adult classification produced accuracies of 0.90 and 0.84 for male and female subjects, respectively, with high recalls for the classification of minors. The CA estimation for the 8 age groups (aged 14-21 years) achieved mean absolute errors of 0.95 years and 1.24 years for male and female subjects, respectively. However, for the latter, a lower error occurred only for the ages of 14 and 15 years. Conclusions: This study investigates CA estimation through BAA using machine learning methods in 2 ways: minor versus adult classification and CA estimation in 8 age groups (aged 14-21 years), while addressing the drawbacks in the research on BAA. The first achieved good results; however, for the second case, the BAA was not precise enough for the classification. ", doi="10.2196/18846", url="http://medinform.jmir.org/2020/9/e18846/", url="http://www.ncbi.nlm.nih.gov/pubmed/32955457" } @Article{info:doi/10.2196/18930, author="Entezarjou, Artin and Bonamy, Edstedt Anna-Karin and Benjaminsson, Simon and Herman, Pawel and Midl{\"o}v, Patrik", title="Human- Versus Machine Learning--Based Triage Using Digitalized Patient Histories in Primary Care: Comparative Study", journal="JMIR Med Inform", year="2020", month="Sep", day="3", volume="8", number="9", pages="e18930", keywords="machine learning", keywords="artificial intelligence", keywords="decision support", keywords="primary care", keywords="triage", abstract="Background: Smartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage). Objective: The purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method. Methods: After testing several models, a na{\"i}ve Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Reliability between raters was measured using both Cohen $\kappa$ (adjusted for chance agreement) and percentage agreement (not adjusted for chance agreement). Results: Interrater reliability as measured by Cohen $\kappa$ was 0.17 when comparing the majority vote of the reference group with the model. Agreement was 74\% (138/186) for cases judged not in need of urgent physical examination and 42\% (38/90) for cases judged to be in need of urgent physical examination. No specific features linked to the model's triage decision could be identified. Between physicians within the panel, Cohen $\kappa$ was 0.2. Intrarater reliability when 1 physician retriaged 50 reports resulted in Cohen $\kappa$ of 0.55. Conclusions: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care. ", doi="10.2196/18930", url="https://medinform.jmir.org/2020/9/e18930", url="http://www.ncbi.nlm.nih.gov/pubmed/32880578" } @Article{info:doi/10.2196/19818, author="Savage, Mark and Savage, Clara Lucia", title="Doctors Routinely Share Health Data Electronically Under HIPAA, and Sharing With Patients and Patients' Third-Party Health Apps is Consistent: Interoperability and Privacy Analysis", journal="J Med Internet Res", year="2020", month="Sep", day="2", volume="22", number="9", pages="e19818", keywords="digital health", keywords="privacy", keywords="interoperability", keywords="mobile phone, smartphone", keywords="electronic health records", keywords="EHR", keywords="patient access", keywords="patient engagement", keywords="Health Insurance Portability and Accountability Act", keywords="HIPAA", keywords="Health Information Technology for Economic and Clinical Health Act", keywords="HITECH", keywords="covered entity", keywords="business associate", keywords="protected health information", keywords="PHI", keywords="digital health applications", keywords="apps", doi="10.2196/19818", url="https://www.jmir.org/2020/9/e19818", url="http://www.ncbi.nlm.nih.gov/pubmed/32876582" } @Article{info:doi/10.2196/18109, author="Jimenez, Geronimo and Tyagi, Shilpa and Osman, Tarig and Spinazze, Pier and van der Kleij, Rianne and Chavannes, H. Niels and Car, Josip", title="Improving the Primary Care Consultation for Diabetes and Depression Through Digital Medical Interview Assistant Systems: Narrative Review", journal="J Med Internet Res", year="2020", month="Aug", day="28", volume="22", number="8", pages="e18109", keywords="digital medical interview assistant, computer-assisted history taking", keywords="primary care", keywords="chronic conditions", abstract="Background: Digital medical interview assistant (DMIA) systems, also known as computer-assisted history taking (CAHT) systems, have the potential to improve the quality of care and the medical consultation by exploring more patient-related aspects without time constraints and, therefore, acquiring more and better-quality information prior to the face-to-face consultation. The consultation in primary care is the broadest in terms of the amount of topics to be covered and, at the same time, the shortest in terms of time spent with the patient. Objective: Our aim is to explore how DMIA systems may be used specifically in the context of primary care, to improve the consultations for diabetes and depression, as exemplars of chronic conditions. Methods: A narrative review was conducted focusing on (1) the characteristics of the primary care consultation in general, and for diabetes and depression specifically, and (2) the impact of DMIA and CAHT systems on the medical consultation. Through thematic analysis, we identified the characteristics of the primary care consultation that a DMIA system would be able to improve. Based on the identified primary care consultation tasks and the potential benefits of DMIA systems, we developed a sample questionnaire for diabetes and depression to illustrate how such a system may work. Results: A DMIA system, prior to the first consultation, could aid in the essential primary care tasks of case finding and screening, diagnosing, and, if needed, timely referral to specialists or urgent care. Similarly, for follow-up consultations, this system could aid with the control and monitoring of these conditions, help check for additional health issues, and update the primary care provider about visits to other providers or further testing. Successfully implementing a DMIA system for these tasks would improve the quality of the data obtained, which means earlier diagnosis and treatment. Such a system would improve the use of face-to-face consultation time, thereby streamlining the interaction and allowing the focus to be the patient's needs, which ultimately would lead to better health outcomes and patient satisfaction. However, for such a system to be successfully incorporated, there are important considerations to be taken into account, such as the language to be used and the challenges for implementing eHealth innovations in primary care and health care in general. Conclusions: Given the benefits explored here, we foresee that DMIA systems could have an important impact in the primary care consultation for diabetes and depression and, potentially, for other chronic conditions. Earlier case finding and a more accurate diagnosis, due to more and better-quality data, paired with improved monitoring of disease progress should improve the quality of care and keep the management of chronic conditions at the primary care level. A somewhat simple, easily scalable technology could go a long way to improve the health of the millions of people affected with chronic conditions, especially if working in conjunction with already-established health technologies such as electronic medical records and clinical decision support systems. ", doi="10.2196/18109", url="http://www.jmir.org/2020/8/e18109/", url="http://www.ncbi.nlm.nih.gov/pubmed/32663144" } @Article{info:doi/10.2196/22033, author="McRae, P. Michael and Dapkins, P. Isaac and Sharif, Iman and Anderman, Judd and Fenyo, David and Sinokrot, Odai and Kang, K. Stella and Christodoulides, J. Nicolaos and Vurmaz, Deniz and Simmons, W. Glennon and Alcorn, M. Timothy and Daoura, J. Marco and Gisburne, Stu and Zar, David and McDevitt, T. John", title="Managing COVID-19 With a Clinical Decision Support Tool in a Community Health Network: Algorithm Development and Validation", journal="J Med Internet Res", year="2020", month="Aug", day="24", volume="22", number="8", pages="e22033", keywords="COVID-19", keywords="coronavirus", keywords="clinical decision support system", keywords="point of care", keywords="mobile app", keywords="disease severity", keywords="biomarkers", keywords="artificial intelligence", keywords="app", keywords="family health center", abstract="Background: The coronavirus disease (COVID-19) pandemic has resulted in significant morbidity and mortality; large numbers of patients require intensive care, which is placing strain on health care systems worldwide. There is an urgent need for a COVID-19 disease severity assessment that can assist in patient triage and resource allocation for patients at risk for severe disease. Objective: The goal of this study was to develop, validate, and scale a clinical decision support system and mobile app to assist in COVID-19 severity assessment, management, and care. Methods: Model training data from 701 patients with COVID-19 were collected across practices within the Family Health Centers network at New York University Langone Health. A two-tiered model was developed. Tier 1 uses easily available, nonlaboratory data to help determine whether biomarker-based testing and/or hospitalization is necessary. Tier 2 predicts the probability of mortality using biomarker measurements (C-reactive protein, procalcitonin, D-dimer) and age. Both the Tier 1 and Tier 2 models were validated using two external datasets from hospitals in Wuhan, China, comprising 160 and 375 patients, respectively. Results: All biomarkers were measured at significantly higher levels in patients who died vs those who were not hospitalized or discharged (P<.001). The Tier 1 and Tier 2 internal validations had areas under the curve (AUCs) of 0.79 (95\% CI 0.74-0.84) and 0.95 (95\% CI 0.92-0.98), respectively. The Tier 1 and Tier 2 external validations had AUCs of 0.79 (95\% CI 0.74-0.84) and 0.97 (95\% CI 0.95-0.99), respectively. Conclusions: Our results demonstrate the validity of the clinical decision support system and mobile app, which are now ready to assist health care providers in making evidence-based decisions when managing COVID-19 patient care. The deployment of these new capabilities has potential for immediate impact in community clinics and sites, where application of these tools could lead to improvements in patient outcomes and cost containment. ", doi="10.2196/22033", url="http://www.jmir.org/2020/8/e22033/", url="http://www.ncbi.nlm.nih.gov/pubmed/32750010" } @Article{info:doi/10.2196/18542, author="Weissler, Hope Elizabeth and Lippmann, J. Steven and Smerek, M. Michelle and Ward, A. Rachael and Kansal, Aman and Brock, Adam and Sullivan, C. Robert and Long, Chandler and Patel, R. Manesh and Greiner, A. Melissa and Hardy, Chantelle N. and Curtis, H. Lesley and Jones, Schuyler W.", title="Model-Based Algorithms for Detecting Peripheral Artery Disease Using Administrative Data From an Electronic Health Record Data System: Algorithm Development Study", journal="JMIR Med Inform", year="2020", month="Aug", day="19", volume="8", number="8", pages="e18542", keywords="peripheral artery disease", keywords="patient selection", keywords="electronic health records", keywords="cardiology", keywords="health data", abstract="Background: Peripheral artery disease (PAD) affects 8 to 10 million Americans, who face significantly elevated risks of both mortality and major limb events such as amputation. Unfortunately, PAD is relatively underdiagnosed, undertreated, and underresearched, leading to wide variations in treatment patterns and outcomes. Efforts to improve PAD care and outcomes have been hampered by persistent difficulties identifying patients with PAD for clinical and investigatory purposes. Objective: The aim of this study is to develop and validate a model-based algorithm to detect patients with peripheral artery disease (PAD) using data from an electronic health record (EHR) system. Methods: An initial query of the EHR in a large health system identified all patients with PAD-related diagnosis codes for any encounter during the study period. Clinical adjudication of PAD diagnosis was performed by chart review on a random subgroup. A binary logistic regression to predict PAD was built and validated using a least absolute shrinkage and selection operator (LASSO) approach in the adjudicated patients. The algorithm was then applied to the nonsampled records to further evaluate its performance. Results: The initial EHR data query using 406 diagnostic codes yielded 15,406 patients. Overall, 2500 patients were randomly selected for ground truth PAD status adjudication. In the end, 108 code flags remained after removing rarely- and never-used codes. We entered these code flags plus administrative encounter, imaging, procedure, and specialist flags into a LASSO model. The area under the curve for this model was 0.862. Conclusions: The algorithm we constructed has two main advantages over other approaches to the identification of patients with PAD. First, it was derived from a broad population of patients with many different PAD manifestations and treatment pathways across a large health system. Second, our model does not rely on clinical notes and can be applied in situations in which only administrative billing data (eg, large administrative data sets) are available. A combination of diagnosis codes and administrative flags can accurately identify patients with PAD in large cohorts. ", doi="10.2196/18542", url="http://medinform.jmir.org/2020/8/e18542/", url="http://www.ncbi.nlm.nih.gov/pubmed/32663152" } @Article{info:doi/10.2196/20974, author="Li, Yong", title="Diagnostic Model for In-Hospital Bleeding in Patients with Acute ST-Segment Elevation Myocardial Infarction: Algorithm Development and Validation", journal="JMIR Med Inform", year="2020", month="Aug", day="14", volume="8", number="8", pages="e20974", keywords="coronary disease", keywords="ST-segment elevation myocardial infarction", keywords="hemorrhage", keywords="nomogram", abstract="Background: Bleeding complications in patients with acute ST-segment elevation myocardial infarction (STEMI) have been associated with increased risk of subsequent adverse consequences. Objective: The objective of our study was to develop and externally validate a diagnostic model of in-hospital bleeding. Methods: We performed multivariate logistic regression of a cohort for hospitalized patients with acute STEMI in the emergency department of a university hospital. Participants: The model development data set was obtained from 4262 hospitalized patients with acute STEMI from January 2002 to December 2013. A set of 6015 hospitalized patients with acute STEMI from January 2014 to August 2019 were used for external validation. We used logistic regression analysis to analyze the risk factors of in-hospital bleeding in the development data set. We developed a diagnostic model of in-hospital bleeding and constructed a nomogram. We assessed the predictive performance of the diagnostic model in the validation data sets by examining measures of discrimination, calibration, and decision curve analysis (DCA). Results: In-hospital bleeding occurred in 112 of 4262 participants (2.6\%) in the development data set. The strongest predictors of in-hospital bleeding were advanced age and high Killip classification. Logistic regression analysis showed differences between the groups with and without in-hospital bleeding in age (odds ratio [OR] 1.047, 95\% CI 1.029-1.066; P<.001), Killip III (OR 3.265, 95\% CI 2.008-5.31; P<.001), and Killip IV (OR 5.133, 95\% CI 3.196-8.242; P<.001). We developed a diagnostic model of in-hospital bleeding. The area under the receiver operating characteristic curve (AUC) was 0.777 (SD 0.021, 95\% CI 0.73576-0.81823). We constructed a nomogram based on age and Killip classification. In-hospital bleeding occurred in 117 of 6015 participants (1.9\%) in the validation data set. The AUC was 0.7234 (SD 0.0252, 95\% CI 0.67392-0.77289). Conclusions: We developed and externally validated a diagnostic model of in-hospital bleeding in patients with acute STEMI. The discrimination, calibration, and DCA of the model were found to be satisfactory. Trial Registration: ChiCTR.org ChiCTR1900027578; http://www.chictr.org.cn/showprojen.aspx?proj=45926 ", doi="10.2196/20974", url="http://medinform.jmir.org/2020/8/e20974/", url="http://www.ncbi.nlm.nih.gov/pubmed/32795995" } @Article{info:doi/10.2196/20773, author="Neuraz, Antoine and Lerner, Ivan and Digan, William and Paris, Nicolas and Tsopra, Rosy and Rogier, Alice and Baudoin, David and Cohen, Bretonnel Kevin and Burgun, Anita and Garcelon, Nicolas and Rance, Bastien and ", title="Natural Language Processing for Rapid Response to Emergent Diseases: Case Study of Calcium Channel Blockers and Hypertension in the COVID-19 Pandemic", journal="J Med Internet Res", year="2020", month="Aug", day="14", volume="22", number="8", pages="e20773", keywords="medication information", keywords="natural language processing", keywords="electronic health records", keywords="COVID-19", keywords="public health", keywords="response", keywords="emergent disease", keywords="informatics", abstract="Background: A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. Objective: The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). Methods: We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. Results: In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. Conclusions: In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable. ", doi="10.2196/20773", url="http://www.jmir.org/2020/8/e20773/", url="http://www.ncbi.nlm.nih.gov/pubmed/32759101" } @Article{info:doi/10.2196/18855, author="Baxter, L. Sally and Klie, R. Adam and Radha Saseendrakumar, Bharanidharan and Ye, Y. Gordon and Hogarth, Michael", title="Text Processing for Detection of Fungal Ocular Involvement in Critical Care Patients: Cross-Sectional Study", journal="J Med Internet Res", year="2020", month="Aug", day="14", volume="22", number="8", pages="e18855", keywords="fungemia", keywords="fungal endophthalmitis", keywords="fungal ocular involvement", keywords="electronic health records", keywords="diagnosis codes", keywords="regular expressions", keywords="natural language processing", keywords="unstructured data", abstract="Background: Fungal ocular involvement can develop in patients with fungal bloodstream infections and can be vision-threatening. Ocular involvement has become less common in the current era of improved antifungal therapies. Retrospectively determining the prevalence of fungal ocular involvement is important for informing clinical guidelines, such as the need for routine ophthalmologic consultations. However, manual retrospective record review to detect cases is time-consuming. Objective: This study aimed to determine the prevalence of fungal ocular involvement in a critical care database using both structured and unstructured electronic health record (EHR) data. Methods: We queried microbiology data from 46,467 critical care patients over 12 years (2000-2012) from the Medical Information Mart for Intensive Care III (MIMIC-III) to identify 265 patients with culture-proven fungemia. For each fungemic patient, demographic data, fungal species present in blood culture, and risk factors for fungemia (eg, presence of indwelling catheters, recent major surgery, diabetes, immunosuppressed status) were ascertained. All structured diagnosis codes and free-text narrative notes associated with each patient's hospitalization were also extracted. Screening for fungal endophthalmitis was performed using two approaches: (1) by querying a wide array of eye- and vision-related diagnosis codes, and (2) by utilizing a custom regular expression pipeline to identify and collate relevant text matches pertaining to fungal ocular involvement. Both approaches were validated using manual record review. The main outcome measure was the documentation of any fungal ocular involvement. Results: In total, 265 patients had culture-proven fungemia, with Candida albicans (n=114, 43\%) and Candida glabrata (n=74, 28\%) being the most common fungal species in blood culture. The in-hospital mortality rate was 121 (46\%). In total, 7 patients were identified as having eye- or vision-related diagnosis codes, none of whom had fungal endophthalmitis based on record review. There were 26,830 free-text narrative notes associated with these 265 patients. A regular expression pipeline based on relevant terms yielded possible matches in 683 notes from 108 patients. Subsequent manual record review again demonstrated that no patients had fungal ocular involvement. Therefore, the prevalence of fungal ocular involvement in this cohort was 0\%. Conclusions: MIMIC-III contained no cases of ocular involvement among fungemic patients, consistent with prior studies reporting low rates of ocular involvement in fungemia. This study demonstrates an application of natural language processing to expedite the review of narrative notes. This approach is highly relevant for ophthalmology, where diagnoses are often based on physical examination findings that are documented within clinical notes. ", doi="10.2196/18855", url="https://www.jmir.org/2020/8/e18855", url="http://www.ncbi.nlm.nih.gov/pubmed/32795984" } @Article{info:doi/10.2196/16761, author="Vargas Meza, Xanat and Yamanaka, Toshimasa", title="Food Communication and its Related Sentiment in Local and Organic Food Videos on YouTube", journal="J Med Internet Res", year="2020", month="Aug", day="10", volume="22", number="8", pages="e16761", keywords="social networks", keywords="framing", keywords="semantic analysis", keywords="sentiment analysis", keywords="organic", keywords="local", keywords="food", keywords="YouTube", abstract="Background: Local and organic foods have shown increased importance and market size in recent years. However, attitudes, sentiment, and habits related to such foods in the context of video social networks have not been thoroughly researched. Given that such media have become some of the most important venues of internet traffic, it is relevant to investigate how sustainable food is communicated through such video social networks. Objective: This study aimed to explore the diffusion paths of local and organic foods on YouTube, providing a review of trends, coincidences, and differences among video discourses. Methods: A combined methodology involving webometric, framing, semantic, and sentiment analyses was employed. Results: We reported the results for the following two groups: organic and local organic videos. Although the content of 923 videos mostly included the ``Good Mother'' (organic and local organic: 282/808, 34.9\% and 311/866, 35.9\%, respectively), ``Natural Goodness'' (220/808, 27.2\% and 253/866, 29.2\%), and ``Undermining of Foundations'' (153/808, 18.9\% and 180/866, 20.7\%) frames, organic videos were more framed in terms of ``Frankenstein'' food (organic and local organic: 68/808, 8.4\% and 27/866, 3.1\%, respectively), with genetically modified organisms being a frequent topic among the comments. Organic videos (N=448) were better connected in terms of network metrics than local organic videos (N=475), which were slightly more framed regarding ``Responsibility'' (organic and local organic: 42/808, 5.1\% and 57/866, 6.5\%, respectively) and expressed more positive sentiment (M ranks for organic and local organic were 521.2 and 564.54, respectively, Z=2.15, P=.03). Conclusions: The results suggest that viewers considered sustainable food as part of a complex system and in a positive light and that food framed as artificial and dangerous sometimes functions as a counterpoint to promote organic food. ", doi="10.2196/16761", url="https://www.jmir.org/2020/8/e16761", url="http://www.ncbi.nlm.nih.gov/pubmed/32773370" } @Article{info:doi/10.2196/19892, author="Essay, Patrick and Balkan, Baran and Subbian, Vignesh", title="Decompensation in Critical Care: Early Prediction of Acute Heart Failure Onset", journal="JMIR Med Inform", year="2020", month="Aug", day="7", volume="8", number="8", pages="e19892", keywords="critical care", keywords="heart failure", keywords="intensive care units", keywords="machine learning", keywords="time series", keywords="heart", keywords="cardiology", keywords="prediction", keywords="chronic disease", keywords="ICU", keywords="intensive care unit", abstract="Background: Heart failure is a leading cause of mortality and morbidity worldwide. Acute heart failure, broadly defined as rapid onset of new or worsening signs and symptoms of heart failure, often requires hospitalization and admission to the intensive care unit (ICU). This acute condition is highly heterogeneous and less well-understood as compared to chronic heart failure. The ICU, through detailed and continuously monitored patient data, provides an opportunity to retrospectively analyze decompensation and heart failure to evaluate physiological states and patient outcomes. Objective: The goal of this study is to examine the prevalence of cardiovascular risk factors among those admitted to ICUs and to evaluate combinations of clinical features that are predictive of decompensation events, such as the onset of acute heart failure, using machine learning techniques. To accomplish this objective, we leveraged tele-ICU data from over 200 hospitals across the United States. Methods: We evaluated the feasibility of predicting decompensation soon after ICU admission for 26,534 patients admitted without a history of heart failure with specific heart failure risk factors (ie, coronary artery disease, hypertension, and myocardial infarction) and 96,350 patients admitted without risk factors using remotely monitored laboratory, vital signs, and discrete physiological measurements. Multivariate logistic regression and random forest models were applied to predict decompensation and highlight important features from combinations of model inputs from dissimilar data. Results: The most prevalent risk factor in our data set was hypertension, although most patients diagnosed with heart failure were admitted to the ICU without a risk factor. The highest heart failure prediction accuracy was 0.951, and the highest area under the receiver operating characteristic curve was 0.9503 with random forest and combined vital signs, laboratory values, and discrete physiological measurements. Random forest feature importance also highlighted combinations of several discrete physiological features and laboratory measures as most indicative of decompensation. Timeline analysis of aggregate vital signs revealed a point of diminishing returns where additional vital signs data did not continue to improve results. Conclusions: Heart failure risk factors are common in tele-ICU data, although most patients that are diagnosed with heart failure later in an ICU stay presented without risk factors making a prediction of decompensation critical. Decompensation was predicted with reasonable accuracy using tele-ICU data, and optimal data extraction for time series vital signs data was identified near a 200-minute window size. Overall, results suggest combinations of laboratory measurements and vital signs are viable for early and continuous prediction of patient decompensation. ", doi="10.2196/19892", url="http://medinform.jmir.org/2020/8/e19892/", url="http://www.ncbi.nlm.nih.gov/pubmed/32663162" } @Article{info:doi/10.2196/18388, author="Camacho, Jhon and Zanoletti-Mannello, Manuela and Landis-Lewis, Zach and Kane-Gill, L. Sandra and Boyce, D. Richard", title="A Conceptual Framework to Study the Implementation of Clinical Decision Support Systems (BEAR): Literature Review and Concept Mapping", journal="J Med Internet Res", year="2020", month="Aug", day="6", volume="22", number="8", pages="e18388", keywords="clinical decision support system", keywords="computerized decision support system", keywords="implementation science", keywords="technology acceptance", keywords="barriers", keywords="facilitators", keywords="determinants", keywords="decision support system", abstract="Background: The implementation of clinical decision support systems (CDSSs) as an intervention to foster clinical practice change is affected by many factors. Key factors include those associated with behavioral change and those associated with technology acceptance. However, the literature regarding these subjects is fragmented and originates from two traditionally separate disciplines: implementation science and technology acceptance. Objective: Our objective is to propose an integrated framework that bridges the gap between the behavioral change and technology acceptance aspects of the implementation of CDSSs. Methods: We employed an iterative process to map constructs from four contributing frameworks---the Theoretical Domains Framework (TDF); the Consolidated Framework for Implementation Research (CFIR); the Human, Organization, and Technology-fit framework (HOT-fit); and the Unified Theory of Acceptance and Use of Technology (UTAUT)---and the findings of 10 literature reviews, identified through a systematic review of reviews approach. Results: The resulting framework comprises 22 domains: agreement with the decision algorithm; attitudes; behavioral regulation; beliefs about capabilities; beliefs about consequences; contingencies; demographic characteristics; effort expectancy; emotions; environmental context and resources; goals; intentions; intervention characteristics; knowledge; memory, attention, and decision processes; patient--health professional relationship; patient's preferences; performance expectancy; role and identity; skills, ability, and competence; social influences; and system quality. We demonstrate the use of the framework providing examples from two research projects. Conclusions: We proposed BEAR (BEhavior and Acceptance fRamework), an integrated framework that bridges the gap between behavioral change and technology acceptance, thereby widening the view established by current models. ", doi="10.2196/18388", url="https://www.jmir.org/2020/8/e18388", url="http://www.ncbi.nlm.nih.gov/pubmed/32759098" } @Article{info:doi/10.2196/15394, author="Cheng, Hao-Yuan and Wu, Yu-Chun and Lin, Min-Hau and Liu, Yu-Lun and Tsai, Yue-Yang and Wu, Jo-Hua and Pan, Ke-Han and Ke, Chih-Jung and Chen, Chiu-Mei and Liu, Ding-Ping and Lin, I-Feng and Chuang, Jen-Hsiang", title="Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study", journal="J Med Internet Res", year="2020", month="Aug", day="5", volume="22", number="8", pages="e15394", keywords="influenza", keywords="Influenza-like illness", keywords="forecasting", keywords="machine learning", keywords="artificial intelligence", keywords="epidemic forecasting", keywords="surveillance", abstract="Background: Changeful seasonal influenza activity in subtropical areas such as Taiwan causes problems in epidemic preparedness. The Taiwan Centers for Disease Control has maintained real-time national influenza surveillance systems since 2004. Except for timely monitoring, epidemic forecasting using the national influenza surveillance data can provide pivotal information for public health response. Objective: We aimed to develop predictive models using machine learning to provide real-time influenza-like illness forecasts. Methods: Using surveillance data of influenza-like illness visits from emergency departments (from the Real-Time Outbreak and Disease Surveillance System), outpatient departments (from the National Health Insurance database), and the records of patients with severe influenza with complications (from the National Notifiable Disease Surveillance System), we developed 4 machine learning models (autoregressive integrated moving average, random forest, support vector regression, and extreme gradient boosting) to produce weekly influenza-like illness predictions for a given week and 3 subsequent weeks. We established a framework of the machine learning models and used an ensemble approach called stacking to integrate these predictions. We trained the models using historical data from 2008-2014. We evaluated their predictive ability during 2015-2017 for each of the 4-week time periods using Pearson correlation, mean absolute percentage error (MAPE), and hit rate of trend prediction. A dashboard website was built to visualize the forecasts, and the results of real-world implementation of this forecasting framework in 2018 were evaluated using the same metrics. Results: All models could accurately predict the timing and magnitudes of the seasonal peaks in the then-current week (nowcast) ($\rho$=0.802-0.965; MAPE: 5.2\%-9.2\%; hit rate: 0.577-0.756), 1-week ($\rho$=0.803-0.918; MAPE: 8.3\%-11.8\%; hit rate: 0.643-0.747), 2-week ($\rho$=0.783-0.867; MAPE: 10.1\%-15.3\%; hit rate: 0.669-0.734), and 3-week forecasts ($\rho$=0.676-0.801; MAPE: 12.0\%-18.9\%; hit rate: 0.643-0.786), especially the ensemble model. In real-world implementation in 2018, the forecasting performance was still accurate in nowcasts ($\rho$=0.875-0.969; MAPE: 5.3\%-8.0\%; hit rate: 0.582-0.782) and remained satisfactory in 3-week forecasts ($\rho$=0.721-0.908; MAPE: 7.6\%-13.5\%; hit rate: 0.596-0.904). Conclusions: This machine learning and ensemble approach can make accurate, real-time influenza-like illness forecasts for a 4-week period, and thus, facilitate decision making. ", doi="10.2196/15394", url="https://www.jmir.org/2020/8/e15394", url="http://www.ncbi.nlm.nih.gov/pubmed/32755888" } @Article{info:doi/10.2196/16709, author="Yu, Kun-Hsing and Lee, Michael Tsung-Lu and Yen, Ming-Hsuan and Kou, C. S. and Rosen, Bruce and Chiang, Jung-Hsien and Kohane, S. Isaac", title="Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation", journal="J Med Internet Res", year="2020", month="Aug", day="5", volume="22", number="8", pages="e16709", keywords="computed tomography, spiral", keywords="lung cancer", keywords="machine learning", keywords="early detection of cancer", keywords="reproducibility of results", abstract="Background: Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced. Objective: The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl. Methods: We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed. Results: Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions. Conclusions: We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability. ", doi="10.2196/16709", url="https://www.jmir.org/2020/8/e16709", url="http://www.ncbi.nlm.nih.gov/pubmed/32755895" } @Article{info:doi/10.2196/19512, author="Park, Jun Hyung and Jung, Yon Dae and Ji, Wonjun and Choi, Chang-Min", title="Detection of Bacteremia in Surgical In-Patients Using Recurrent Neural Network Based on Time Series Records: Development and Validation Study", journal="J Med Internet Res", year="2020", month="Aug", day="4", volume="22", number="8", pages="e19512", keywords="deep learning", keywords="bacteremia", keywords="early detection", keywords="time series", keywords="recurrent neural network", keywords="neural network", keywords="informatics", keywords="surgery", keywords="sepsis", keywords="modeling", abstract="Background: Detecting bacteremia among surgical in-patients is more obscure than other patients due to the inflammatory condition caused by the surgery. The previous criteria such as systemic inflammatory response syndrome or Sepsis-3 are not available for use in general wards, and thus, many clinicians usually rely on practical senses to diagnose postoperative infection. Objective: This study aims to evaluate the performance of continuous monitoring with a deep learning model for early detection of bacteremia for surgical in-patients in the general ward and the intensive care unit (ICU). Methods: In this retrospective cohort study, we included 36,023 consecutive patients who underwent general surgery between October and December 2017 at a tertiary referral hospital in South Korea. The primary outcome was the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) for detecting bacteremia by the deep learning model, and the secondary outcome was the feature explainability of the model by occlusion analysis. Results: Out of the 36,023 patients in the data set, 720 cases of bacteremia were included. Our deep learning--based model showed an AUROC of 0.97 (95\% CI 0.974-0.981) and an AUPRC of 0.17 (95\% CI 0.147-0.203) for detecting bacteremia in surgical in-patients. For predicting bacteremia within the previous 24-hour period, the AUROC and AUPRC values were 0.93 and 0.15, respectively. Occlusion analysis showed that vital signs and laboratory measurements (eg, kidney function test and white blood cell group) were the most important variables for detecting bacteremia. Conclusions: A deep learning model based on time series electronic health records data had a high detective ability for bacteremia for surgical in-patients in the general ward and the ICU. The model may be able to assist clinicians in evaluating infection among in-patients, ordering blood cultures, and prescribing antibiotics with real-time monitoring. ", doi="10.2196/19512", url="https://www.jmir.org/2020/8/e19512", url="http://www.ncbi.nlm.nih.gov/pubmed/32669261" } @Article{info:doi/10.2196/16903, author="Hsu, Chien-Ning and Liu, Chien-Liang and Tain, You-Lin and Kuo, Chin-Yu and Lin, Yun-Chun", title="Machine Learning Model for Risk Prediction of Community-Acquired Acute Kidney Injury Hospitalization From Electronic Health Records: Development and Validation Study", journal="J Med Internet Res", year="2020", month="Aug", day="4", volume="22", number="8", pages="e16903", keywords="community-acquired acute kidney injury (CA-AKI)", keywords="hospitalization", keywords="treatment decision making", keywords="clinical decision support system", keywords="machine learning", keywords="feature selection with extreme gradient boost (XGBoost)", keywords="least absolute shrinkage and selection operator (LASSO)", keywords="risk prediction", abstract="Background: Community-acquired acute kidney injury (CA-AKI)-associated hospitalizations impose significant health care needs and contribute to in-hospital mortality. However, most risk prediction models developed to date have focused on AKI in a specific group of patients during hospitalization, and there is limited knowledge on the baseline risk in the general population for preventing CA-AKI-associated hospitalization. Objective: To gain further insight into risk exploration, the aim of this study was to develop, validate, and establish a scoring system to facilitate health professionals in enabling early recognition and intervention of CA-AKI to prevent permanent kidney damage using different machine-learning techniques. Methods: A nested case-control study design was employed using electronic health records derived from a group of Chang Gung Memorial Hospitals in Taiwan from 2010 to 2017 to identify 234,867 adults with at least two measures of serum creatinine at hospital admission. Patients were classified into a derivation cohort (2010-2016) and a temporal validation cohort (2017). Patients with the first episode of CA-AKI at hospital admission were classified into the case group and those without CA-AKI were classified in the control group. A total of 47 potential candidate variables, including age, gender, prior use of nephrotoxic medications, Charlson comorbid conditions, commonly measured laboratory results, and recent use of health services, were tested to develop a CA-AKI hospitalization risk model. Permutation-based selection with both the extreme gradient boost (XGBoost) and least absolute shrinkage and selection operator (LASSO) algorithms was performed to determine the top 10 important features for scoring function development. Results: The discriminative ability of the risk model was assessed by the area under the receiver operating characteristic curve (AUC), and the predictive CA-AKI risk model derived by the logistic regression algorithm achieved an AUC of 0.767 (95\% CI 0.764-0.770) on derivation and 0.761 on validation for any stage of AKI, with positive and negative predictive values of 19.2\% and 96.1\%, respectively. The risk model for prediction of CA-AKI stages 2 and 3 had an AUC value of 0.818 for the validation cohort with positive and negative predictive values of 13.3\% and 98.4\%, respectively. These metrics were evaluated at a cut-off value of 7.993, which was determined as the threshold to discriminate the risk of AKI. Conclusions: A machine learning--generated risk score model can identify patients at risk of developing CA-AKI-related hospitalization through a routine care data-driven approach. The validated multivariate risk assessment tool could help clinicians to stratify patients in primary care, and to provide monitoring and early intervention for preventing AKI while improving the quality of AKI care in the general population. ", doi="10.2196/16903", url="https://www.jmir.org/2020/8/e16903", url="http://www.ncbi.nlm.nih.gov/pubmed/32749223" } @Article{info:doi/10.2196/18389, author="Bhardwaj, Niharika and Cecchetti, A. Alfred and Murughiyan, Usha and Neitch, Shirley", title="Analysis of Benzodiazepine Prescription Practices in Elderly Appalachians with Dementia via the Appalachian Informatics Platform: Longitudinal Study", journal="JMIR Med Inform", year="2020", month="Aug", day="4", volume="8", number="8", pages="e18389", keywords="dementia", keywords="Alzheimer disease", keywords="benzodiazepines", keywords="Appalachia", keywords="geriatrics", keywords="informatics platform", keywords="interactive visualization", keywords="eHealth", keywords="clinical data", abstract="Background: Caring for the growing dementia population with complex health care needs in West Virginia has been challenging due to its large, sizably rural-dwelling geriatric population and limited resource availability. Objective: This paper aims to illustrate the application of an informatics platform to drive dementia research and quality care through a preliminary study of benzodiazepine (BZD) prescription patterns and its effects on health care use by geriatric patients. Methods: The Maier Institute Data Mart, which contains clinical and billing data on patients aged 65 years and older (N=98,970) seen within our clinics and hospital, was created. Relevant variables were analyzed to identify BZD prescription patterns and calculate related charges and emergency department (ED) use. Results: Nearly one-third (4346/13,910, 31.24\%) of patients with dementia received at least one BZD prescription, 20\% more than those without dementia. More women than men received at least one BZD prescription. On average, patients with dementia and at least one BZD prescription sustained higher charges and visited the ED more often than those without one. Conclusions: The Appalachian Informatics Platform has the potential to enhance dementia care and research through a deeper understanding of dementia, data enrichment, risk identification, and care gap analysis. ", doi="10.2196/18389", url="https://medinform.jmir.org/2020/8/e18389", url="http://www.ncbi.nlm.nih.gov/pubmed/32749226" } @Article{info:doi/10.2196/16981, author="Xiang, Yang and Ji, Hangyu and Zhou, Yujia and Li, Fang and Du, Jingcheng and Rasmy, Laila and Wu, Stephen and Zheng, Jim W. and Xu, Hua and Zhi, Degui and Zhang, Yaoyun and Tao, Cui", title="Asthma Exacerbation Prediction and Risk Factor Analysis Based on a Time-Sensitive, Attentive Neural Network: Retrospective Cohort Study", journal="J Med Internet Res", year="2020", month="Jul", day="31", volume="22", number="7", pages="e16981", keywords="asthma", keywords="deep learning", keywords="electronic health records", keywords="health risk appraisal", abstract="Background: Asthma exacerbation is an acute or subacute episode of progressive worsening of asthma symptoms and can have a significant impact on patients' quality of life. However, efficient methods that can help identify personalized risk factors and make early predictions are lacking. Objective: This study aims to use advanced deep learning models to better predict the risk of asthma exacerbations and to explore potential risk factors involved in progressive asthma. Methods: We proposed a novel time-sensitive, attentive neural network to predict asthma exacerbation using clinical variables from large electronic health records. The clinical variables were collected from the Cerner Health Facts database between 1992 and 2015, including 31,433 adult patients with asthma. Interpretations on both patient and cohort levels were investigated based on the model parameters. Results: The proposed model obtained an area under the curve value of 0.7003 through a five-fold cross-validation, which outperformed the baseline methods. The results also demonstrated that the addition of elapsed time embeddings considerably improved the prediction performance. Further analysis observed diverse distributions of contributing factors across patients as well as some possible cohort-level risk factors, which could be found supporting evidence from peer-reviewed literature such as respiratory diseases and esophageal reflux. Conclusions: The proposed neural network model performed better than previous methods for the prediction of asthma exacerbation. We believe that personalized risk scores and analyses of contributing factors can help clinicians better assess the individual's level of disease progression and afford the opportunity to adjust treatment, prevent exacerbation, and improve outcomes. ", doi="10.2196/16981", url="https://www.jmir.org/2020/7/e16981", url="http://www.ncbi.nlm.nih.gov/pubmed/32735224" } @Article{info:doi/10.2196/18228, author="Guo, Yuqi and Hao, Zhichao and Zhao, Shichong and Gong, Jiaqi and Yang, Fan", title="Artificial Intelligence in Health Care: Bibliometric Analysis", journal="J Med Internet Res", year="2020", month="Jul", day="29", volume="22", number="7", pages="e18228", keywords="health care", keywords="artificial intelligence", keywords="bibliometric analysis", keywords="telehealth", keywords="neural networks", keywords="machine learning", abstract="Background: As a critical driving power to promote health care, the health care--related artificial intelligence (AI) literature is growing rapidly. Objective: The purpose of this analysis is to provide a dynamic and longitudinal bibliometric analysis of health care--related AI publications. Methods: The Web of Science (Clarivate PLC) was searched to retrieve all existing and highly cited AI-related health care research papers published in English up to December 2019. Based on bibliometric indicators, a search strategy was developed to screen the title for eligibility, using the abstract and full text where needed. The growth rate of publications, characteristics of research activities, publication patterns, and research hotspot tendencies were computed using the HistCite software. Results: The search identified 5235 hits, of which 1473 publications were included in the analyses. Publication output increased an average of 17.02\% per year since 1995, but the growth rate of research papers significantly increased to 45.15\% from 2014 to 2019. The major health problems studied in AI research are cancer, depression, Alzheimer disease, heart failure, and diabetes. Artificial neural networks, support vector machines, and convolutional neural networks have the highest impact on health care. Nucleosides, convolutional neural networks, and tumor markers have remained research hotspots through 2019. Conclusions: This analysis provides a comprehensive overview of the AI-related research conducted in the field of health care, which helps researchers, policy makers, and practitioners better understand the development of health care--related AI research and possible practice implications. Future AI research should be dedicated to filling in the gaps between AI health care research and clinical applications. ", doi="10.2196/18228", url="http://www.jmir.org/2020/7/e18228/", url="http://www.ncbi.nlm.nih.gov/pubmed/32723713" } @Article{info:doi/10.2196/16850, author="Zhang, Lei and Shang, Xianwen and Sreedharan, Subhashaan and Yan, Xixi and Liu, Jianbin and Keel, Stuart and Wu, Jinrong and Peng, Wei and He, Mingguang", title="Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study", journal="JMIR Med Inform", year="2020", month="Jul", day="28", volume="8", number="7", pages="e16850", keywords="diabetes", keywords="machine learning", keywords="risk prediction", keywords="cohort study", abstract="Background: Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. Objective: We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. Methods: We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. Results: Overall, 6.05\% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30\% (8.08\%-8.49\%), which was significantly higher (odds ratio 1.37, 95\% CI 1.32-1.41) than that in women at 6.20\% (6.00\%-6.40\%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78\% [17.05\%-18.43\%]; women: 14.59\% [13.99\%-15.17\%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79\% in 3-year prediction and 75\% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12\%-50\% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3\% to 2.8\% (P<.001). Conclusions: A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM. ", doi="10.2196/16850", url="https://medinform.jmir.org/2020/7/e16850", url="http://www.ncbi.nlm.nih.gov/pubmed/32720912" } @Article{info:doi/10.2196/19428, author="Gong, Liheng and Zhang, Xiao and Li, Ling", title="An Artificial Intelligence Fusion Model for Cardiac Emergency Decision Making: Application and Robustness Analysis", journal="JMIR Med Inform", year="2020", month="Jul", day="27", volume="8", number="7", pages="e19428", keywords="artificial intelligence", keywords="fusion model", keywords="cardiac emergency", keywords="robustness", abstract="Background: During cardiac emergency medical treatment, reducing the incidence of avoidable adverse events, ensuring the safety of patients, and generally improving the quality and efficiency of medical treatment have been important research topics in theoretical and practical circles. Objective: This paper examines the robustness of the decision-making reasoning process from the overall perspective of the cardiac emergency medical system. Methods: The principle of robustness was introduced into our study on the quality and efficiency of cardiac emergency decision making. We propose the concept of robustness for complex medical decision making by targeting the problem of low reasoning efficiency and accuracy in cardiac emergency decision making. The key bottlenecks such as anti-interference capability, fault tolerance, and redundancy were studied. The rules of knowledge acquisition and transfer in the decision-making process were systematically analyzed to reveal the core role of knowledge reasoning. Results: The robustness threshold method was adopted to construct the robustness criteria group of the system, and the fusion and coordination mechanism was realized through information entropy, information gain, and mutual information methods. Conclusions: A set of fusion models and robust threshold methods such as the R2CMIFS (treatment mode of fibroblastic sarcoma) model and the RTCRF (clinical trial observation mode) model were proposed. Our study enriches the theoretical research on robustness in this field. ", doi="10.2196/19428", url="https://medinform.jmir.org/2020/7/e19428", url="http://www.ncbi.nlm.nih.gov/pubmed/32716305" } @Article{info:doi/10.2196/18599, author="Choudhury, Avishek and Asan, Onur", title="Role of Artificial Intelligence in Patient Safety Outcomes: Systematic Literature Review", journal="JMIR Med Inform", year="2020", month="Jul", day="24", volume="8", number="7", pages="e18599", keywords="artificial intelligence", keywords="patient safety", keywords="drug safety", keywords="clinical error", keywords="report analysis", keywords="natural language processing", keywords="drug", keywords="review", abstract="Background: Artificial intelligence (AI) provides opportunities to identify the health risks of patients and thus influence patient safety outcomes. Objective: The purpose of this systematic literature review was to identify and analyze quantitative studies utilizing or integrating AI to address and report clinical-level patient safety outcomes. Methods: We restricted our search to the PubMed, PubMed Central, and Web of Science databases to retrieve research articles published in English between January 2009 and August 2019. We focused on quantitative studies that reported positive, negative, or intermediate changes in patient safety outcomes using AI apps, specifically those based on machine-learning algorithms and natural language processing. Quantitative studies reporting only AI performance but not its influence on patient safety outcomes were excluded from further review. Results: We identified 53 eligible studies, which were summarized concerning their patient safety subcategories, the most frequently used AI, and reported performance metrics. Recognized safety subcategories were clinical alarms (n=9; mainly based on decision tree models), clinical reports (n=21; based on support vector machine models), and drug safety (n=23; mainly based on decision tree models). Analysis of these 53 studies also identified two essential findings: (1) the lack of a standardized benchmark and (2) heterogeneity in AI reporting. Conclusions: This systematic review indicates that AI-enabled decision support systems, when implemented correctly, can aid in enhancing patient safety by improving error detection, patient stratification, and drug management. Future work is still needed for robust validation of these systems in prospective and real-world clinical environments to understand how well AI can predict safety outcomes in health care settings. ", doi="10.2196/18599", url="http://medinform.jmir.org/2020/7/e18599/", url="http://www.ncbi.nlm.nih.gov/pubmed/32706688" } @Article{info:doi/10.2196/18758, author="Jung, Young Se and Hwang, Hee and Lee, Keehyuck and Lee, Ho-Young and Kim, Eunhye and Kim, Miyoung and Cho, Young In", title="Barriers and Facilitators to Implementation of Medication Decision Support Systems in Electronic Medical Records: Mixed Methods Approach Based on Structural Equation Modeling and Qualitative Analysis", journal="JMIR Med Inform", year="2020", month="Jul", day="22", volume="8", number="7", pages="e18758", keywords="clinical decision support system", keywords="electronic health record", keywords="medication safety", keywords="Computerized Provider Order Entry (CPOE)", abstract="Background: Adverse drug events (ADEs) resulting from medication error are some of the most common causes of iatrogenic injuries in hospitals. With the appropriate use of medication, ADEs can be prevented and ameliorated. Efforts to reduce medication errors and prevent ADEs have been made by implementing a medication decision support system (MDSS) in electronic health records (EHRs). However, physicians tend to override most MDSS alerts. Objective: In order to improve MDSS functionality, we must understand what factors users consider essential for the successful implementation of an MDSS into their clinical setting. This study followed the implementation process for an MDSS within a comprehensive EHR system and analyzed the relevant barriers and facilitators. Methods: A mixed research methodology was adopted. Data from a structured survey and 15 in-depth interviews were integrated. Structural equation modeling was conducted for quantitative analysis of factors related to user adoption of MDSS. Qualitative analysis based on semistructured interviews with physicians was conducted to collect various opinions on MDSS implementation. Results: Quantitative analysis revealed that physicians' expectations regarding ease of use and performance improvement are crucial. Qualitative analysis identified four significant barriers to MDSS implementation: alert fatigue, lack of accuracy, poor user interface design, and lack of customizability. Conclusions: This study revealed barriers and facilitators to the implementation of MDSS. The findings can be applied to upgrade MDSS in the future. ", doi="10.2196/18758", url="https://medinform.jmir.org/2020/7/e18758", url="http://www.ncbi.nlm.nih.gov/pubmed/32706717" } @Article{info:doi/10.2196/17940, author="Peiffer-Smadja, Nathan and Poda, Armel and Ouedraogo, Abdoul-Salam and Guiard-Schmid, Jean-Baptiste and Delory, Tristan and Le Bel, Josselin and Bouvet, Elisabeth and Lariven, Sylvie and Jeanmougin, Pauline and Ahmad, Raheelah and Lescure, Fran{\c{c}}ois-Xavier", title="Paving the Way for the Implementation of a Decision Support System for Antibiotic Prescribing in Primary Care in West Africa: Preimplementation and Co-Design Workshop With Physicians", journal="J Med Internet Res", year="2020", month="Jul", day="20", volume="22", number="7", pages="e17940", keywords="decision support systems, clinical", keywords="antibiotic resistance, microbial", keywords="drug resistance, microbial", keywords="antibiotic stewardship", keywords="implementation science", keywords="Africa, Western", keywords="diffusion of innovation", keywords="medical informatics applications", abstract="Background: Suboptimal use of antibiotics is a driver of antimicrobial resistance (AMR). Clinical decision support systems (CDSS) can assist prescribers with rapid access to up-to-date information. In low- and middle-income countries (LMIC), the introduction of CDSS for antibiotic prescribing could have a measurable impact. However, interventions to implement them are challenging because of cultural and structural constraints, and their adoption and sustainability in routine clinical care are often limited. Preimplementation research is needed to ensure relevant adaptation and fit within the context of primary care in West Africa. Objective: This study examined the requirements for a CDSS adapted to the context of primary care in West Africa, to analyze the barriers and facilitators of its implementation and adaptation, and to ensure co-designed solutions for its adaptation and sustainable use. Methods: We organized a workshop in Burkina Faso in June 2019 with 47 health care professionals representing 9 West African countries and 6 medical specialties. The workshop began with a presentation of Antibioclic, a publicly funded CDSS for antibiotic prescribing in primary care that provides personalized antibiotic recommendations for 37 infectious diseases. Antibioclic is freely available on the web and as a smartphone app (iOS, Android). The presentation was followed by a roundtable discussion and completion of a questionnaire with open-ended questions by participants. Qualitative data were analyzed using thematic analysis. Results: Most of the participants had access to a smartphone during their clinical consultations (35/47, 74\%), but only 49\% (23/47) had access to a computer and none used CDSS for antibiotic prescribing. The participants considered that CDSS could have a number of benefits including updating the knowledge of practitioners on antibiotic prescribing, improving clinical care and reducing AMR, encouraging the establishment of national guidelines, and developing surveillance capabilities in primary care. The most frequently mentioned contextual barrier to implementing a CDSS was the potential risk of increasing self-medication in West Africa, where antibiotics can be bought without a prescription. The need for the CDSS to be tailored to the local epidemiology of infectious diseases and AMR was highlighted along with the availability of diagnostic tests and antibiotics using national guidelines where available. Participants endorsed co-design involving all stakeholders, including nurses, midwives, and pharmacists, as central to any introduction of CDSS. A phased approach was suggested by initiating and evaluating CDSS at a pilot site, followed by dissemination using professional networks and social media. The lack of widespread internet access and computers could be circumvented by a mobile app with an offline mode. Conclusions: Our study provides valuable information for the development and implementation of a CDSS for antibiotic prescribing among primary care prescribers in LMICs and may, in turn, contribute to improving antibiotic use, clinical outcomes and decreasing AMR. ", doi="10.2196/17940", url="https://www.jmir.org/2020/7/e17940", url="http://www.ncbi.nlm.nih.gov/pubmed/32442155" } @Article{info:doi/10.2196/18477, author="Liu, Siqi and See, Choong Kay and Ngiam, Yuan Kee and Celi, Anthony Leo and Sun, Xingzhi and Feng, Mengling", title="Reinforcement Learning for Clinical Decision Support in Critical Care: Comprehensive Review", journal="J Med Internet Res", year="2020", month="Jul", day="20", volume="22", number="7", pages="e18477", keywords="artificial intelligence", keywords="reinforcement learning", keywords="critical care", keywords="decision support systems, clinical", keywords="intensive care unit", keywords="machine learning", abstract="Background: Decision support systems based on reinforcement learning (RL) have been implemented to facilitate the delivery of personalized care. This paper aimed to provide a comprehensive review of RL applications in the critical care setting. Objective: This review aimed to survey the literature on RL applications for clinical decision support in critical care and to provide insight into the challenges of applying various RL models. Methods: We performed an extensive search of the following databases: PubMed, Google Scholar, Institute of Electrical and Electronics Engineers (IEEE), ScienceDirect, Web of Science, Medical Literature Analysis and Retrieval System Online (MEDLINE), and Excerpta Medica Database (EMBASE). Studies published over the past 10 years (2010-2019) that have applied RL for critical care were included. Results: We included 21 papers and found that RL has been used to optimize the choice of medications, drug dosing, and timing of interventions and to target personalized laboratory values. We further compared and contrasted the design of the RL models and the evaluation metrics for each application. Conclusions: RL has great potential for enhancing decision making in critical care. Challenges regarding RL system design, evaluation metrics, and model choice exist. More importantly, further work is required to validate RL in authentic clinical environments. ", doi="10.2196/18477", url="https://www.jmir.org/2020/7/e18477", url="http://www.ncbi.nlm.nih.gov/pubmed/32706670" } @Article{info:doi/10.2196/18910, author="Rankin, Debbie and Black, Michaela and Bond, Raymond and Wallace, Jonathan and Mulvenna, Maurice and Epelde, Gorka", title="Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing", journal="JMIR Med Inform", year="2020", month="Jul", day="20", volume="8", number="7", pages="e18910", keywords="synthetic data", keywords="supervised machine learning", keywords="data utility", keywords="health care", keywords="decision support", keywords="statistical disclosure control", keywords="privacy", keywords="open data", keywords="stochastic gradient descent", keywords="decision tree", keywords="k-nearest neighbors", keywords="random forest", keywords="support vector machine", abstract="Background: The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. Objective: This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. Methods: A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. Results: A total of 92\% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18\%) to 0.193 (19\%), while other models have lower deviations of 0.058 (6\%) to 0.072 (7\%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26\% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21\% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95\% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74\% (14/19), 53\% (10/19), and 68\% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. Conclusions: The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making. ", doi="10.2196/18910", url="http://medinform.jmir.org/2020/7/e18910/", url="http://www.ncbi.nlm.nih.gov/pubmed/32501278" } @Article{info:doi/10.2196/15770, author="Khalifa, Mohamed and Magrabi, Farah and Gallego Luxan, Blanca", title="Evaluating the Impact of the Grading and Assessment of Predictive Tools Framework on Clinicians and Health Care Professionals' Decisions in Selecting Clinical Predictive Tools: Randomized Controlled Trial", journal="J Med Internet Res", year="2020", month="Jul", day="9", volume="22", number="7", pages="e15770", keywords="clinical prediction rule", keywords="clinical decision rules", keywords="evidence-based medicine", keywords="evaluation study", abstract="Background: While selecting predictive tools for implementation in clinical practice or for recommendation in clinical guidelines, clinicians and health care professionals are challenged with an overwhelming number of tools. Many of these tools have never been implemented or evaluated for comparative effectiveness. To overcome this challenge, the authors developed and validated an evidence-based framework for grading and assessment of predictive tools (the GRASP framework). This framework was based on the critical appraisal of the published evidence on such tools. Objective: The aim of the study was to examine the impact of using the GRASP framework on clinicians' and health care professionals' decisions in selecting clinical predictive tools. Methods: A controlled experiment was conducted through a web-based survey. Participants were randomized to either review the derivation publications, such as studies describing the development of the predictive tools, on common traumatic brain injury predictive tools (control group) or to review an evidence-based summary, where each tool had been graded and assessed using the GRASP framework (intervention group). Participants in both groups were asked to select the best tool based on the greatest validation or implementation. A wide group of international clinicians and health care professionals were invited to participate in the survey. Task completion time, rate of correct decisions, rate of objective versus subjective decisions, and level of decisional conflict were measured. Results: We received a total of 194 valid responses. In comparison with not using GRASP, using the framework significantly increased correct decisions by 64\%, from 53.7\% to 88.1\% (88.1/53.7=1.64; t193=8.53; P<.001); increased objective decision making by 32\%, from 62\% (3.11/5) to 82\% (4.10/5; t189=9.24; P<.001); decreased subjective decision making based on guessing by 20\%, from 49\% (2.48/5) to 39\% (1.98/5; t188=?5.47; P<.001); and decreased prior knowledge or experience by 8\%, from 71\% (3.55/5) to 65\% (3.27/5; t187=?2.99; P=.003). Using GRASP significantly decreased decisional conflict and increased the confidence and satisfaction of participants with their decisions by 11\%, from 71\% (3.55/5) to 79\% (3.96/5; t188=4.27; P<.001), and by 13\%, from 70\% (3.54/5) to 79\% (3.99/5; t188=4.89; P<.001), respectively. Using GRASP decreased the task completion time, on the 90th percentile, by 52\%, from 12.4 to 6.4 min (t193=?0.87; P=.38). The average System Usability Scale of the GRASP framework was very good: 72.5\% and 88\% (108/122) of the participants found the GRASP useful. Conclusions: Using GRASP has positively supported and significantly improved evidence-based decision making. It has increased the accuracy and efficiency of selecting predictive tools. GRASP is not meant to be prescriptive; it represents a high-level approach and an effective, evidence-based, and comprehensive yet simple and feasible method to evaluate, compare, and select clinical predictive tools. ", doi="10.2196/15770", url="https://www.jmir.org/2020/7/e15770", url="http://www.ncbi.nlm.nih.gov/pubmed/32673228" } @Article{info:doi/10.2196/18963, author="Alhassan, Zakhriya and Budgen, David and Alshammari, Riyad and Al Moubayed, Noura", title="Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm", journal="JMIR Med Inform", year="2020", month="Jul", day="3", volume="8", number="7", pages="e18963", keywords="glycated hemoglobin", keywords="HbA1c", keywords="prediction", keywords="electronic health records", keywords="diabetes", keywords="differentiated replication", keywords="EHR", keywords="hemoglobin", keywords="logistic regression", keywords="medical informatics", abstract="Background: Electronic health record (EHR) systems generate large datasets that can significantly enrich the development of medical predictive models. Several attempts have been made to investigate the effect of glycated hemoglobin (HbA1c) elevation on the prediction of diabetes onset. However, there is still a need for validation of these models using EHR data collected from different populations. Objective: The aim of this study is to perform a replication study to validate, evaluate, and identify the strengths and weaknesses of replicating a predictive model that employed multiple logistic regression with EHR data to forecast the levels of HbA1c. The original study used data from a population in the United States and this differentiated replication used a population in Saudi Arabia. Methods: A total of 3 models were developed and compared with the model created in the original study. The models were trained and tested using a larger dataset from Saudi Arabia with 36,378 records. The 10-fold cross-validation approach was used for measuring the performance of the models. Results: Applying the method employed in the original study achieved an accuracy of 74\% to 75\% when using the dataset collected from Saudi Arabia, compared with 77\% obtained from using the population from the United States. The results also show a different ranking of importance for the predictors between the original study and the replication. The order of importance for the predictors with our population, from the most to the least importance, is age, random blood sugar, estimated glomerular filtration rate, total cholesterol, non--high-density lipoprotein, and body mass index. Conclusions: This replication study shows that direct use of the models (calculators) created using multiple logistic regression to predict the level of HbA1c may not be appropriate for all populations. This study reveals that the weighting of the predictors needs to be calibrated to the population used. However, the study does confirm that replicating the original study using a different population can help with predicting the levels of HbA1c by using the predictors that are routinely collected and stored in hospital EHR systems. ", doi="10.2196/18963", url="https://medinform.jmir.org/2020/7/e18963", url="http://www.ncbi.nlm.nih.gov/pubmed/32618575" } @Article{info:doi/10.2196/16922, author="Musacchio, Nicoletta and Giancaterini, Annalisa and Guaita, Giacomo and Ozzello, Alessandro and Pellegrini, A. Maria and Ponzani, Paola and Russo, T. Giuseppina and Zilich, Rita and de Micheli, Alberto", title="Artificial Intelligence and Big Data in Diabetes Care: A Position Statement of the Italian Association of Medical Diabetologists", journal="J Med Internet Res", year="2020", month="Jun", day="22", volume="22", number="6", pages="e16922", keywords="artificial intelligence", keywords="big data analytics", keywords="clinical decision making", keywords="diabetes management", keywords="health care", doi="10.2196/16922", url="http://www.jmir.org/2020/6/e16922/", url="http://www.ncbi.nlm.nih.gov/pubmed/32568088" } @Article{info:doi/10.2196/15154, author="Asan, Onur and Bayrak, Emrah Alparslan and Choudhury, Avishek", title="Artificial Intelligence and Human Trust in Healthcare: Focus on Clinicians", journal="J Med Internet Res", year="2020", month="Jun", day="19", volume="22", number="6", pages="e15154", keywords="human-AI collaboration", keywords="trust", keywords="technology adoption", keywords="FDA policy", keywords="bias", keywords="health care", doi="10.2196/15154", url="http://www.jmir.org/2020/6/e15154/", url="http://www.ncbi.nlm.nih.gov/pubmed/32558657" } @Article{info:doi/10.2196/15431, author="Yang, Tianzhou and Zhang, Li and Yi, Liwei and Feng, Huawei and Li, Shimeng and Chen, Haoyu and Zhu, Junfeng and Zhao, Jian and Zeng, Yingyue and Liu, Hongsheng", title="Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation", journal="JMIR Med Inform", year="2020", month="Jun", day="18", volume="8", number="6", pages="e15431", keywords="type 2 diabetes", keywords="screening", keywords="non-invasive attributes", keywords="machine learning", abstract="Background: Early diabetes screening can effectively reduce the burden of disease. However, natural population--based screening projects require a large number of resources. With the emergence and development of machine learning, researchers have started to pursue more flexible and efficient methods to screen or predict type 2 diabetes. Objective: The aim of this study was to build prediction models based on the ensemble learning method for diabetes screening to further improve the health status of the population in a noninvasive and inexpensive manner. Methods: The dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey from 2011-2016. After data cleaning and feature selection, the dataset was split into a training set (80\%, 2011-2014), test set (20\%, 2011-2014) and validation set (2015-2016). Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and easy ensemble methods were used to build diabetes prediction models. The performance of the models was evaluated through 5-fold cross-validation and external validation. The Delong test (2-sided) was used to test the performance differences between the models. Results: We selected 8057 observations and 12 attributes from the database. In the 5-fold cross-validation, the three simple methods yielded highly predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble methods significantly outperformed the simple methods. When we evaluated the models in the test set and validation set, the same trends were observed. The ensemble model of linear discriminant analysis yielded the best performance, with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set. Conclusions: This study indicates that efficient screening using machine learning methods with noninvasive tests can be applied to a large population and achieve the objective of secondary prevention. ", doi="10.2196/15431", url="https://medinform.jmir.org/2020/6/e15431", url="http://www.ncbi.nlm.nih.gov/pubmed/32554386" } @Article{info:doi/10.2196/18186, author="Chen, Weijia and Lu, Zhijun and You, Lijue and Zhou, Lingling and Xu, Jie and Chen, Ken", title="Artificial Intelligence--Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study", journal="JMIR Med Inform", year="2020", month="Jun", day="15", volume="8", number="6", pages="e18186", keywords="surgical site infection", keywords="machine learning", keywords="deep learning", keywords="natural language processing", keywords="artificial intelligence", keywords="risk assessment model", keywords="routinely collected data", keywords="electronic medical record", keywords="neural network", keywords="word embedding", abstract="Background: Surgical site infection (SSI) is one of the most common types of health care--associated infections. It increases mortality, prolongs hospital length of stay, and raises health care costs. Many institutions developed risk assessment models for SSI to help surgeons preoperatively identify high-risk patients and guide clinical intervention. However, most of these models had low accuracies. Objective: We aimed to provide a solution in the form of an Artificial intelligence--based Multimodal Risk Assessment Model for Surgical site infection (AMRAMS) for inpatients undergoing operations, using routinely collected clinical data. We internally and externally validated the discriminations of the models, which combined various machine learning and natural language processing techniques, and compared them with the National Nosocomial Infections Surveillance (NNIS) risk index. Methods: We retrieved inpatient records between January 1, 2014, and June 30, 2019, from the electronic medical record (EMR) system of Rui Jin Hospital, Luwan Branch, Shanghai, China. We used data from before July 1, 2018, as the development set for internal validation and the remaining data as the test set for external validation. We included patient demographics, preoperative lab results, and free-text preoperative notes as our features. We used word-embedding techniques to encode text information, and we trained the LASSO (least absolute shrinkage and selection operator) model, random forest model, gradient boosting decision tree (GBDT) model, convolutional neural network (CNN) model, and self-attention network model using the combined data. Surgeons manually scored the NNIS risk index values. Results: For internal bootstrapping validation, CNN yielded the highest mean area under the receiver operating characteristic curve (AUROC) of 0.889 (95\% CI 0.886-0.892), and the paired-sample t test revealed statistically significant advantages as compared with other models (P<.001). The self-attention network yielded the second-highest mean AUROC of 0.882 (95\% CI 0.878-0.886), but the AUROC was only numerically higher than the AUROC of the third-best model, GBDT with text embeddings (mean AUROC 0.881, 95\% CI 0.878-0.884, P=.47). The AUROCs of LASSO, random forest, and GBDT models using text embeddings were statistically higher than the AUROCs of models not using text embeddings (P<.001). For external validation, the self-attention network yielded the highest AUROC of 0.879. CNN was the second-best model (AUROC 0.878), and GBDT with text embeddings was the third-best model (AUROC 0.872). The NNIS risk index scored by surgeons had an AUROC of 0.651. Conclusions: Our AMRAMS based on EMR data and deep learning methods---CNN and self-attention network---had significant advantages in terms of accuracy compared with other conventional machine learning methods and the NNIS risk index. Moreover, the semantic embeddings of preoperative notes improved the model performance further. Our models could replace the NNIS risk index to provide personalized guidance for the preoperative intervention of SSIs. Through this case, we offered an easy-to-implement solution for building multimodal RAMs for other similar scenarios. ", doi="10.2196/18186", url="http://medinform.jmir.org/2020/6/e18186/", url="http://www.ncbi.nlm.nih.gov/pubmed/32538798" } @Article{info:doi/10.2196/17608, author="Zhang, Hong and Ni, Wandong and Li, Jing and Zhang, Jiajun", title="Artificial Intelligence--Based Traditional Chinese Medicine Assistive Diagnostic System: Validation Study", journal="JMIR Med Inform", year="2020", month="Jun", day="15", volume="8", number="6", pages="e17608", keywords="traditional Chinese medicine", keywords="TCM", keywords="disease diagnosis", keywords="syndrome prediction", keywords="syndrome differentiation", keywords="natural language processing", keywords="NLP", keywords="artificial intelligence", keywords="AI", keywords="assistive diagnostic system", keywords="convolutional neural network", keywords="CNN", keywords="machine learning", keywords="ML", keywords="BiLSTM-CRF", abstract="Background: Artificial intelligence--based assistive diagnostic systems imitate the deductive reasoning process of a human physician in biomedical disease diagnosis and treatment decision making. While impressive progress in this area has been reported, most of the reported successes are applications of artificial intelligence in Western medicine. The application of artificial intelligence in traditional Chinese medicine has lagged mainly because traditional Chinese medicine practitioners need to perform syndrome differentiation as well as biomedical disease diagnosis before a treatment decision can be made. Syndrome, a concept unique to traditional Chinese medicine, is an abstraction of a variety of signs and symptoms. The fact that the relationship between diseases and syndromes is not one-to-one but rather many-to-many makes it very challenging for a machine to perform syndrome predictions. So far, only a handful of artificial intelligence--based assistive traditional Chinese medicine diagnostic models have been reported, and they are limited in application to a single disease-type. Objective: The objective was to develop an artificial intelligence--based assistive diagnostic system capable of diagnosing multiple types of diseases that are common in traditional Chinese medicine, given a patient's electronic health record notes. The system was designed to simultaneously diagnose the disease and produce a list of corresponding syndromes. Methods: Unstructured freestyle electronic health record notes were processed by natural language processing techniques to extract clinical information such as signs and symptoms which were represented by named entities. Natural language processing used a recurrent neural network model called bidirectional long short-term memory network--conditional random forest. A convolutional neural network was then used to predict the disease-type out of 187 diseases in traditional Chinese medicine. A novel traditional Chinese medicine syndrome prediction method---an integrated learning model---was used to produce a corresponding list of probable syndromes. By following a majority-rule voting method, the integrated learning model for syndrome prediction can take advantage of four existing prediction methods (back propagation, random forest, extreme gradient boosting, and support vector classifier) while avoiding their respective weaknesses which resulted in a consistently high prediction accuracy. Results: A data set consisting of 22,984 electronic health records from Guanganmen Hospital of the China Academy of Chinese Medical Sciences that were collected between January 1, 2017 and September 7, 2018 was used. The data set contained a total of 187 diseases that are commonly diagnosed in traditional Chinese medicine. The diagnostic system was designed to be able to detect any one of the 187 disease-types. The data set was partitioned into a training set, a validation set, and a testing set in a ratio of 8:1:1. Test results suggested that the proposed system had a good diagnostic accuracy and a strong capability for generalization. The disease-type prediction accuracies of the top one, top three, and top five were 80.5\%, 91.6\%, and 94.2\%, respectively. Conclusions: The main contributions of the artificial intelligence--based traditional Chinese medicine assistive diagnostic system proposed in this paper are that 187 commonly known traditional Chinese medicine diseases can be diagnosed and a novel prediction method called an integrated learning model is demonstrated. This new prediction method outperformed all four existing methods in our preliminary experimental results. With further improvement of the algorithms and the availability of additional electronic health record data, it is expected that a wider range of traditional Chinese medicine disease-types could be diagnosed and that better diagnostic accuracies could be achieved. ", doi="10.2196/17608", url="http://medinform.jmir.org/2020/6/e17608/", url="http://www.ncbi.nlm.nih.gov/pubmed/32538797" } @Article{info:doi/10.2196/17364, author="Hou, Can and Zhong, Xiaorong and He, Ping and Xu, Bin and Diao, Sha and Yi, Fang and Zheng, Hong and Li, Jiayuan", title="Predicting Breast Cancer in Chinese Women Using Machine Learning Techniques: Algorithm Development", journal="JMIR Med Inform", year="2020", month="Jun", day="8", volume="8", number="6", pages="e17364", keywords="machine learning", keywords="XGBoost", keywords="random forest", keywords="deep neural network", keywords="breast cancer", abstract="Background: Risk-based breast cancer screening is a cost-effective intervention for controlling breast cancer in China, but the successful implementation of such intervention requires an accurate breast cancer prediction model for Chinese women. Objective: This study aimed to evaluate and compare the performance of four machine learning algorithms on predicting breast cancer among Chinese women using 10 breast cancer risk factors. Methods: A dataset consisting of 7127 breast cancer cases and 7127 matched healthy controls was used for model training and testing. We used repeated 5-fold cross-validation and calculated AUC, sensitivity, specificity, and accuracy as the measures of the model performance. Results: The three novel machine-learning algorithms (XGBoost, Random Forest and Deep Neural Network) all achieved significantly higher area under the receiver operating characteristic curves (AUCs), sensitivity, and accuracy than logistic regression. Among the three novel machine learning algorithms, XGBoost (AUC 0.742) outperformed deep neural network (AUC 0.728) and random forest (AUC 0.728). Main residence, number of live births, menopause status, age, and age at first birth were considered as top-ranked variables in the three novel machine learning algorithms. Conclusions: The novel machine learning algorithms, especially XGBoost, can be used to develop breast cancer prediction models to help identify women at high risk for breast cancer in developing countries. ", doi="10.2196/17364", url="http://medinform.jmir.org/2020/6/e17364/", url="http://www.ncbi.nlm.nih.gov/pubmed/32510459" } @Article{info:doi/10.2196/18585, author="Yu, Cheng-Sheng and Lin, Yu-Jiun and Lin, Chang-Hsien and Lin, Shiyng-Yu and Wu, L. Jenny and Chang, Shy-Shin", title="Development of an Online Health Care Assessment for Preventive Medicine: A Machine Learning Approach", journal="J Med Internet Res", year="2020", month="Jun", day="5", volume="22", number="6", pages="e18585", keywords="machine learning", keywords="online healthcare assessment", keywords="medical informatics", keywords="preventive medicine", abstract="Background: In the era of information explosion, the use of the internet to assist with clinical practice and diagnosis has become a cutting-edge area of research. The application of medical informatics allows patients to be aware of their clinical conditions, which may contribute toward the prevention of several chronic diseases and disorders. Objective: In this study, we applied machine learning techniques to construct a medical database system from electronic medical records (EMRs) of subjects who have undergone health examination. This system aims to provide online self-health evaluation to clinicians and patients worldwide, enabling personalized health and preventive health. Methods: We built a medical database system based on the literature, and data preprocessing and cleaning were performed for the database. We utilized both supervised and unsupervised machine learning technology to analyze the EMR data to establish prediction models. The models with EMR databases were then applied to the internet platform. Results: The validation data were used to validate the online diagnosis prediction system. The accuracy of the prediction model for metabolic syndrome reached 91\%, and the area under the receiver operating characteristic (ROC) curve was 0.904 in this system. For chronic kidney disease, the prediction accuracy of the model reached 94.7\%, and the area under the ROC curve (AUC) was 0.982. In addition, the system also provided disease diagnosis visualization via clustering, allowing users to check their outcome compared with those in the medical database, enabling increased awareness for a healthier lifestyle. Conclusions: Our web-based health care machine learning system allowed users to access online diagnosis predictions and provided a health examination report. Users could understand and review their health status accordingly. In the future, we aim to connect hospitals worldwide with our platform, so that health care practitioners can make diagnoses or provide patient education to remote patients. This platform can increase the value of preventive medicine and telemedicine. ", doi="10.2196/18585", url="https://www.jmir.org/2020/6/e18585", url="http://www.ncbi.nlm.nih.gov/pubmed/32501272" } @Article{info:doi/10.2196/16678, author="Tarekegn, Adane and Ricceri, Fulvio and Costa, Giuseppe and Ferracin, Elisa and Giacobini, Mario", title="Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches", journal="JMIR Med Inform", year="2020", month="Jun", day="4", volume="8", number="6", pages="e16678", keywords="predictive modeling", keywords="frailty", keywords="machine learning", keywords="genetic programming", keywords="imbalanced dataset", keywords="elderly people", keywords="classification", abstract="Background: Frailty is one of the most critical age-related conditions in older adults. It is often recognized as a syndrome of physiological decline in late life, characterized by a marked vulnerability to adverse health outcomes. A clear operational definition of frailty, however, has not been agreed so far. There is a wide range of studies on the detection of frailty and their association with mortality. Several of these studies have focused on the possible risk factors associated with frailty in the elderly population while predicting who will be at increased risk of frailty is still overlooked in clinical settings. Objective: The objective of our study was to develop predictive models for frailty conditions in older people using different machine learning methods based on a database of clinical characteristics and socioeconomic factors. Methods: An administrative health database containing 1,095,612 elderly people aged 65 or older with 58 input variables and 6 output variables was used. We first identify and define six problems/outputs as surrogates of frailty. We then resolve the imbalanced nature of the data through resampling process and a comparative study between the different machine learning (ML) algorithms -- Artificial neural network (ANN), Genetic programming (GP), Support vector machines (SVM), Random Forest (RF), Logistic regression (LR) and Decision tree (DT) -- was carried out. The performance of each model was evaluated using a separate unseen dataset. Results: Predicting mortality outcome has shown higher performance with ANN (TPR 0.81, TNR 0.76, accuracy 0.78, F1-score 0.79) and SVM (TPR 0.77, TNR 0.80, accuracy 0.79, F1-score 0.78) than predicting the other outcomes. On average, over the six problems, the DT classifier has shown the lowest accuracy, while other models (GP, LR, RF, ANN, and SVM) performed better. All models have shown lower accuracy in predicting an event of an emergency admission with red code than predicting fracture and disability. In predicting urgent hospitalization, only SVM achieved better performance (TPR 0.75, TNR 0.77, accuracy 0.73, F1-score 0.76) with the 10-fold cross validation compared with other models in all evaluation metrics. Conclusions: We developed machine learning models for predicting frailty conditions (mortality, urgent hospitalization, disability, fracture, and emergency admission). The results show that the prediction performance of machine learning models significantly varies from problem to problem in terms of different evaluation metrics. Through further improvement, the model that performs better can be used as a base for developing decision-support tools to improve early identification and prediction of frail older adults. ", doi="10.2196/16678", url="http://medinform.jmir.org/2020/6/e16678/", url="http://www.ncbi.nlm.nih.gov/pubmed/32442149" } @Article{info:doi/10.2196/16980, author="Ritchie, Jordon and Welch, Brandon", title="Categorization of Third-Party Apps in Electronic Health Record App Marketplaces: Systematic Search and Analysis", journal="JMIR Med Inform", year="2020", month="May", day="29", volume="8", number="5", pages="e16980", keywords="electronic health records", keywords="medical informatics", keywords="software", keywords="interoperability", keywords="apps", keywords="app marketplace", abstract="Background: Third-party electronic health record (EHR) apps allow health care organizations to extend the capabilities and features of their EHR system. Given the widespread utilization of EHRs and the emergence of third-party apps in EHR marketplaces, it has become necessary to conduct a systematic review and analysis of apps in EHR app marketplaces. Objective: The goal of this review is to organize, categorize, and characterize the availability of third-party apps in EHR marketplaces. Methods: Two informaticists (authors JR and BW) used grounded theory principles to review and categorize EHR apps listed in top EHR vendors' public-facing marketplaces. Results: We categorized a total of 471 EHR apps into a taxonomy consisting of 3 primary categories, 15 secondary categories, and 55 tertiary categories. The three primary categories were administrative (n=203, 43.1\%), provider support (n=159, 33.8\%), and patient care (n=109, 23.1\%). Within administrative apps, we split the apps into four secondary categories: front office (n=77, 37.9\%), financial (n=53, 26.1\%), office administration (n=49, 24.1\%), and office device integration (n=17, 8.4\%). Within the provider support primary classification, we split the apps into eight secondary categories: documentation (n=34, 21.3\%), records management (n=27, 17.0\%), care coordination (n=23, 14.4\%), population health (n=18, 11.3\%), EHR efficiency (n=16, 10.1\%), ordering and prescribing (n=15, 9.4\%), medical device integration (n=13, 8.2\%), and specialty EHR (n=12, 7.5\%). Within the patient care primary classification, we split the apps into three secondary categories: patient engagement (n=50, 45.9\%), clinical decision support (n=40, 36.7\%), and remote care (n=18, 16.5\%). Total app counts varied substantially across EHR vendors. Overall, the distribution of apps across primary categories were relatively similar, with a few exceptions. Conclusions: We characterized and organized a diverse and rich set of third-party EHR apps. This work provides an important reference for developers, researchers, and EHR customers to more easily search, review, and compare apps in EHR app marketplaces. ", doi="10.2196/16980", url="http://medinform.jmir.org/2020/5/e16980/", url="http://www.ncbi.nlm.nih.gov/pubmed/32469324" } @Article{info:doi/10.2196/16975, author="Bennasar, Mohamed and Banks, Duncan and Price, A. Blaine and Kardos, Attila", title="Minimal Patient Clinical Variables to Accurately Predict Stress Echocardiography Outcome: Validation Study Using Machine Learning Techniques", journal="JMIR Cardio", year="2020", month="May", day="29", volume="4", number="1", pages="e16975", keywords="stress echocardiography", keywords="coronary heart disease", keywords="risk factors", keywords="machine learning", keywords="feature selection", keywords="risk prediction", abstract="Background: Stress echocardiography is a well-established diagnostic tool for suspected coronary artery disease (CAD). Cardiovascular risk factors are used in the assessment of the probability of CAD. The link between the outcome of stress echocardiography and patients' variables including risk factors, current medication, and anthropometric variables has not been widely investigated. Objective: This study aimed to use machine learning to predict significant CAD defined by positive stress echocardiography results in patients with chest pain based on anthropometrics, cardiovascular risk factors, and medication as variables. This could allow clinical prioritization of patients with likely prediction of CAD, thus saving clinician time and improving outcomes. Methods: A machine learning framework was proposed to automate the prediction of stress echocardiography results. The framework consisted of four stages: feature extraction, preprocessing, feature selection, and classification stage. A mutual information--based feature selection method was used to investigate the amount of information that each feature carried to define the positive outcome of stress echocardiography. Two classification algorithms, support vector machine (SVM) and random forest classifiers, have been deployed. Data from 529 patients were used to train and validate the framework. Patient mean age was 61 (SD 12) years. The data consists of anthropological data and cardiovascular risk factors such as gender, age, weight, family history, diabetes, smoking history, hypertension, hypercholesterolemia, prior diagnosis of CAD, and prescribed medications at the time of the test. There were 82 positive (abnormal) and 447 negative (normal) stress echocardiography results. The framework was evaluated using the whole dataset including cases with prior diagnosis of CAD. Five-fold cross-validation was used to validate the performance of the framework. We also investigated the model in the subset of patients with no prior CAD. Results: The feature selection methods showed that prior diagnosis of CAD, sex, and prescribed medications such as angiotensin-converting enzyme inhibitor/angiotensin receptor blocker were the features that shared the most information about the outcome of stress echocardiography. SVM classifiers showed the best trade-off between sensitivity and specificity and was achieved with three features. Using only these three features, we achieved an accuracy of 67.63\% with sensitivity and specificity 72.87\% and 66.67\% respectively. However, for patients with no prior diagnosis of CAD, only two features (sex and angiotensin-converting enzyme inhibitor/angiotensin receptor blocker use) were needed to achieve accuracy of 70.32\% with sensitivity and specificity at 70.24\%. Conclusions: This study shows that machine learning can predict the outcome of stress echocardiography based on only a few features: patient prior cardiac history, gender, and prescribed medication. Further research recruiting higher number of patients who underwent stress echocardiography could further improve the performance of the proposed algorithm with the potential of facilitating patient selection for early treatment/intervention avoiding unnecessary downstream testing. ", doi="10.2196/16975", url="http://cardio.jmir.org/2020/1/e16975/", url="http://www.ncbi.nlm.nih.gov/pubmed/32469316" } @Article{info:doi/10.2196/16452, author="Horne, Elsie and Tibble, Holly and Sheikh, Aziz and Tsanas, Athanasios", title="Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping", journal="JMIR Med Inform", year="2020", month="May", day="28", volume="8", number="5", pages="e16452", keywords="asthma", keywords="cluster analysis", keywords="data mining", keywords="machine learning", keywords="unsupervised machine learning", abstract="Background: In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging. Objective: This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies. Methods: We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process. Results: Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75\%) studies and continuous in 12 (19\%), and the feature type was unclear in the remaining 4 (6\%) studies. A total of 23 (37\%) studies used hierarchical clustering with Ward linkage, and 22 (35\%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14\%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86\%) explained the methods used to determine the number of clusters, 24 (38\%) studies tested the quality of their cluster solution, and 11 (17\%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification. Conclusions: This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings. ", doi="10.2196/16452", url="http://medinform.jmir.org/2020/5/e16452/", url="http://www.ncbi.nlm.nih.gov/pubmed/32463370" } @Article{info:doi/10.2196/18682, author="Liu, Yue", title="Artificial Intelligence--Based Neural Network for the Diagnosis of Diabetes: Model Development", journal="JMIR Med Inform", year="2020", month="May", day="27", volume="8", number="5", pages="e18682", keywords="artificial intelligence", keywords="diabetes", keywords="neural network", abstract="Background: The incidence of diabetes is increasing in China, and its impact on national health cannot be ignored. Smart medicine is a medical model that uses technology to assist the diagnosis and treatment of disease. Objective: The aim of this paper was to apply artificial intelligence (AI) in the diagnosis of diabetes. Methods: We established an AI diagnostic model in the MATLAB software platform based on a backpropagation neural network by collecting data for the cases of integration and extraction and selecting an input feature vector. Based on this diagnostic model, using an intelligent combination of the LabVIEW development platform and the MATLAB software-designed diabetes diagnosis system with user data, we called the neural network diagnostic module to correctly diagnose diabetes. Results: Compared to conventional diagnostic procedures, the system can effectively improve diagnostic efficiency and save time for physicians. Conclusions: The development of AI applications has utility to aid diabetes diagnosis. ", doi="10.2196/18682", url="http://medinform.jmir.org/2020/5/e18682/", url="http://www.ncbi.nlm.nih.gov/pubmed/32459183" } @Article{info:doi/10.2196/17252, author="Akbarian, Sina and Montazeri Ghahjaverestan, Nasim and Yadollahi, Azadeh and Taati, Babak", title="Distinguishing Obstructive Versus Central Apneas in Infrared Video of Sleep Using Deep Learning: Validation Study", journal="J Med Internet Res", year="2020", month="May", day="22", volume="22", number="5", pages="e17252", keywords="noncontact monitoring", keywords="sleep apnea", keywords="motion analysis", keywords="computer vision", keywords="obstructive apnea", keywords="central apnea", keywords="machine learning", keywords="deep learning", abstract="Background: Sleep apnea is a respiratory disorder characterized by an intermittent reduction (hypopnea) or cessation (apnea) of breathing during sleep. Depending on the presence of a breathing effort, sleep apnea is divided into obstructive sleep apnea (OSA) and central sleep apnea (CSA) based on the different pathologies involved. If the majority of apneas in a person are obstructive, they will be diagnosed as OSA or otherwise as CSA. In addition, as it is challenging and highly controversial to divide hypopneas into central or obstructive, the decision about sleep apnea type (OSA vs CSA) is made based on apneas only. Choosing the appropriate treatment relies on distinguishing between obstructive apnea (OA) and central apnea (CA). Objective: The objective of this study was to develop a noncontact method to distinguish between OAs and CAs. Methods: Five different computer vision-based algorithms were used to process infrared (IR) video data to track and analyze body movements to differentiate different types of apnea (OA vs CA). In the first two methods, supervised classifiers were trained to process optical flow information. In the remaining three methods, a convolutional neural network (CNN) was designed to extract distinctive features from optical flow and to distinguish OA from CA. Results: Overnight sleeping data of 42 participants (mean age 53, SD 15 years; mean BMI 30, SD 7 kg/m2; 27 men and 15 women; mean number of OA 16, SD 30; mean number of CA 3, SD 7; mean apnea-hypopnea index 27, SD 31 events/hour; mean sleep duration 5 hours, SD 1 hour) were collected for this study. The test and train data were recorded in two separate laboratory rooms. The best-performing model (3D-CNN) obtained 95\% accuracy and an F1 score of 89\% in differentiating OA vs CA. Conclusions: In this study, the first vision-based method was developed that differentiates apnea types (OA vs CA). The developed algorithm tracks and analyses chest and abdominal movements captured via an IR video camera. Unlike previously developed approaches, this method does not require any attachment to a user that could potentially alter the sleeping condition. ", doi="10.2196/17252", url="http://www.jmir.org/2020/5/e17252/", url="http://www.ncbi.nlm.nih.gov/pubmed/32441656" } @Article{info:doi/10.2196/15411, author="Sufriyana, Herdiantri and Wu, Yu-Wei and Su, Chia-Yu Emily", title="Prediction of Preeclampsia and Intrauterine Growth Restriction: Development of Machine Learning Models on a Prospective Cohort", journal="JMIR Med Inform", year="2020", month="May", day="18", volume="8", number="5", pages="e15411", keywords="preeclampsia", keywords="intrauterine growth restriction", keywords="machine learning", keywords="uterine artery Doppler", keywords="sFlt-1/PlGF ratio", abstract="Background: Preeclampsia and intrauterine growth restriction are placental dysfunction--related disorders (PDDs) that require a referral decision be made within a certain time period. An appropriate prediction model should be developed for these diseases. However, previous models did not demonstrate robust performances and/or they were developed from datasets with highly imbalanced classes. Objective: In this study, we developed a predictive model of PDDs by machine learning that uses features at 24-37 weeks' gestation, including maternal characteristics, uterine artery (UtA) Doppler measures, soluble fms-like tyrosine kinase receptor-1 (sFlt-1), and placental growth factor (PlGF). Methods: A public dataset was taken from a prospective cohort study that included pregnant women with PDDs (66/95, 69\%) and a control group (29/95, 31\%). Preliminary selection of features was based on a statistical analysis using SAS 9.4 (SAS Institute). We used Weka (Waikato Environment for Knowledge Analysis) 3.8.3 (The University of Waikato, Hamilton, NZ) to automatically select the best model using its optimization algorithm. We also manually selected the best of 23 white-box models. Models, including those from recent studies, were also compared by interval estimation of evaluation metrics. We used the Matthew correlation coefficient (MCC) as the main metric. It is not overoptimistic to evaluate the performance of a prediction model developed from a dataset with a class imbalance. Repeated 10-fold cross-validation was applied. Results: The classification via regression model was chosen as the best model. Our model had a robust MCC (.93, 95\% CI .87-1.00, vs .64, 95\% CI .57-.71) and specificity (100\%, 95\% CI 100-100, vs 90\%, 95\% CI 90-90) compared to each metric of the best models from recent studies. The sensitivity of this model was not inferior (95\%, 95\% CI 91-100, vs 100\%, 95\% CI 92-100). The area under the receiver operating characteristic curve was also competitive (0.970, 95\% CI 0.966-0.974, vs 0.987, 95\% CI 0.980-0.994). Features in the best model were maternal weight, BMI, pulsatility index of the UtA, sFlt-1, and PlGF. The most important feature was the sFlt-1/PlGF ratio. This model used an M5P algorithm consisting of a decision tree and four linear models with different thresholds. Our study was also better than the best ones among recent studies in terms of the class balance and the size of the case class (66/95, 69\%, vs 27/239, 11.3\%). Conclusions: Our model had a robust predictive performance. It was also developed to deal with the problem of a class imbalance. In the context of clinical management, this model may improve maternal mortality and neonatal morbidity and reduce health care costs. ", doi="10.2196/15411", url="http://medinform.jmir.org/2020/5/e15411/", url="http://www.ncbi.nlm.nih.gov/pubmed/32348266" } @Article{info:doi/10.2196/14693, author="Mohammed, Akram and Podila, B. Pradeep S. and Davis, L. Robert and Ataga, I. Kenneth and Hankins, S. Jane and Kamaleswaran, Rishikesan", title="Using Machine Learning to Predict Early Onset Acute Organ Failure in Critically Ill Intensive Care Unit Patients With Sickle Cell Disease: Retrospective Study", journal="J Med Internet Res", year="2020", month="May", day="13", volume="22", number="5", pages="e14693", keywords="multiple organ failure", keywords="sickle cell disease", keywords="machine learning", keywords="electronic medical record", keywords="hematology", abstract="Background: Sickle cell disease (SCD) is a genetic disorder of the red blood cells, resulting in multiple acute and chronic complications, including pain episodes, stroke, and kidney disease. Patients with SCD develop chronic organ dysfunction, which may progress to organ failure during disease exacerbations. Early detection of acute physiological deterioration leading to organ failure is not always attainable. Machine learning techniques that allow for prediction of organ failure may enable early identification and treatment and potentially reduce mortality. Objective: The aim of this study was to test the hypothesis that machine learning physiomarkers can predict the development of organ dysfunction in a sample of adult patients with SCD admitted to intensive care units (ICUs). Methods: We applied diverse machine learning methods, statistical methods, and data visualization techniques to develop classification models to distinguish SCD from controls. Results: We studied 63 sequential SCD patients admitted to ICUs with 163 patient encounters (mean age 30.7 years, SD 9.8 years). A subset of these patient encounters, 22.7\% (37/163), met the sequential organ failure assessment criteria. The other 126 SCD patient encounters served as controls. A set of signal processing features (such as fast Fourier transform, energy, and continuous wavelet transform) derived from heart rate, blood pressure, and respiratory rate was identified to distinguish patients with SCD who developed acute physiological deterioration leading to organ failure from patients with SCD who did not meet the criteria. A multilayer perceptron model accurately predicted organ failure up to 6 hours before onset, with an average sensitivity and specificity of 96\% and 98\%, respectively. Conclusions: This retrospective study demonstrated the viability of using machine learning to predict acute organ failure among hospitalized adults with SCD. The discovery of salient physiomarkers through machine learning techniques has the potential to further accelerate the development and implementation of innovative care delivery protocols and strategies for medically vulnerable patients. ", doi="10.2196/14693", url="https://www.jmir.org/2020/5/e14693", url="http://www.ncbi.nlm.nih.gov/pubmed/32401216" } @Article{info:doi/10.2196/17203, author="Downie, Simon Aron and Hancock, Mark and Abdel Shaheed, Christina and McLachlan, J. Andrew and Kocaballi, Baki Ahmet and Williams, M. Christopher and Michaleff, A. Zoe and Maher, G. Chris", title="An Electronic Clinical Decision Support System for the Management of Low Back Pain in Community Pharmacy: Development and Mixed Methods Feasibility Study", journal="JMIR Med Inform", year="2020", month="May", day="11", volume="8", number="5", pages="e17203", keywords="low back pain", keywords="community pharmacy", keywords="decision support systems, clinical", abstract="Background: People with low back pain (LBP) in the community often do not receive evidence-based advice and management. Community pharmacists can play an important role in supporting people with LBP as pharmacists are easily accessible to provide first-line care. However, previous research suggests that pharmacists may not consistently deliver advice that is concordant with guideline recommendations and may demonstrate difficulty determining which patients require prompt medical review. A clinical decision support system (CDSS) may enhance first-line care of LBP, but none exists to support the community pharmacist--client consultation. Objective: This study aimed to develop a CDSS to guide first-line care of LBP in the community pharmacy setting and to evaluate the pharmacist-reported usability and acceptance of the prototype system. Methods: A cross-platform Web app for the Apple iPad was developed in conjunction with academic and clinical experts using an iterative user-centered design process during interface design, clinical reasoning, program development, and evaluation. The CDSS was evaluated via one-to-one user-testing with 5 community pharmacists (5 case vignettes each). Data were collected via video recording, screen capture, survey instrument (system usability scale), and direct observation. Results: Pharmacists' agreement with CDSS-generated self-care recommendations was 90\% (18/20), with medicines recommendations was 100\% (25/25), and with referral advice was 88\% (22/25; total 70 recommendations). Pharmacists expressed uncertainty when screening for serious pathology in 40\% (10/25) of cases. Pharmacists requested more direction from the CDSS in relation to automated prompts for user input and page navigation. Overall system usability was rated as excellent (mean score 92/100, SD 6.5; 90th percentile compared with similar systems), with acceptance rated as good to excellent. Conclusions: A novel CDSS (high-fidelity prototype) to enhance pharmacist care of LBP was developed, underpinned by clinical practice guidelines and informed by a multidisciplinary team of experts. User-testing revealed a high level of usability and acceptance of the prototype system, with suggestions to improve interface prompts and information delivery. The small study sample limits the generalizability of the findings but offers important insights to inform the next stage of system development. ", doi="10.2196/17203", url="https://medinform.jmir.org/2020/5/e17203", url="http://www.ncbi.nlm.nih.gov/pubmed/32390593" } @Article{info:doi/10.2196/15992, author="Cao, Yang and Montgomery, Scott and Ottosson, Johan and N{\"a}slund, Erik and Stenberg, Erik", title="Deep Learning Neural Networks to Predict Serious Complications After Bariatric Surgery: Analysis of Scandinavian Obesity Surgery Registry Data", journal="JMIR Med Inform", year="2020", month="May", day="8", volume="8", number="5", pages="e15992", keywords="projections and predictions", keywords="deep learning", keywords="computational neural networks", keywords="bariatric surgery", keywords="postoperative complications", abstract="Background: Obesity is one of today's most visible public health problems worldwide. Although modern bariatric surgery is ostensibly considered safe, serious complications and mortality still occur in some patients. Objective: This study aimed to explore whether serious postoperative complications of bariatric surgery recorded in a national quality registry can be predicted preoperatively using deep learning methods. Methods: Patients who were registered in the Scandinavian Obesity Surgery Registry (SOReg) between 2010 and 2015 were included in this study. The patients who underwent a bariatric procedure between 2010 and 2014 were used as training data, and those who underwent a bariatric procedure in 2015 were used as test data. Postoperative complications were graded according to the Clavien-Dindo classification, and complications requiring intervention under general anesthesia or resulting in organ failure or death were considered serious. Three supervised deep learning neural networks were applied and compared in our study: multilayer perceptron (MLP), convolutional neural network (CNN), and recurrent neural network (RNN). The synthetic minority oversampling technique (SMOTE) was used to artificially augment the patients with serious complications. The performances of the neural networks were evaluated using accuracy, sensitivity, specificity, Matthews correlation coefficient, and area under the receiver operating characteristic curve. Results: In total, 37,811 and 6250 patients were used as the training data and test data, with incidence rates of serious complication of 3.2\% (1220/37,811) and 3.0\% (188/6250), respectively. When trained using the SMOTE data, the MLP appeared to have a desirable performance, with an area under curve (AUC) of 0.84 (95\% CI 0.83-0.85). However, its performance was low for the test data, with an AUC of 0.54 (95\% CI 0.53-0.55). The performance of CNN was similar to that of MLP. It generated AUCs of 0.79 (95\% CI 0.78-0.80) and 0.57 (95\% CI 0.59-0.61) for the SMOTE data and test data, respectively. Compared with the MLP and CNN, the RNN showed worse performance, with AUCs of 0.65 (95\% CI 0.64-0.66) and 0.55 (95\% CI 0.53-0.57) for the SMOTE data and test data, respectively. Conclusions: MLP and CNN showed improved, but limited, ability for predicting the postoperative serious complications after bariatric surgery in the Scandinavian Obesity Surgery Registry data. However, the overfitting issue is still apparent and needs to be overcome by incorporating intra- and perioperative information. ", doi="10.2196/15992", url="https://medinform.jmir.org/2020/5/e15992", url="http://www.ncbi.nlm.nih.gov/pubmed/32383681" } @Article{info:doi/10.2196/16225, author="Chun, Jaehyeong and Kim, Youngjun and Shin, Yoon Kyoung and Han, Hyup Sun and Oh, Yeul Sei and Chung, Tae-Young and Park, Kyung-Ah and Lim, Hui Dong", title="Deep Learning--Based Prediction of Refractive Error Using Photorefraction Images Captured by a Smartphone: Model Development and Validation Study", journal="JMIR Med Inform", year="2020", month="May", day="5", volume="8", number="5", pages="e16225", keywords="amblyopia", keywords="cycloplegic refraction", keywords="deep learning", keywords="deep convolutional neural network", keywords="mobile phone", keywords="photorefraction", keywords="refractive error", keywords="screening", abstract="Background: Accurately predicting refractive error in children is crucial for detecting amblyopia, which can lead to permanent visual impairment, but is potentially curable if detected early. Various tools have been adopted to more easily screen a large number of patients for amblyopia risk. Objective: For efficient screening, easy access to screening tools and an accurate prediction algorithm are the most important factors. In this study, we developed an automated deep learning--based system to predict the range of refractive error in children (mean age 4.32 years, SD 1.87 years) using 305 eccentric photorefraction images captured with a smartphone. Methods: Photorefraction images were divided into seven classes according to their spherical values as measured by cycloplegic refraction. Results: The trained deep learning model had an overall accuracy of 81.6\%, with the following accuracies for each refractive error class: 80.0\% for ??5.0 diopters (D), 77.8\% for >?5.0 D and ??3.0 D, 82.0\% for >?3.0 D and ??0.5 D, 83.3\% for >?0.5 D and <+0.5 D, 82.8\% for ?+0.5 D and <+3.0 D, 79.3\% for ?+3.0 D and <+5.0 D, and 75.0\% for ?+5.0 D. These results indicate that our deep learning--based system performed sufficiently accurately. Conclusions: This study demonstrated the potential of precise smartphone-based prediction systems for refractive error using deep learning and further yielded a robust collection of pediatric photorefraction images. ", doi="10.2196/16225", url="https://medinform.jmir.org/2020/5/e16225", url="http://www.ncbi.nlm.nih.gov/pubmed/32369035" } @Article{info:doi/10.2196/14330, author="Lanera, Corrado and Berchialla, Paola and Baldi, Ileana and Lorenzoni, Giulia and Tramontan, Lara and Scamarcia, Antonio and Cantarutti, Luigi and Giaquinto, Carlo and Gregori, Dario", title="Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study", journal="JMIR Med Inform", year="2020", month="May", day="5", volume="8", number="5", pages="e14330", keywords="machine learning technique", keywords="text mining", keywords="electronic health report", keywords="varicella zoster", keywords="pediatric infectious disease", abstract="Background: The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns. Objective: The purpose of this paper is to compare machine learning techniques in their application to EHR analysis for disease detection. Methods: The Pedianet database was used as a data source for a real-world scenario on the identification of cases of varicella. The models' training and test sets were based on two different Italian regions' (Veneto and Sicilia) data sets of 7631 patients and 1,230,355 records, and 2347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. Elastic-net regularized generalized linear model (GLMNet), maximum entropy (MAXENT), and LogitBoost (boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The document-term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than a 99\% sparsity ratio. Results: The highest predictive values were achieved through boosting (positive predicative value [PPV] 63.1, 95\% CI 42.7-83.5 and negative predicative value [NPV] 98.8, 95\% CI 98.3-99.3). GLMNet delivered superior predictive capability compared to MAXENT (PPV 24.5\% and NPV 98.3\% vs PPV 11.0\% and NPV 98.0\%). MAXENT and GLMNet predictions weakly agree with each other (agreement coefficient 1 [AC1]=0.60, 95\% CI 0.58-0.62), as well as with LogitBoost (MAXENT: AC1=0.64, 95\% CI 0.63-0.66 and GLMNet: AC1=0.53, 95\% CI 0.51-0.55). Conclusions: Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification. ", doi="10.2196/14330", url="https://medinform.jmir.org/2020/5/e14330", url="http://www.ncbi.nlm.nih.gov/pubmed/32369038" } @Article{info:doi/10.2196/15516, author="Zhang, Weina and Liu, Han and Silenzio, Bernard Vincent Michael and Qiu, Peiyuan and Gong, Wenjie", title="Machine Learning Models for the Prediction of Postpartum Depression: Application and Comparison Based on a Cohort Study", journal="JMIR Med Inform", year="2020", month="Apr", day="30", volume="8", number="4", pages="e15516", keywords="depression", keywords="postpartum", keywords="machine learning", keywords="support vector machine", keywords="random forest", keywords="prediction model", abstract="Background: Postpartum depression (PPD) is a serious public health problem. Building a predictive model for PPD using data during pregnancy can facilitate earlier identification and intervention. Objective: The aims of this study are to compare the effects of four different machine learning models using data during pregnancy to predict PPD and explore which factors in the model are the most important for PPD prediction. Methods: Information on the pregnancy period from a cohort of 508 women, including demographics, social environmental factors, and mental health, was used as predictors in the models. The Edinburgh Postnatal Depression Scale score within 42 days after delivery was used as the outcome indicator. Using two feature selection methods (expert consultation and random forest-based filter feature selection [FFS-RF]) and two algorithms (support vector machine [SVM] and random forest [RF]), we developed four different machine learning PPD prediction models and compared their prediction effects. Results: There was no significant difference in the effectiveness of the two feature selection methods in terms of model prediction performance, but 10 fewer factors were selected with the FFS-RF than with the expert consultation method. The model based on SVM and FFS-RF had the best prediction effects (sensitivity=0.69, area under the curve=0.78). In the feature importance ranking output by the RF algorithm, psychological elasticity, depression during the third trimester, and income level were the most important predictors. Conclusions: In contrast to the expert consultation method, FFS-RF was important in dimension reduction. When the sample size is small, the SVM algorithm is suitable for predicting PPD. In the prevention of PPD, more attention should be paid to the psychological resilience of mothers. ", doi="10.2196/15516", url="http://medinform.jmir.org/2020/4/e15516/", url="http://www.ncbi.nlm.nih.gov/pubmed/32352387" } @Article{info:doi/10.2196/16848, author="Chen, Ji and Chokshi, Sara and Hegde, Roshini and Gonzalez, Javier and Iturrate, Eduardo and Aphinyanaphongs, Yin and Mann, Devin", title="Development, Implementation, and Evaluation of a Personalized Machine Learning Algorithm for Clinical Decision Support: Case Study With Shingles Vaccination", journal="J Med Internet Res", year="2020", month="Apr", day="29", volume="22", number="4", pages="e16848", keywords="clinical decision support", keywords="machine learning", keywords="alert fatigue", keywords="implementation science", abstract="Background: Although clinical decision support (CDS) alerts are effective reminders of best practices, their effectiveness is blunted by clinicians who fail to respond to an overabundance of inappropriate alerts. An electronic health record (EHR)--integrated machine learning (ML) algorithm is a potentially powerful tool to increase the signal-to-noise ratio of CDS alerts and positively impact the clinician's interaction with these alerts in general. Objective: This study aimed to describe the development and implementation of an ML-based signal-to-noise optimization system (SmartCDS) to increase the signal of alerts by decreasing the volume of low-value herpes zoster (shingles) vaccination alerts. Methods: We built and deployed SmartCDS, which builds personalized user activity profiles to suppress shingles vaccination alerts unlikely to yield a clinician's interaction. We extracted all records of shingles alerts from January 2017 to March 2019 from our EHR system, including 327,737 encounters, 780 providers, and 144,438 patients. Results: During the 6 weeks of pilot deployment, the SmartCDS system suppressed an average of 43.67\% (15,425/35,315) potential shingles alerts (appointments) and maintained stable counts of weekly shingles vaccination orders (326.3 with system active vs 331.3 in the control group; P=.38) and weekly user-alert interactions (1118.3 with system active vs 1166.3 in the control group; P=.20). Conclusions: All key statistics remained stable while the system was turned on. Although the results are promising, the characteristics of the system can be subject to future data shifts, which require automated logging and monitoring. We demonstrated that an automated, ML-based method and data architecture to suppress alerts are feasible without detriment to overall order rates. This work is the first alert suppression ML-based model deployed in practice and serves as foundational work in encounter-level customization of alert display to maximize effectiveness. ", doi="10.2196/16848", url="http://www.jmir.org/2020/4/e16848/", url="http://www.ncbi.nlm.nih.gov/pubmed/32347813" } @Article{info:doi/10.2196/17550, author="Chishti, Shahrukh and Jaggi, Raj Karan and Saini, Anuj and Agarwal, Gaurav and Ranjan, Ashish", title="Artificial Intelligence-Based Differential Diagnosis: Development and Validation of a Probabilistic Model to Address Lack of Large-Scale Clinical Datasets", journal="J Med Internet Res", year="2020", month="Apr", day="28", volume="22", number="4", pages="e17550", keywords="artificial intelligence", keywords="medical diagnosis", keywords="probabilistic modeling", keywords="Bayesian model", keywords="machine learning", abstract="Background: Machine-learning or deep-learning algorithms for clinical diagnosis are inherently dependent on the availability of large-scale clinical datasets. Lack of such datasets and inherent problems such as overfitting often necessitate the development of innovative solutions. Probabilistic modeling closely mimics the rationale behind clinical diagnosis and represents a unique solution. Objective: The aim of this study was to develop and validate a probabilistic model for differential diagnosis in different medical domains. Methods: Numerical values of symptom-disease associations were utilized to mathematically represent medical domain knowledge. These values served as the core engine for the probabilistic model. For the given set of symptoms, the model was utilized to produce a ranked list of differential diagnoses, which was compared to the differential diagnosis constructed by a physician in a consult. Practicing medical specialists were integral in the development and validation of this model. Clinical vignettes (patient case studies) were utilized to compare the accuracy of doctors and the model against the assumed gold standard. The accuracy analysis was carried out over the following metrics: top 3 accuracy, precision, and recall. Results: The model demonstrated a statistically significant improvement (P=.002) in diagnostic accuracy (85\%) as compared to the doctors' performance (67\%). This advantage was retained across all three categories of clinical vignettes: 100\% vs 82\% (P<.001) for highly specific disease presentation, 83\% vs 65\% for moderately specific disease presentation (P=.005), and 72\% vs 49\% (P<.001) for nonspecific disease presentation. The model performed slightly better than the doctors' average in precision (62\% vs 60\%, P=.43) but there was no improvement with respect to recall (53\% vs 56\%, P=.27). However, neither difference was statistically significant. Conclusions: The present study demonstrates a drastic improvement over previously reported results that can be attributed to the development of a stable probabilistic framework utilizing symptom-disease associations to mathematically represent medical domain knowledge. The current iteration relies on static, manually curated values for calculating the degree of association. Shifting to real-world data--derived values represents the next step in model development. ", doi="10.2196/17550", url="https://www.jmir.org/2020/4/e17550", url="http://www.ncbi.nlm.nih.gov/pubmed/32343256" } @Article{info:doi/10.2196/14710, author="Park, Phillip and Shin, Soo-Yong and Park, Yun Seog and Yun, Jeonghee and Shin, Chulmin and Jung, Jipmin and Choi, Son Kui and Cha, Soung Hyo", title="Next-Generation Sequencing--Based Cancer Panel Data Conversion Using International Standards to Implement a Clinical Next-Generation Sequencing Research System: Single-Institution Study", journal="JMIR Med Inform", year="2020", month="Apr", day="24", volume="8", number="4", pages="e14710", keywords="data standardization", keywords="clinical sequencing data", keywords="next-generation sequencing", keywords="translational research information system", abstract="Background: The analytical capacity and speed of next-generation sequencing (NGS) technology have been improved. Many genetic variants associated with various diseases have been discovered using NGS. Therefore, applying NGS to clinical practice results in precision or personalized medicine. However, as clinical sequencing reports in electronic health records (EHRs) are not structured according to recommended standards, clinical decision support systems have not been fully utilized. In addition, integrating genomic data with clinical data for translational research remains a great challenge. Objective: To apply international standards to clinical sequencing reports and to develop a clinical research information system to integrate standardized genomic data with clinical data. Methods: We applied the recently published ISO/TS 20428 standard to 367 clinical sequencing reports generated by panel (91 genes) sequencing in EHRs and implemented a clinical NGS research system by extending the clinical data warehouse to integrate the necessary clinical data for each patient. We also developed a user interface with a clinical research portal and an NGS result viewer. Results: A single clinical sequencing report with 28 items was restructured into four database tables and 49 entities. As a result, 367 patients' clinical sequencing data were connected with clinical data in EHRs, such as diagnosis, surgery, and death information. This system can support the development of cohort or case-control datasets as well. Conclusions: The standardized clinical sequencing data are not only for clinical practice and could be further applied to translational research. ", doi="10.2196/14710", url="http://medinform.jmir.org/2020/4/e14710/", url="http://www.ncbi.nlm.nih.gov/pubmed/32329738" } @Article{info:doi/10.2196/18948, author="Gong, Mengchun and Liu, Li and Sun, Xin and Yang, Yue and Wang, Shuang and Zhu, Hong", title="Cloud-Based System for Effective Surveillance and Control of COVID-19: Useful Experiences From Hubei, China", journal="J Med Internet Res", year="2020", month="Apr", day="22", volume="22", number="4", pages="e18948", keywords="COVID-19", keywords="cloud system", keywords="syndromic surveillance", keywords="clinical decision support", keywords="stakeholders involvement", keywords="pandemic", keywords="medical informatics", abstract="Background: Coronavirus disease (COVID-19) has been an unprecedented challenge to the global health care system. Tools that can improve the focus of surveillance efforts and clinical decision support are of paramount importance. Objective: The aim of this study was to illustrate how new medical informatics technologies may enable effective control of the pandemic through the development and successful 72-hour deployment of the Honghu Hybrid System (HHS) for COVID-19 in the city of Honghu in Hubei, China. Methods: The HHS was designed for the collection, integration, standardization, and analysis of COVID-19-related data from multiple sources, which includes a case reporting system, diagnostic labs, electronic medical records, and social media on mobile devices. Results: HHS supports four main features: syndromic surveillance on mobile devices, policy-making decision support, clinical decision support and prioritization of resources, and follow-up of discharged patients. The syndromic surveillance component in HHS covered over 95\% of the population of over 900,000 people and provided near real time evidence for the control of epidemic emergencies. The clinical decision support component in HHS was also provided to improve patient care and prioritize the limited medical resources. However, the statistical methods still require further evaluations to confirm clinical effectiveness and appropriateness of disposition assigned in this study, which warrants further investigation. Conclusions: The facilitating factors and challenges are discussed to provide useful insights to other cities to build suitable solutions based on cloud technologies. The HHS for COVID-19 was shown to be feasible and effective in this real-world field study, and has the potential to be migrated. ", doi="10.2196/18948", url="http://www.jmir.org/2020/4/e18948/", url="http://www.ncbi.nlm.nih.gov/pubmed/32287040" } @Article{info:doi/10.2196/16069, author="Chapman, B. Kenneth and Pas, M. Martijn and Abrar, Diana and Day, Wesley and Vissers, C. Kris and van Helmond, Noud", title="Development and Performance of a Web-Based Tool to Adjust Urine Toxicology Testing Frequency: Retrospective Study", journal="JMIR Med Inform", year="2020", month="Apr", day="22", volume="8", number="4", pages="e16069", keywords="Urine drug testing", keywords="Opioid therapy", keywords="Chronic noncancer pain", abstract="Background: Several pain management guidelines recommend regular urine drug testing (UDT) in patients who are being treated with chronic opioid analgesic therapy (COAT) to monitor compliance and improve safety. Guidelines also recommend more frequent testing in patients who are at high risk of adverse events related to COAT; however, there is no consensus on how to identify high-risk patients or on the testing frequency that should be used. Using previously described clinical risk factors for UDT results that are inconsistent with the prescribed COAT, we developed a web-based tool to adjust drug testing frequency in patients treated with COAT. Objective: The objective of this study was to evaluate a risk stratification tool, the UDT Randomizer, to adjust UDT frequency in patients treated with COAT. Methods: Patients were stratified using an algorithm based on readily available clinical risk factors into categories of presumed low, moderate, high, and high+ risk of presenting with UDT results inconsistent with the prescribed COAT. The algorithm was integrated in a website to facilitate adoption across practice sites. To test the performance of this algorithm, we performed a retrospective analysis of patients treated with COAT between June 2016 and June 2017. The primary outcome was compliance with the prescribed COAT as defined by UDT results consistent with the prescribed COAT. Results: 979 drug tests (867 UDT, 88.6\%; 112 oral fluid testing, 11.4\%) were performed in 320 patients. An inconsistent drug test result was registered in 76/979 tests (7.8\%). The incidences of inconsistent test results across the risk tool categories were 7/160 (4.4\%) in the low risk category, 32/349 (9.2\%) in the moderate risk category, 28/338 (8.3\%) in the high risk category, and 9/132 (6.8\%) in the high+ risk category. Generalized estimating equation analysis demonstrated that the moderate risk (odds ratio (OR) 2.1, 95\% CI 0.9-5.0; P=.10), high risk (OR 2.0, 95\% CI 0.8-5.0; P=.14), and high risk+ (OR 2.0, 95\% CI 0.7-5.6; P=.20) categories were associated with a nonsignificantly increased risk of inconsistency vs the low risk category. Conclusions: The developed tool stratified patients during individual visits into risk categories of presenting with drug testing results inconsistent with the prescribed COAT; the higher risk categories showed nonsignificantly higher risk compared to the low risk category. Further development of the tool with additional risk factors in a larger cohort may further clarify and enhance its performance. ", doi="10.2196/16069", url="http://medinform.jmir.org/2020/4/e16069/", url="http://www.ncbi.nlm.nih.gov/pubmed/32319958" } @Article{info:doi/10.2196/13188, author="Janssen, Anna and Donnelly, Candice and Kay, Judy and Thiem, Peter and Saavedra, Aldo and Pathmanathan, Nirmala and Elder, Elisabeth and Dinh, Phuong and Kabir, Masrura and Jackson, Kirsten and Harnett, Paul and Shaw, Tim", title="Developing an Intranet-Based Lymphedema Dashboard for Breast Cancer Multidisciplinary Teams: Design Research Study", journal="J Med Internet Res", year="2020", month="Apr", day="21", volume="22", number="4", pages="e13188", keywords="eHealth", keywords="clinical informatics", keywords="human-centered design", keywords="data visualization", abstract="Background: A large quantity of data is collected during the delivery of cancer care. However, once collected, these data are difficult for health professionals to access to support clinical decision making and performance review. There is a need for innovative tools that make clinical data more accessible to support health professionals in these activities. One approach for providing health professionals with access to clinical data is to create the infrastructure and interface for a clinical dashboard to make data accessible in a timely and relevant manner. Objective: This study aimed to develop and evaluate 2 prototype dashboards for displaying data on the identification and management of lymphedema. Methods: The study used a co-design framework to develop 2 prototype dashboards for use by health professionals delivering breast cancer care. The key feature of these dashboards was an approach for visualizing lymphedema patient cohort and individual patient data. This project began with 2 focus group sessions conducted with members of a breast cancer multidisciplinary team (n=33) and a breast cancer consumer (n=1) to establish clinically relevant and appropriate data for presentation and the visualization requirements for a dashboard. A series of fortnightly meetings over 6 months with an Advisory Committee (n=10) occurred to inform and refine the development of a static mock-up dashboard. This mock-up was then presented to representatives of the multidisciplinary team (n=3) to get preliminary feedback about the design and use of such dashboards. Feedback from these presentations was reviewed and used to inform the development of the interactive prototypes. A structured evaluation was conducted on the prototypes, using Think Aloud Protocol and semistructured interviews with representatives of the multidisciplinary team (n=5). Results: Lymphedema was selected as a clinically relevant area for the prototype dashboards. A qualitative evaluation is reported for 5 health professionals. These participants were selected from 3 specialties: surgery (n=1), radiation oncology (n=2), and occupational therapy (n=2). Participants were able to complete the majority of tasks on the dashboard. Semistructured interview themes were categorized into engagement or enthusiasm for the dashboard, user experience, and data quality and completeness. Conclusions: Findings from this study constitute the first report of a co-design process for creating a lymphedema dashboard for breast cancer health professionals. Health professionals are interested in the use of data visualization tools to make routinely collected clinical data more accessible. To be used effectively, dashboards need to be reliable and sourced from accurate and comprehensive data sets. While the co-design process used to develop the visualization tool proved effective for designing an individual patient dashboard, the complexity and accessibility of the data required for a cohort dashboard remained a challenge. ", doi="10.2196/13188", url="https://www.jmir.org/2020/4/e13188", url="http://www.ncbi.nlm.nih.gov/pubmed/32314968" } @Article{info:doi/10.2196/16533, author="Nasseh, Daniel and Schneiderbauer, Sophie and Lange, Michael and Schweizer, Diana and Heinemann, Volker and Belka, Claus and Cadenovic, Ranko and Buysse, Laurence and Erickson, Nicole and Mueller, Michael and Kortuem, Karsten and Niyazi, Maximilian and Marschner, Sebastian and Fey, Theres", title="Optimizing the Analytical Value of Oncology-Related Data Based on an In-Memory Analysis Layer: Development and Assessment of the Munich Online Comprehensive Cancer Analysis Platform", journal="J Med Internet Res", year="2020", month="Apr", day="17", volume="22", number="4", pages="e16533", keywords="oncology", keywords="database management systems", keywords="data visualization", keywords="usability", abstract="Background: Many comprehensive cancer centers incorporate tumor documentation software supplying structured information from the associated centers' oncology patients for internal and external audit purposes. However, much of the documentation data included in these systems often remain unused and unknown by most of the clinicians at the sites. Objective: To improve access to such data for analytical purposes, a prerollout of an analysis layer based on the business intelligence software QlikView was implemented. This software allows for the real-time analysis and inspection of oncology-related data. The system is meant to increase access to the data while simultaneously providing tools for user-friendly real-time analytics. Methods: The system combines in-memory capabilities (based on QlikView software) with innovative techniques that compress the complexity of the data, consequently improving its readability as well as its accessibility for designated end users. Aside from the technical and conceptual components, the software's implementation necessitated a complex system of permission and governance. Results: A continuously running system including daily updates with a user-friendly Web interface and real-time usage was established. This paper introduces its main components and major design ideas. A commented video summarizing and presenting the work can be found within the Multimedia Appendix. Conclusions: The system has been well-received by a focus group of physicians within an initial prerollout. Aside from improving data transparency, the system's main benefits are its quality and process control capabilities, knowledge discovery, and hypothesis generation. Limitations such as run time, governance, or misinterpretation of data are considered. ", doi="10.2196/16533", url="https://www.jmir.org/2020/4/e16533", url="http://www.ncbi.nlm.nih.gov/pubmed/32077858" } @Article{info:doi/10.2196/17366, author="Xu, Huiyu and Feng, Guoshuang and Wei, Yuan and Feng, Ying and Yang, Rui and Wang, Liying and Zhang, Hongxia and Li, Rong and Qiao, Jie", title="Predicting Ectopic Pregnancy Using Human Chorionic Gonadotropin (hCG) Levels and Main Cause of Infertility in Women Undergoing Assisted Reproductive Treatment: Retrospective Observational Cohort Study", journal="JMIR Med Inform", year="2020", month="Apr", day="16", volume="8", number="4", pages="e17366", keywords="$\beta$-hCG", keywords="ectopic pregnancy", keywords="intrauterine pregnancy", keywords="biochemical pregnancies", keywords="IVF/ICSI-ET", abstract="Background: Ectopic pregnancy (EP) is a serious complication of assisted reproductive technology (ART). However, there is no acknowledged mathematical model for predicting EP in the ART population. Objective: The goal of the research was to establish a model to tailor treatment for women with a higher risk of EP. Methods: From December 2015 to July 2016, we retrospectively included 1703 women whose serum human chorionic gonadotropin (hCG) levels were positive on day 21 (hCG21) after fresh embryo transfer. Multivariable multinomial logistic regression was used to predict EP, intrauterine pregnancy (IUP), and biochemical pregnancy (BCP). Results: The variables included in the final predicting model were (hCG21, ratio of hCG21/hCG14, and main cause of infertility). During evaluation of the model, the areas under the receiver operating curve for IUP, EP, and BCP were 0.978, 0.962, and 0.999, respectively, in the training set, and 0.963, 0.942, and 0.996, respectively, in the validation set. The misclassification rates were 0.038 and 0.045, respectively, in the training and validation sets. Our model classified the whole in vitro fertilization/intracytoplasmic sperm injection--embryo transfer population into four groups: first, the low-risk EP group, with incidence of EP of 0.52\% (0.23\%-1.03\%); second, a predicted BCP group, with incidence of EP of 5.79\% (1.21\%-15.95\%); third, a predicted undetermined group, with incidence of EP of 28.32\% (21.10\%-35.53\%), and fourth, a predicted high-risk EP group, with incidence of EP of 64.11\% (47.22\%-78.81\%). Conclusions: We have established a model to sort the women undergoing ART into four groups according to their incidence of EP in order to reduce the medical resources spent on women with low-risk EP and provide targeted tailor-made treatment for women with a higher risk of EP. ", doi="10.2196/17366", url="http://medinform.jmir.org/2020/4/e17366/", url="http://www.ncbi.nlm.nih.gov/pubmed/32297865" } @Article{info:doi/10.2196/16749, author="Hu, Fang and Li, Liuhuan and Huang, Xiaoyu and Yan, Xingyu and Huang, Panpan", title="Symptom Distribution Regularity of Insomnia: Network and Spectral Clustering Analysis", journal="JMIR Med Inform", year="2020", month="Apr", day="16", volume="8", number="4", pages="e16749", keywords="insomnia", keywords="core symptom", keywords="symptom community", keywords="symptom embedding representation", keywords="spectral clustering algorithm", abstract="Background: Recent research in machine-learning techniques has led to signi?cant progress in various research ?elds. In particular, knowledge discovery using this method has become a hot topic in traditional Chinese medicine. As the key clinical manifestations of patients, symptoms play a signi?cant role in clinical diagnosis and treatment, which evidently have their underlying traditional Chinese medicine mechanisms. Objective: We aimed to explore the core symptoms and potential regularity of symptoms for diagnosing insomnia to reveal the key symptoms, hidden relationships underlying the symptoms, and their corresponding syndromes. Methods: An insomnia dataset with 807 samples was extracted from real-world electronic medical records. After cleaning and selecting the theme data referring to the syndromes and symptoms, the symptom network analysis model was constructed using complex network theory. We used four evaluation metrics of node centrality to discover the core symptom nodes from multiple aspects. To explore the hidden relationships among symptoms, we trained each symptom node in the network to obtain the symptom embedding representation using the Skip-Gram model and node embedding theory. After acquiring the symptom vocabulary in a digital vector format, we calculated the similarities between any two symptom embeddings, and clustered these symptom embeddings into five communities using the spectral clustering algorithm. Results: The top five core symptoms of insomnia diagnosis, including difficulty falling asleep, easy to wake up at night, dysphoria and irascibility, forgetful, and spiritlessness and weakness, were identified using evaluation metrics of node centrality. The symptom embeddings with hidden relationships were constructed, which can be considered as the basic dataset for future insomnia research. The symptom network was divided into five communities, and these symptoms were accurately categorized into their corresponding syndromes. Conclusions: These results highlight that network and clustering analyses can objectively and effectively find the key symptoms and relationships among symptoms. Identification of the symptom distribution and symptom clusters of insomnia further provide valuable guidance for clinical diagnosis and treatment. ", doi="10.2196/16749", url="http://medinform.jmir.org/2020/4/e16749/", url="http://www.ncbi.nlm.nih.gov/pubmed/32297869" } @Article{info:doi/10.2196/18402, author="Essay, Patrick and Mosier, Jarrod and Subbian, Vignesh", title="Rule-Based Cohort Definitions for Acute Respiratory Failure: Electronic Phenotyping Algorithm", journal="JMIR Med Inform", year="2020", month="Apr", day="15", volume="8", number="4", pages="e18402", keywords="computable phenotype", keywords="electronic health record", keywords="intensive care units", keywords="critical care informatics", keywords="telemedicine", keywords="respiratory", abstract="Background: Acute respiratory failure is generally treated with invasive mechanical ventilation or noninvasive respiratory support strategies. The efficacies of the various strategies are not fully understood. There is a need for accurate therapy-based phenotyping for secondary analyses of electronic health record data to answer research questions regarding respiratory management and outcomes with each strategy. Objective: The objective of this study was to address knowledge gaps related to ventilation therapy strategies across diverse patient populations by developing an algorithm for accurate identification of patients with acute respiratory failure. To accomplish this objective, our goal was to develop rule-based computable phenotypes for patients with acute respiratory failure using remotely monitored intensive care unit (tele-ICU) data. This approach permits analyses by ventilation strategy across broad patient populations of interest with the ability to sub-phenotype as research questions require. Methods: Tele-ICU data from ?200 hospitals were used to create a rule-based algorithm for phenotyping patients with acute respiratory failure, defined as an adult patient requiring invasive mechanical ventilation or a noninvasive strategy. The dataset spans a wide range of hospitals and ICU types across all US regions. Structured clinical data, including ventilation therapy start and stop times, medication records, and nurse and respiratory therapy charts, were used to define clinical phenotypes. All adult patients of any diagnoses with record of ventilation therapy were included. Patients were categorized by ventilation type, and analysis of event sequences using record timestamps defined each phenotype. Manual validation was performed on 5\% of patients in each phenotype. Results: We developed 7 phenotypes: (0) invasive mechanical ventilation, (1) noninvasive positive-pressure ventilation, (2) high-flow nasal insufflation, (3) noninvasive positive-pressure ventilation subsequently requiring intubation, (4) high-flow nasal insufflation subsequently requiring intubation, (5) invasive mechanical ventilation with extubation to noninvasive positive-pressure ventilation, and (6) invasive mechanical ventilation with extubation to high-flow nasal insufflation. A total of 27,734 patients met our phenotype criteria and were categorized into these ventilation subgroups. Manual validation of a random selection of 5\% of records from each phenotype resulted in a total accuracy of 88\% and a precision and recall of 0.8789 and 0.8785, respectively, across all phenotypes. Individual phenotype validation showed that the algorithm categorizes patients particularly well but has challenges with patients that require ?2 management strategies. Conclusions: Our proposed computable phenotyping algorithm for patients with acute respiratory failure effectively identifies patients for therapy-focused research regardless of admission diagnosis or comorbidities and allows for management strategy comparisons across populations of interest. ", doi="10.2196/18402", url="http://medinform.jmir.org/2020/4/e18402/", url="http://www.ncbi.nlm.nih.gov/pubmed/32293579" } @Article{info:doi/10.2196/15124, author="Choi, I. Youn and Kim, Jae Yoon and Chung, Jun-Won and Kim, Oh Kyoung and Kim, Hakki and Park, Woong Rae and Park, Kyun Dong", title="Effect of Age on the Initiation of Biologic Agent Therapy in Patients With Inflammatory Bowel Disease: Korean Common Data Model Cohort Study", journal="JMIR Med Inform", year="2020", month="Apr", day="15", volume="8", number="4", pages="e15124", keywords="ulcerative colitis", keywords="Crohn's disease", keywords="early-onset", keywords="late-onset", keywords="common data model", abstract="Background: The Observational Health Data Sciences and Informatics (OHDSI) network is an international collaboration established to apply open-source data analytics to a large network of health databases, including the Korean common data model (K-CDM) network. Objective: The aim of this study is to analyze the effect that age at diagnosis has on the prognosis of inflammatory bowel disease (IBD) in Korea using a CDM network database. Methods: We retrospectively analyzed the K-CDM network database from 2005 to 2015. We transformed the electronic medical record into the CDM version 5.0 used in OHDSI. A worsened IBD prognosis was defined as the initiation of therapy with biologic agents, including infliximab and adalimumab. To evaluate the effect that age at diagnosis had on the prognosis of IBD, we divided the patients into an early-onset (EO) IBD group (age at diagnosis <40 years) and a late-onset (LO) IBD group (age at diagnosis ?40 years) with the cutoff value of age at diagnosis as 40 years, which was calculated using the Youden index method. We then used the logrank test and Cox proportional hazards model to analyze the effect that age at diagnosis (EO group vs LO group) had on the prognosis in patients with IBD. Results: A total of 3480 patients were enrolled. There was 2017 patients with ulcerative colitis (UC) and 1463 with Crohn's disease (CD). The median follow up period was 109.5 weeks. The EO UC group was statistically significant and showed less event-free survival (ie, experiences of biologic agents) than the LO UC group (P<.001). In CD, the EO CD group showed less event-free survival (ie, experiences of biologic agents) than the LO CD group. In the Cox proportional hazard analysis, the odds ratio (OR) of the EO UC group on experiences of biologic agents compared with the LO UC group was 2.3 (95\% CI 1.3-3.8, P=.002). The OR of the EO CD group on experiences of biologic agents compared with the LO CD group was 5.4 (95\% CI 1.9-14.9, P=.001). Conclusions: The EO IBD group showed a worse prognosis than the LO IBD group in Korean patients with IBD. In addition, this study successfully verified the CDM model in gastrointestinal research. ", doi="10.2196/15124", url="https://medinform.jmir.org/2020/4/e15124", url="http://www.ncbi.nlm.nih.gov/pubmed/32293578" } @Article{info:doi/10.2196/13075, author="Peng, Junfeng and Chen, Chuan and Zhou, Mi and Xie, Xiaohua and Zhou, Yuqi and Luo, Ching-Hsing", title="Peak Outpatient and Emergency Department Visit Forecasting for Patients With Chronic Respiratory Diseases Using Machine Learning Methods: Retrospective Cohort Study", journal="JMIR Med Inform", year="2020", month="Mar", day="30", volume="8", number="3", pages="e13075", keywords="chronic respiratory diseases", keywords="ensemble machine learning", keywords="health forecasting", keywords="outpatient and emergency departments management", abstract="Background: The overcrowding of hospital outpatient and emergency departments (OEDs) due to chronic respiratory diseases in certain weather or under certain environmental pollution conditions results in the degradation in quality of medical care, and even limits its availability. Objective: To help OED managers to schedule medical resource allocation during times of excessive health care demands after short-term fluctuations in air pollution and weather, we employed machine learning (ML) methods to predict the peak OED arrivals of patients with chronic respiratory diseases. Methods: In this paper, we first identified 13,218 visits from patients with chronic respiratory diseases to OEDs in hospitals from January 1, 2016, to December 31, 2017. Then, we divided the data into three datasets: weather-based visits, air quality-based visits, and weather air quality-based visits. Finally, we developed ML methods to predict the peak event (peak demand days) of patients with chronic respiratory diseases (eg, asthma, respiratory infection, and chronic obstructive pulmonary disease) visiting OEDs on the three weather data and environmental pollution datasets in Guangzhou, China. Results: The adaptive boosting-based neural networks, tree bag, and random forest achieved the biggest receiver operating characteristic area under the curve, 0.698, 0.714, and 0.809, on the air quality dataset, the weather dataset, and weather air quality dataset, respectively. Overall, random forests reached the best classification prediction performance. Conclusions: The proposed ML methods may act as a useful tool to adapt medical services in advance by predicting the peak of OED arrivals. Further, the developed ML methods are generic enough to cope with similar medical scenarios, provided that the data is available. ", doi="10.2196/13075", url="http://medinform.jmir.org/2020/3/e13075/", url="http://www.ncbi.nlm.nih.gov/pubmed/32224488" } @Article{info:doi/10.2196/16117, author="Lee, Sungjoo and Hong, Sungjun and Cha, Chul Won and Kim, Kyunga", title="Predicting Adverse Outcomes for Febrile Patients in the Emergency Department Using Sparse Laboratory Data: Development of a Time Adaptive Model", journal="JMIR Med Inform", year="2020", month="Mar", day="26", volume="8", number="3", pages="e16117", keywords="order status", keywords="sparse laboratory data", keywords="time adaptive model", keywords="emergency department", keywords="adverse outcome", keywords="machine learning", keywords="imbalanced data", abstract="Background: A timely decision in the initial stages for patients with an acute illness is important. However, only a few studies have determined the prognosis of patients based on insufficient laboratory data during the initial stages of treatment. Objective: This study aimed to develop and validate time adaptive prediction models to predict the severity of illness in the emergency department (ED) using highly sparse laboratory test data (test order status and test results) and a machine learning approach. Methods: This retrospective study used ED data from a tertiary academic hospital in Seoul, Korea. Two different models were developed based on laboratory test data: order status only (OSO) and order status and results (OSR) models. A binary composite adverse outcome was used, including mortality or hospitalization in the intensive care unit. Both models were evaluated using various performance criteria, including the area under the receiver operating characteristic curve (AUC) and balanced accuracy (BA). Clinical usefulness was examined by determining the positive likelihood ratio (PLR) and negative likelihood ratio (NLR). Results: Of 9491 eligible patients in the ED (mean age, 55.2 years, SD 17.7 years; 4839/9491, 51.0\% women), the model development cohort and validation cohort included 6645 and 2846 patients, respectively. The OSR model generally exhibited better performance (AUC=0.88, BA=0.81) than the OSO model (AUC=0.80, BA=0.74). The OSR model was more informative than the OSO model to predict patients at low or high risk of adverse outcomes (P<.001 for differences in both PLR and NLR). Conclusions: Early-stage adverse outcomes for febrile patients could be predicted using machine learning models of highly sparse data including test order status and laboratory test results. This prediction tool could help medical professionals who are simultaneously treating the same patient share information, lead dynamic communication, and consequently prevent medical errors. ", doi="10.2196/16117", url="http://medinform.jmir.org/2020/3/e16117/", url="http://www.ncbi.nlm.nih.gov/pubmed/32213477" } @Article{info:doi/10.2196/17110, author="Yu, Cheng-Sheng and Lin, Yu-Jiun and Lin, Chang-Hsien and Wang, Sen-Te and Lin, Shiyng-Yu and Lin, H. Sanders and Wu, L. Jenny and Chang, Shy-Shin", title="Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study", journal="JMIR Med Inform", year="2020", month="Mar", day="23", volume="8", number="3", pages="e17110", keywords="machine learning", keywords="decision tree", keywords="controlled attenuation parameter technology", keywords="metabolic syndrome", abstract="Background: Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. Objective: We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. Methods: Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. Results: Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. Conclusions: Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy. ", doi="10.2196/17110", url="http://medinform.jmir.org/2020/3/e17110/", url="http://www.ncbi.nlm.nih.gov/pubmed/32202504" } @Article{info:doi/10.2196/16374, author="Rongali, Subendhu and Rose, J. Adam and McManus, D. David and Bajracharya, S. Adarsha and Kapoor, Alok and Granillo, Edgard and Yu, Hong", title="Learning Latent Space Representations to Predict Patient Outcomes: Model Development and Validation", journal="J Med Internet Res", year="2020", month="Mar", day="23", volume="22", number="3", pages="e16374", keywords="predictive modeling", keywords="neural networks", keywords="ablation", keywords="patient mortality", abstract="Background: Scalable and accurate health outcome prediction using electronic health record (EHR) data has gained much attention in research recently. Previous machine learning models mostly ignore relations between different types of clinical data (ie, laboratory components, International Classification of Diseases codes, and medications). Objective: This study aimed to model such relations and build predictive models using the EHR data from intensive care units. We developed innovative neural network models and compared them with the widely used logistic regression model and other state-of-the-art neural network models to predict the patient's mortality using their longitudinal EHR data. Methods: We built a set of neural network models that we collectively called as long short-term memory (LSTM) outcome prediction using comprehensive feature relations or in short, CLOUT. Our CLOUT models use a correlational neural network model to identify a latent space representation between different types of discrete clinical features during a patient's encounter and integrate the latent representation into an LSTM-based predictive model framework. In addition, we designed an ablation experiment to identify risk factors from our CLOUT models. Using physicians' input as the gold standard, we compared the risk factors identified by both CLOUT and logistic regression models. Results: Experiments on the Medical Information Mart for Intensive Care-III dataset (selected patient population: 7537) show that CLOUT (area under the receiver operating characteristic curve=0.89) has surpassed logistic regression (0.82) and other baseline NN models (<0.86). In addition, physicians' agreement with the CLOUT-derived risk factor rankings was statistically significantly higher than the agreement with the logistic regression model. Conclusions: Our results support the applicability of CLOUT for real-world clinical use in identifying patients at high risk of mortality. ", doi="10.2196/16374", url="http://www.jmir.org/2020/3/e16374/", url="http://www.ncbi.nlm.nih.gov/pubmed/32202503" } @Article{info:doi/10.2196/16349, author="Kim, Junetae and Park, Rang Yu and Lee, Hoon Jeong and Lee, Jae-Ho and Kim, Young-Hak and Huh, Won Jin", title="Development of a Real-Time Risk Prediction Model for In-Hospital Cardiac Arrest in Critically Ill Patients Using Deep Learning: Retrospective Study", journal="JMIR Med Inform", year="2020", month="Mar", day="18", volume="8", number="3", pages="e16349", keywords="deep learning", keywords="cardiac arrest", keywords="Weibull distribution", keywords="forecasting", keywords="intensive care units", keywords="gated recurrent unit", abstract="Background: Cardiac arrest is the most serious death-related event in intensive care units (ICUs), but it is not easily predicted because of the complex and time-dependent data characteristics of intensive care patients. Given the complexity and time dependence of ICU data, deep learning--based methods are expected to provide a good foundation for developing risk prediction models based on large clinical records. Objective: This study aimed to implement a deep learning model that estimates the distribution of cardiac arrest risk probability over time based on clinical data and assesses its potential. Methods: A retrospective study of 759 ICU patients was conducted between January 2013 and July 2015. A character-level gated recurrent unit with a Weibull distribution algorithm was used to develop a real-time prediction model. Fivefold cross-validation testing (training set: 80\% and validation set: 20\%) determined the consistency of model accuracy. The time-dependent area under the curve (TAUC) was analyzed based on the aggregation of 5 validation sets. Results: The TAUCs of the implemented model were 0.963, 0.942, 0.917, 0.875, 0.850, 0.842, and 0.761 before cardiac arrest at 1, 8, 16, 24, 32, 40, and 48 hours, respectively. The sensitivity was between 0.846 and 0.909, and specificity was between 0.923 and 0.946. The distribution of risk between the cardiac arrest group and the non--cardiac arrest group was generally different, and the difference rapidly increased as the time left until cardiac arrest reduced. Conclusions: A deep learning model for forecasting cardiac arrest was implemented and tested by considering the cumulative and fluctuating effects of time-dependent clinical data gathered from a large medical center. This real-time prediction model is expected to improve patient's care by allowing early intervention in patients at high risk of unexpected cardiac arrests. ", doi="10.2196/16349", url="http://medinform.jmir.org/2020/3/e16349/", url="http://www.ncbi.nlm.nih.gov/pubmed/32186517" } @Article{info:doi/10.2196/14272, author="Morid, Amin Mohammad and Sheng, Liu Olivia R. and Del Fiol, Guilherme and Facelli, C. Julio and Bray, E. Bruce and Abdelrahman, Samir", title="Temporal Pattern Detection to Predict Adverse Events in Critical Care: Case Study With Acute Kidney Injury", journal="JMIR Med Inform", year="2020", month="Mar", day="17", volume="8", number="3", pages="e14272", keywords="acute kidney injury", keywords="adverse effects", keywords="supervised machine learning", keywords="automated pattern recognition", abstract="Background: More than 20\% of patients admitted to the intensive care unit (ICU) develop an adverse event (AE). No previous study has leveraged patients' data to extract the temporal features using their structural temporal patterns, that is, trends. Objective: This study aimed to improve AE prediction methods by using structural temporal pattern detection that captures global and local temporal trends and to demonstrate these improvements in the detection of acute kidney injury (AKI). Methods: Using the Medical Information Mart for Intensive Care dataset, containing 22,542 patients, we extracted both global and local trends using structural pattern detection methods to predict AKI (ie, binary prediction). Classifiers were built on 17 input features consisting of vital signs and laboratory test results using state-of-the-art models; the optimal classifier was selected for comparisons with previous approaches. The classifier with structural pattern detection features was compared with two baseline classifiers that used different temporal feature extraction approaches commonly used in the literature: (1) symbolic temporal pattern detection, which is the most common approach for multivariate time series classification; and (2) the last recorded value before the prediction point, which is the most common approach to extract temporal data in the AKI prediction literature. Moreover, we assessed the individual contribution of global and local trends. Classifier performance was measured in terms of accuracy (primary outcome), area under the curve, and F-measure. For all experiments, we employed 20-fold cross-validation. Results: Random forest was the best classifier using structural temporal pattern detection. The accuracy of the classifier with local and global trend features was significantly higher than that while using symbolic temporal pattern detection and the last recorded value (81.3\% vs 70.6\% vs 58.1\%; P<.001). Excluding local or global features reduced the accuracy to 74.4\% or 78.1\%, respectively (P<.001). Conclusions: Classifiers using features obtained from structural temporal pattern detection significantly improved the prediction of AKI onset in ICU patients over two baselines based on common previous approaches. The proposed method is a generalizable approach to predict AEs in critical care that may be used to help clinicians intervene in a timely manner to prevent or mitigate AEs. ", doi="10.2196/14272", url="http://medinform.jmir.org/2020/3/e14272/", url="http://www.ncbi.nlm.nih.gov/pubmed/32181753" } @Article{info:doi/10.2196/16073, author="Lester, A. Corey and Tu, Liyun and Ding, Yuting and Flynn, J. Allen", title="Detecting Potential Medication Selection Errors During Outpatient Pharmacy Processing of Electronic Prescriptions With the RxNorm Application Programming Interface: Retrospective Observational Cohort Study", journal="JMIR Med Inform", year="2020", month="Mar", day="11", volume="8", number="3", pages="e16073", keywords="patient safety", keywords="RxNorm", keywords="electronic prescription", keywords="pharmacy", keywords="pharmacists", keywords="automation", abstract="Background: Medication errors are pervasive. Electronic prescriptions (e-prescriptions) convey secure and computer-readable prescriptions from clinics to outpatient pharmacies for dispensing. Once received, pharmacy staff perform a transcription task to select the medications needed to process e-prescriptions within their dispensing software. Later, pharmacists manually double-check medications selected to fulfill e-prescriptions before dispensing to the patient. Although pharmacist double-checks are mostly effective for catching medication selection mistakes, the cognitive process of medication selection in the computer is still prone to error because of heavy workload, inattention, and fatigue. Leveraging health information technology to identify and recover from medication selection errors can improve patient safety. Objective: This study aimed to determine the performance of an automated double-check of pharmacy prescription records to identify potential medication selection errors made in outpatient pharmacies with the RxNorm application programming interface (API). Methods: We conducted a retrospective observational analysis of 537,710 pairs of e-prescription and dispensing records from a mail-order pharmacy for the period January 2017 to October 2018. National Drug Codes (NDCs) for each pair were obtained from the National Library of Medicine's (NLM's) RxNorm API. The API returned RxNorm concept unique identifier (RxCUI) semantic clinical drug (SCD) identifiers associated with every NDC. The SCD identifiers returned for the e-prescription NDC were matched against the corresponding SCD identifiers from the pharmacy dispensing record NDC. An error matrix was created based on the hand-labeling of mismatched SCD pairs. Performance metrics were calculated for the e-prescription-to-dispensing record matching algorithm for both total pairs and unique pairs of NDCs in these data. Results: We analyzed 527,881 e-prescription and pharmacy dispensing record pairs. Four clinically significant cases of mismatched RxCUI identifiers were detected (ie, three different ingredient selections and one different strength selection). A total of 546 less significant cases of mismatched RxCUIs were found. Nearly all of the NDC pairs had matching RxCUIs (28,787/28,817, 99.90\%-525,270/527,009, 99.67\%). The RxNorm API had a sensitivity of 1, a false-positive rate of 0.00104 to 0.00312, specificity of 0.99896 to 0.99688, precision of 0.00727 to 0.04255, and F1 score of 0.01444 to 0.08163. We found 872 pairs of records without an RxCUI. Conclusions: The NLM's RxNorm API can perform an independent and automatic double-check of correct medication selection to verify e-prescription processing at outpatient pharmacies. RxNorm has near-comprehensive coverage of prescribed medications and can be used to recover from medication selection errors. In the future, tools such as this may be able to perform automated verification of medication selection accurately enough to free pharmacists from having to perform manual double-checks of the medications selected within pharmacy dispensing software to fulfill e-prescriptions. ", doi="10.2196/16073", url="http://medinform.jmir.org/2020/3/e16073/", url="http://www.ncbi.nlm.nih.gov/pubmed/32044760" } @Article{info:doi/10.2196/16866, author="Wolff, Justus and Pauling, Josch and Keck, Andreas and Baumbach, Jan", title="The Economic Impact of Artificial Intelligence in Health Care: Systematic Review", journal="J Med Internet Res", year="2020", month="Feb", day="20", volume="22", number="2", pages="e16866", keywords="telemedicine", keywords="artificial intelligence", keywords="machine learning", keywords="cost-benefit analysis", abstract="Background: Positive economic impact is a key decision factor in making the case for or against investing in an artificial intelligence (AI) solution in the health care industry. It is most relevant for the care provider and insurer as well as for the pharmaceutical and medical technology sectors. Although the broad economic impact of digital health solutions in general has been assessed many times in literature and the benefit for patients and society has also been analyzed, the specific economic impact of AI in health care has been addressed only sporadically. Objective: This study aimed to systematically review and summarize the cost-effectiveness studies dedicated to AI in health care and to assess whether they meet the established quality criteria. Methods: In a first step, the quality criteria for economic impact studies were defined based on the established and adapted criteria schemes for cost impact assessments. In a second step, a systematic literature review based on qualitative and quantitative inclusion and exclusion criteria was conducted to identify relevant publications for an in-depth analysis of the economic impact assessment. In a final step, the quality of the identified economic impact studies was evaluated based on the defined quality criteria for cost-effectiveness studies. Results: Very few publications have thoroughly addressed the economic impact assessment, and the economic assessment quality of the reviewed publications on AI shows severe methodological deficits. Only 6 out of 66 publications could be included in the second step of the analysis based on the inclusion criteria. Out of these 6 studies, none comprised a methodologically complete cost impact analysis. There are two areas for improvement in future studies. First, the initial investment and operational costs for the AI infrastructure and service need to be included. Second, alternatives to achieve similar impact must be evaluated to provide a comprehensive comparison. Conclusions: This systematic literature analysis proved that the existing impact assessments show methodological deficits and that upcoming evaluations require more comprehensive economic analyses to enable economic decisions for or against implementing AI technology in health care. ", doi="10.2196/16866", url="http://www.jmir.org/2020/2/e16866/", url="http://www.ncbi.nlm.nih.gov/pubmed/32130134" } @Article{info:doi/10.2196/14122, author="Tian, Yulong and Liu, Xiaodong and Wang, Zixuan and Cao, Shougen and Liu, Zimin and Ji, Qinglian and Li, Zequn and Sun, Yuqi and Zhou, Xin and Wang, Daosheng and Zhou, Yanbing", title="Concordance Between Watson for Oncology and a Multidisciplinary Clinical Decision-Making Team for Gastric Cancer and the Prognostic Implications: Retrospective Study", journal="J Med Internet Res", year="2020", month="Feb", day="20", volume="22", number="2", pages="e14122", keywords="Watson for Oncology", keywords="artificial intelligence", keywords="gastric cancer", keywords="concordance", keywords="multidisciplinary team", abstract="Background: With the increasing number of cancer treatments, the emergence of multidisciplinary teams (MDTs) provides patients with personalized treatment options. In recent years, artificial intelligence (AI) has developed rapidly in the medical field. There has been a gradual tendency to replace traditional diagnosis and treatment with AI. IBM Watson for Oncology (WFO) has been proven to be useful for decision-making in breast cancer and lung cancer, but to date, research on gastric cancer is limited. Objective: This study compared the concordance of WFO with MDT and investigated the impact on patient prognosis. Methods: This study retrospectively analyzed eligible patients (N=235) with gastric cancer who were evaluated by an MDT, received corresponding recommended treatment, and underwent follow-up. Thereafter, physicians inputted the information of all patients into WFO manually, and the results were compared with the treatment programs recommended by the MDT. If the MDT treatment program was classified as ``recommended'' or ``considered'' by WFO, we considered the results concordant. All patients were divided into a concordant group and a nonconcordant group according to whether the WFO and MDT treatment programs were concordant. The prognoses of the two groups were analyzed. Results: The overall concordance of WFO and the MDT was 54.5\% (128/235) in this study. The subgroup analysis found that concordance was less likely in patients with human epidermal growth factor receptor 2 (HER2)-positive tumors than in patients with HER2-negative tumors (P=.02). Age, Eastern Cooperative Oncology Group performance status, differentiation type, and clinical stage were not found to affect concordance. Among all patients, the survival time was significantly better in concordant patients than in nonconcordant patients (P<.001). Multivariate analysis revealed that concordance was an independent prognostic factor of overall survival in patients with gastric cancer (hazard ratio 0.312 [95\% CI 0.187-0.521]). Conclusions: The treatment recommendations made by WFO and the MDT were mostly concordant in gastric cancer patients. If the WFO options are updated to include local treatment programs, the concordance will greatly improve. The HER2 status of patients with gastric cancer had a strong effect on the likelihood of concordance. Generally, survival was better in concordant patients than in nonconcordant patients. ", doi="10.2196/14122", url="http://www.jmir.org/2020/2/e14122/", url="http://www.ncbi.nlm.nih.gov/pubmed/32130123" } @Article{info:doi/10.2196/15510, author="Song, Xing and Waitman, R. Lemuel and Yu, SL Alan and Robbins, C. David and Hu, Yong and Liu, Mei", title="Longitudinal Risk Prediction of Chronic Kidney Disease in Diabetic Patients Using a Temporal-Enhanced Gradient Boosting Machine: Retrospective Cohort Study", journal="JMIR Med Inform", year="2020", month="Jan", day="31", volume="8", number="1", pages="e15510", keywords="diabetic kidney disease", keywords="diabetic nephropathy", keywords="chronic kidney disease", keywords="machine learning", abstract="Background: Artificial intelligence--enabled electronic health record (EHR) analysis can revolutionize medical practice from the diagnosis and prediction of complex diseases to making recommendations in patient care, especially for chronic conditions such as chronic kidney disease (CKD), which is one of the most frequent complications in patients with diabetes and is associated with substantial morbidity and mortality. Objective: The longitudinal prediction of health outcomes requires effective representation of temporal data in the EHR. In this study, we proposed a novel temporal-enhanced gradient boosting machine (GBM) model that dynamically updates and ensembles learners based on new events in patient timelines to improve the prediction accuracy of CKD among patients with diabetes. Methods: Using a broad spectrum of deidentified EHR data on a retrospective cohort of 14,039 adult patients with type 2 diabetes and GBM as the base learner, we validated our proposed Landmark-Boosting model against three state-of-the-art temporal models for rolling predictions of 1-year CKD risk. Results: The proposed model uniformly outperformed other models, achieving an area under receiver operating curve of 0.83 (95\% CI 0.76-0.85), 0.78 (95\% CI 0.75-0.82), and 0.82 (95\% CI 0.78-0.86) in predicting CKD risk with automatic accumulation of new data in later years (years 2, 3, and 4 since diabetes mellitus onset, respectively). The Landmark-Boosting model also maintained the best calibration across moderate- and high-risk groups and over time. The experimental results demonstrated that the proposed temporal model can not only accurately predict 1-year CKD risk but also improve performance over time with additionally accumulated data, which is essential for clinical use to improve renal management of patients with diabetes. Conclusions: Incorporation of temporal information in EHR data can significantly improve predictive model performance and will particularly benefit patients who follow-up with their physicians as recommended. ", doi="10.2196/15510", url="http://medinform.jmir.org/2020/1/e15510/", url="http://www.ncbi.nlm.nih.gov/pubmed/32012067" } @Article{info:doi/10.2196/16080, author="Luo, Gang and He, Shan and Stone, L. Bryan and Nkoy, L. Flory and Johnson, D. Michael", title="Developing a Model to Predict Hospital Encounters for Asthma in Asthmatic Patients: Secondary Analysis", journal="JMIR Med Inform", year="2020", month="Jan", day="21", volume="8", number="1", pages="e16080", abstract="Background: As a major chronic disease, asthma causes many emergency department (ED) visits and hospitalizations each year. Predictive modeling is a key technology to prospectively identify high-risk asthmatic patients and enroll them in care management for preventive care to reduce future hospital encounters, including inpatient stays and ED visits. However, existing models for predicting hospital encounters in asthmatic patients are inaccurate. Usually, they miss over half of the patients who will incur future hospital encounters and incorrectly classify many others who will not. This makes it difficult to match the limited resources of care management to the patients who will incur future hospital encounters, increasing health care costs and degrading patient outcomes. Objective: The goal of this study was to develop a more accurate model for predicting hospital encounters in asthmatic patients. Methods: Secondary analysis of 334,564 data instances from Intermountain Healthcare from 2005 to 2018 was conducted to build a machine learning classification model to predict the hospital encounters for asthma in the following year in asthmatic patients. The patient cohort included all asthmatic patients who resided in Utah or Idaho and visited Intermountain Healthcare facilities during 2005 to 2018. A total of 235 candidate features were considered for model building. Results: The model achieved an area under the receiver operating characteristic curve of 0.859 (95\% CI 0.846-0.871). When the cutoff threshold for conducting binary classification was set at the top 10.00\% (1926/19,256) of asthmatic patients with the highest predicted risk, the model reached an accuracy of 90.31\% (17,391/19,256; 95\% CI 89.86-90.70), a sensitivity of 53.7\% (436/812; 95\% CI 50.12-57.18), and a specificity of 91.93\% (16,955/18,444; 95\% CI 91.54-92.31). To steer future research on this topic, we pinpointed several potential improvements to our model. Conclusions: Our model improves the state of the art for predicting hospital encounters for asthma in asthmatic patients. After further refinement, the model could be integrated into a decision support tool to guide asthma care management allocation. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.5039 ", doi="10.2196/16080", url="http://medinform.jmir.org/2020/1/e16080/", url="http://www.ncbi.nlm.nih.gov/pubmed/31961332" } @Article{info:doi/10.2196/16912, author="Tao, Liyuan and Zhang, Chen and Zeng, Lin and Zhu, Shengrong and Li, Nan and Li, Wei and Zhang, Hua and Zhao, Yiming and Zhan, Siyan and Ji, Hong", title="Accuracy and Effects of Clinical Decision Support Systems Integrated With BMJ Best Practice--Aided Diagnosis: Interrupted Time Series?Study", journal="JMIR Med Inform", year="2020", month="Jan", day="20", volume="8", number="1", pages="e16912", keywords="BMJ Best Practice", keywords="artificial intelligence", keywords="clinical decision support systems", keywords="aided diagnosis", keywords="accuracy and effect", abstract="Background: Clinical decision support systems (CDSS) are an integral component of health information technologies and can assist disease interpretation, diagnosis, treatment, and prognosis. However, the utility of CDSS in the clinic remains controversial. Objective: The aim is to assess the effects of CDSS integrated with British Medical Journal (BMJ) Best Practice--aided diagnosis in real-world research. Methods: This was a retrospective, longitudinal observational study using routinely collected clinical diagnosis data from electronic medical records. A total of 34,113 hospitalized patient records were successively selected from December 2016 to February 2019 in six clinical departments. The diagnostic accuracy of the CDSS was verified before its implementation. A self-controlled comparison was then applied to detect the effects of CDSS implementation. Multivariable logistic regression and single-group interrupted time series analysis were used to explore the effects of CDSS. The sensitivity analysis was conducted using the subgroup data from January 2018 to February 2019. Results: The total accuracy rates of the recommended diagnosis from CDSS were 75.46\% in the first-rank diagnosis, 83.94\% in the top-2 diagnosis, and 87.53\% in the top-3 diagnosis in the data before CDSS implementation. Higher consistency was observed between admission and discharge diagnoses, shorter confirmed diagnosis times, and shorter hospitalization days after the CDSS implementation (all P<.001). Multivariable logistic regression analysis showed that the consistency rates after CDSS implementation (OR 1.078, 95\% CI 1.015-1.144) and the proportion of hospitalization time 7 days or less (OR 1.688, 95\% CI 1.592-1.789) both increased. The interrupted time series analysis showed that the consistency rates significantly increased by 6.722\% (95\% CI 2.433\%-11.012\%, P=.002) after CDSS implementation. The proportion of hospitalization time 7 days or less significantly increased by 7.837\% (95\% CI 1.798\%-13.876\%, P=.01). Similar results were obtained in the subgroup analysis. Conclusions: The CDSS integrated with BMJ Best Practice improved the accuracy of clinicians' diagnoses. Shorter confirmed diagnosis times and hospitalization days were also found to be associated with CDSS implementation in retrospective real-world studies. These findings highlight the utility of artificial intelligence-based CDSS to improve diagnosis efficiency, but these results require confirmation in future randomized controlled trials. ", doi="10.2196/16912", url="http://medinform.jmir.org/2020/1/e16912/", url="http://www.ncbi.nlm.nih.gov/pubmed/31958069" } @Article{info:doi/10.2196/15415, author="Lenaerts, Gerlinde and Bekkering, E. Geertruida and Goossens, Martine and De Coninck, Leen and Delvaux, Nicolas and Cordyn, Sam and Adriaenssens, Jef and Vankrunkelsven, Patrick", title="Tools to Assess the Trustworthiness of Evidence-Based Point-of-Care Information for Health Care Professionals: Systematic Review", journal="J Med Internet Res", year="2020", month="Jan", day="17", volume="22", number="1", pages="e15415", keywords="evidence-based medicine", keywords="evidence-based practice", keywords="point-of-care systems", keywords="health care quality", keywords="internet information", keywords="information science", keywords="systematic review", abstract="Background: User-friendly information at the point of care should be well structured, rapidly accessible, and comprehensive. Also, this information should be trustworthy, as it will be used by health care practitioners to practice evidence-based medicine. Therefore, a standard, validated tool to evaluate the trustworthiness of such point-of-care information resources is needed. Objective: This systematic review sought to search for tools to assess the trustworthiness of point-of-care resources and to describe and analyze the content of these tools. Methods: A systematic search was performed on three sources: (1) we searched online for initiatives that worked off of the trustworthiness of medical information; (2) we searched Medline (PubMed) until June 2019 for relevant literature; and (3) we scanned reference lists and lists of citing papers via Web of Science for each retrieved paper. We included all studies, reports, websites, or methodologies that reported on tools that assessed the trustworthiness of medical information for professionals. From the selected studies, we extracted information on the general characteristics of the tools. As no standard, risk-of-bias assessment instruments are available for these types of studies, we described how each tool was developed, including any assessments on reliability and validity. We analyzed the criteria used in the different tools and divided them into five categories: (1) author-related information; (2) evidence-based methodology; (3) website quality; (4) website design and usability; and (5) website interactivity. The percentage of tools in compliance with these categories and the different criteria were calculated. Results: Included in this review was a total of 17 tools, all published between 1997 and 2018. The tools were developed for different purposes, from a general quality assessment of medical information to very detailed analyses, all specifically for point-of-care resources. However, the development process of the tools was poorly described. Overall, seven tools had a scoring system implemented, two were assessed for reliability only, and two other tools were assessed for both validity and reliability. The content analysis showed that all the tools assessed criteria related to an evidence-based methodology: 82\% of the tools assessed author-related information, 71\% assessed criteria related to website quality, 71\% assessed criteria related to website design and usability, and 47\% of the tools assessed criteria related to website interactivity. There was significant variability in criteria used, as some were very detailed while others were more broadly defined. Conclusions: The 17 included tools encompass a variety of items important for the assessment of the trustworthiness of point-of-care information. Overall, two tools were assessed for both reliability and validity, but they lacked some essential criteria for the assessment of the trustworthiness of medical information for use at the point-of-care. Currently, a standard, validated tool does not exist. The results of this review may contribute to the development of such an instrument, which may enhance the quality of point-of-care information in the long term. Trial Registration: PROSPERO CRD42019122565; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=122565 ", doi="10.2196/15415", url="https://www.jmir.org/2020/1/e15415", url="http://www.ncbi.nlm.nih.gov/pubmed/31951213" } @Article{info:doi/10.2196/15166, author="Joshi, Meera and Ashrafian, Hutan and Arora, Sonal and Khan, Sadia and Cooke, Graham and Darzi, Ara", title="Digital Alerting and Outcomes in Patients With Sepsis: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2019", month="Dec", day="20", volume="21", number="12", pages="e15166", keywords="diagnosis", keywords="electronic health records, sepsis", keywords="medical order entry systems, outcome assessment (health care)", abstract="Background: The diagnosis and management of sepsis remain a global health care challenge. Digital technologies have the potential to improve sepsis care. Objective: The aim of this paper was to systematically review the evidence on the impact of digital alerting systems on sepsis related outcomes. Methods: The following databases were searched for studies published from April 1964 to February 12, 2019, with no language restriction: EMBASE, MEDLINE, HMIC, PsycINFO, and Cochrane. All full-text reports of studies identified as potentially eligible after title and abstract reviews were obtained for further review. The search was limited to adult inpatients. Relevant articles were hand searched for other studies. Only studies with clear pre- and postalerting phases were included. Primary outcomes were hospital length of stay (LOS) and intensive care LOS, whereas secondary outcomes were time to antibiotics and mortality. Studies based solely on intensive care, case reports, narrative reviews, editorials, and commentaries were excluded. All other trial designs were included. A qualitative assessment and meta-analysis were performed. Results: This review identified 72 full-text articles. From these, 16 studies met the inclusion criteria and were included in the final analysis. Of these, 8 studies reviewed hospital LOS, 12 reviewed mortality outcomes, 5 studies explored time to antibiotics, and 5 studies investigated intensive care unit (ICU) LOS. Both quantitative and qualitative assessments of the studies were performed. There was evidence of a significant benefit of digital alerting in hospital LOS, which reduced by 1.31 days (P=.014), and ICU LOS, which reduced by 0.766 days (P=.007). There was no significant association between digital alerts and mortality (mean decrease 11.4\%; P=.77) or time to antibiotics (mean decrease 126 min; P=.13). Conclusions: This review highlights that digital alerts can considerably reduce hospital and ICU stay for patients with sepsis. Further studies including randomized controlled trials are necessary to confirm these findings and identify the choice of alerting system according to the patient status and pathological cohort. ", doi="10.2196/15166", url="http://www.jmir.org/2019/12/e15166/", url="http://www.ncbi.nlm.nih.gov/pubmed/31859672" } @Article{info:doi/10.2196/16291, author="Dallora, Luiza Ana and Berglund, Sanmartin Johan and Brogren, Martin and Kvist, Ola and Diaz Ruiz, Sandra and D{\"u}bbel, Andr{\'e} and Anderberg, Peter", title="Age Assessment of Youth and Young Adults Using Magnetic Resonance Imaging of the Knee: A Deep Learning Approach", journal="JMIR Med Inform", year="2019", month="Dec", day="5", volume="7", number="4", pages="e16291", keywords="age assessment", keywords="bone age", keywords="skeletal maturity", keywords="deep learning", keywords="convolutional neural networks", keywords="transfer learning", keywords="machine learning", keywords="magnetic resonance imaging", keywords="medical imaging", keywords="knee", abstract="Background: Bone age assessment (BAA) is an important tool for diagnosis and in determining the time of treatment in a number of pediatric clinical scenarios, as well as in legal settings where it is used to estimate the chronological age of an individual where valid documents are lacking. Traditional methods for BAA suffer from drawbacks, such as exposing juveniles to radiation, intra- and interrater variability, and the time spent on the assessment. The employment of automated methods such as deep learning and the use of magnetic resonance imaging (MRI) can address these drawbacks and improve the assessment of age. Objective: The aim of this paper is to propose an automated approach for age assessment of youth and young adults in the age range when the length growth ceases and growth zones are closed (14-21 years of age) by employing deep learning using MRI of the knee. Methods: This study carried out MRI examinations of the knee of 402 volunteer subjects---221 males (55.0\%) and 181 (45.0\%) females---aged 14-21 years. The method comprised two convolutional neural network (CNN) models: the first one selected the most informative images of an MRI sequence, concerning age-assessment purposes; these were then used in the second module, which was responsible for the age estimation. Different CNN architectures were tested, both training from scratch and employing transfer learning. Results: The CNN architecture that provided the best results was GoogLeNet pretrained on the ImageNet database. The proposed method was able to assess the age of male subjects in the range of 14-20.5 years, with a mean absolute error (MAE) of 0.793 years, and of female subjects in the range of 14-19.5 years, with an MAE of 0.988 years. Regarding the classification of minors---with the threshold of 18 years of age---an accuracy of 98.1\% for male subjects and 95.0\% for female subjects was achieved. Conclusions: The proposed method was able to assess the age of youth and young adults from 14 to 20.5 years of age for male subjects and 14 to 19.5 years of age for female subjects in a fully automated manner, without the use of ionizing radiation, addressing the drawbacks of traditional methods. ", doi="10.2196/16291", url="http://medinform.jmir.org/2019/4/e16291/", url="http://www.ncbi.nlm.nih.gov/pubmed/31804183" } @Article{info:doi/10.2196/16047, author="Roosan, Don and Law, V. Anandi and Karim, Mazharul and Roosan, Moom", title="Improving Team-Based Decision Making Using Data Analytics and Informatics: Protocol for a Collaborative Decision Support Design", journal="JMIR Res Protoc", year="2019", month="Nov", day="27", volume="8", number="11", pages="e16047", keywords="informatics", keywords="health care team", keywords="data science", keywords="decision support techniques", keywords="decision-making, computer-assisted", keywords="data display", keywords="diagnosis, computer-assisted", abstract="Background: According to the September 2015 Institute of Medicine report, Improving Diagnosis in Health Care, each of us is likely to experience one diagnostic error in our lifetime, often with devastating consequences. Traditionally, diagnostic decision making has been the sole responsibility of an individual clinician. However, diagnosis involves an interaction among interprofessional team members with different training, skills, cultures, knowledge, and backgrounds. Moreover, diagnostic error is prevalent in the interruption-prone environment, such as the emergency department, where the loss of information may hinder a correct diagnosis. Objective: The overall purpose of this protocol is to improve team-based diagnostic decision making by focusing on data analytics and informatics tools that improve collective information management. Methods: To achieve this goal, we will identify the factors contributing to failures in team-based diagnostic decision making (aim 1), understand the barriers of using current health information technology tools for team collaboration (aim 2), and develop and evaluate a collaborative decision-making prototype that can improve team-based diagnostic decision making (aim 3). Results: Between 2019 to 2020, we are collecting data for this study. The results are anticipated to be published between 2020 and 2021. Conclusions: The results from this study can shed light on improving diagnostic decision making by incorporating diagnostics rationale from team members. We believe a positive direction to move forward in solving diagnostic errors is by incorporating all team members, and using informatics. International Registered Report Identifier (IRRID): DERR1-10.2196/16047 ", doi="10.2196/16047", url="http://www.researchprotocols.org/2019/11/e16047/", url="http://www.ncbi.nlm.nih.gov/pubmed/31774412" } @Article{info:doi/10.2196/16295, author="Mesk{\'o}, Bertalan", title="The Real Era of the Art of Medicine Begins with Artificial Intelligence", journal="J Med Internet Res", year="2019", month="Nov", day="18", volume="21", number="11", pages="e16295", keywords="future", keywords="artificial intelligence", keywords="digital health", keywords="technology", keywords="art of medicine", doi="10.2196/16295", url="http://www.jmir.org/2019/11/e16295/", url="http://www.ncbi.nlm.nih.gov/pubmed/31738169" } @Article{info:doi/10.2196/15385, author="Monteiro, Lu{\'i}s and Maricoto, Tiago and Solha, Isabel and Ribeiro-Vaz, In{\^e}s and Martins, Carlos and Monteiro-Soares, Matilde", title="Reducing Potentially Inappropriate Prescriptions for Older Patients Using Computerized Decision Support Tools: Systematic Review", journal="J Med Internet Res", year="2019", month="Nov", day="14", volume="21", number="11", pages="e15385", keywords="deprescriptions", keywords="medical informatics applications", keywords="potentially inappropriate prescription", keywords="potentially inappropriate medication", keywords="computerized decision support", abstract="Background: Older adults are more vulnerable to polypharmacy and prescriptions of potentially inappropriate medications. There are several ways to address polypharmacy to prevent its occurrence. We focused on computerized decision support tools. Objective: The available literature was reviewed to understand whether computerized decision support tools reduce potentially inappropriate prescriptions or potentially inappropriate medications in older adult patients and affect health outcomes. Methods: Our systematic review was conducted by searching the literature in the MEDLINE, CENTRAL, EMBASE, and Web of Science databases for interventional studies published through February 2018 to assess the impact of computerized decision support tools on potentially inappropriate medications and potentially inappropriate prescriptions in people aged 65 years and older. Results: A total of 3756 articles were identified, and 16 were included. More than half (n=10) of the studies were randomized controlled trials, one was a crossover study, and five were pre-post intervention studies. A total of 266,562 participants were included; of those, 233,144 participants were included and assessed in randomized controlled trials. Intervention designs had several different features. Computerized decision support tools consistently reduced the number of potentially inappropriate prescriptions started and mean number of potentially inappropriate prescriptions per patient. Computerized decision support tools also increased potentially inappropriate prescriptions discontinuation and drug appropriateness. However, in several studies, statistical significance was not achieved. A meta-analysis was not possible due to the significant heterogeneity among the systems used and the definitions of outcomes. Conclusions: Computerized decision support tools may reduce potentially inappropriate prescriptions and potentially inappropriate medications. More randomized controlled trials assessing the impact of computerized decision support tools that could be used both in primary and secondary health care are needed to evaluate the use of medication targets defined by the Beers or STOPP (Screening Tool of Older People's Prescriptions) criteria, adverse drug reactions, quality of life measurements, patient satisfaction, and professional satisfaction with a reasonable follow-up, which could clarify the clinical usefulness of these tools. Trial Registration: International Prospective Register of Systematic Reviews (PROSPERO) CRD42017067021; https://www.crd.york.ac.uk/prospero/display\_record.php?ID=CRD42017067021 ", doi="10.2196/15385", url="https://www.jmir.org/2019/11/e15385", url="http://www.ncbi.nlm.nih.gov/pubmed/31724956" } @Article{info:doi/10.2196/14044, author="Schwitzguebel, Jean-Pierre Adrien and Jeckelmann, Clarisse and Gavinio, Roberto and Levallois, C{\'e}cile and Bena{\"i}m, Charles and Spechbach, Herv{\'e}", title="Differential Diagnosis Assessment in Ambulatory Care With an Automated Medical History--Taking Device: Pilot Randomized Controlled Trial", journal="JMIR Med Inform", year="2019", month="Nov", day="4", volume="7", number="4", pages="e14044", keywords="differential diagnosis", keywords="decision making", keywords="computer-assisted", keywords="hospital outpatient clinics", keywords="general practitioners", keywords="clinical applications software", keywords="patient engagement", abstract="Background: Automated medical history--taking devices (AMHTDs) are emerging tools with the potential to increase the quality of medical consultations by providing physicians with an exhaustive, high-quality, standardized anamnesis and differential diagnosis. Objective: This study aimed to assess the effectiveness of an AMHTD to obtain an accurate differential diagnosis in an outpatient service. Methods: We conducted a pilot randomized controlled trial involving 59 patients presenting to an emergency outpatient unit and suffering from various conditions affecting the limbs, the back, and the chest wall. Resident physicians were randomized into 2 groups, one assisted by the AMHTD and one without access to the device. For each patient, physicians were asked to establish an exhaustive differential diagnosis based on the anamnesis and clinical examination. In the intervention group, residents read the AMHTD report before performing the anamnesis. In both the groups, a senior physician had to establish a differential diagnosis, considered as the gold standard, independent of the resident's opinion and AMHTD report. Results: A total of 29 patients were included in the intervention group and 30 in the control group. Differential diagnosis accuracy was higher in the intervention group (mean 75\%, SD 26\%) than in the control group (mean 59\%, SD 31\%; P=.01). Subgroup analysis showed a between-group difference of 3\% (83\% [17/21]-80\% [14/17]) for low complexity cases (1-2 differential diagnoses possible) in favor of the AMHTD (P=.76), 31\% (87\% [13/15]-56\% [18/33]) for intermediate complexity (3 differential diagnoses; P=.02), and 24\% (63\% [34/54]-39\% [14/35]) for high complexity (4-5 differential diagnoses; P=.08). Physicians in the intervention group (mean 4.3, SD 2) had more years of clinical practice compared with the control group (mean 5.5, SD 2; P=.03). Differential diagnosis accuracy was negatively correlated to case complexity (r=0.41; P=.001) and the residents' years of practice (r=0.04; P=.72). The AMHTD was able to determine 73\% (SD 30\%) of correct differential diagnoses. Patient satisfaction was good (4.3/5), and 26 of 29 patients (90\%) considered that they were able to accurately describe their symptomatology. In 8 of 29 cases (28\%), residents considered that the AMHTD helped to establish the differential diagnosis. Conclusions: The AMHTD allowed physicians to make more accurate differential diagnoses, particularly in complex cases. This could be explained not only by the ability of the AMHTD to make the right diagnoses, but also by the exhaustive anamnesis provided. ", doi="10.2196/14044", url="http://medinform.jmir.org/2019/4/e14044/", url="http://www.ncbi.nlm.nih.gov/pubmed/31682590" } @Article{info:doi/10.2196/15511, author="Tran, Xuan Bach and Nghiem, Son and Sahin, Oz and Vu, Manh Tuan and Ha, Hai Giang and Vu, Thu Giang and Pham, Quang Hai and Do, Thi Hoa and Latkin, A. Carl and Tam, Wilson and Ho, H. Cyrus S. and Ho, M. Roger C.", title="Modeling Research Topics for Artificial Intelligence Applications in Medicine: Latent Dirichlet Allocation Application Study", journal="J Med Internet Res", year="2019", month="Nov", day="1", volume="21", number="11", pages="e15511", keywords="artificial intelligence", keywords="applications", keywords="medicine", keywords="scientometric", keywords="bibliometric", keywords="latent Dirichlet allocation", abstract="Background: Artificial intelligence (AI)--based technologies develop rapidly and have myriad applications in medicine and health care. However, there is a lack of comprehensive reporting on the productivity, workflow, topics, and research landscape of AI in this field. Objective: This study aimed to evaluate the global development of scientific publications and constructed interdisciplinary research topics on the theory and practice of AI in medicine from 1977 to 2018. Methods: We obtained bibliographic data and abstract contents of publications published between 1977 and 2018 from the Web of Science database. A total of 27,451 eligible articles were analyzed. Research topics were classified by latent Dirichlet allocation, and principal component analysis was used to identify the construct of the research landscape. Results: The applications of AI have mainly impacted clinical settings (enhanced prognosis and diagnosis, robot-assisted surgery, and rehabilitation), data science and precision medicine (collecting individual data for precision medicine), and policy making (raising ethical and legal issues, especially regarding privacy and confidentiality of data). However, AI applications have not been commonly used in resource-poor settings due to the limit in infrastructure and human resources. Conclusions: The application of AI in medicine has grown rapidly and focuses on three leading platforms: clinical practices, clinical material, and policies. AI might be one of the methods to narrow down the inequality in health care and medicine between developing and developed countries. Technology transfer and support from developed countries are essential measures for the advancement of AI application in health care in developing countries. ", doi="10.2196/15511", url="https://www.jmir.org/2019/11/e15511", url="http://www.ncbi.nlm.nih.gov/pubmed/31682577" } @Article{info:doi/10.2196/15794, author="Chartash, David and Paek, Hyung and Dziura, D. James and Ross, K. Bill and Nogee, P. Daniel and Boccio, Eric and Hines, Cory and Schott, M. Aaron and Jeffery, M. Molly and Patel, D. Mehul and Platts-Mills, F. Timothy and Ahmed, Osama and Brandt, Cynthia and Couturier, Katherine and Melnick, Edward", title="Identifying Opioid Use Disorder in the Emergency Department: Multi-System Electronic Health Record--Based Computable Phenotype Derivation and Validation Study", journal="JMIR Med Inform", year="2019", month="Oct", day="31", volume="7", number="4", pages="e15794", keywords="electronic health records", keywords="emergency medicine", keywords="algorithms", keywords="phenotype", keywords="opioid-related disorders", abstract="Background: Deploying accurate computable phenotypes in pragmatic trials requires a trade-off between precise and clinically sensical variable selection. In particular, evaluating the medical encounter to assess a pattern leading to clinically significant impairment or distress indicative of disease is a difficult modeling challenge for the emergency department. Objective: This study aimed to derive and validate an electronic health record--based computable phenotype to identify emergency department patients with opioid use disorder using physician chart review as a reference standard. Methods: A two-algorithm computable phenotype was developed and evaluated using structured clinical data across 13 emergency departments in two large health care systems. Algorithm 1 combined clinician and billing codes. Algorithm 2 used chief complaint structured data suggestive of opioid use disorder. To evaluate the algorithms in both internal and external validation phases, two emergency medicine physicians, with a third acting as adjudicator, reviewed a pragmatic sample of 231 charts: 125 internal validation (75 positive and 50 negative), 106 external validation (56 positive and 50 negative). Results: Cohen kappa, measuring agreement between reviewers, for the internal and external validation cohorts was 0.95 and 0.93, respectively. In the internal validation phase, Algorithm 1 had a positive predictive value (PPV) of 0.96 (95\% CI 0.863-0.995) and a negative predictive value (NPV) of 0.98 (95\% CI 0.893-0.999), and Algorithm 2 had a PPV of 0.8 (95\% CI 0.593-0.932) and an NPV of 1.0 (one-sided 97.5\% CI 0.863-1). In the external validation phase, the phenotype had a PPV of 0.95 (95\% CI 0.851-0.989) and an NPV of 0.92 (95\% CI 0.807-0.978). Conclusions: This phenotype detected emergency department patients with opioid use disorder with high predictive values and reliability. Its algorithms were transportable across health care systems and have potential value for both clinical and research purposes. ", doi="10.2196/15794", url="http://medinform.jmir.org/2019/4/e15794/", url="http://www.ncbi.nlm.nih.gov/pubmed/31674913" } @Article{info:doi/10.2196/15358, author="Kaufmann, Thomas and Castela Forte, Jos{\'e} and Hiemstra, Bart and Wiering, A. Marco and Grzegorczyk, Marco and Epema, H. Anne and van der Horst, C. Iwan C. and ", title="A Bayesian Network Analysis of the Diagnostic Process and its Accuracy to Determine How Clinicians Estimate Cardiac Function in Critically Ill Patients: Prospective Observational Cohort Study", journal="JMIR Med Inform", year="2019", month="Oct", day="30", volume="7", number="4", pages="e15358", keywords="cardiac function", keywords="physical examination", keywords="Bayesian network", keywords="critical care", keywords="ICU", keywords="medical education", keywords="educated guess", keywords="cognition", keywords="clinical decision-support", keywords="cardiology", abstract="Background: Hemodynamic assessment of critically ill patients is a challenging endeavor, and advanced monitoring techniques are often required to guide treatment choices. Given the technical complexity and occasional unavailability of these techniques, estimation of cardiac function based on clinical examination is valuable for critical care physicians to diagnose circulatory shock. Yet, the lack of knowledge on how to best conduct and teach the clinical examination to estimate cardiac function has reduced its accuracy to almost that of ``flipping a coin.'' Objective: The aim of this study was to investigate the decision-making process underlying estimates of cardiac function of patients acutely admitted to the intensive care unit (ICU) based on current standardized clinical examination using Bayesian methods. Methods: Patient data were collected as part of the Simple Intensive Care Studies-I (SICS-I) prospective cohort study. All adult patients consecutively admitted to the ICU with an expected stay longer than 24 hours were included, for whom clinical examination was conducted and cardiac function was estimated. Using these data, first, the probabilistic dependencies between the examiners' estimates and the set of clinically measured variables upon which these rely were analyzed using a Bayesian network. Second, the accuracy of cardiac function estimates was assessed by comparison to the cardiac index values measured by critical care ultrasonography. Results: A total of 1075 patients were included, of which 783 patients had validated cardiac index measurements. A Bayesian network analysis identified two clinical variables upon which cardiac function estimate is conditionally dependent, namely, noradrenaline administration and presence of delayed capillary refill time or mottling. When the patient received noradrenaline, the probability of cardiac function being estimated as reasonable or good P(ER,G) was lower, irrespective of whether the patient was mechanically ventilated (P[ER,G|ventilation, noradrenaline]=0.63, P[ER,G|ventilation, no noradrenaline]=0.91, P[ER,G|no ventilation, noradrenaline]=0.67, P[ER,G|no ventilation, no noradrenaline]=0.93). The same trend was found for capillary refill time or mottling. Sensitivity of estimating a low cardiac index was 26\% and 39\% and specificity was 83\% and 74\% for students and physicians, respectively. Positive and negative likelihood ratios were 1.53 (95\% CI 1.19-1.97) and 0.87 (95\% CI 0.80-0.95), respectively, overall. Conclusions: The conditional dependencies between clinical variables and the cardiac function estimates resulted in a network consistent with known physiological relations. Conditional probability queries allow for multiple clinical scenarios to be recreated, which provide insight into the possible thought process underlying the examiners' cardiac function estimates. This information can help develop interactive digital training tools for students and physicians and contribute toward the goal of further improving the diagnostic accuracy of clinical examination in ICU patients. Trial Registration: ClinicalTrials.gov NCT02912624; https://clinicaltrials.gov/ct2/show/NCT02912624 ", doi="10.2196/15358", url="http://medinform.jmir.org/2019/4/e15358/", url="http://www.ncbi.nlm.nih.gov/pubmed/31670697" } @Article{info:doi/10.2196/13085, author="Zhou, Mi and Chen, Chuan and Peng, Junfeng and Luo, Ching-Hsing and Feng, Yun Ding and Yang, Hailing and Xie, Xiaohua and Zhou, Yuqi", title="Fast Prediction of Deterioration and Death Risk in Patients With Acute Exacerbation of Chronic Obstructive Pulmonary Disease Using Vital Signs and Admission History: Retrospective Cohort Study", journal="JMIR Med Inform", year="2019", month="Oct", day="21", volume="7", number="4", pages="e13085", keywords="chronic obstructive pulmonary disease", keywords="clinical decision support systems", keywords="health risk assessment", abstract="Background: Chronic obstructive pulmonary disease (COPD) has 2 courses with different options for medical treatment: the acute exacerbation phase and the stable phase. Stable patients can use the Global Initiative for Chronic Obstructive Lung Disease (GOLD) to guide treatment strategies. However, GOLD could not classify and guide the treatment of acute exacerbation as acute exacerbation of COPD (AECOPD) is a complex process. Objective: This paper aimed to propose a fast severity assessment and risk prediction approach in order to strengthen monitoring and medical interventions in advance. Methods: The proposed method uses a classification and regression tree (CART) and had been validated using the AECOPD inpatient's medical history and first measured vital signs at admission that can be collected within minutes. We identified 552 inpatients with AECOPD from February 2011 to June 2018 retrospectively and used the classifier to predict the outcome and prognosis of this hospitalization. Results: The overall accuracy of the proposed CART classifier was 76.2\% (83/109 participants) with 95\% CI 0.67-0.84. The precision, recall, and F-measure for the mild AECOPD were 76\% (50/65 participants), 82\% (50/61 participants), and 0.79, respectively, and those with severe AECOPD were 75\% (33/44 participants), 68\% (33/48 participants), and 0.72, respectively. Conclusions: This fast prediction CART classifier for early exacerbation detection could trigger the initiation of timely treatment, thereby potentially reducing exacerbation severity and recovery time and improving the patients' health. ", doi="10.2196/13085", url="http://medinform.jmir.org/2019/4/e13085/", url="http://www.ncbi.nlm.nih.gov/pubmed/31638595" } @Article{info:doi/10.2196/14149, author="Kim, Heejung and Lee, SungHee and Lee, SangEun and Hong, Soyun and Kang, HeeJae and Kim, Namhee", title="Depression Prediction by Using Ecological Momentary Assessment, Actiwatch Data, and Machine Learning: Observational Study on Older Adults Living Alone", journal="JMIR Mhealth Uhealth", year="2019", month="Oct", day="16", volume="7", number="10", pages="e14149", keywords="elderly", keywords="one-person household", keywords="depression", keywords="ecological momentary assessment", keywords="actigraphy", keywords="machine learning", abstract="Background: Although geriatric depression is prevalent, diagnosis using self-reporting instruments has limitations when measuring the depressed mood of older adults in a community setting. Ecological momentary assessment (EMA) by using wearable devices could be used to collect data to classify older adults into depression groups. Objective: The objective of this study was to develop a machine learning algorithm to predict the classification of depression groups among older adults living alone. We focused on utilizing diverse data collected through a survey, an Actiwatch, and an EMA report related to depression. Methods: The prediction model using machine learning was developed in 4 steps: (1) data collection, (2) data processing and representation, (3) data modeling (feature engineering and selection), and (4) training and validation to test the prediction model. Older adults (N=47), living alone in community settings, completed an EMA to report depressed moods 4 times a day for 2 weeks between May 2017 and January 2018. Participants wore an Actiwatch that measured their activity and ambient light exposure every 30 seconds for 2 weeks. At baseline and the end of the 2-week observation, depressive symptoms were assessed using the Korean versions of the Short Geriatric Depression Scale (SGDS-K) and the Hamilton Depression Rating Scale (K-HDRS). Conventional classification based on binary logistic regression was built and compared with 4 machine learning models (the logit, decision tree, boosted trees, and random forest models). Results: On the basis of the SGDS-K and K-HDRS, 38\% (18/47) of the participants were classified into the probable depression group. They reported significantly lower scores of normal mood and physical activity and higher levels of white and red, green, and blue (RGB) light exposures at different degrees of various 4-hour time frames (all P<.05). Sleep efficiency was chosen for modeling through feature selection. Comparing diverse combinations of the selected variables, daily mean EMA score, daily mean activity level, white and RGB light at 4:00 pm to 8:00 pm exposure, and daily sleep efficiency were selected for modeling. Conventional classification based on binary logistic regression had a good model fit (accuracy: 0.705; precision: 0.770; specificity: 0.859; and area under receiver operating characteristic curve or AUC: 0.754). Among the 4 machine learning models, the logit model had the best fit compared with the others (accuracy: 0.910; precision: 0.929; specificity: 0.940; and AUC: 0.960). Conclusions: This study provides preliminary evidence for developing a machine learning program to predict the classification of depression groups in older adults living alone. Clinicians should consider using this method to identify underdiagnosed subgroups and monitor daily progression regarding treatment or therapeutic intervention in the community setting. Furthermore, more efforts are needed for researchers and clinicians to diversify data collection methods by using a survey, EMA, and a sensor. ", doi="10.2196/14149", url="http://mhealth.jmir.org/2019/10/e14149/", url="http://www.ncbi.nlm.nih.gov/pubmed/31621642" } @Article{info:doi/10.2196/14141, author="Guenter, Dale and Abouzahra, Mohamed and Schabort, Inge and Radhakrishnan, Arun and Nair, Kalpana and Orr, Sherrie and Langevin, Jessica and Taenzer, Paul and Moulin, E. Dwight", title="Design Process and Utilization of a Novel Clinical Decision Support System for Neuropathic Pain in Primary Care: Mixed Methods Observational Study", journal="JMIR Med Inform", year="2019", month="Sep", day="30", volume="7", number="3", pages="e14141", keywords="medical records systems", keywords="computerized", keywords="quality of health care", keywords="pain management", keywords="medical informatics", abstract="Background: Computerized clinical decision support systems (CDSSs) have emerged as an approach to improve compliance of clinicians with clinical practice guidelines (CPGs). Research utilizing CDSS has primarily been conducted in clinical contexts with clear diagnostic criteria such as diabetes and cardiovascular diseases. In contrast, research on CDSS for pain management and more specifically neuropathic pain has been limited. A CDSS for neuropathic pain has the potential to enhance patient care as the challenge of diagnosing and treating neuropathic pain often leads to tension in clinician-patient relationships. Objective: The aim of this study was to design and evaluate a CDSS aimed at improving the adherence of interprofessional primary care clinicians to CPG for managing neuropathic pain. Methods: Recommendations from the Canadian CPGs informed the decision pathways. The development of the CDSS format and function involved participation of multiple stakeholders and end users in needs assessment and usability testing. Clinicians, including family medicine physicians, residents, and nurse practitioners, in three academic teaching clinics were trained in the use of the CDSS. Evaluation over one year included the measurement of utilization of the CDSS; change in reported awareness, agreement, and adoption of CPG recommendations; and change in the observed adherence to CPG recommendations. Results: The usability testing of the CDSS was highly successful in the prototype environment. Deployment in the clinical setting was partially complete by the time of the study, with some limitations in the planned functionality. The study population had a high level of awareness, agreement, and adoption of guideline recommendations before implementation of CDSS. Nevertheless, there was a small and statistically significant improvement in the mean awareness and adoption scores over the year of observation (P=.01 for mean awareness scores at 6 and 12 months compared with baseline, for mean adoption scores at 6 months compared with baseline, and for mean adoption scores at 12 months). Documenting significant findings related to diagnosis of neuropathic pain increased significantly. Clinicians accessed CPG information more frequently than they utilized data entry functions. Nurse practitioners and first year family medicine trainees had higher utilization than physicians. Conclusions: We observed a small increase in the adherence to CPG recommendations for managing neuropathic pain. Clinicians utilized the CDSS more as a source of knowledge and as a training tool than as an ongoing dynamic decision support. ", doi="10.2196/14141", url="http://medinform.jmir.org/2019/3/e14141/", url="http://www.ncbi.nlm.nih.gov/pubmed/31573946" } @Article{info:doi/10.2196/10010, author="Shen, Jiayi and Zhang, P. Casper J. and Jiang, Bangsheng and Chen, Jiebin and Song, Jian and Liu, Zherui and He, Zonglin and Wong, Yi Sum and Fang, Po-Han and Ming, Wai-Kit", title="Artificial Intelligence Versus Clinicians in Disease Diagnosis: Systematic Review", journal="JMIR Med Inform", year="2019", month="Aug", day="16", volume="7", number="3", pages="e10010", keywords="artificial intelligence", keywords="deep learning", keywords="diagnosis", keywords="diagnostic imaging", keywords="image interpretation, computer-assisted", keywords="patient-centered care", abstract="Background: Artificial intelligence (AI) has been extensively used in a range of medical fields to promote therapeutic development. The development of diverse AI techniques has also contributed to early detections, disease diagnoses, and referral management. However, concerns about the value of advanced AI in disease diagnosis have been raised by health care professionals, medical service providers, and health policy decision makers. Objective: This review aimed to systematically examine the literature, in particular, focusing on the performance comparison between advanced AI and human clinicians to provide an up-to-date summary regarding the extent of the application of AI to disease diagnoses. By doing so, this review discussed the relationship between the current advanced AI development and clinicians with respect to disease diagnosis and thus therapeutic development in the long run. Methods: We systematically searched articles published between January 2000 and March 2019 following the Preferred Reporting Items for Systematic reviews and Meta-Analysis in the following databases: Scopus, PubMed, CINAHL, Web of Science, and the Cochrane Library. According to the preset inclusion and exclusion criteria, only articles comparing the medical performance between advanced AI and human experts were considered. Results: A total of 9 articles were identified. A convolutional neural network was the commonly applied advanced AI technology. Owing to the variation in medical fields, there is a distinction between individual studies in terms of classification, labeling, training process, dataset size, and algorithm validation of AI. Performance indices reported in articles included diagnostic accuracy, weighted errors, false-positive rate, sensitivity, specificity, and the area under the receiver operating characteristic curve. The results showed that the performance of AI was at par with that of clinicians and exceeded that of clinicians with less experience. Conclusions: Current AI development has a diagnostic performance that is comparable with medical experts, especially in image recognition-related fields. Further studies can be extended to other types of medical imaging such as magnetic resonance imaging and other medical practices unrelated to images. With the continued development of AI-assisted technologies, the clinical implications underpinned by clinicians' experience and guided by patient-centered health care principle should be constantly considered in future AI-related and other technology-based medical research. ", doi="10.2196/10010", url="http://medinform.jmir.org/2019/3/e10010/", url="http://www.ncbi.nlm.nih.gov/pubmed/31420959" } @Article{info:doi/10.2196/13476, author="Wu, Jiangpeng and Zan, Xiangyi and Gao, Liping and Zhao, Jianhong and Fan, Jing and Shi, Hengxue and Wan, Yixin and Yu, E. and Li, Shuyan and Xie, Xiaodong", title="A Machine Learning Method for Identifying Lung Cancer Based on Routine Blood Indices: Qualitative Feasibility Study", journal="JMIR Med Inform", year="2019", month="Aug", day="15", volume="7", number="3", pages="e13476", keywords="lung cancer identification", keywords="routine blood indices", keywords="Random Forest", abstract="Background: Liquid biopsies based on blood samples have been widely accepted as a diagnostic and monitoring tool for cancers, but extremely high sensitivity is frequently needed due to the very low levels of the specially selected DNA, RNA, or protein biomarkers that are released into blood. However, routine blood indices tests are frequently ordered by physicians, as they are easy to perform and are cost effective. In addition, machine learning is broadly accepted for its ability to decipher complicated connections between multiple sets of test data and diseases. Objective: The aim of this study is to discover the potential association between lung cancer and routine blood indices and thereby help clinicians and patients to identify lung cancer based on these routine tests. Methods: The machine learning method known as Random Forest was adopted to build an identification model between routine blood indices and lung cancer that would determine if they were potentially linked. Ten-fold cross-validation and further tests were utilized to evaluate the reliability of the identification model. Results: In total, 277 patients with 49 types of routine blood indices were included in this study, including 183 patients with lung cancer and 94 patients without lung cancer. Throughout the course of the study, there was correlation found between the combination of 19 types of routine blood indices and lung cancer. Lung cancer patients could be identified from other patients, especially those with tuberculosis (which usually has similar clinical symptoms to lung cancer), with a sensitivity, specificity and total accuracy of 96.3\%, 94.97\% and 95.7\% for the cross-validation results, respectively. This identification method is called the routine blood indices model for lung cancer, and it promises to be of help as a tool for both clinicians and patients for the identification of lung cancer based on routine blood indices. Conclusions: Lung cancer can be identified based on the combination of 19 types of routine blood indices, which implies that artificial intelligence can find the connections between a disease and the fundamental indices of blood, which could reduce the necessity of costly, elaborate blood test techniques for this purpose. It may also be possible that the combination of multiple indices obtained from routine blood tests may be connected to other diseases as well. ", doi="10.2196/13476", url="http://medinform.jmir.org/2019/3/e13476/", url="http://www.ncbi.nlm.nih.gov/pubmed/31418423" } @Article{info:doi/10.2196/14482, author="Bachmann, F. Kaspar and Vetter, Christian and Wenzel, Lars and Konrad, Christoph and Vogt, P. Andreas", title="Implementation and Evaluation of a Web-Based Distribution System For Anesthesia Department Guidelines and Standard Operating Procedures: Qualitative Study and Content Analysis", journal="J Med Internet Res", year="2019", month="Aug", day="15", volume="21", number="8", pages="e14482", keywords="standards", keywords="computer communication networks", keywords="anesthesiology", keywords="decision making, computer-assisted", abstract="Background: Digitization is spreading exponentially in medical care, with improved availability of electronic devices. Guidelines and standard operating procedures (SOPs) form an important part of daily clinical routine, and adherence is associated with improved outcomes. Objective: This study aimed to evaluate a digital solution for the maintenance and distribution of SOPs and guidelines in 2 different anesthesiology departments in Switzerland. Methods: A content management system (CMS), WordPress, was set up in 2 tertiary-level hospitals within 1 year: the Department of Anesthesiology and Pain Medicine at the Kantonsspital Lucerne in Lucerne, Switzerland, as an open-access system, followed by a similar system for internal usage in the Department of Anaesthesiology and Pain Medicine of the Inselspital, Bern University Hospital, in Bern, Switzerland. We analyzed the requirements and implementation processes needed to successfully set up these systems, and we evaluated the systems' impact by analyzing content and usage. Results: The systems' generated exportable metadata, such as traffic and content. Analysis of the exported metadata showed that the Lucerne website had 269 pages managed by 44 users, with 88,124 visits per month (worldwide access possible), and the Bern website had 341 pages managed by 35 users, with 1765 visits per month (access only possible from within the institution). Creation of an open-access system resulted in third-party interest in the published guidelines and SOPs. The implementation process can be performed over the course of 1 year and setup and maintenance costs are low. Conclusions: A CMS, such as WordPress, is a suitable solution for distributing and managing guidelines and SOPs. Content is easily accessible and is accessed frequently. Metadata from the system allow live monitoring of usage and suggest that the system be accepted and appreciated by the users. In the future, Web-based solutions could be an important tool to handle guidelines and SOPs, but further studies are needed to assess the effect of these systems. ", doi="10.2196/14482", url="https://www.jmir.org/2019/8/e14482/", url="http://www.ncbi.nlm.nih.gov/pubmed/31418427" } @Article{info:doi/10.2196/11966, author="Tobore, Igbe and Li, Jingzhen and Yuhang, Liu and Al-Handarish, Yousef and Kandwal, Abhishek and Nie, Zedong and Wang, Lei", title="Deep Learning Intervention for Health Care Challenges: Some Biomedical Domain Considerations", journal="JMIR Mhealth Uhealth", year="2019", month="Aug", day="02", volume="7", number="8", pages="e11966", keywords="machine learning", keywords="deep learning", keywords="big data", keywords="mHealth", keywords="medical imaging", keywords="electronic health record", keywords="biologicals", keywords="biomedical", keywords="ECG", keywords="EEG", keywords="artificial intelligence", doi="10.2196/11966", url="https://mhealth.jmir.org/2019/8/e11966/", url="http://www.ncbi.nlm.nih.gov/pubmed/31376272" } @Article{info:doi/10.2196/12660, author="Masud, Rafia and Al-Rei, Mona and Lokker, Cynthia", title="Computer-Aided Detection for Breast Cancer Screening in Clinical Settings: Scoping Review", journal="JMIR Med Inform", year="2019", month="Jul", day="18", volume="7", number="3", pages="e12660", keywords="computer-aided detection", keywords="machine learning", keywords="screening mammography", keywords="breast cancer", keywords="radiology", keywords="implementation", abstract="Background: With the growth of machine learning applications, the practice of medicine is evolving. Computer-aided detection (CAD) is a software technology that has become widespread in radiology practices, particularly in breast cancer screening for improving detection rates at earlier stages. Many studies have investigated the diagnostic accuracy of CAD, but its implementation in clinical settings has been largely overlooked. Objective: The aim of this scoping review was to summarize recent literature on the adoption and implementation of CAD during breast cancer screening by radiologists and to describe barriers and facilitators for CAD use. Methods: The MEDLINE database was searched for English, peer-reviewed articles that described CAD implementation, including barriers or facilitators, in breast cancer screening and were published between January 2010 and March 2018. Articles describing the diagnostic accuracy of CAD for breast cancer detection were excluded. The search returned 526 citations, which were reviewed in duplicate through abstract and full-text screening. Reference lists and cited references in the included studies were reviewed. Results: A total of nine articles met the inclusion criteria. The included articles showed that there is a tradeoff between the facilitators and barriers for CAD use. Facilitators for CAD use were improved breast cancer detection rates, increased profitability of breast imaging, and time saved by replacing double reading. Identified barriers were less favorable perceptions of CAD compared to double reading by radiologists, an increase in recall rates of patients for further testing, increased costs, and unclear effect on patient outcomes. Conclusions: There is a gap in the literature between CAD's well-established diagnostic accuracy and its implementation and use by radiologists. Generally, the perceptions of radiologists have not been considered and details of implementation approaches for adoption of CAD have not been reported. The cost-effectiveness of CAD has not been well established for breast cancer screening in various populations. Further research is needed on how to best facilitate CAD in radiology practices in order to optimize patient outcomes, and the views of radiologists need to be better considered when advancing CAD use. ", doi="10.2196/12660", url="http://medinform.jmir.org/2019/3/e12660/", url="http://www.ncbi.nlm.nih.gov/pubmed/31322128" } @Article{info:doi/10.2196/14286, author="Dhombres, Ferdinand and Maurice, Paul and Guilbaud, Lucie and Franchinard, Loriane and Dias, Barbara and Charlet, Jean and Blondiaux, El{\'e}onore and Khoshnood, Babak and Jurkovic, Davor and Jauniaux, Eric and Jouannic, Jean-Marie", title="A Novel Intelligent Scan Assistant System for Early Pregnancy Diagnosis by Ultrasound: Clinical Decision Support System Evaluation Study", journal="J Med Internet Res", year="2019", month="Jul", day="03", volume="21", number="7", pages="e14286", keywords="decision support system", keywords="ontology", keywords="knowledge base", keywords="medical ultrasound", keywords="ectopic pregnancy", abstract="Background: Early pregnancy ultrasound scans are usually performed by nonexpert examiners in obstetrics/gynecology (OB/GYN) emergency departments. Establishing the precise diagnosis of pregnancy location is key for appropriate management of early pregnancies, and experts are usually able to locate a pregnancy in the first scan. A decision-support system based on a semantic, expert-validated knowledge base may improve the diagnostic performance of nonexpert examiners for early pregnancy transvaginal ultrasound. Objective: This study aims to evaluate a novel Intelligent Scan Assistant System for early pregnancy ultrasound to diagnose the pregnancy location and determine the image quality. Methods: Two trainees performed virtual transvaginal ultrasound examinations of early pregnancy cases with and without the system. The ultrasound images and reports were blindly reviewed by two experts using scoring methods. A diagnosis of pregnancy location and ultrasound image quality were compared between scans performed with and without the system. Results: Each trainee performed a virtual vaginal examination for all 32 cases with and without use of the system. The analysis of the 128 resulting scans showed higher quality of the images (quality score: +23\%; P<.001), less images per scan (4.6 vs 6.3 [without the CDSS]; P<.001), and higher confidence in reporting conclusions (trust score: +20\%; P<.001) with use of the system. Further, use of the system cost an additional 8 minutes per scan. We observed a correct diagnosis of pregnancy location in 39 (61\%) and 52 (81\%) of 64 scans in the nonassisted mode and assisted mode, respectively. Additionally, an exact diagnosis (with precise ectopic location) was made in 30 (47\%) and 49 (73\%) of the 64 scans without and with use of the system, respectively. These differences in diagnostic performance (+20\% for correct location diagnosis and +30\% for exact diagnosis) were both statistically significant (P=.002 and P<.001, respectively). Conclusions: The Intelligent Scan Assistant System is based on an expert-validated knowledge base and demonstrates significant improvement in early pregnancy scanning, both in diagnostic performance (pregnancy location and precise diagnosis) and scan quality (selection of images, confidence, and image quality). ", doi="10.2196/14286", url="http://www.jmir.org/2019/7/e14286/", url="http://www.ncbi.nlm.nih.gov/pubmed/31271152" } @Article{info:doi/10.2196/14022, author="Khoong, C. Elaine and Karliner, Leah and Lo, Lowell and Stebbins, Marilyn and Robinson, Andrew and Pathak, Sarita and Santoyo-Olsson, Jasmine and Scherzer, Rebecca and Peralta, A. Carmen", title="A Pragmatic Cluster Randomized Trial of an Electronic Clinical Decision Support System to Improve Chronic Kidney Disease Management in Primary Care: Design, Rationale, and Implementation Experience", journal="JMIR Res Protoc", year="2019", month="Jun", day="7", volume="8", number="6", pages="e14022", keywords="chronic kidney disease", keywords="clinical decision support systems", keywords="pragmatic clinical trial", keywords="electronic health records", abstract="Background: The diagnosis of chronic kidney disease (CKD) is based on laboratory results easily extracted from electronic health records; therefore, CKD identification and management is an ideal area for targeted electronic decision support efforts. Early CKD management frequently occurs in primary care settings where primary care providers (PCPs) may not implement all the best practices to prevent CKD-related complications. Few previous studies have employed randomized trials to assess a CKD electronic clinical decision support system (eCDSS) that provided recommendations to PCPs tailored to each patient based on laboratory results. Objective: The aim of this study was to report the trial design and implementation experience of a CKD eCDSS in primary care. Methods: This was a 3-arm pragmatic cluster-randomized trial at an academic general internal medicine practice. Eligible patients had 2 previous estimated-glomerular-filtration-rates by serum creatinine (eGFRCr) <60 mL/min/1.73m2 at least 90 days apart. Randomization occurred at the PCP level. For patients of PCPs in either of the 2 intervention arms, the research team ordered triple-marker testing (serum creatinine, serum cystatin-c, and urine albumin-creatinine-ratio) at the beginning of the study period, to be completed when acquiring labs for regular clinical care. The eCDSS launched for PCPs and patients in the intervention arms during a regular PCP visit subsequent to completing the triple-marker testing. The eCDSS delivered individualized guidance on cardiovascular risk-reduction, potassium and proteinuria management, and patient education. Patients in the eCDSS+ arm also received a pharmacist phone call to reinforce CKD-related education. The primary clinical outcome is blood pressure change from baseline at 6 months after the end of the trial, and the main secondary outcome is provider awareness of CKD diagnosis. We also collected process, patient-centered, and implementation outcomes. Results: A multidisciplinary team (primary care internist, nephrologists, pharmacist, and informaticist) designed the eCDSS to integrate into the current clinical workflow. All 81 PCPs contacted agreed to participate and were randomized. Of 995 patients initially eligible by eGFRCr, 413 were excluded per protocol and 58 opted out or withdrew, resulting in 524 patient participants (188 usual care; 165 eCDSS; and 171 eCDSS+). During the 12-month intervention period, 53.0\% (178/336) of intervention patient participants completed triple-marker labs. Among these, 138/178 (77.5\%) had a PCP appointment after the triple-marker labs resulted; the eCDSS was opened for 73.9\% (102/138), with orders or education signed for 81.4\% (83/102). Conclusions: Successful integration of an eCDSS into primary care workflows and high eCDSS utilization rates at eligible visits suggest this tailored electronic approach is feasible and has the potential to improve guideline-concordant CKD care. Trial Registration: ClinicalTrials.gov NCT02925962; https://clinicaltrials.gov/ct2/show/NCT02925962 (Archived by WebCite at http://www.webcitation.org/78qpx1mjR) International Registered Report Identifier (IRRID): DERR1-10.2196/14022 ", doi="10.2196/14022", url="http://www.researchprotocols.org/2019/6/e14022/", url="http://www.ncbi.nlm.nih.gov/pubmed/31199334" } @Article{info:doi/10.2196/13260, author="Wang, Xiaofang and Zhang, Yan and Hao, Shiying and Zheng, Le and Liao, Jiayu and Ye, Chengyin and Xia, Minjie and Wang, Oliver and Liu, Modi and Weng, Ho Ching and Duong, Q. Son and Jin, Bo and Alfreds, T. Shaun and Stearns, Frank and Kanov, Laura and Sylvester, G. Karl and Widen, Eric and McElhinney, B. Doff and Ling, B. Xuefeng", title="Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine", journal="J Med Internet Res", year="2019", month="May", day="16", volume="21", number="5", pages="e13260", keywords="lung cancer", keywords="risk prediction model", keywords="electronic health records", keywords="prospective study", abstract="Background: Lung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate. Objective: The aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population. Methods: Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018. Results: The model had an area under the curve (AUC) of 0.881 (95\% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07\%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14\%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer. Conclusions: We retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance. ", doi="10.2196/13260", url="http://www.jmir.org/2019/5/e13260/", url="http://www.ncbi.nlm.nih.gov/pubmed/31099339" } @Article{info:doi/10.2196/12239, author="Sheikhalishahi, Seyedmostafa and Miotto, Riccardo and Dudley, T. Joel and Lavelli, Alberto and Rinaldi, Fabio and Osmani, Venet", title="Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review", journal="JMIR Med Inform", year="2019", month="Apr", day="27", volume="7", number="2", pages="e12239", keywords="electronic health records", keywords="clinical notes", keywords="chronic diseases", keywords="natural language processing", keywords="machine learning", keywords="deep learning", keywords="heart disease", keywords="stroke", keywords="cancer", keywords="diabetes", keywords="lung disease", abstract="Background: Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. Objective: The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. Methods: Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using ``clinical notes,'' ``natural language processing,'' and ``chronic disease'' and their variations as keywords to maximize coverage of the articles. Results: Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. Conclusions: Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora. ", doi="10.2196/12239", url="http://medinform.jmir.org/2019/2/e12239/", url="http://www.ncbi.nlm.nih.gov/pubmed/31066697" } @Article{info:doi/10.2196/12469, author="Blecker, Saul and Pandya, Rishi and Stork, Susan and Mann, Devin and Kuperman, Gilad and Shelley, Donna and Austrian, S. Jonathan", title="Interruptive Versus Noninterruptive Clinical Decision Support: Usability Study", journal="JMIR Hum Factors", year="2019", month="Apr", day="17", volume="6", number="2", pages="e12469", keywords="clinical decision support", keywords="hospital", keywords="electronic health records", abstract="Background: Clinical decision support (CDS) has been shown to improve compliance with evidence-based care, but its impact is often diminished because of issues such as poor usability, insufficient integration into workflow, and alert fatigue. Noninterruptive CDS may be less subject to alert fatigue, but there has been little assessment of its usability. Objective: This study aimed to study the usability of interruptive and noninterruptive versions of a CDS. Methods: We conducted a usability study of a CDS tool that recommended prescribing an angiotensin-converting enzyme inhibitor for inpatients with heart failure. We developed 2 versions of the CDS: an interruptive alert triggered at order entry and a noninterruptive alert listed in the sidebar of the electronic health record screen. Inpatient providers were recruited and randomly assigned to use the interruptive alert followed by the noninterruptive alert or vice versa in a laboratory setting. We asked providers to ``think aloud'' while using the CDS and then conducted a brief semistructured interview about usability. We used a constant comparative analysis informed by the CDS Five Rights framework to analyze usability testing. Results: A total of 12 providers participated in usability testing. Providers noted that the interruptive alert was readily noticed but generally impeded workflow. The noninterruptive alert was felt to be less annoying but had lower visibility, which might reduce engagement. Provider role seemed to influence preferences; for instance, some providers who had more global responsibility for patients seemed to prefer the noninterruptive alert, whereas more task-oriented providers generally preferred the interruptive alert. Conclusions: Providers expressed trade-offs between impeding workflow and improving visibility with interruptive and noninterruptive versions of a CDS. In addition, 2 potential approaches to effective CDS may include targeting alerts by provider role or supplementing a noninterruptive alert with an occasional, well-timed interruptive alert. ", doi="10.2196/12469", url="http://humanfactors.jmir.org/2019/2/e12469/", url="http://www.ncbi.nlm.nih.gov/pubmed/30994460" } @Article{info:doi/10.2196/10773, author="Arostegui, Inmaculada and Legarreta, Jos{\'e} Mar{\'i}a and Barrio, Irantzu and Esteban, Cristobal and Garcia-Gutierrez, Susana and Aguirre, Urko and Quintana, Mar{\'i}a Jos{\'e} and ", title="A Computer Application to Predict Adverse Events in the Short-Term Evolution of Patients With Exacerbation of Chronic Obstructive Pulmonary Disease", journal="JMIR Med Inform", year="2019", month="Apr", day="17", volume="7", number="2", pages="e10773", keywords="COPD", keywords="disease exacerbation", keywords="mortality", keywords="intensive care", keywords="clinical prediction rule", keywords="mobile app", abstract="Background: Chronic obstructive pulmonary disease (COPD) is a common chronic disease. Exacerbations of COPD (eCOPD) contribute to the worsening of the disease and the patient's evolution. There are some clinical prediction rules that may help to stratify patients with eCOPD by their risk of poor evolution or adverse events. The translation of these clinical prediction rules into computer applications would allow their implementation in clinical practice. Objective: The goal of this study was to create a computer application to predict various outcomes related to adverse events of short-term evolution in eCOPD patients attending an emergency department (ED) based on valid and reliable clinical prediction rules. Methods: A computer application, Prediction of Evolution of patients with eCOPD (PrEveCOPD), was created to predict 2 outcomes related to adverse events: (1) mortality during hospital admission or within a week after an ED visit and (2) admission to an intensive care unit (ICU) or an intermediate respiratory care unit (IRCU) during the eCOPD episode. The algorithms included in the computer tool were based on clinical prediction rules previously developed and validated within the Investigaci{\'o}n en Resultados y Servicios de Salud COPD study. The app was developed for Windows and Android systems, using Visual Studio 2008 and Eclipse, respectively. Results: The PrEveCOPD computer application implements the prediction models previously developed and validated for 2 relevant adverse events in the short-term evolution of patients with eCOPD. The application runs under Windows and Android systems and it can be used locally or remotely as a Web application. Full description of the clinical prediction rules as well as the original references is included on the screen. Input of the predictive variables is controlled for out-of-range and missing values. Language can be switched between English and Spanish. The application is available for downloading and installing on a computer, as a mobile app, or to be used remotely via internet. Conclusions: The PrEveCOPD app shows how clinical prediction rules can be summarized into simple and easy to use tools, which allow for the estimation of the risk of short-term mortality and ICU or IRCU admission for patients with eCOPD. The app can be used on any computer device, including mobile phones or tablets, and it can guide the clinicians to a valid stratification of patients attending the ED with eCOPD. Trial Registration: ClinicalTrials.gov NCT00102401;?https://clinicaltrials.gov/ct2/show/results/NCT02434536 (Archived by WebCite at http://www.webcitation.org/76iwTxYuA) International Registered Report Identifier (IRRID): RR2-10.1186/1472-6963-11-322 ", doi="10.2196/10773", url="http://medinform.jmir.org/2019/2/e10773/", url="http://www.ncbi.nlm.nih.gov/pubmed/30994471" } @Article{info:doi/10.2196/12471, author="Richardson, Safiya and Feldstein, David and McGinn, Thomas and Park, S. Linda and Khan, Sundas and Hess, Rachel and Smith, D. Paul and Mishuris, Grochow Rebecca and McCullagh, Lauren and Mann, Devin", title="Live Usability Testing of Two Complex Clinical Decision Support Tools: Observational Study", journal="JMIR Hum Factors", year="2019", month="Apr", day="15", volume="6", number="2", pages="e12471", keywords="usability", keywords="usability testing", keywords="user experience", keywords="clinical decision support", keywords="health informatics", keywords="provider adoption", keywords="workflow", keywords="live usability", keywords="clinical prediction rules", abstract="Background: Potential of the electronic health records (EHR) and clinical decision support (CDS) systems to improve the practice of medicine has been tempered by poor design and the resulting burden they place on providers. CDS is rarely tested in the real clinical environment. As a result, many tools are hard to use, placing strain on providers and resulting in low adoption rates. The existing CDS usability literature relies primarily on expert opinion and provider feedback via survey. This is the first study to evaluate CDS usability and the provider-computer-patient interaction with complex CDS in the real clinical environment. Objective: This study aimed to further understand the barriers and facilitators of meaningful CDS usage within a real clinical context. Methods: This qualitative observational study was conducted with 3 primary care providers during 6 patient care sessions. In patients with the chief complaint of sore throat, a CDS tool built with the Centor Score was used to stratify the risk of group A Streptococcus pharyngitis. In patients with a chief complaint of cough or upper respiratory tract infection, a CDS tool built with the Heckerling Rule was used to stratify the risk of pneumonia. During usability testing, all human-computer interactions, including audio and continuous screen capture, were recorded using the Camtasia software. Participants' comments and interactions with the tool during clinical sessions and participant comments during a postsession brief interview were placed into coding categories and analyzed for generalizable themes. Results: In the 6 encounters observed, primary care providers toggled between addressing either the computer or the patient during the visit. Minimal time was spent listening to the patient without engaging the EHR. Participants mostly used the CDS tool with the patient, asking questions to populate the calculator and discussing the results of the risk assessment; they reported the ability to do this as the major benefit of the tool. All providers were interrupted during their use of the CDS tool by the need to refer to other sections of the chart. In half of the visits, patients' clinical symptoms challenged the applicability of the tool to calculate the risk of bacterial infection. Primary care providers rarely used the incorporated incentives for CDS usage, including progress notes and patient instructions. Conclusions: Live usability testing of these CDS tools generated insights about their role in the patient-provider interaction. CDS may contribute to the interaction by being simultaneously viewed by the provider and patient. CDS can improve usability and lessen the strain it places on providers by being short, flexible, and customizable to unique provider workflow. A useful component of CDS is being as widely applicable as possible and ensuring that its functions represent the fastest way to perform a particular task. ", doi="10.2196/12471", url="http://humanfactors.jmir.org/2019/2/e12471/", url="http://www.ncbi.nlm.nih.gov/pubmed/30985283" } @Article{info:doi/10.2196/10832, author="Morath, Benedict and Wien, Katharina and Hoppe-Tichy, Torsten and Haefeli, Emil Walter and Seidling, Marita Hanna", title="Structure and Content of Drug Monitoring Advices Included in Discharge Letters at Interfaces of Care: Exploratory Analysis Preceding Database Development", journal="JMIR Med Inform", year="2019", month="Apr", day="08", volume="7", number="2", pages="e10832", keywords="drug monitoring", keywords="patient discharge summaries", keywords="transition of care", abstract="Background: Inadequate drug monitoring of drug therapy after hospital discharge facilitates adverse drug events and preventable hospital readmissions. Objective: This study aimed to analyze the structure and content of drug monitoring advices of a representative sample of discharge letters as a basis for future electronic information systems. Methods: On 2 days in November 2016, all discharge letters of 3 departments of a university hospital were extracted from the hospital information system. The frequency, content, and structure of drug monitoring advices in discharge letters were investigated and compared with the theoretical monitoring requirements expressed in the corresponding summaries of product characteristics (SmPC). The quality of the drug monitoring advices in the discharge letters was rated with the domains of an adapted systematic instructions for monitoring (SIM) score. Results: In total, 154 discharge letters were analyzed containing 1180 brands (240 active pharmaceutical substances), of which 50.42\% (595/1180) could theoretically be amended with a monitoring advice according to the SmPC. In reality, 40 discharge letters (26.0\%, 40/154) contained a total of 66 monitoring advices for 57 brands (4.83\%, 57/1180), comprising 18 different monitoring parameters. Drug monitoring advices only addressed mean 1.9?(SD?0.8) of the 7 domains of the SIM score and frequently did not address reasons for monitoring (86\%, 57/66), the timing of monitoring, that is, the start (76\%, 50/66), the frequency (94\%, 63/66), the stop (95\%, 63/66), and how to react (83\%, 55/66). Conclusions: Drug monitoring advices were mostly absent in discharge letters and a gold standard for appropriate drug monitoring advices was lacking. Hence, more effort should be put in the development of tools that facilitate easy presentation of clinically meaningful drug monitoring advices at the point of care. ", doi="10.2196/10832", url="https://medinform.jmir.org/2019/2/e10832/", url="http://www.ncbi.nlm.nih.gov/pubmed/30958278" } @Article{info:doi/10.2196/12100, author="van Hartskamp, Michael and Consoli, Sergio and Verhaegh, Wim and Petkovic, Milan and van de Stolpe, Anja", title="Artificial Intelligence in Clinical Health Care Applications: Viewpoint", journal="Interact J Med Res", year="2019", month="Apr", day="05", volume="8", number="2", pages="e12100", keywords="artificial intelligence", keywords="deep learning", keywords="clinical data", keywords="Bayesian modeling", keywords="medical informatics", doi="10.2196/12100", url="https://www.i-jmr.org/2019/2/e12100/", url="http://www.ncbi.nlm.nih.gov/pubmed/30950806" } @Article{info:doi/10.2196/12437, author="Zhang, Yonglai and Zhou, Yaojian and Zhang, Dongsong and Song, Wenai", title="A Stroke Risk Detection: Improving Hybrid Feature Selection Method", journal="J Med Internet Res", year="2019", month="Apr", day="02", volume="21", number="4", pages="e12437", keywords="machine learning", keywords="stroke", keywords="risk", keywords="feature selection", keywords="WRHFS", abstract="Background: Stroke is one of the most common diseases that cause mortality. Detecting the risk of stroke for individuals is critical yet challenging because of a large number of risk factors for stroke. Objective: This study aimed to address the limitation of ineffective feature selection in existing research on stroke risk detection. We have proposed a new feature selection method called weighting- and ranking-based hybrid feature selection (WRHFS) to select important risk factors for detecting ischemic stroke. Methods: WRHFS integrates the strengths of various filter algorithms by following the principle of a wrapper approach. We employed a variety of filter-based feature selection models as the candidate set, including standard deviation, Pearson correlation coefficient, Fisher score, information gain, Relief algorithm, and chi-square test and used sensitivity, specificity, accuracy, and Youden index as performance metrics to evaluate the proposed method. Results: This study chose 792 samples from the electronic records of 13,421 patients in a community hospital. Each sample included 28 features (24 blood test features and 4 demographic features). The results of evaluation showed that the proposed method selected 9 important features out of the original 28 features and significantly outperformed baseline methods. Their cumulative contribution was 0.51. The WRHFS method achieved a sensitivity of 82.7\% (329/398), specificity of 80.4\% (317/394), classification accuracy of 81.5\% (645/792), and Youden index of 0.63 using only the top 9 features. We have also presented a chart for visualizing the risk of having ischemic strokes. Conclusions: This study has proposed, developed, and evaluated a new feature selection method for identifying the most important features for building effective and parsimonious models for stroke risk detection. The findings of this research provide several novel research contributions and practical implications. ", doi="10.2196/12437", url="https://www.jmir.org/2019/4/e12437/", url="http://www.ncbi.nlm.nih.gov/pubmed/30938684" } @Article{info:doi/10.2196/10769, author="Lander, Bryn and Balka, Ellen", title="Exploring How Evidence is Used in Care Through an Organizational Ethnography of Two Teaching Hospitals", journal="J Med Internet Res", year="2019", month="Mar", day="28", volume="21", number="3", pages="e10769", keywords="clinical practice guidelines", keywords="evidence-based medicine", keywords="mindlines", keywords="ethnography", abstract="Background: Numerous published articles show that clinicians do not follow clinical practice guidelines (CPGs). However, a few studies explore what clinicians consider evidence and how they use different forms of evidence in their care decisions. Many of these existing studies occurred before the advent of smartphones and advanced Web-based information retrieval technologies. It is important to understand how these new technologies influence the ways clinicians use evidence in their clinical practice. Mindlines are a concept that explores how clinicians draw on different sources of information (including context, experience, medical training, and evidence) to develop collectively reinforced, internalized tacit guidelines. Objective: The aim of this paper was to explore how evidence is integrated into mindline development and the everyday use of mindlines and evidence in care. Methods: We draw on ethnographic data collected by shadowing internal medicine teams at 2 teaching hospitals. Fieldnotes were tagged by evidence category, teaching and care, and role of the person referencing evidence. Counts of these tags were integrated with fieldnote vignettes and memos. The findings were verified with an advisory council and through member checks. Results: CPGs represent just one of several sources of evidence used when making care decisions. Some forms of evidence were predominately invoked from mindlines, whereas other forms were read to supplement mindlines. The majority of scientific evidence was accessed on the Web, often through smartphones. How evidence was used varied by role. As team members gained experience, they increasingly incorporated evidence into their mindlines. Evidence was often blended together to arrive at shared understandings and approaches to patient care that included ways to filter evidence. Conclusions: This paper outlines one way through which the ethos of evidence-based medicine has been incorporated into the daily work of care. Here, multiple Web-based forms of evidence were mixed with other information. This is different from the way that is often articulated by health administrators and policy makers whereby clinical practice guideline adherence is equated with practicing evidence-based medicine. ", doi="10.2196/10769", url="http://www.jmir.org/2019/3/e10769/", url="http://www.ncbi.nlm.nih.gov/pubmed/30920371" } @Article{info:doi/10.2196/12584, author="{\O}stervang, Christina and Vestergaard, Vedel Lene and Dieperink, Brochstedt Karin and Danbj{\o}rg, Boe Dorthe", title="Patient Rounds With Video-Consulted Relatives: Qualitative Study on Possibilities and Barriers From the Perspective of Healthcare Providers", journal="J Med Internet Res", year="2019", month="Mar", day="25", volume="21", number="3", pages="e12584", keywords="telehealth", keywords="family", keywords="relatives", keywords="cancer", keywords="technology", keywords="qualitative research", abstract="Background: In cancer settings, relatives are often seen as a resource as they are able to support the patient and remember information during hospitalization. However, geographic distance to hospitals, work, and family obligations are reasons that may cause difficulties for relatives' physical participation during hospitalization. This provided inspiration to uncover the possibility of telehealth care in connection with enabling participation by relatives during patient rounds. Telehealth is used advantageously in health care systems but is also at risk of failing during the implementation process because of, for instance, health care professionals' resistance to change. Research on the implications for health care professionals in involving relatives' participation through virtual presence during patient rounds is limited. Objective: This study aimed to investigate health care professionals' experiences in using and implementing technology to involve relatives during video-consulted patient rounds. Methods: The design was a qualitative approach. Methods used were focus group interviews, short open interviews, and field observations of health care professionals working at a cancer department. The text material was analyzed using interpretative phenomenological analysis. Results: Field observational studies were conducted for 15 days, yielding 75 hours of observation. A total of 14 sessions of video-consulted patient rounds were observed and 15 pages of field notes written, along with 8 short open interviews with physicians, nurses, and staff from management. Moreover, 2 focus group interviews with 9 health care professionals were conducted. Health care professionals experienced the use of technology as a way to facilitate involvement of the patient's relatives, without them being physically present. Moreover, it raised questions about whether this way of conducting patient rounds could address the needs of both the patients and the relatives. Time, culture, and change of work routines were found to be the major barriers when implementing new technology involving relatives. Conclusions: This study identified a double change by introducing both new technology and virtual participation by relatives at the same time. The change had consequences on health care professionals' work routines with regard to work load, culture, and organization because of the complexity in health care systems. ", doi="10.2196/12584", url="http://www.jmir.org/2019/3/e12584/", url="http://www.ncbi.nlm.nih.gov/pubmed/30907746" } @Article{info:doi/10.2196/12422, author="Oh, Songhee and Kim, Heon Jae and Choi, Sung-Woo and Lee, Jeong Hee and Hong, Jungrak and Kwon, Hyo Soon", title="Physician Confidence in Artificial Intelligence: An Online Mobile Survey", journal="J Med Internet Res", year="2019", month="Mar", day="25", volume="21", number="3", pages="e12422", keywords="artificial intelligence", keywords="AI", keywords="awareness", keywords="physicians", abstract="Background: It is expected that artificial intelligence (AI) will be used extensively in the medical field in the future. Objective: The purpose of this study is to investigate the awareness of AI among Korean doctors and to assess physicians' attitudes toward the medical application of AI. Methods: We conducted an online survey composed of 11 closed-ended questions using Google Forms. The survey consisted of questions regarding the recognition of and attitudes toward AI, the development direction of AI in medicine, and the possible risks of using AI in the medical field. Results: A total of 669 participants completed the survey. Only 40 (5.9\%) answered that they had good familiarity with AI. However, most participants considered AI useful in the medical field (558/669, 83.4\% agreement). The advantage of using AI was seen as the ability to analyze vast amounts of high-quality, clinically relevant data in real time. Respondents agreed that the area of medicine in which AI would be most useful is disease diagnosis (558/669, 83.4\% agreement). One possible problem cited by the participants was that AI would not be able to assist in unexpected situations owing to inadequate information (196/669, 29.3\%). Less than half of the participants(294/669, 43.9\%) agreed that AI is diagnostically superior to human doctors. Only 237 (35.4\%) answered that they agreed that AI could replace them in their jobs. Conclusions: This study suggests that Korean doctors and medical students have favorable attitudes toward AI in the medical field. The majority of physicians surveyed believed that AI will not replace their roles in the future. ", doi="10.2196/12422", url="http://www.jmir.org/2019/3/e12422/", url="http://www.ncbi.nlm.nih.gov/pubmed/30907742" } @Article{info:doi/10.2196/12577, author="Tang, Chunlei and Sun, Huajun and Xiong, Yun and Yang, Jiahong and Vitale, Christopher and Ruan, Lu and Ai, Angela and Yu, Guangjun and Ma, Jing and Bates, David", title="Medication Use for Childhood Pneumonia at a Children's Hospital in Shanghai, China: Analysis of Pattern Mining Algorithms", journal="JMIR Med Inform", year="2019", month="Mar", day="22", volume="7", number="1", pages="e12577", keywords="drug therapy", keywords="combination", keywords="computer-assisted", keywords="pattern recognition", keywords="data mining", keywords="precision medicine", keywords="childhood pneumonia", keywords="hospital", abstract="Background: Pattern mining utilizes multiple algorithms to explore objective and sometimes unexpected patterns in real-world data. This technique could be applied to electronic medical record data mining; however, it first requires a careful clinical assessment and validation. Objective: The aim of this study was to examine the use of pattern mining techniques on a large clinical dataset to detect treatment and medication use patterns for childhood pneumonia. Methods: We applied 3 pattern mining algorithms to 680,138 medication administration records from 30,512 childhood inpatients with diagnosis of pneumonia during a 6-year period at a children's hospital in China. Patients' ages ranged from 0 to 17 years, where 37.53\% (11,453/30,512) were 0 to 3 months old, 86.55\% (26,408/30,512) were under 5 years, 60.37\% (18,419/30,512) were male, and 60.10\% (18,338/30,512) had a hospital stay of 9 to 15 days. We used the FP-Growth, PrefixSpan, and USpan pattern mining algorithms. The first 2 are more traditional methods of pattern mining and mine a complete set of frequent medication use patterns. PrefixSpan also incorporates an administration sequence. The newer USpan method considers medication utility, defined by the dose, frequency, and timing of use of the 652 individual medications in the dataset. Together, these 3 methods identified the top 10 patterns from 6 age groups, forming a total of 180 distinct medication combinations. These medications encompassed the top 40 (73.66\%, 500,982/680,138) most frequently used medications. These patterns were then evaluated by subject matter experts to summarize 5 medication use and 2 treatment patterns. Results: We identified 5 medication use patterns: (1) antiasthmatics and expectorants and corticosteroids, (2) antibiotics and (antiasthmatics or expectorants or corticosteroids), (3) third-generation cephalosporin antibiotics with (or followed by) traditional antibiotics, (4) antibiotics and (medications for enteritis or skin diseases), and (5) (antiasthmatics or expectorants or corticosteroids) and (medications for enteritis or skin diseases). We also identified 2 frequent treatment patterns: (1) 42.89\% (291,701/680,138) of specific medication administration records were of intravenous therapy with antibiotics, diluents, and nutritional supplements and (2) 11.53\% (78,390/680,138) were of various combinations of inhalation of antiasthmatics, expectorants, or corticosteroids. Fleiss kappa for the subject experts' evaluation was 0.693, indicating moderate agreement. Conclusions: Utilizing a pattern mining approach, we summarized 5 medication use patterns and 2 treatment patterns. These warrant further investigation. ", doi="10.2196/12577", url="http://medinform.jmir.org/2019/1/e12577/", url="http://www.ncbi.nlm.nih.gov/pubmed/30900998" } @Article{info:doi/10.2196/12802, author="Blease, Charlotte and Kaptchuk, J. Ted and Bernstein, H. Michael and Mandl, D. Kenneth and Halamka, D. John and DesRoches, M. Catherine", title="Artificial Intelligence and the Future of Primary Care: Exploratory Qualitative Study of UK General Practitioners' Views", journal="J Med Internet Res", year="2019", month="Mar", day="20", volume="21", number="3", pages="e12802", keywords="artificial intelligence", keywords="attitudes", keywords="future", keywords="general practice", keywords="machine learning", keywords="opinions", keywords="primary care", keywords="qualitative research", keywords="technology", abstract="Background: The potential for machine learning to disrupt the medical profession is the subject of ongoing debate within biomedical informatics and related fields. Objective: This study aimed to explore general practitioners' (GPs') opinions about the potential impact of future technology on key tasks in primary care. Methods: In June 2018, we conducted a Web-based survey of 720 UK GPs' opinions about the likelihood of future technology to fully replace GPs in performing 6 key primary care tasks, and, if respondents considered replacement for a particular task likely, to estimate how soon the technological capacity might emerge. This study involved qualitative descriptive analysis of written responses (``comments'') to an open-ended question in the survey. Results: Comments were classified into 3 major categories in relation to primary care: (1) limitations of future technology, (2) potential benefits of future technology, and (3) social and ethical concerns. Perceived limitations included the beliefs that communication and empathy are exclusively human competencies; many GPs also considered clinical reasoning and the ability to provide value-based care as necessitating physicians' judgments. Perceived benefits of technology included expectations about improved efficiencies, in particular with respect to the reduction of administrative burdens on physicians. Social and ethical concerns encompassed multiple, divergent themes including the need to train more doctors to overcome workforce shortfalls and misgivings about the acceptability of future technology to patients. However, some GPs believed that the failure to adopt technological innovations could incur harms to both patients and physicians. Conclusions: This study presents timely information on physicians' views about the scope of artificial intelligence (AI) in primary care. Overwhelmingly, GPs considered the potential of AI to be limited. These views differ from the predictions of biomedical informaticians. More extensive, stand-alone qualitative work would provide a more in-depth understanding of GPs' views. ", doi="10.2196/12802", url="http://www.jmir.org/2019/3/e12802/", url="http://www.ncbi.nlm.nih.gov/pubmed/30892270" } @Article{info:doi/10.2196/11732, author="Bezemer, Tim and de Groot, CH Mark and Blasse, Enja and ten Berg, J. Maarten and Kappen, H. Teus and Bredenoord, L. Annelien and van Solinge, W. Wouter and Hoefer, E. Imo and Haitjema, Saskia", title="A Human(e) Factor in Clinical Decision Support Systems", journal="J Med Internet Res", year="2019", month="Mar", day="19", volume="21", number="3", pages="e11732", keywords="clinical decision support", keywords="big data", keywords="artificial intelligence", keywords="machine learning", keywords="deep learning", keywords="precision medicine", keywords="expert systems", keywords="data science", keywords="health care providers", doi="10.2196/11732", url="http://www.jmir.org/2019/3/e11732/", url="http://www.ncbi.nlm.nih.gov/pubmed/30888324" } @Article{info:doi/10.2196/11659, author="Lee, Jaehoon and Hulse, C. Nathan", title="An Analytics Framework for Physician Adherence to Clinical Practice Guidelines: Knowledge-Based Approach", journal="JMIR Biomed Eng", year="2019", month="Feb", day="27", volume="4", number="1", pages="e11659", keywords="clinical practice guidelines", keywords="care process model", keywords="visual analytics", keywords="clinical decision support", abstract="Background: One of the problems in evaluating clinical practice guidelines (CPGs) is the occurrence of knowledge gaps. These gaps may occur when evaluation logics and definitions in analytics pipelines are translated differently. Objective: The objective of this paper is to develop a systematic method that will fill in the cognitive and computational gaps of CPG knowledge components in analytics pipelines. Methods: We used locally developed CPGs that resulted in care process models (CPMs). We derived adherence definitions from the CPMs, transformed them into computationally executable queries, and deployed them into an enterprise knowledge base that specializes in managing clinical knowledge content. We developed a visual analytics framework, whose data pipelines are connected to queries in the knowledge base, to automate the extraction of data from clinical databases and calculation of evaluation metrics. Results: In this pilot study, we implemented 21 CPMs within the proposed framework, which is connected to an enterprise data warehouse (EDW) as a data source. We built a Web--based dashboard for monitoring and evaluating adherence to the CPMs. The dashboard ran for 18 months during which CPM adherence definitions were updated a number of times. Conclusions: The proposed framework was demonstrated to accommodate complicated knowledge management for CPM adherence evaluation in analytics pipelines using a knowledge base. At the same time, knowledge consistency and computational efficiency were maintained. ", doi="10.2196/11659", url="http://biomedeng.jmir.org/2019/1/e11659/" } @Article{info:doi/10.2196/10245, author="Khan, Sundas and Richardson, Safiya and Liu, Andrew and Mechery, Vinodh and McCullagh, Lauren and Schachter, Andy and Pardo, Salvatore and McGinn, Thomas", title="Improving Provider Adoption With Adaptive Clinical Decision Support Surveillance: An Observational Study", journal="JMIR Hum Factors", year="2019", month="Feb", day="20", volume="6", number="1", pages="e10245", keywords="pulmonary embolism", keywords="clinical decision support", keywords="evidence-based medicine", abstract="Background: Successful clinical decision support (CDS) tools can help use evidence-based medicine to effectively improve patient outcomes. However, the impact of these tools has been limited by low provider adoption due to overtriggering, leading to alert fatigue. We developed a tracking mechanism for monitoring trigger (percent of total visits for which the tool triggers) and adoption (percent of completed tools) rates of a complex CDS tool based on the Wells criteria for pulmonary embolism (PE). Objective: We aimed to monitor and evaluate the adoption and trigger rates of the tool and assess whether ongoing tool modifications would improve adoption rates. Methods: As part of a larger clinical trial, a CDS tool was developed using the Wells criteria to calculate pretest probability for PE at 2 tertiary centers' emergency departments (EDs). The tool had multiple triggers: any order for D-dimer, computed tomography (CT) of the chest with intravenous contrast, CT pulmonary angiography (CTPA), ventilation-perfusion scan, or lower extremity Doppler ultrasound. A tracking dashboard was developed using Tableau to monitor real-time trigger and adoption rates. Based on initial low provider adoption rates of the tool, we conducted small focus groups with key ED providers to elicit barriers to tool use. We identified overtriggering of the tool for non-PE-related evaluations and inability to order CT testing for intermediate-risk patients. Thus, the tool was modified to allow CT testing for the intermediate-risk group and not to trigger for CT chest with intravenous contrast orders. A dialogue box, ``Are you considering PE for this patient?'' was added before the tool triggered to account for CTPAs ordered for aortic dissection evaluation. Results: In the ED of tertiary center 1, 95,295 patients visited during the academic year. The tool triggered for an average of 509 patients per month (average trigger rate 2036/30,234, 6.73\%) before the modifications, reducing to 423 patients per month (average trigger rate 1629/31,361, 5.22\%). In the ED of tertiary center 2, 88,956 patients visited during the academic year, with the tool triggering for about 473 patients per month (average trigger rate 1892/29,706, 6.37\%) before the modifications and for about 400 per month (average trigger rate 1534/30,006, 5.12\%) afterward. The modifications resulted in a significant 4.5- and 3-fold increase in provider adoption rates in tertiary centers 1 and 2, respectively. The modifications increased the average monthly adoption rate from 23.20/360 (6.5\%) tools to 81.60/280.20 (29.3\%) tools and 46.60/318.80 (14.7\%) tools to 111.20/263.40 (42.6\%) tools in centers 1 and 2, respectively. Conclusions: Close postimplementation monitoring of CDS tools may help improve provider adoption. Adaptive modifications based on user feedback may increase targeted CDS with lower trigger rates, reducing alert fatigue and increasing provider adoption. Iterative improvements and a postimplementation monitoring dashboard can significantly improve adoption rates. ", doi="10.2196/10245", url="http://humanfactors.jmir.org/2019/1/e10245/", url="http://www.ncbi.nlm.nih.gov/pubmed/30785410" } @Article{info:doi/10.2196/11505, author="Cho, Insook and Boo, Eun-Hee and Chung, Eunja and Bates, W. David and Dykes, Patricia", title="Novel Approach to Inpatient Fall Risk Prediction and Its Cross-Site Validation Using Time-Variant Data", journal="J Med Internet Res", year="2019", month="Feb", day="19", volume="21", number="2", pages="e11505", keywords="across sites validation", keywords="electronic medical records", keywords="inpatient falls", keywords="nursing dataset", keywords="predictive model", abstract="Background: Electronic medical records (EMRs) contain a considerable amount of information about patients. The rapid adoption of EMRs and the integration of nursing data into clinical repositories have made large quantities of clinical data available for both clinical practice and research. Objective: In this study, we aimed to investigate whether readily available longitudinal EMR data including nursing records could be utilized to compute the risk of inpatient falls and to assess their accuracy compared with existing fall risk assessment tools. Methods: We used 2 study cohorts from 2 tertiary hospitals, located near Seoul, South Korea, with different EMR systems. The modeling cohort included 14,307 admissions (122,179 hospital days), and the validation cohort comprised 21,172 admissions (175,592 hospital days) from each of 6 nursing units. A probabilistic Bayesian network model was used, and patient data were divided into windows with a length of 24 hours. In addition, data on existing fall risk assessment tools, nursing processes, Korean Patient Classification System groups, and medications and administration data were used as model parameters. Model evaluation metrics were averaged using 10-fold cross-validation. Results: The initial model showed an error rate of 11.7\% and a spherical payoff of 0.91 with a c-statistic of 0.96, which represent far superior performance compared with that for the existing fall risk assessment tool (c-statistic=0.69). The cross-site validation revealed an error rate of 4.87\% and a spherical payoff of 0.96 with a c-statistic of 0.99 compared with a c-statistic of 0.65 for the existing fall risk assessment tool. The calibration curves for the model displayed more reliable results than those for the fall risk assessment tools alone. In addition, nursing intervention data showed potential contributions to reducing the variance in the fall rate as did the risk factors of individual patients. Conclusions: A risk prediction model that considers longitudinal EMR data including nursing interventions can improve the ability to identify individual patients likely to fall. ", doi="10.2196/11505", url="https://www.jmir.org/2019/2/e11505/", url="http://www.ncbi.nlm.nih.gov/pubmed/30777849" } @Article{info:doi/10.2196/11757, author="Park, Jaram and Kim, Jeong-Whun and Ryu, Borim and Heo, Eunyoung and Jung, Young Se and Yoo, Sooyoung", title="Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data", journal="J Med Internet Res", year="2019", month="Feb", day="15", volume="21", number="2", pages="e11757", keywords="health risk appraisal", keywords="risk", keywords="hypertension", keywords="chronic disease", keywords="clustering and classification", keywords="decision support systems", abstract="Background: Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension. Objective: Our goal was to develop and compare machine learning models predicting high-risk vascular diseases for hypertensive patients so that they can manage their blood pressure based on their risk level. Methods: We used a 12-year longitudinal dataset of the nationwide sample cohort, which contains the data of 514,866 patients and allows tracking of patients' medical history across all health care providers in Korea (N=51,920). To ensure the generalizability of our models, we conducted an external validation using another national sample cohort dataset, comprising one million different patients, published by the National Health Insurance Service. From each dataset, we obtained the data of 74,535 and 59,738 patients with essential hypertension and developed machine learning models for predicting cardiovascular and cerebrovascular events. Six machine learning models were developed and compared for evaluating performances based on validation metrics. Results: Machine learning algorithms enabled us to detect high-risk patients based on their medical history. The long short-term memory-based algorithm outperformed in the within test (F1-score=.772, external test F1-score=.613), and the random forest-based algorithm of risk prediction showed better performance over other machine learning algorithms concerning generalization (within test F1-score=.757, external test F1-score=.705). Concerning the number of features, in the within test, the long short-term memory-based algorithms outperformed regardless of the number of features. However, in the external test, the random forest-based algorithm was the best, irrespective of the number of features it encountered. Conclusions: We developed and compared machine learning models predicting high-risk vascular diseases in hypertensive patients so that they may manage their blood pressure based on their risk level. By relying on the prediction model, a government can predict high-risk patients at the nationwide level and establish health care policies in advance. ", doi="10.2196/11757", url="http://www.jmir.org/2019/2/e11757/", url="http://www.ncbi.nlm.nih.gov/pubmed/30767907" } @Article{info:doi/10.2196/12650, author="Amroze, Azraa and Field, S. Terry and Fouayzi, Hassan and Sundaresan, Devi and Burns, Laura and Garber, Lawrence and Sadasivam, S. Rajani and Mazor, M. Kathleen and Gurwitz, H. Jerry and Cutrona, L. Sarah", title="Use of Electronic Health Record Access and Audit Logs to Identify Physician Actions Following Noninterruptive Alert Opening: Descriptive Study", journal="JMIR Med Inform", year="2019", month="Feb", day="07", volume="7", number="1", pages="e12650", keywords="electronic health records", keywords="health services research", keywords="health information technology", keywords="health care communication", abstract="Background: Electronic health record (EHR) access and audit logs record behaviors of providers as they navigate the EHR. These data can be used to better understand provider responses to EHR--based clinical decision support (CDS), shedding light on whether and why CDS is effective. Objective: This study aimed to determine the feasibility of using EHR access and audit logs to track primary care physicians' (PCPs') opening of and response to noninterruptive alerts delivered to EHR InBaskets. Methods: We conducted a descriptive study to assess the use of EHR log data to track provider behavior. We analyzed data recorded following opening of 799 noninterruptive alerts sent to 75 PCPs' InBaskets through a prior randomized controlled trial. Three types of alerts highlighted new medication concerns for older patients' posthospital discharge: information only (n=593), medication recommendations (n=37), and test recommendations (n=169). We sought log data to identify the person opening the alert and the timing and type of PCPs' follow-up EHR actions (immediate vs by the end of the following day). We performed multivariate analyses examining associations between alert type, patient characteristics, provider characteristics, and contextual factors and likelihood of immediate or subsequent PCP action (general, medication-specific, or laboratory-specific actions). We describe challenges and strategies for log data use. Results: We successfully identified the required data in EHR access and audit logs. More than three-quarters of alerts (78.5\%, 627/799) were opened by the PCP to whom they were directed, allowing us to assess immediate PCP action; of these, 208 alerts were followed by immediate action. Expanding on our analyses to include alerts opened by staff or covering physicians, we found that an additional 330 of the 799 alerts demonstrated PCP action by the end of the following day. The remaining 261 alerts showed no PCP action. Compared to information-only alerts, the odds ratio (OR) of immediate action was 4.03 (95\% CI 1.67-9.72) for medication-recommendation and 2.14 (95\% CI 1.38-3.32) for test-recommendation alerts. Compared to information-only alerts, ORs of medication-specific action by end of the following day were significantly greater for medication recommendations (5.59; 95\% CI 2.42-12.94) and test recommendations (1.71; 95\% CI 1.09-2.68). We found a similar pattern for OR of laboratory-specific action. We encountered 2 main challenges: (1) Capturing a historical snapshot of EHR status (number of InBasket messages at time of alert delivery) required incorporation of data generated many months prior with longitudinal follow-up. (2) Accurately interpreting data elements required iterative work by a physician/data manager team taking action within the EHR and then examining audit logs to identify corresponding documentation. Conclusions: EHR log data could inform future efforts and provide valuable information during development and refinement of CDS interventions. To address challenges, use of these data should be planned before implementing an EHR--based study. ", doi="10.2196/12650", url="http://medinform.jmir.org/2019/1/e12650/", url="http://www.ncbi.nlm.nih.gov/pubmed/30730293" } @Article{info:doi/10.2196/11016, author="Wang, Chi-Shiang and Lin, Pei-Ju and Cheng, Ching-Lan and Tai, Shu-Hua and Kao Yang, Yea-Huei and Chiang, Jung-Hsien", title="Detecting Potential Adverse Drug Reactions Using a Deep Neural Network Model", journal="J Med Internet Res", year="2019", month="Feb", day="06", volume="21", number="2", pages="e11016", keywords="adverse drug reactions", keywords="deep neural network", keywords="drug representation", keywords="machine learning", keywords="pharmacovigilance", abstract="Background: Adverse drug reactions (ADRs) are common and are the underlying cause of over a million serious injuries and deaths each year. The most familiar method to detect ADRs is relying on spontaneous reports. Unfortunately, the low reporting rate of spontaneous reports is a serious limitation of pharmacovigilance. Objective: The objective of this study was to identify a method to detect potential ADRs of drugs automatically using a deep neural network (DNN). Methods: We designed a DNN model that utilizes the chemical, biological, and biomedical information of drugs to detect ADRs. This model aimed to fulfill two main purposes: identifying the potential ADRs of drugs and predicting the possible ADRs of a new drug. For improving the detection performance, we distributed representations of the target drugs in a vector space to capture the drug relationships using the word-embedding approach to process substantial biomedical literature. Moreover, we built a mapping function to address new drugs that do not appear in the dataset. Results: Using the drug information and the ADRs reported up to 2009, we predicted the ADRs of drugs recorded up to 2012. There were 746 drugs and 232 new drugs, which were only recorded in 2012 with 1325 ADRs. The experimental results showed that the overall performance of our model with mean average precision at top-10 achieved is 0.523 and the rea under the receiver operating characteristic curve (AUC) score achieved is 0.844 for ADR prediction on the dataset. Conclusions: Our model is effective in identifying the potential ADRs of a drug and the possible ADRs of a new drug. Most importantly, it can detect potential ADRs irrespective of whether they have been reported in the past. ", doi="10.2196/11016", url="http://www.jmir.org/2019/2/e11016/", url="http://www.ncbi.nlm.nih.gov/pubmed/30724742" } @Article{info:doi/10.2196/12341, author="Yang, Cheng-Yi and Chen, Ray-Jade and Chou, Wan-Lin and Lee, Yuarn-Jang and Lo, Yu-Sheng", title="An Integrated Influenza Surveillance Framework Based on National Influenza-Like Illness Incidence and Multiple Hospital Electronic Medical Records for Early Prediction of Influenza Epidemics: Design and Evaluation", journal="J Med Internet Res", year="2019", month="Feb", day="01", volume="21", number="2", pages="e12341", keywords="influenza", keywords="epidemics", keywords="influenza surveillance", keywords="electronic disease surveillance", keywords="electronic medical records", keywords="electronic health records", keywords="public health", abstract="Background: Influenza is a leading cause of death worldwide and contributes to heavy economic losses to individuals and communities. Therefore, the early prediction of and interventions against influenza epidemics are crucial to reduce mortality and morbidity because of this disease. Similar to other countries, the Taiwan Centers for Disease Control and Prevention (TWCDC) has implemented influenza surveillance and reporting systems, which primarily rely on influenza-like illness (ILI) data reported by health care providers, for the early prediction of influenza epidemics. However, these surveillance and reporting systems show at least a 2-week delay in prediction, indicating the need for improvement. Objective: We aimed to integrate the TWCDC ILI data with electronic medical records (EMRs) of multiple hospitals in Taiwan. Our ultimate goal was to develop a national influenza trend prediction and reporting tool more accurate and efficient than the current influenza surveillance and reporting systems. Methods: First, the influenza expertise team at Taipei Medical University Health Care System (TMUHcS) identified surveillance variables relevant to the prediction of influenza epidemics. Second, we developed a framework for integrating the EMRs of multiple hospitals with the ILI data from the TWCDC website to proactively provide results of influenza epidemic monitoring to hospital infection control practitioners. Third, using the TWCDC ILI data as the gold standard for influenza reporting, we calculated Pearson correlation coefficients to measure the strength of the linear relationship between TMUHcS EMRs and regional and national TWCDC ILI data for 2 weekly time series datasets. Finally, we used the Moving Epidemic Method analyses to evaluate each surveillance variable for its predictive power for influenza epidemics. Results: Using this framework, we collected the EMRs and TWCDC ILI data of the past 3 influenza seasons (October 2014 to September 2017). On the basis of the EMRs of multiple hospitals, 3 surveillance variables, TMUHcS-ILI, TMUHcS-rapid influenza laboratory tests with positive results (RITP), and TMUHcS-influenza medication use (IMU), which reflected patients with ILI, those with positive results from rapid influenza diagnostic tests, and those treated with antiviral drugs, respectively, showed strong correlations with the TWCDC regional and national ILI data (r=.86-.98). The 2 surveillance variables---TMUHcS-RITP and TMUHcS-IMU---showed predictive power for influenza epidemics 3 to 4 weeks before the increase noted in the TWCDC ILI reports. Conclusions: Our framework periodically integrated and compared surveillance data from multiple hospitals and the TWCDC website to maintain a certain prediction quality and proactively provide monitored results. Our results can be extended to other infectious diseases, mitigating the time and effort required for data collection and analysis. Furthermore, this approach may be developed as a cost-effective electronic surveillance tool for the early and accurate prediction of epidemics of influenza and other infectious diseases in densely populated regions and nations. ", doi="10.2196/12341", url="http://www.jmir.org/2019/2/e12341/", url="http://www.ncbi.nlm.nih.gov/pubmed/30707099" } @Article{info:doi/10.2196/12790, author="Lee, Jen-Kuang and Hung, Chi-Sheng and Huang, Ching-Chang and Chen, Ying-Hsien and Chuang, Pao-Yu and Yu, Jiun-Yu and Ho, Yi-Lwun", title="Use of the CHA2DS2-VASc Score for Risk Stratification of Hospital Admissions Among Patients With Cardiovascular Diseases Receiving a Fourth-Generation Synchronous Telehealth Program: Retrospective Cohort Study", journal="J Med Internet Res", year="2019", month="Jan", day="31", volume="21", number="1", pages="e12790", keywords="CHA2DS2-VASc score", keywords="fourth-generation synchronous telehealth program", keywords="hospitalization", keywords="cardiovascular disease", abstract="Background: Telehealth programs are generally diverse in approaching patients, from traditional telephone calling and texting message and to the latest fourth-generation synchronous program. The predefined outcomes are also different, including hypertension control, lipid lowering, cardiovascular outcomes, and mortality. In previous studies, the telehealth program showed both positive and negative results, providing mixed and confusing clinical outcomes. A comprehensive and integrated approach is needed to determine which patients benefit from the program in order to improve clinical outcomes. Objective: The CHA2DS2-VASc (congestive heart failure, hypertension, age >75 years [doubled], type 2 diabetes mellitus, previous stroke, transient ischemic attack or thromboembolism [doubled], vascular disease, age of 65-75 years, and sex) score has been widely used for the prediction of stroke in patients with atrial fibrillation. This study investigated the CHA2DS2-VASc score to stratify patients with cardiovascular diseases receiving a fourth-generation synchronous telehealth program. Methods: This was a retrospective cohort study. We recruited patients with cardiovascular disease who received the fourth-generation synchronous telehealth program at the National Taiwan University Hospital between October 2012 and June 2015. We enrolled 431 patients who had joined a telehealth program and compared them to 1549 control patients. Risk of cardiovascular hospitalization was estimated with Kaplan-Meier curves. The CHA2DS2-VASc score was used as the composite parameter to stratify the severity of patients' conditions. The association between baseline characteristics and clinical outcomes was assessed via the Cox proportional hazard model. Results: The mean follow-up duration was 886.1 (SD 531.0) days in patients receiving the fourth-generation synchronous telehealth program and 707.1 (SD 431.4) days in the control group (P<.001). The telehealth group had more comorbidities at baseline than the control group. Higher CHA2DS2-VASc scores (?4) were associated with a lower estimated rate of remaining free from cardiovascular hospitalization (46.5\% vs 54.8\%, log-rank P=.003). Patients with CHA2DS2-VASc scores ?4 receiving the telehealth program were less likely to be admitted for cardiovascular disease than patients not receiving the program. (61.5\% vs 41.8\%, log-rank P=.01). The telehealth program remained a significant prognostic factor after multivariable Cox analysis in patients with CHA2DS2-VASc scores ?4 (hazard ratio=0.36 [CI 0.22-0.62], P<.001) Conclusions: A higher CHA2DS2-VASc score was associated with a higher risk of cardiovascular admissions. Patients accepting the fourth-generation telehealth program with CHA2DS2-VASc scores ?4 benefit most by remaining free from cardiovascular hospitalization. ", doi="10.2196/12790", url="https://www.jmir.org/2019/1/e12790/", url="http://www.ncbi.nlm.nih.gov/pubmed/30702437" } @Article{info:doi/10.2196/12591, author="Luo, Gang and Stone, L. Bryan and Nkoy, L. Flory and He, Shan and Johnson, D. Michael", title="Predicting Appropriate Hospital Admission of Emergency Department Patients with Bronchiolitis: Secondary Analysis", journal="JMIR Med Inform", year="2019", month="Jan", day="22", volume="7", number="1", pages="e12591", keywords="bronchiolitis", keywords="appropriate hospital admission", keywords="emergency department", keywords="predictive model", keywords="machine learning", abstract="Background: In children below the age of 2 years, bronchiolitis is the most common reason for hospitalization. Each year in the United States, bronchiolitis causes 287,000 emergency department visits, 32\%-40\% of which result in hospitalization. Due to a lack of evidence and objective criteria for managing bronchiolitis, clinicians often make emergency department disposition decisions on hospitalization or discharge to home subjectively, leading to large practice variation. Our recent study provided the first operational definition of appropriate hospital admission for emergency department patients with bronchiolitis and showed that 6.08\% of emergency department disposition decisions for bronchiolitis were inappropriate. An accurate model for predicting appropriate hospital admission can guide emergency department disposition decisions for bronchiolitis and improve outcomes, but has not been developed thus far. Objective: The objective of this study was to develop a reasonably accurate model for predicting appropriate hospital admission. Methods: Using Intermountain Healthcare data from 2011-2014, we developed the first machine learning classification model to predict appropriate hospital admission for emergency department patients with bronchiolitis. Results: Our model achieved an accuracy of 90.66\% (3242/3576, 95\% CI: 89.68-91.64), a sensitivity of 92.09\% (1083/1176, 95\% CI: 90.33-93.56), a specificity of 89.96\% (2159/2400, 95\% CI: 88.69-91.17), and an area under the receiver operating characteristic curve of 0.960 (95\% CI: 0.954-0.966). We identified possible improvements to the model to guide future research on this topic. Conclusions: Our model has good accuracy for predicting appropriate hospital admission for emergency department patients with bronchiolitis. With further improvement, our model could serve as a foundation for building decision-support tools to guide disposition decisions for children with bronchiolitis presenting to emergency departments. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.5155 ", doi="10.2196/12591", url="http://medinform.jmir.org/2019/1/e12591/", url="http://www.ncbi.nlm.nih.gov/pubmed/30668518" } @Article{info:doi/10.2196/10008, author="Duckworth, Megan and Adelman, Jason and Belategui, Katherine and Feliciano, Zinnia and Jackson, Emily and Khasnabish, Srijesa and Lehman, Sun I-Fong and Lindros, Ellen Mary and Mortimer, Heather and Ryan, Kasey and Scanlan, Maureen and Berger Spivack, Linda and Yu, Ping Shao and Bates, Westfall David and Dykes, C. Patricia", title="Assessing the Effectiveness of Engaging Patients and Their Families in the Three-Step Fall Prevention Process Across Modalities of an Evidence-Based Fall Prevention Toolkit: An Implementation Science Study", journal="J Med Internet Res", year="2019", month="Jan", day="21", volume="21", number="1", pages="e10008", keywords="clinical decision support", keywords="fall prevention", keywords="fall prevention toolkit", keywords="health information technology", keywords="implementation science", keywords="patient safety", abstract="Background: Patient falls are a major problem in hospitals. The development of a Patient-Centered Fall Prevention Toolkit, Fall TIPS (Tailoring Interventions for Patient Safety), reduced falls by 25\% in acute care hospitals by leveraging health information technology to complete the 3-step fall prevention process---(1) conduct fall risk assessments; (2) develop tailored fall prevention plans with the evidence-based interventions; and (3) consistently implement the plan. We learned that Fall TIPS was most effective when patients and family were engaged in all 3 steps of the fall prevention process. Over the past decade, our team developed 3 Fall TIPS modalities---the original electronic health record (EHR) version, a laminated paper version that uses color to provide clinical decision support linking patient-specific risk factors to the interventions, and a bedside display version that automatically populates the bedside monitor with the patients' fall prevention plan based on the clinical documentation in the EHR. However, the relative effectiveness of each Fall TIPS modality for engaging patients and family in the 3-step fall prevention process remains unknown. Objective: This study aims to examine if the Fall TIPS modality impacts patient engagement in the 3-step fall prevention process and thus Fall TIPS efficacy. Methods: To assess patient engagement in the 3-step fall prevention process, we conducted random audits with the question, ``Does the patient/family member know their fall prevention plan?'' In addition, audits were conducted to measure adherence, defined by the presence of the Fall TIPS poster at the bedside. Champions from 3 hospitals reported data from April to June 2017 on 6 neurology and 7 medical units. Peer-to-peer feedback to reiterate the best practice for patient engagement was central to data collection. Results: Overall, 1209 audits were submitted for the patient engagement measure and 1401 for the presence of the Fall TIPS poster at the bedside. All units reached 80\% adherence for both measures. While some units maintained high levels of patient engagement and adherence with the poster protocol, others showed improvement over time, reaching clinically significant adherence (>80\%) by the final month of data collection. Conclusions: Each Fall TIPS modality effectively facilitates patient engagement in the 3-step fall prevention process, suggesting all 3 can be used to integrate evidence-based fall prevention practices into the clinical workflow. The 3 Fall TIPS modalities may prove an effective strategy for the spread, allowing diverse institutions to choose the modality that fits with the organizational culture and health information technology infrastructure. ", doi="10.2196/10008", url="http://www.jmir.org/2019/1/e10008/", url="http://www.ncbi.nlm.nih.gov/pubmed/30664454" } @Article{info:doi/10.2196/10240, author="Bourla, Alexis and Ferreri, Florian and Ogorzelec, Laetitia and Peretti, Charles-Siegfried and Guinchard, Christian and Mouchabac, Stephane", title="Psychiatrists' Attitudes Toward Disruptive New Technologies: Mixed-Methods Study", journal="JMIR Ment Health", year="2018", month="Dec", day="14", volume="5", number="4", pages="e10240", keywords="acceptability", keywords="clinical decision support systems", keywords="computerized adaptive testing", keywords="digital phenotype", keywords="ecological momentary assessment", keywords="machine learning", keywords="mobile phone", keywords="professional culture", abstract="Background: Recent discoveries in the fields of machine learning (ML), Ecological Momentary Assessment (EMA), computerized adaptive testing (CAT), digital phenotype, imaging, and biomarkers have brought about a new paradigm shift in medicine. Objective: The aim of this study was to explore psychiatrists' perspectives on this paradigm through the prism of new clinical decision support systems (CDSSs). Our primary objective was to assess the acceptability of these new technologies. Our secondary objective was to characterize the factors affecting their acceptability. Methods: A sample of psychiatrists was recruited through a mailing list. Respondents completed a Web-based survey. A quantitative study with an original form of assessment involving the screenplay method was implemented involving 3 scenarios, each featuring 1 of the 3 support systems, namely, EMA and CAT, biosensors comprising a connected wristband-based digital phenotype, and an ML-based blood test or magnetic resonance imaging (MRI). We investigated 4 acceptability domains based on International Organization for Standardization and Nielsen models (usefulness, usability, reliability, and risk). Results: We recorded 515 observations. Regarding our primary objective, overall acceptability was moderate. MRI coupled with ML was considered to be the most useful system, and the connected wristband was considered the least. All the systems were described as risky (410/515, 79.6\%). Regarding our secondary objective, acceptability was strongly influenced by socioepidemiological variables (professional culture), such as gender, age, and theoretical approach. Conclusions: This is the first study to assess psychiatrists' views on new CDSSs. Data revealed moderate acceptability, but our analysis shows that this is more the result of the lack of knowledge about these new technologies rather than a strong rejection. Furthermore, we found strong correspondences between acceptability profiles and professional culture profiles. Many medical, forensics, and ethical issues were raised, including therapeutic relationship, data security, data storage, and privacy risk. It is essential for psychiatrists to receive training and become involved in the development of new technologies. ", doi="10.2196/10240", url="http://mental.jmir.org/2018/4/e10240/", url="http://www.ncbi.nlm.nih.gov/pubmed/30552086" } @Article{info:doi/10.2196/12159, author="Li, Fei and Liu, Weisong and Yu, Hong", title="Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning", journal="JMIR Med Inform", year="2018", month="Nov", day="26", volume="6", number="4", pages="e12159", keywords="adverse drug event", keywords="deep learning", keywords="multi-task learning", keywords="named entity recognition", keywords="natural language processing", keywords="relation extraction", abstract="Background: Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs. Objective: We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps---named entity recognition and relation extraction---our second objective was to improve the deep learning model using multi-task learning between the two steps. Methods: We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively. Results: Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9\%), which is significantly higher than that (F1=61.7\%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8\%, boosting the F1 to 66.7\%, whereas RegMTL and LearnMTL failed to boost the performance. Conclusions: Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning. ", doi="10.2196/12159", url="http://medinform.jmir.org/2018/4/e12159/", url="http://www.ncbi.nlm.nih.gov/pubmed/30478023" } @Article{info:doi/10.2196/11144, author="Zhang, Kai and Liu, Xiyang and Liu, Fan and He, Lin and Zhang, Lei and Yang, Yahan and Li, Wangting and Wang, Shuai and Liu, Lin and Liu, Zhenzhen and Wu, Xiaohang and Lin, Haotian", title="An Interpretable and Expandable Deep Learning Diagnostic System for Multiple Ocular Diseases: Qualitative Study", journal="J Med Internet Res", year="2018", month="Nov", day="14", volume="20", number="11", pages="e11144", keywords="deep learning", keywords="object localization", keywords="multiple ocular diseases", keywords="interpretable and expandable diagnosis framework", keywords="making medical decisions", abstract="Background: Although artificial intelligence performs promisingly in medicine, few automatic disease diagnosis platforms can clearly explain why a specific medical decision is made. Objective: We aimed to devise and develop an interpretable and expandable diagnosis framework for automatically diagnosing multiple ocular diseases and providing treatment recommendations for the particular illness of a specific patient. Methods: As the diagnosis of ocular diseases highly depends on observing medical images, we chose ophthalmic images as research material. All medical images were labeled to 4 types of diseases or normal (total 5 classes); each image was decomposed into different parts according to the anatomical knowledge and then annotated. This process yields the positions and primary information on different anatomical parts and foci observed in medical images, thereby bridging the gap between medical image and diagnostic process. Next, we applied images and the information produced during the annotation process to implement an interpretable and expandable automatic diagnostic framework with deep learning. Results: This diagnosis framework comprises 4 stages. The first stage identifies the type of disease (identification accuracy, 93\%). The second stage localizes the anatomical parts and foci of the eye (localization accuracy: images under natural light without fluorescein sodium eye drops, 82\%; images under cobalt blue light or natural light with fluorescein sodium eye drops, 90\%). The third stage carefully classifies the specific condition of each anatomical part or focus with the result from the second stage (average accuracy for multiple classification problems, 79\%-98\%). The last stage provides treatment advice according to medical experience and artificial intelligence, which is merely involved with pterygium (accuracy, >95\%). Based on this, we developed a telemedical system that can show detailed reasons for a particular diagnosis to doctors and patients to help doctors with medical decision making. This system can carefully analyze medical images and provide treatment advices according to the analysis results and consultation between a doctor and a patient. Conclusions: The interpretable and expandable medical artificial intelligence platform was successfully built; this system can identify the disease, distinguish different anatomical parts and foci, discern the diagnostic information relevant to the diagnosis of diseases, and provide treatment suggestions. During this process, the whole diagnostic flow becomes clear and understandable to both doctors and their patients. Moreover, other diseases can be seamlessly integrated into this system without any influence on existing modules or diseases. Furthermore, this framework can assist in the clinical training of junior doctors. Owing to the rare high-grade medical resource, it is impossible that everyone receives high-quality professional diagnosis and treatment service. This framework can not only be applied in hospitals with insufficient medical resources to decrease the pressure on experienced doctors but also deployed in remote areas to help doctors diagnose common ocular diseases. ", doi="10.2196/11144", url="http://www.jmir.org/2018/11/e11144/", url="http://www.ncbi.nlm.nih.gov/pubmed/30429111" } @Article{info:doi/10.2196/10497, author="Leroy, Gondy and Gu, Yang and Pettygrove, Sydney and Galindo, K. Maureen and Arora, Ananyaa and Kurzius-Spencer, Margaret", title="Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application", journal="J Med Internet Res", year="2018", month="Nov", day="07", volume="20", number="11", pages="e10497", keywords="parser", keywords="natural language processing", keywords="complex entity extraction", keywords="Autism Spectrum Disorder", keywords="DSM", keywords="electronic health records", keywords="decision tree", keywords="machine learning", abstract="Background: Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive. Objective: Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data. Methods: We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms. Results: We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76\% precision, 43\% recall (ie, sensitivity), and >99\% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60\% precision and 30\% recall). For some individual criteria, precision was as high as 97\% and recall 57\%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs. Conclusions: Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets. ", doi="10.2196/10497", url="https://www.jmir.org/2018/11/e10497/", url="http://www.ncbi.nlm.nih.gov/pubmed/30404767" } @Article{info:doi/10.2196/10498, author="Luo, Gang and Johnson, D. Michael and Nkoy, L. Flory and He, Shan and Stone, L. Bryan", title="Appropriateness of Hospital Admission for Emergency Department Patients with Bronchiolitis: Secondary Analysis", journal="JMIR Med Inform", year="2018", month="Nov", day="05", volume="6", number="4", pages="e10498", keywords="appropriate hospital admission", keywords="bronchiolitis", keywords="emergency department", keywords="operational definition", abstract="Background: Bronchiolitis is the leading cause of hospitalization in children under 2 years of age. Each year in the United States, bronchiolitis results in 287,000 emergency department visits, 32\%-40\% of which end in hospitalization. Frequently, emergency department disposition decisions (to discharge or hospitalize) are made subjectively because of the lack of evidence and objective criteria for bronchiolitis management, leading to significant practice variation, wasted health care use, and suboptimal outcomes. At present, no operational definition of appropriate hospital admission for emergency department patients with bronchiolitis exists. Yet, such a definition is essential for assessing care quality and building a predictive model to guide and standardize disposition decisions. Our prior work provided a framework of such a definition using 2 concepts, one on safe versus unsafe discharge and another on necessary versus unnecessary hospitalization. Objective: The goal of this study was to determine the 2 threshold values used in the 2 concepts, with 1 value per concept. Methods: Using Intermountain Healthcare data from 2005-2014, we examined distributions of several relevant attributes of emergency department visits by children under 2 years of age for bronchiolitis. Via a data-driven approach, we determined the 2 threshold values. Results: We completed the first operational definition of appropriate hospital admission for emergency department patients with bronchiolitis. Appropriate hospital admissions include actual admissions with exposure to major medical interventions for more than 6 hours, as well as actual emergency department discharges, followed by an emergency department return within 12 hours ending in admission for bronchiolitis. Based on the definition, 0.96\% (221/23,125) of the emergency department discharges were deemed unsafe. Moreover, 14.36\% (432/3008) of the hospital admissions from the emergency department were deemed unnecessary. Conclusions: Our operational definition can define the prediction target for building a predictive model to guide and improve emergency department disposition decisions for bronchiolitis in the future. ", doi="10.2196/10498", url="http://medinform.jmir.org/2018/4/e10498/", url="http://www.ncbi.nlm.nih.gov/pubmed/30401659" } @Article{info:doi/10.2196/medinform.9957, author="Richardson, Safiya and Solomon, Philip and O'Connell, Alexander and Khan, Sundas and Gong, Jonathan and Makhnevich, Alex and Qiu, Guang and Zhang, Meng and McGinn, Thomas", title="A Computerized Method for Measuring Computed Tomography Pulmonary Angiography Yield in the Emergency Department: Validation Study", journal="JMIR Med Inform", year="2018", month="Oct", day="25", volume="6", number="4", pages="e44", keywords="health informatics", keywords="pulmonary embolism", keywords="electronic health record", keywords="quality improvement", keywords="clinical decision support systems", abstract="Background: Use of computed tomography pulmonary angiography (CTPA) in the assessment of pulmonary embolism (PE) has markedly increased over the past two decades. While this technology has improved the accuracy of radiological testing for PE, CTPA also carries the risk of substantial iatrogenic harm. Each CTPA carries a 14\% risk of contrast-induced nephropathy and a lifetime malignancy risk that can be as high as 2.76\%. The appropriate use of CTPA can be estimated by monitoring the CTPA yield, the percentage of tests positive for PE. This is the first study to propose and validate a computerized method for measuring the CTPA yield in the emergency department (ED). Objective: The objective of our study was to assess the validity of a novel computerized method of calculating the CTPA yield in the ED. Methods: The electronic health record databases at two tertiary care academic hospitals were queried for CTPA orders completed in the ED over 1-month periods. These visits were linked with an inpatient admission with a discharge diagnosis of PE based on the International Classification of Diseases codes. The computerized the CTPA yield was calculated as the number of CTPA orders with an associated inpatient discharge diagnosis of PE divided by the total number of orders for completed CTPA. This computerized method was then validated by 2 independent reviewers performing a manual chart review, which included reading the free-text radiology reports for each CTPA. Results: A total of 349 CTPA orders were completed during the 1-month periods at the two institutions. Of them, acute PE was diagnosed on CTPA in 28 studies, with a CTPA yield of 7.7\%. The computerized method correctly identified 27 of 28 scans positive for PE. The one discordant scan was tied to a patient who was discharged directly from the ED and, as a result, never received an inpatient discharge diagnosis. Conclusions: This is the first successful validation study of a computerized method for calculating the CTPA yield in the ED. This method for data extraction allows for an accurate determination of the CTPA yield and is more efficient than manual chart review. With this ability, health care systems can monitor the appropriate use of CTPA and the effect of interventions to reduce overuse and decrease preventable iatrogenic harm. ", doi="10.2196/medinform.9957", url="http://medinform.jmir.org/2018/4/e44/", url="http://www.ncbi.nlm.nih.gov/pubmed/30361200" } @Article{info:doi/10.2196/10780, author="Wells, J. Brian and Lenoir, M. Kristin and Diaz-Garelli, Jose-Franck and Futrell, Wendell and Lockerman, Elizabeth and Pantalone, M. Kevin and Kattan, W. Michael", title="Predicting Current Glycated Hemoglobin Values in Adults: Development of an Algorithm From the Electronic Health Record", journal="JMIR Med Inform", year="2018", month="Oct", day="22", volume="6", number="4", pages="e10780", keywords="electronic health records", keywords="risk prediction", keywords="clinical decision support", keywords="hemoglobin A1c", keywords="diabetes", abstract="Background: Electronic, personalized clinical decision support tools to optimize glycated hemoglobin (HbA1c) screening are lacking. Current screening guidelines are based on simple, categorical rules developed for populations of patients. Although personalized diabetes risk calculators have been created, none are designed to predict current glycemic status using structured data commonly available in electronic health records (EHRs). Objective: The goal of this project was to create a mathematical equation for predicting the probability of current elevations in HbA1c (?5.7\%) among patients with no history of hyperglycemia using readily available variables that will allow integration with EHR systems. Methods: The reduced model was compared head-to-head with calculators created by Baan and Griffin. Ten-fold cross-validation was used to calculate the bias-adjusted prediction accuracy of the new model. Statistical analyses were performed in R version 3.2.5 (The R Foundation for Statistical Computing) using the rms (Regression Modeling Strategies) package. Results: The final model to predict an elevated HbA1c based on 22,635 patient records contained the following variables in order from most to least importance according to their impact on the discriminating accuracy of the model: age, body mass index, random glucose, race, serum non--high-density lipoprotein, serum total cholesterol, estimated glomerular filtration rate, and smoking status. The new model achieved a concordance statistic of 0.77 which was statistically significantly better than prior models. The model appeared to be well calibrated according to a plot of the predicted probabilities versus the prevalence of the outcome at different probabilities. Conclusions: The calculator created for predicting the probability of having an elevated HbA1c significantly outperformed the existing calculators. The personalized prediction model presented in this paper could improve the efficiency of HbA1c screening initiatives. ", doi="10.2196/10780", url="http://medinform.jmir.org/2018/4/e10780/", url="http://www.ncbi.nlm.nih.gov/pubmed/30348631" } @Article{info:doi/10.2196/11936, author="Brinker, Josef Titus and Hekler, Achim and Utikal, Sven Jochen and Grabe, Niels and Schadendorf, Dirk and Klode, Joachim and Berking, Carola and Steeb, Theresa and Enk, H. Alexander and von Kalle, Christof", title="Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review", journal="J Med Internet Res", year="2018", month="Oct", day="17", volume="20", number="10", pages="e11936", keywords="skin cancer", keywords="convolutional neural networks", keywords="lesion classification", keywords="deep learning", keywords="melanoma classification", keywords="carcinoma classification", abstract="Background: State-of-the-art classifiers based on convolutional neural networks (CNNs) were shown to classify images of skin cancer on par with dermatologists and could enable lifesaving and fast diagnoses, even outside the hospital via installation of apps on mobile devices. To our knowledge, at present there is no review of the current work in this research area. Objective: This study presents the first systematic review of the state-of-the-art research on classifying skin lesions with CNNs. We limit our review to skin lesion classifiers. In particular, methods that apply a CNN only for segmentation or for the classification of dermoscopic patterns are not considered here. Furthermore, this study discusses why the comparability of the presented procedures is very difficult and which challenges must be addressed in the future. Methods: We searched the Google Scholar, PubMed, Medline, ScienceDirect, and Web of Science databases for systematic reviews and original research articles published in English. Only papers that reported sufficient scientific proceedings are included in this review. Results: We found 13 papers that classified skin lesions using CNNs. In principle, classification methods can be differentiated according to three principles. Approaches that use a CNN already trained by means of another large dataset and then optimize its parameters to the classification of skin lesions are the most common ones used and they display the best performance with the currently available limited datasets. Conclusions: CNNs display a high performance as state-of-the-art skin lesion classifiers. Unfortunately, it is difficult to compare different classification methods because some approaches use nonpublic datasets for training and/or testing, thereby making reproducibility difficult. Future publications should use publicly available benchmarks and fully disclose methods used for training to allow comparability. ", doi="10.2196/11936", url="http://www.jmir.org/2018/10/e11936/", url="http://www.ncbi.nlm.nih.gov/pubmed/30333097" } @Article{info:doi/10.2196/11087, author="Liu, Chaoyuan and Liu, Xianling and Wu, Fang and Xie, Mingxuan and Feng, Yeqian and Hu, Chunhong", title="Using Artificial Intelligence (Watson for Oncology) for Treatment Recommendations Amongst Chinese Patients with Lung Cancer: Feasibility Study", journal="J Med Internet Res", year="2018", month="Sep", day="25", volume="20", number="9", pages="e11087", keywords="Watson for Oncology", keywords="artificial intelligence", keywords="lung neoplasms", keywords="comparative study", keywords="interdisciplinary communication", abstract="Background: Artificial intelligence (AI) is developing quickly in the medical field and can benefit both medical staff and patients. The clinical decision support system Watson for Oncology (WFO) is an outstanding representative AI in the medical field, and it can provide to cancer patients prompt treatment recommendations comparable with ones made by expert oncologists. WFO is increasingly being used in China, but limited reports on whether WFO is suitable for Chinese patients, especially patients with lung cancer, exist. Here, we report a retrospective study based on the consistency between the lung cancer treatment recommendations made for the same patient by WFO and by the multidisciplinary team at our center. Objective: The aim of this study was to explore the feasibility of using WFO for lung cancer cases in China and to ascertain ways to make WFO more suitable for Chinese patients with lung cancer. Methods: We selected all lung cancer patients who were hospitalized and received antitumor treatment for the first time at the Second Xiangya Hospital Cancer Center from September to December 2017 (N=182). WFO made treatment recommendations for all supported cases (n=149). If the actual therapeutic regimen (administered by our multidisciplinary team) was recommended or for consideration according to WFO, we defined the recommendations as consistent; if the actual therapeutic regimen was not recommended by WFO or if WFO did not provide the same treatment option, we defined the recommendations as inconsistent. Blinded second round reviews were performed by our multidisciplinary team to reassess the incongruent cases. Results: WFO did not support 18.1\% (33/182) of recommendations among all cases. Of the 149 supported cases, 65.8\% (98/149) received recommendations that were consistent with the recommendations of our team. Logistic regression analysis showed that pathological type and staging had significant effects on consistency (P=.004, odds ratio [OR] 0.09, 95\% CI 0.02-0.45 and P<.001, OR 9.5, 95\% CI 3.4-26.1, respectively). Age, gender, and presence of epidermal growth factor receptor gene mutations had no effect on consistency. In 82\% (42/51) of the inconsistent cases, our team administered two China-specific treatments, which were different from the recommendations made by WFO but led to excellent outcomes. Conclusions: In China, most of the treatment recommendations of WFO are consistent with the recommendations of the expert group, although a relatively high proportion of cases are still not supported by WFO. Therefore, WFO cannot currently replace oncologists. WFO can improve the efficiency of clinical work by providing assistance to doctors, but it needs to learn the regional characteristics of patients to improve its assistive ability. ", doi="10.2196/11087", url="http://www.jmir.org/2018/9/e11087/", url="http://www.ncbi.nlm.nih.gov/pubmed/30257820" } @Article{info:doi/10.2196/humanfactors.9891, author="Mercer, Kathryn and Burns, Catherine and Guirguis, Lisa and Chin, Jessie and Dogba, Joyce Maman and Dolovich, Lisa and Gu{\'e}nette, Line and Jenkins, Laurie and L{\'e}gar{\'e}, France and McKinnon, Annette and McMurray, Josephine and Waked, Khrystine and Grindrod, A. Kelly", title="Physician and Pharmacist Medication Decision-Making in the Time of Electronic Health Records: Mixed-Methods Study?", journal="JMIR Hum Factors", year="2018", month="Sep", day="25", volume="5", number="3", pages="e24", keywords="shared decision-making", keywords="electronic health records", keywords="collaboration", keywords="interprofessional collaboration", keywords="medication management", abstract="Background: Primary care needs to be patient-centered, integrated, and interprofessional to help patients with complex needs manage the burden of medication-related problems. Considering the growing problem of polypharmacy, increasing attention has been paid to how and when medication-related decisions should be coordinated across multidisciplinary care teams. Improved knowledge on how integrated electronic health records (EHRs) can support interprofessional shared decision-making for medication therapy management is necessary to continue improving patient care. Objective: The objective of our study was to examine how physicians and pharmacists understand and communicate patient-focused medication information with each other and how this knowledge can influence the design of EHRs. Methods: This study is part of a broader cross-Canada study between patients and health care providers around how medication-related decisions are made and communicated. We visited community pharmacies, team-based primary care clinics, and independent-practice family physician clinics throughout Ontario, Nova Scotia, Alberta, and Quebec. Research assistants conducted semistructured interviews with physicians and pharmacists. A modified version of the Multidisciplinary Framework Method was used to analyze the data. Results: We collected data from 19 pharmacies and 9 medical clinics and identified 6 main themes from 34 health care professionals. First, Interprofessional Shared Decision-Making was not occurring and clinicians made decisions based on their understanding of the patient. Physicians and pharmacists reported indirect Communication, incomplete Information specifically missing insight into indication and adherence, and misaligned Processes of Care that were further compounded by EHRs that are not designed to facilitate collaboration. Scope of Practice examined professional and workplace boundaries for pharmacists and physicians that were internally and externally imposed. Physicians decided on the degree of the Physician-Pharmacist Relationship, often predicated by colocation. Conclusions: We observed limited communication and collaboration between primary care providers and pharmacists when managing medications. Pharmacists were missing key information around reason for use, and physicians required accurate information around adherence. EHRs are a potential tool to help clinicians communicate information to resolve this issue. EHRs need to be designed to facilitate interprofessional medication management so that pharmacists and physicians can move beyond task-based work toward a collaborative approach. ", doi="10.2196/humanfactors.9891", url="http://humanfactors.jmir.org/2018/3/e24/", url="http://www.ncbi.nlm.nih.gov/pubmed/30274959" } @Article{info:doi/10.2196/jmir.9227, author="Orchard, Peter and Agakova, Anna and Pinnock, Hilary and Burton, David Christopher and Sarran, Christophe and Agakov, Felix and McKinstry, Brian", title="Improving Prediction of Risk of Hospital Admission in Chronic Obstructive Pulmonary Disease: Application of Machine Learning to Telemonitoring Data", journal="J Med Internet Res", year="2018", month="Sep", day="21", volume="20", number="9", pages="e263", keywords="machine learning", keywords="telemedicine", keywords="chronic obstructive pulmonary disease", abstract="Background: Telemonitoring of symptoms and physiological signs has been suggested as a means of early detection of chronic obstructive pulmonary disease (COPD) exacerbations, with a view to instituting timely treatment. However, algorithms to identify exacerbations result in frequent false-positive results and increased workload. Machine learning, when applied to predictive modelling, can determine patterns of risk factors useful for improving prediction quality. Objective: Our objectives were to (1) establish whether machine learning techniques applied to telemonitoring datasets improve prediction of hospital admissions and decisions to start corticosteroids, and (2) determine whether the addition of weather data further improves such predictions. Methods: We used daily symptoms, physiological measures, and medication data, with baseline demography, COPD severity, quality of life, and hospital admissions from a pilot and large randomized controlled trial of telemonitoring in COPD. We linked weather data from the United Kingdom meteorological service. We used feature selection and extraction techniques for time series to construct up to 153 predictive patterns (features) from symptom, medication, and physiological measurements. We used the resulting variables to construct predictive models fitted to training sets of patients and compared them with common symptom-counting algorithms. Results: We had a mean 363 days of telemonitoring data from 135 patients. The two most practical traditional score-counting algorithms, restricted to cases with complete data, resulted in area under the receiver operating characteristic curve (AUC) estimates of 0.60 (95\% CI 0.51-0.69) and 0.58 (95\% CI 0.50-0.67) for predicting admissions based on a single day's readings. However, in a real-world scenario allowing for missing data, with greater numbers of patient daily data and hospitalizations (N=57,150, N+=55, respectively), the performance of all the traditional algorithms fell, including those based on 2 days' data. One of the most frequently used algorithms performed no better than chance. All considered machine learning models demonstrated significant improvements; the best machine learning algorithm based on 57,150 episodes resulted in an aggregated AUC of 0.74 (95\% CI 0.67-0.80). Adding weather data measurements did not improve the predictive performance of the best model (AUC 0.74, 95\% CI 0.69-0.79). To achieve an 80\% true-positive rate (sensitivity), the traditional algorithms were associated with an 80\% false-positive rate: our algorithm halved this rate to approximately 40\% (specificity approximately 60\%). The machine learning algorithm was moderately superior to the best symptom-counting algorithm (AUC 0.77, 95\% CI 0.74-0.79 vs AUC 0.66, 95\% CI 0.63-0.68) at predicting the need for corticosteroids. Conclusions: Early detection and management of COPD remains an important goal given its huge personal and economic costs. Machine learning approaches, which can be tailored to an individual's baseline profile and can learn from experience of the individual patient, are superior to existing predictive algorithms and show promise in achieving this goal. Trial Registration: International Standard Randomized Controlled Trial Number ISRCTN96634935; http://www.isrctn.com/ISRCTN96634935 (Archived by WebCite at http://www.webcitation.org/722YkuhAz) ", doi="10.2196/jmir.9227", url="http://www.jmir.org/2018/9/e263/", url="http://www.ncbi.nlm.nih.gov/pubmed/30249589" } @Article{info:doi/10.2196/jmir.8206, author="L{\'e}gat, Laura and Van Laere, Sven and Nyssen, Marc and Steurbaut, Stephane and Dupont, G. Alain and Cornu, Pieter", title="Clinical Decision Support Systems for Drug Allergy Checking: Systematic Review", journal="J Med Internet Res", year="2018", month="Sep", day="07", volume="20", number="9", pages="e258", keywords="alert", keywords="clinical decision support systems", keywords="computerized physician order entry", keywords="drug allergy", keywords="patient safety", abstract="Background: Worldwide, the burden of allergies---in particular, drug allergies---is growing. In the process of prescribing, dispensing, or administering a drug, a medication error may occur and can have adverse consequences; for example, a drug may be given to a patient with a documented allergy to that particular drug. Computerized physician order entry (CPOE) systems with built-in clinical decision support systems (CDSS) have the potential to prevent such medication errors and adverse events. Objective: The aim of this review is to provide a comprehensive overview regarding all aspects of CDSS for drug allergy, including documenting, coding, rule bases, alerts and alert fatigue, and outcome evaluation. Methods: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed as much as possible and searches were conducted in 5 databases using CPOE, CDSS, alerts, and allergic or allergy as keywords. Bias could not be evaluated according to PRISMA guidelines due to the heterogeneity of study types included in the review. Results: Of the 3160 articles considered, 60 met the inclusion criteria. A further 9 articles were added based on expert opinion, resulting in a total of 69 articles. An interrater agreement of 90.9\% with a reliability $\Kappa$=.787 (95\% CI 0.686-0.888) was reached. Large heterogeneity across study objectives, study designs, study populations, and reported results was found. Several key findings were identified. Evidence of the usefulness of clinical decision support for drug allergies has been documented. Nevertheless, there are some important problems associated with their use. Accurate and structured documenting of information on drug allergies in electronic health records (EHRs) is difficult, as it is often not clear to healthcare providers how and where to document drug allergies. Besides the underreporting of drug allergies, outdated or inaccurate drug allergy information in EHRs poses an important problem. Research on the use of coding terminologies for documenting drug allergies is sparse. There is no generally accepted standard terminology for structured documentation of allergy information. The final key finding is the consistently reported low specificity of drug allergy alerts. Current systems have high alert override rates of up to 90\%, leading to alert fatigue. Important challenges remain for increasing the specificity of drug allergy alerts. We found only one study specifically reporting outcomes related to CDSS for drug allergies. It showed that adverse drug events resulting from overridden drug allergy alerts do not occur frequently. Conclusions: Accurate and comprehensive recording of drug allergies is required for good use of CDSS for drug allergy screening. We found considerable variation in the way drug allergy are recorded in EHRs. It remains difficult to reduce drug allergy alert overload while maintaining patient safety as the highest priority. Future research should focus on improving alert specificity, thereby reducing override rates and alert fatigue. Also, the effect on patient outcomes and cost-effectiveness should be evaluated. ", doi="10.2196/jmir.8206", url="http://www.jmir.org/2018/9/e258/", url="http://www.ncbi.nlm.nih.gov/pubmed/30194058" } @Article{info:doi/10.2196/jmir.9454, author="Chai, Ray Peter and Zhang, Haipeng and Baugh, W. Christopher and Jambaulikar, D. Guruprasad and McCabe, C. Jonathan and Gorman, M. Janet and Boyer, W. Edward and Landman, Adam", title="Internet of Things Buttons for Real-Time Notifications in Hospital Operations: Proposal for Hospital Implementation", journal="J Med Internet Res", year="2018", month="Aug", day="10", volume="20", number="8", pages="e251", keywords="Internet of Things", keywords="operations", keywords="hospital systems", keywords="health care", abstract="Background: Hospital staff frequently performs the same process hundreds to thousands of times a day. Customizable Internet of Things buttons are small, wirelessly-enabled devices that trigger specific actions with the press of an integrated button and have the potential to automate some of these repetitive tasks. In addition, IoT buttons generate logs of triggered events that can be used for future process improvements. Although Internet of Things buttons have seen some success as consumer products, little has been reported on their application in hospital systems. Objective: We discuss potential hospital applications categorized by the intended user group (patient or hospital staff). In addition, we examine key technological considerations, including network connectivity, security, and button management systems. Methods: In order to meaningfully deploy Internet of Things buttons in a hospital system, we propose an implementation framework grounded in the Plan-Do-Study-Act method. Results: We plan to deploy Internet of Things buttons within our hospital system to deliver real-time notifications in public-facing tasks such as restroom cleanliness and critical supply restocking. We expect results from this pilot in the next year. Conclusions: Overall, Internet of Things buttons have significant promise; future rigorous evaluations are needed to determine the impact of Internet of Things buttons in real-world health care settings. ", doi="10.2196/jmir.9454", url="http://www.jmir.org/2018/8/e251/", url="http://www.ncbi.nlm.nih.gov/pubmed/30097420" } @Article{info:doi/10.2196/10041, author="Gao, Fangjian and Thiebes, Scott and Sunyaev, Ali", title="Rethinking the Meaning of Cloud Computing for Health Care: A Taxonomic Perspective and Future Research Directions", journal="J Med Internet Res", year="2018", month="Jul", day="11", volume="20", number="7", pages="e10041", keywords="cloud computing", keywords="taxonomy", keywords="health IT innovation", abstract="Background: Cloud computing is an innovative paradigm that provides users with on-demand access to a shared pool of configurable computing resources such as servers, storage, and applications. Researchers claim that information technology (IT) services delivered via the cloud computing paradigm (ie, cloud computing services) provide major benefits for health care. However, due to a mismatch between our conceptual understanding of cloud computing for health care and the actual phenomenon in practice, the meaningful use of it for the health care industry cannot always be ensured. Although some studies have tried to conceptualize cloud computing or interpret this phenomenon for health care settings, they have mainly relied on its interpretation in a common context or have been heavily based on a general understanding of traditional health IT artifacts, leading to an insufficient or unspecific conceptual understanding of cloud computing for health care. Objective: We aim to generate insights into the concept of cloud computing for health IT research. We propose a taxonomy that can serve as a fundamental mechanism for organizing knowledge about cloud computing services in health care organizations to gain a deepened, specific understanding of cloud computing in health care. With the taxonomy, we focus on conceptualizing the relevant properties of cloud computing for service delivery to health care organizations and highlighting their specific meanings for health care. Methods: We employed a 2-stage approach in developing a taxonomy of cloud computing services for health care organizations. We conducted a structured literature review and 24 semistructured expert interviews in stage 1, drawing on data from theory and practice. In stage 2, we applied a systematic approach and relied on data from stage 1 to develop and evaluate the taxonomy using 14 iterations. Results: Our taxonomy is composed of 8 dimensions and 28 characteristics that are relevant for cloud computing services in health care organizations. By applying the taxonomy to classify existing cloud computing services identified from the literature and expert interviews, which also serves as a part of the taxonomy, we identified 7 specificities of cloud computing in health care. These specificities challenge what we have learned about cloud computing in general contexts or in traditional health IT from the previous literature. The summarized specificities suggest research opportunities and exemplary research questions for future health IT research on cloud computing. Conclusions: By relying on perspectives from a taxonomy for cloud computing services for health care organizations, this study provides a solid conceptual cornerstone for cloud computing in health care. Moreover, the identified specificities of cloud computing and the related future research opportunities will serve as a valuable roadmap to facilitate more research into cloud computing in health care. ", doi="10.2196/10041", url="http://www.jmir.org/2018/7/e10041/", url="http://www.ncbi.nlm.nih.gov/pubmed/29997108" } @Article{info:doi/10.2196/10493, author="Cleret de Langavant, Laurent and Bayen, Eleonore and Yaffe, Kristine", title="Unsupervised Machine Learning to Identify High Likelihood of Dementia in Population-Based Surveys: Development and Validation Study", journal="J Med Internet Res", year="2018", month="Jul", day="09", volume="20", number="7", pages="e10493", keywords="dementia", keywords="cognition disorders", keywords="health surveys", keywords="electronic health records", keywords="diagnosis", keywords="unsupervised machine learning", keywords="cluster analysis", keywords="data mining", abstract="Background: Dementia is increasing in prevalence worldwide, yet frequently remains undiagnosed, especially in low- and middle-income countries. Population-based surveys represent an underinvestigated source to identify individuals at risk of dementia. Objective: The aim is to identify participants with high likelihood of dementia in population-based surveys without the need of the clinical diagnosis of dementia in a subsample. Methods: Unsupervised machine learning classification (hierarchical clustering on principal components) was developed in the Health and Retirement Study (HRS; 2002-2003, N=18,165 individuals) and validated in the Survey of Health, Ageing and Retirement in Europe (SHARE; 2010-2012, N=58,202 individuals). Results: Unsupervised machine learning classification identified three clusters in HRS: cluster 1 (n=12,231) without any functional or motor limitations, cluster 2 (N=4841) with walking/climbing limitations, and cluster 3 (N=1093) with both functional and walking/climbing limitations. Comparison of cluster 3 with previously published predicted probabilities of dementia in HRS showed that it identified high likelihood of dementia (probability of dementia >0.95; area under the curve [AUC]=0.91). Removing either cognitive or both cognitive and behavioral measures did not impede accurate classification (AUC=0.91 and AUC=0.90, respectively). Three clusters with similar profiles were identified in SHARE (cluster 1: n=40,223; cluster 2: n=15,644; cluster 3: n=2335). Survival rate of participants from cluster 3 reached 39.2\% (n=665 deceased) in HRS and 62.2\% (n=811 deceased) in SHARE after a 3.9-year follow-up. Surviving participants from cluster 3 in both cohorts worsened their functional and mobility performance over the same period. Conclusions: Unsupervised machine learning identifies high likelihood of dementia in population-based surveys, even without cognitive and behavioral measures and without the need of clinical diagnosis of dementia in a subsample of the population. This method could be used to tackle the global challenge of dementia. ", doi="10.2196/10493", url="http://www.jmir.org/2018/7/e10493/", url="http://www.ncbi.nlm.nih.gov/pubmed/29986849" } @Article{info:doi/10.2196/10507, author="Bian, Jiantao and Weir, Charlene and Unni, Prasad and Borbolla, Damian and Reese, Thomas and Wan, Jacob Yik-Ki and Del Fiol, Guilherme", title="Interactive Visual Displays for Interpreting the Results of Clinical Trials: Formative Evaluation With Case Vignettes", journal="J Med Internet Res", year="2018", month="Jun", day="25", volume="20", number="6", pages="e10507", keywords="clinical decision-making", keywords="clinician information needs", keywords="information display", keywords="information foraging theory", keywords="information seeking behavior", abstract="Background: At the point of care, evidence from randomized controlled trials (RCTs) is underutilized in helping clinicians meet their information needs. Objective: To design interactive visual displays to help clinicians interpret and compare the results of relevant RCTs for the management of a specific patient, and to conduct a formative evaluation with physicians comparing interactive visual versus narrative displays. Methods: We followed a user-centered and iterative design process succeeded by development of information display prototypes as a Web-based application. We then used a within-subjects design with 20 participants (8 attendings and 12 residents) to evaluate the usability and problem-solving impact of the information displays. We compared subjects' perceptions of the interactive visual displays versus narrative abstracts. Results: The resulting interactive visual displays present RCT results side-by-side according to the Population, Intervention, Comparison, and Outcome (PICO) framework. Study participants completed 19 usability tasks in 3 to 11 seconds with a success rate of 78\% to 100\%. Participants favored the interactive visual displays over narrative abstracts according to perceived efficiency, effectiveness, effort, user experience and preference (all P values <.001). Conclusions: When interpreting and applying RCT findings to case vignettes, physicians preferred interactive graphical and PICO-framework-based information displays that enable direct comparison of the results from multiple RCTs compared to the traditional narrative and study-centered format. Future studies should investigate the use of interactive visual displays to support clinical decision making in care settings and their effect on clinician and patient outcomes. ", doi="10.2196/10507", url="http://www.jmir.org/2018/6/e10507/", url="http://www.ncbi.nlm.nih.gov/pubmed/29941416" } @Article{info:doi/10.2196/10144, author="DelPozo-Banos, Marcos and John, Ann and Petkov, Nicolai and Berridge, Mark Damon and Southern, Kate and LLoyd, Keith and Jones, Caroline and Spencer, Sarah and Travieso, Manuel Carlos", title="Using Neural Networks with Routine Health Records to Identify Suicide Risk: Feasibility Study", journal="JMIR Ment Health", year="2018", month="Jun", day="22", volume="5", number="2", pages="e10144", keywords="suicide prevention", keywords="risk assessment", keywords="electronic health records", keywords="routine data", keywords="machine learning", keywords="artificial neural networks", abstract="Background: Each year, approximately 800,000 people die by suicide worldwide, accounting for 1--2 in every 100 deaths. It is always a tragic event with a huge impact on family, friends, the community and health professionals. Unfortunately, suicide prevention and the development of risk assessment tools have been hindered by the complexity of the underlying mechanisms and the dynamic nature of a person's motivation and intent. Many of those who die by suicide had contact with health services in the preceding year but identifying those most at risk remains a challenge. Objective: To explore the feasibility of using artificial neural networks with routinely collected electronic health records to support the identification of those at high risk of suicide when in contact with health services. Methods: Using the Secure Anonymised Information Linkage Databank UK, we extracted the data of those who died by suicide between 2001 and 2015 and paired controls. Looking at primary (general practice) and secondary (hospital admissions) electronic health records, we built a binary feature vector coding the presence of risk factors at different times prior to death. Risk factors included: general practice contact and hospital admission; diagnosis of mental health issues; injury and poisoning; substance misuse; maltreatment; sleep disorders; and the prescription of opiates and psychotropics. Basic artificial neural networks were trained to differentiate between the suicide cases and paired controls. We interpreted the output score as the estimated suicide risk. System performance was assessed with 10x10-fold repeated cross-validation, and its behavior was studied by representing the distribution of estimated risk across the cases and controls, and the distribution of factors across estimated risks. Results: We extracted a total of 2604 suicide cases and 20 paired controls per case. Our best system attained a mean error rate of 26.78\% (SD 1.46; 64.57\% of sensitivity and 81.86\% of specificity). While the distribution of controls was concentrated around estimated risks < 0.5, cases were almost uniformly distributed between 0 and 1. Prescription of psychotropics, depression and anxiety, and self-harm increased the estimated risk by {\textasciitilde}0.4. At least 95\% of those presenting these factors were identified as suicide cases. Conclusions: Despite the simplicity of the implemented system, the proposed methodology obtained an accuracy like other published methods based on specialized questionnaire generated data. Most of the errors came from the heterogeneity of patterns shown by suicide cases, some of which were identical to those of the paired controls. Prescription of psychotropics, depression and anxiety, and self-harm were strongly linked with higher estimated risk scores, followed by hospital admission and long-term drug and alcohol misuse. Other risk factors like sleep disorders and maltreatment had more complex effects. ", doi="10.2196/10144", url="http://mental.jmir.org/2018/2/e10144/", url="http://www.ncbi.nlm.nih.gov/pubmed/29934287" } @Article{info:doi/10.2196/diabetes.8316, author="Schaarup, Clara and Pape-Haugaard, Bilenberg Louise and Hejlesen, Kristian Ole", title="Models Used in Clinical Decision Support Systems Supporting Healthcare Professionals Treating Chronic Wounds: Systematic Literature Review", journal="JMIR Diabetes", year="2018", month="Jun", day="21", volume="3", number="2", pages="e11", keywords="clinical decision support systems", keywords="statistical model", keywords="neural networks", keywords="logistic models", keywords="linear models", keywords="foot ulcer", keywords="diabetes", keywords="health personnel", keywords="systematic review", keywords="chronic wounds", abstract="Background: Chronic wounds such as diabetic foot ulcers, venous leg ulcers, and pressure ulcers are a massive burden to health care facilities. Many randomized controlled trials on different wound care elements have been conducted and published in the Cochrane Library, all of which have only a low evidential basis. Thus, health care professionals are forced to rely on their own experience when making decisions regarding wound care. To progress from experience-based practice to evidence-based wound care practice, clinical decision support systems (CDSS) that help health care providers with decision-making in a clinical workflow have been developed. These systems have proven useful in many areas of the health care sector, partly because they have increased the quality of care, and partially because they have generated a solid basis for evidence-based practice. However, no systematic reviews focus on CDSS within the field of wound care to chronic wounds. Objective: The aims of this systematic literature review are (1) to identify models used in CDSS that support health care professionals treating chronic wounds, and (2) to classify each clinical decision support model according to selected variables and to create an overview. Methods: A systematic review was conducted using 6 databases. This systematic literature review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement for systematic reviews. The search strategy consisted of three facets, respectively: Facet 1 (Algorithm), Facet 2 (Wound care) and Facet 3 (Clinical decision support system). Studies based on acute wounds or trauma were excluded. Similarly, studies that presented guidelines, protocols and instructions were excluded, since they do not require progression along an active chain of reasoning from the clinicians, just their focus. Finally, studies were excluded if they had not undergone a peer review process. The following aspects were extracted from each article: authors, year, country, the sample size of data and variables describing the type of clinical decision support models. The decision support models were classified in 2 ways: quantitative decision support models, and qualitative decision support models. Results: The final number of studies included in the systematic literature review was 10. These clinical decision support models included 4/10 (40\%) quantitative decision support models and 6/10 (60\%) qualitative decision support models. The earliest article was published in 2007, and the most recent was from 2015. Conclusions: The clinical decision support models were targeted at a variety of different types of chronic wounds. The degree of accessibility of the inference engines varied. Quantitative models served as the engine and were invisible to the health care professionals, while qualitative models required interaction with the user. ", doi="10.2196/diabetes.8316", url="http://diabetes.jmir.org/2018/2/e11/", url="http://www.ncbi.nlm.nih.gov/pubmed/30291078" } @Article{info:doi/10.2196/10263, author="Paradis, Michelle and Stiell, Ian and Atkinson, M. Katherine and Guerinet, Julien and Sequeira, Yulric and Salter, Laura and Forster, J. Alan and Murphy, SQ Malia and Wilson, Kumanan", title="Acceptability of a Mobile Clinical Decision Tool Among Emergency Department Clinicians: Development and Evaluation of The Ottawa Rules App", journal="JMIR Mhealth Uhealth", year="2018", month="Jun", day="11", volume="6", number="6", pages="e10263", keywords="emergency department medicine", keywords="clinical tools", keywords="mobile apps", keywords="digital health", abstract="Background: The Ottawa Ankle Rules, Ottawa Knee Rule, and Canadian C-Spine Rule---together known as The Ottawa Rules---are a set of internationally validated clinical decision rules developed to decrease unnecessary diagnostic imaging in the emergency department. In this study, we sought to develop and evaluate the use of a mobile app version of The Ottawa Rules. Objective: The primary objective of this study was to determine acceptability of The Ottawa Rules app among emergency department clinicians. The secondary objective was to evaluate the effect of publicity efforts on uptake of The Ottawa Rules app. Methods: The Ottawa Rules app was developed and publicly released for free on iOS and Android operating systems in April 2016. Local and national news and academic media coverage coincided with app release. This study was conducted at a large tertiary trauma care center in Ottawa, Canada. The study was advertised through posters and electronically by email. Emergency department clinicians were approached in person to enroll via in-app consent for a 1-month study during which time they were encouraged to use the app when evaluating patients with suspected knee, foot, or neck injuries. A 23-question survey was administered at the end of the study period via email to determine self-reported frequency, perceived ease of use of the app, and participant Technology Readiness Index scores. Results: A total of 108 emergency department clinicians completed the study including 42 nurses, 33 residents, 20 attending physicians, and 13 medical students completing emergency department rotations. The median Technology Readiness Index for this group was 3.56, indicating a moderate degree of openness for technological adoption. The majority of survey respondents indicated favorable receptivity to the app including finding it helpful to applying the rules (73/108, 67.6\%), that they would recommend the app to colleagues (81/108, 75.0\%), and that they would continue using the app (73/108, 67.6\%). Feedback from study participants highlighted a desire for access to more clinical decision rules and a higher degree of interactivity of the app. Between April 21, 2016, and June 1, 2017, The Ottawa Rules app was downloaded approximately 4000 times across 89 countries. Conclusions: We have found The Ottawa Rules app to be an effective means to disseminate the Ottawa Ankle Rules, Ottawa Knee Rule, and Canadian C-Spine Rule among all levels of emergency department clinicians. We have been successful in monitoring uptake and access of the rules in the app as a result of our publicity efforts. Mobile technology can be leveraged to improve the accessibility of clinical decision tools to health professionals. ", doi="10.2196/10263", url="http://mhealth.jmir.org/2018/6/e10263/", url="http://www.ncbi.nlm.nih.gov/pubmed/29891469" } @Article{info:doi/10.2196/10311, author="Guo, Yanting and Zheng, Gang and Fu, Tianyun and Hao, Shiying and Ye, Chengyin and Zheng, Le and Liu, Modi and Xia, Minjie and Jin, Bo and Zhu, Chunqing and Wang, Oliver and Wu, Qian and Culver, S. Devore and Alfreds, T. Shaun and Stearns, Frank and Kanov, Laura and Bhatia, Ajay and Sylvester, G. Karl and Widen, Eric and McElhinney, B. Doff and Ling, Bruce Xuefeng", title="Assessing Statewide All-Cause Future One-Year Mortality: Prospective Study With Implications for Quality of Life, Resource Utilization, and Medical Futility", journal="J Med Internet Res", year="2018", month="Jun", day="04", volume="20", number="6", pages="e10311", keywords="One-year mortality risk prediction", keywords="electronic medical records", keywords="quality of life", keywords="healthcare resource utilization", keywords="social determinants", abstract="Background: For many elderly patients, a disproportionate amount of health care resources and expenditures is spent during the last year of life, despite the discomfort and reduced quality of life associated with many aggressive medical approaches. However, few prognostic tools have focused on predicting all-cause 1-year mortality among elderly patients at a statewide level, an issue that has implications for improving quality of life while distributing scarce resources fairly. Objective: Using data from a statewide elderly population (aged ?65 years), we sought to prospectively validate an algorithm to identify patients at risk for dying in the next year for the purpose of minimizing decision uncertainty, improving quality of life, and reducing futile treatment. Methods: Analysis was performed using electronic medical records from the Health Information Exchange in the state of Maine, which covered records of nearly 95\% of the statewide population. The model was developed from 125,896 patients aged at least 65 years who were discharged from any care facility in the Health Information Exchange network from September 5, 2013, to September 4, 2015. Validation was conducted using 153,199 patients with same inclusion and exclusion criteria from September 5, 2014, to September 4, 2016. Patients were stratified into risk groups. The association between all-cause 1-year mortality and risk factors was screened by chi-squared test and manually reviewed by 2 clinicians. We calculated risk scores for individual patients using a gradient tree-based boost algorithm, which measured the probability of mortality within the next year based on the preceding 1-year clinical profile. Results: The development sample included 125,896 patients (72,572 women, 57.64\%; mean 74.2 [SD 7.7] years). The final validation cohort included 153,199 patients (88,177 women, 57.56\%; mean 74.3 [SD 7.8] years). The c-statistic for discrimination was 0.96 (95\% CI 0.93-0.98) in the development group and 0.91 (95\% CI 0.90-0.94) in the validation cohort. The mortality was 0.99\% in the low-risk group, 16.75\% in the intermediate-risk group, and 72.12\% in the high-risk group. A total of 99 independent risk factors (n=99) for mortality were identified (reported as odds ratios; 95\% CI). Age was on the top of list (1.41; 1.06-1.48); congestive heart failure (20.90; 15.41-28.08) and different tumor sites were also recognized as driving risk factors, such as cancer of the ovaries (14.42; 2.24-53.04), colon (14.07; 10.08-19.08), and stomach (13.64; 3.26-86.57). Disparities were also found in patients' social determinants like respiratory hazard index (1.24; 0.92-1.40) and unemployment rate (1.18; 0.98-1.24). Among high-risk patients who expired in our dataset, cerebrovascular accident, amputation, and type 1 diabetes were the top 3 diseases in terms of average cost in the last year of life. Conclusions: Our study prospectively validated an accurate 1-year risk prediction model and stratification for the elderly population (?65 years) at risk of mortality with statewide electronic medical record datasets. It should be a valuable adjunct for helping patients to make better quality-of-life choices and alerting care givers to target high-risk elderly for appropriate care and discussions, thus cutting back on futile treatment. ", doi="10.2196/10311", url="http://www.jmir.org/2018/6/e10311/", url="http://www.ncbi.nlm.nih.gov/pubmed/29866643" } @Article{info:doi/10.2196/10775, author="Contreras, Ivan and Vehi, Josep", title="Artificial Intelligence for Diabetes Management and Decision Support: Literature Review", journal="J Med Internet Res", year="2018", month="May", day="30", volume="20", number="5", pages="e10775", keywords="diabetes management", keywords="artificial intelligence", keywords="machine learning", keywords="mobile computing", keywords="blood glucose", abstract="Background: Artificial intelligence methods in combination with the latest technologies, including medical devices, mobile computing, and sensor technologies, have the potential to enable the creation and delivery of better management services to deal with chronic diseases. One of the most lethal and prevalent chronic diseases is diabetes mellitus, which is characterized by dysfunction of glucose homeostasis. Objective: The objective of this paper is to review recent efforts to use artificial intelligence techniques to assist in the management of diabetes, along with the associated challenges. Methods: A review of the literature was conducted using PubMed and related bibliographic resources. Analyses of the literature from 2010 to 2018 yielded 1849 pertinent articles, of which we selected 141 for detailed review. Results: We propose a functional taxonomy for diabetes management and artificial intelligence. Additionally, a detailed analysis of each subject category was performed using related key outcomes. This approach revealed that the experiments and studies reviewed yielded encouraging results. Conclusions: We obtained evidence of an acceleration of research activity aimed at developing artificial intelligence-powered tools for prediction and prevention of complications associated with diabetes. Our results indicate that artificial intelligence methods are being progressively established as suitable for use in clinical daily practice, as well as for the self-management of diabetes. Consequently, these methods provide powerful tools for improving patients' quality of life. ", doi="10.2196/10775", url="http://www.jmir.org/2018/5/e10775/" } @Article{info:doi/10.2196/jmir.9901, author="Musy, N. Sarah and Ausserhofer, Dietmar and Schwendimann, Ren{\'e} and Rothen, Ulrich Hans and Jeitziner, Marie-Madlen and Rutjes, WS Anne and Simon, Michael", title="Trigger Tool--Based Automated Adverse Event Detection in Electronic Health Records: Systematic Review", journal="J Med Internet Res", year="2018", month="May", day="30", volume="20", number="5", pages="e198", keywords="patient safety", keywords="electronic health records", keywords="patient harm", keywords="review, systematic", abstract="Background: Adverse events in health care entail substantial burdens to health care systems, institutions, and patients. Retrospective trigger tools are often manually applied to detect AEs, although automated approaches using electronic health records may offer real-time adverse event detection, allowing timely corrective interventions. Objective: The aim of this systematic review was to describe current study methods and challenges regarding the use of automatic trigger tool-based adverse event detection methods in electronic health records. In addition, we aimed to appraise the applied studies' designs and to synthesize estimates of adverse event prevalence and diagnostic test accuracy of automatic detection methods using manual trigger tool as a reference standard. Methods: PubMed, EMBASE, CINAHL, and the Cochrane Library were queried. We included observational studies, applying trigger tools in acute care settings, and excluded studies using nonhospital and outpatient settings. Eligible articles were divided into diagnostic test accuracy studies and prevalence studies. We derived the study prevalence and estimates for the positive predictive value. We assessed bias risks and applicability concerns using Quality Assessment tool for Diagnostic Accuracy Studies-2 (QUADAS-2) for diagnostic test accuracy studies and an in-house developed tool for prevalence studies. Results: A total of 11 studies met all criteria: 2 concerned diagnostic test accuracy and 9 prevalence. We judged several studies to be at high bias risks for their automated detection method, definition of outcomes, and type of statistical analyses. Across all the 11 studies, adverse event prevalence ranged from 0\% to 17.9\%, with a median of 0.8\%. The positive predictive value of all triggers to detect adverse events ranged from 0\% to 100\% across studies, with a median of 40\%. Some triggers had wide ranging positive predictive value values: (1) in 6 studies, hypoglycemia had a positive predictive value ranging from 15.8\% to 60\%; (2) in 5 studies, naloxone had a positive predictive value ranging from 20\% to 91\%; (3) in 4 studies, flumazenil had a positive predictive value ranging from 38.9\% to 83.3\%; and (4) in 4 studies, protamine had a positive predictive value ranging from 0\% to 60\%. We were unable to determine the adverse event prevalence, positive predictive value, preventability, and severity in 40.4\%, 10.5\%, 71.1\%, and 68.4\% of the studies, respectively. These studies did not report the overall number of records analyzed, triggers, or adverse events; or the studies did not conduct the analysis. Conclusions: We observed broad interstudy variation in reported adverse event prevalence and positive predictive value. The lack of sufficiently described methods led to difficulties regarding interpretation. To improve quality, we see the need for a set of recommendations to endorse optimal use of research designs and adequate reporting of future adverse event detection studies. ", doi="10.2196/jmir.9901", url="http://www.jmir.org/2018/5/e198/" } @Article{info:doi/10.2196/jmir.9610, author="Henderson, Jette and Ke, Junyuan and Ho, C. Joyce and Ghosh, Joydeep and Wallace, C. Byron", title="Phenotype Instance Verification and Evaluation Tool (PIVET): A Scaled Phenotype Evidence Generation Framework Using Web-Based Medical Literature", journal="J Med Internet Res", year="2018", month="May", day="04", volume="20", number="5", pages="e164", keywords="medical informatics", keywords="medical subject headings", keywords="algorithms", keywords="clustering analysis", keywords="classification", keywords="databases as topic", keywords="information storage and retrieval", keywords="MEDLINE", abstract="Background: Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. Objective: The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. Methods: PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET's phenotype representation with PheKnow-Cloud's by using PheKnow-Cloud's experimental setup. In PIVET's framework, we also introduce a statistical model trained on domain expert--verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. Results: PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET's analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Conclusions: Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy. ", doi="10.2196/jmir.9610", url="http://www.jmir.org/2018/5/e164/", url="http://www.ncbi.nlm.nih.gov/pubmed/29728351" } @Article{info:doi/10.2196/medinform.9171, author="Zarinabad, Niloufar and Meeus, M. Emma and Manias, Karen and Foster, Katharine and Peet, Andrew", title="Automated Modular Magnetic Resonance Imaging Clinical Decision Support System (MIROR): An Application in Pediatric Cancer Diagnosis", journal="JMIR Med Inform", year="2018", month="May", day="02", volume="6", number="2", pages="e30", keywords="clinical decision support", keywords="real-time systems", keywords="magnetic resonance imaging", abstract="Background: Advances in magnetic resonance imaging and the introduction of clinical decision support systems has underlined the need for an analysis tool to extract and analyze relevant information from magnetic resonance imaging data to aid decision making, prevent errors, and enhance health care. Objective: The aim of this study was to design and develop a modular medical image region of interest analysis tool and repository (MIROR) for automatic processing, classification, evaluation, and representation of advanced magnetic resonance imaging data. Methods: The clinical decision support system was developed and evaluated for diffusion-weighted imaging of body tumors in children (cohort of 48 children, with 37 malignant and 11 benign tumors). Mevislab software and Python have been used for the development of MIROR. Regions of interests were drawn around benign and malignant body tumors on different diffusion parametric maps, and extracted information was used to discriminate the malignant tumors from benign tumors. Results: Using MIROR, the various histogram parameters derived for each tumor case when compared with the information in the repository provided additional information for tumor characterization and facilitated the discrimination between benign and malignant tumors. Clinical decision support system cross-validation showed high sensitivity and specificity in discriminating between these tumor groups using histogram parameters. Conclusions: MIROR, as a diagnostic tool and repository, allowed the interpretation and analysis of magnetic resonance imaging images to be more accessible and comprehensive for clinicians. It aims to increase clinicians' skillset by introducing newer techniques and up-to-date findings to their repertoire and make information from previous cases available to aid decision making. The modular-based format of the tool allows integration of analyses that are not readily available clinically and streamlines the future developments. ", doi="10.2196/medinform.9171", url="http://medinform.jmir.org/2018/2/e30/", url="http://www.ncbi.nlm.nih.gov/pubmed/29720361" } @Article{info:doi/10.2196/jmir.9036, author="Abbasgholizadeh Rahimi, Samira and L{\'e}pine, Johanie and Croteau, Jordie and Robitaille, Hubert and Giguere, MC Anik and Wilson, J. Brenda and Rousseau, Fran{\c{c}}ois and L{\'e}vesque, Isabelle and L{\'e}gar{\'e}, France", title="Psychosocial Factors of Health Professionals' Intention to Use a Decision Aid for Down Syndrome Screening: Cross-Sectional Quantitative Study", journal="J Med Internet Res", year="2018", month="Apr", day="25", volume="20", number="4", pages="e114", keywords="decision support techniques", keywords="Down syndrome", keywords="decision making", keywords="behavior", keywords="intention", keywords="physicians", keywords="midwifery", keywords="surveys", keywords="prenatal diagnosis", abstract="Background: Decisions about prenatal screening for Down syndrome are difficult for women, as they entail risk, potential loss, and regret. Shared decision making increases women's knowledge of their choices and better aligns decisions with their values. Patient decision aids foster shared decision making but are rarely used in this context. Objective: One of the most promising strategies for implementing shared decision making is distribution of decision aids by health professionals. We aimed to identify factors influencing their intention to use a DA during prenatal visit for decisions about Down syndrome screening. Methods: We conducted a cross-sectional quantitative study. Using a Web panel, we conducted a theory-based survey of health professionals in Quebec province (Canada). Eligibility criteria were as follows: (1) family physicians, midwives, obstetrician-gynecologists, or trainees in these professions; (2) involved in prenatal care; and (3) working in Quebec province. Participants watched a video depicting a health professional using a decision aid during a prenatal consultation with a woman and her partner, and then answered a questionnaire based on an extended version of the theory of planned behavior, including some of the constructs of the theoretical domains framework. The questionnaire assessed 8 psychosocial constructs (attitude, anticipated regret, subjective norm, self-identity, moral norm, descriptive norm, self-efficacy, and perceived control), 7 related sets of behavioral beliefs (advantages, disadvantages, emotions, sources of encouragement or discouragement, incentives, facilitators, and barriers), and sociodemographic data. We performed descriptive, bivariate, and multiple linear regression analyses to identify factors influencing health professionals' intention to use a decision aid. Results: Among 330 health professionals who completed the survey, 310 met the inclusion criteria: family physicians, 55.2\% (171/310); obstetrician-gynecologists, 33.8\% (105/310); and midwives, 11.0\% (34/310). Of these, 80.9\% were female (251/310). Mean age was 39.6 (SD 11.5) years. Less than half were aware of any decision aids at all. In decreasing order of importance, factors influencing their intention to use a decision aid for Down syndrome prenatal screening were as follows: self-identity (beta=.325, P<.001), attitude (beta=.297, P<.001), moral norm (beta=.288, P<.001), descriptive norm (beta=.166, P<.001), and anticipated regret (beta=.099, P=.003). Underlying behavioral beliefs significantly related to intention were that the use of a decision aid would promote decision making (beta=.117, 95\% CI 0.043-0.190), would reassure health professionals (beta=.100, 95\% CI 0.024-0.175), and might require more time than planned for the consultation (beta=?.077, 95\% CI ?0.124 to ?0.031). Conclusions: We identified psychosocial factors that could influence health professionals' intention to use a decision aid about Down syndrome screening. Strategies should remind them of the following: (1) using a decision aid for this purpose should be a common practice, (2) it would be expected of someone in their societal role, (3) the experience of using it will be satisfying and reassuring, and (4) it is likely to be compatible with their moral values. ", doi="10.2196/jmir.9036", url="http://www.jmir.org/2018/4/e114/", url="http://www.ncbi.nlm.nih.gov/pubmed/29695369" } @Article{info:doi/10.2196/jmir.9477, author="Lin, Fong-Ci and Wang, Chen-Yu and Shang, Ji Rung and Hsiao, Fei-Yuan and Lin, Mei-Shu and Hung, Kuan-Yu and Wang, Jui and Lin, Zhen-Fang and Lai, Feipei and Shen, Li-Jiuan and Huang, Chih-Fen", title="Identifying Unmet Treatment Needs for Patients With Osteoporotic Fracture: Feasibility Study for an Electronic Clinical Surveillance System", journal="J Med Internet Res", year="2018", month="Apr", day="24", volume="20", number="4", pages="e142", keywords="information systems", keywords="public health surveillance", keywords="osteoporotic fractures", keywords="pharmacovigilance", keywords="guideline adherence", abstract="Background: Traditional clinical surveillance relied on the results from clinical trials and observational studies of administrative databases. However, these studies not only required many valuable resources but also faced a very long time lag. Objective: This study aimed to illustrate a practical application of the National Taiwan University Hospital Clinical Surveillance System (NCSS) in the identification of patients with an osteoporotic fracture and to provide a high reusability infrastructure for longitudinal clinical data. Methods: The NCSS integrates electronic medical records in the National Taiwan University Hospital (NTUH) with a data warehouse and is equipped with a user-friendly interface. The NCSS was developed using professional insight from multidisciplinary experts, including clinical practitioners, epidemiologists, and biomedical engineers. The practical example identifying the unmet treatment needs for patients encountering major osteoporotic fractures described herein was mainly achieved by adopting the computerized workflow in the NCSS. Results: We developed the infrastructure of the NCSS, including an integrated data warehouse and an automatic surveillance workflow. By applying the NCSS, we efficiently identified 2193 patients who were newly diagnosed with a hip or vertebral fracture between 2010 and 2014 at NTUH. By adopting the filter function, we identified 1808 (1808/2193, 82.44\%) patients who continued their follow-up at NTUH, and 464 (464/2193, 21.16\%) patients who were prescribed anti-osteoporosis medications, within 3 and 12 months post the index date of their fracture, respectively. Conclusions: The NCSS systems can integrate the workflow of cohort identification to accelerate the survey process of clinically relevant problems and provide decision support in the daily practice of clinical physicians, thereby making the benefit of evidence-based medicine a reality. ", doi="10.2196/jmir.9477", url="http://www.jmir.org/2018/4/e142/", url="http://www.ncbi.nlm.nih.gov/pubmed/29691201" } @Article{info:doi/10.2196/medinform.8912, author="Khairat, Saif and Marc, David and Crosby, William and Al Sanousi, Ali", title="Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis", journal="JMIR Med Inform", year="2018", month="Apr", day="18", volume="6", number="2", pages="e24", keywords="decision support systems, clinical", keywords="decision making, computer-assisted", keywords="attitude to computers", abstract="Background: Clinical decision support systems (CDSSs) are an integral component of today's health information technologies. They assist with interpretation, diagnosis, and treatment. A CDSS can be embedded throughout the patient safety continuum providing reminders, recommendations, and alerts to health care providers. Although CDSSs have been shown to reduce medical errors and improve patient outcomes, they have fallen short of their full potential. User acceptance has been identified as one of the potential reasons for this shortfall. Objective: The purpose of this paper was to conduct a critical review and task analysis of CDSS research and to develop a new framework for CDSS design in order to achieve user acceptance. Methods: A critical review of CDSS papers was conducted with a focus on user acceptance. To gain a greater understanding of the problems associated with CDSS acceptance, we conducted a task analysis to identify and describe the goals, user input, system output, knowledge requirements, and constraints from two different perspectives: the machine (ie, the CDSS engine) and the user (ie, the physician). Results: Favorability of CDSSs was based on user acceptance of clinical guidelines, reminders, alerts, and diagnostic suggestions. We propose two models: (1) the user acceptance and system adaptation design model, which includes optimizing CDSS design based on user needs/expectations, and (2) the input-process-output-engagemodel, which reveals to users the processes that govern CDSS outputs. Conclusions: This research demonstrates that the incorporation of the proposed models will improve user acceptance to support the beneficial effects of CDSSs adoption. Ultimately, if a user does not accept technology, this not only poses a threat to the use of the technology but can also pose a threat to the health and well-being of patients. ", doi="10.2196/medinform.8912", url="http://medinform.jmir.org/2018/2/e24/", url="http://www.ncbi.nlm.nih.gov/pubmed/29669706" } @Article{info:doi/10.2196/resprot.9827, author="Camacho, Jhon and Medina Ch., Mar{\'i}a Ana and Landis-Lewis, Zach and Douglas, Gerald and Boyce, Richard", title="Comparing a Mobile Decision Support System Versus the Use of Printed Materials for the Implementation of an Evidence-Based Recommendation: Protocol for a Qualitative Evaluation", journal="JMIR Res Protoc", year="2018", month="Apr", day="13", volume="7", number="4", pages="e105", keywords="practice guideline", keywords="implementation science", keywords="decision support systems", keywords="mhealth", keywords="technology acceptance", keywords="computer-interpretable clinical guidelines", keywords="Colombia", abstract="Background: The distribution of printed materials is the most frequently used strategy to disseminate and implement clinical practice guidelines, although several studies have shown that the effectiveness of this approach is modest at best. Nevertheless, there is insufficient evidence to support the use of other strategies. Recent research has shown that the use of computerized decision support presents a promising approach to address some aspects of this problem. Objective: The aim of this study is to provide qualitative evidence on the potential effect of mobile decision support systems to facilitate the implementation of evidence-based recommendations included in clinical practice guidelines. Methods: We will conduct a qualitative study with two arms to compare the experience of primary care physicians while they try to implement an evidence-based recommendation in their clinical practice. In the first arm, we will provide participants with a printout of the guideline article containing the recommendation, while in the second arm, we will provide participants with a mobile app developed after formalizing the recommendation text into a clinical algorithm. Data will be collected using semistructured and open interviews to explore aspects of behavioral change and technology acceptance involved in the implementation process. The analysis will be comprised of two phases. During the first phase, we will conduct a template analysis to identify barriers and facilitators in each scenario. Then, during the second phase, we will contrast the findings from each arm to propose hypotheses about the potential impact of the system. Results: We have formalized the narrative in the recommendation into a clinical algorithm and have developed a mobile app. Data collection is expected to occur during 2018, with the first phase of analysis running in parallel. The second phase is scheduled to conclude in July 2019. Conclusions: Our study will further the understanding of the role of mobile decision support systems in the implementation of clinical practice guidelines. Furthermore, we will provide qualitative evidence to aid decisions made by low- and middle-income countries' ministries of health about investments in these technologies. ", doi="10.2196/resprot.9827", url="http://www.researchprotocols.org/2018/4/e105/", url="http://www.ncbi.nlm.nih.gov/pubmed/29653921" } @Article{info:doi/10.2196/jmir.8961, author="He, Zhe and Bian, Jiang and Carretta, J. Henry and Lee, Jiwon and Hogan, R. William and Shenkman, Elizabeth and Charness, Neil", title="Prevalence of Multiple Chronic Conditions Among Older Adults in Florida and the United States: Comparative Analysis of the OneFlorida Data Trust and National Inpatient Sample", journal="J Med Internet Res", year="2018", month="Apr", day="12", volume="20", number="4", pages="e137", keywords="medical informatics", keywords="chronic disease", keywords="comorbidity", keywords="geriatrics", abstract="Background: Older patients with multiple chronic conditions are often faced with increased health care needs and subsequent higher medical costs, posing significant financial burden to patients, their caregivers, and the health care system. The increasing adoption of electronic health record systems and the proliferation of clinical data offer new opportunities for prevalence studies and for population health assessment. The last few years have witnessed an increasing number of clinical research networks focused on building large collections of clinical data from electronic health records and claims to make it easier and less costly to conduct clinical research. Objective: The aim of this study was to compare the prevalence of common chronic conditions and multiple chronic conditions in older adults between Florida and the United States using data from the OneFlorida Clinical Research Consortium and the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS). Methods: We first analyzed the basic demographic characteristics of the older adults in 3 datasets---the 2013 OneFlorida data, the 2013 HCUP NIS data, and the combined 2012 to 2016 OneFlorida data. Then we analyzed the prevalence of each of the 25 chronic conditions in each of the 3 datasets. We stratified the analysis of older adults with hypertension, the most prevalent condition. Additionally, we examined trends (ie, overall trends and then by age, race, and gender) in the prevalence of discharge records representing multiple chronic conditions over time for the OneFlorida (2012-2016) and HCUP NIS cohorts (2003-2013). Results: The rankings of the top 10 prevalent conditions are the same across the OneFlorida and HCUP NIS datasets. The most prevalent multiple chronic conditions of 2 conditions among the 3 datasets were---hyperlipidemia and hypertension; hypertension and ischemic heart disease; diabetes and hypertension; chronic kidney disease and hypertension; anemia and hypertension; and hyperlipidemia and ischemic heart disease. We observed increasing trends in multiple chronic conditions in both data sources. Conclusions: The results showed that chronic conditions and multiple chronic conditions are prevalent in older adults across Florida and the United States. Even though slight differences were observed, the similar estimates of prevalence of chronic conditions and multiple chronic conditions across OneFlorida and HCUP NIS suggested that clinical research data networks such as OneFlorida, built from heterogeneous data sources, can provide rich data resources for conducting large-scale secondary data analyses. ", doi="10.2196/jmir.8961", url="http://www.jmir.org/2018/4/e137/", url="http://www.ncbi.nlm.nih.gov/pubmed/29650502" } @Article{info:doi/10.2196/jmir.8884, author="Conca, Tania and Saint-Pierre, Cecilia and Herskovic, Valeria and Sep{\'u}lveda, Marcos and Capurro, Daniel and Prieto, Florencia and Fernandez-Llatas, Carlos", title="Multidisciplinary Collaboration in the Treatment of Patients With Type 2 Diabetes in Primary Care: Analysis Using Process Mining", journal="J Med Internet Res", year="2018", month="Apr", day="10", volume="20", number="4", pages="e127", keywords="process assessment (health care)", keywords="interprofessional relations", keywords="primary health care", keywords="type 2 diabetes mellitus", keywords="data mining", abstract="Background: Public health in several countries is characterized by a shortage of professionals and a lack of economic resources. Monitoring and redesigning processes can foster the success of health care institutions, enabling them to provide a quality service while simultaneously reducing costs. Process mining, a discipline that extracts knowledge from information system data to analyze operational processes, affords an opportunity to understand health care processes. Objective: Health care processes are highly flexible and multidisciplinary, and health care professionals are able to coordinate in a variety of different ways to treat a diagnosis. The aim of this work was to understand whether the ways in which professionals coordinate their work affect the clinical outcome of patients. Methods: This paper proposes a method based on the use of process mining to identify patterns of collaboration between physician, nurse, and dietitian in the treatment of patients with type 2 diabetes mellitus and to compare these patterns with the clinical evolution of the patients within the context of primary care. Clustering is used as part of the preprocessing of data to manage the variability, and then process mining is used to identify patterns that may arise. Results: The method is applied in three primary health care centers in Santiago, Chile. A total of seven collaboration patterns were identified, which differed primarily in terms of the number of disciplines present, the participation intensity of each discipline, and the referrals between disciplines. The pattern in which the three disciplines participated in the most equitable and comprehensive manner had a lower proportion of highly decompensated patients compared with those patterns in which the three disciplines participated in an unbalanced manner. Conclusions: By discovering which collaboration patterns lead to improved outcomes, health care centers can promote the most successful patterns among their professionals so as to improve the treatment of patients. Process mining techniques are useful for discovering those collaborations patterns in flexible and unstructured health care processes. ", doi="10.2196/jmir.8884", url="http://www.jmir.org/2018/4/e127/", url="http://www.ncbi.nlm.nih.gov/pubmed/29636315" } @Article{info:doi/10.2196/jmir.7541, author="van Kasteren, Yasmin and Freyne, Jill and Hussain, Sazzad M.", title="Total Knee Replacement and the Effect of Technology on Cocreation for Improved Outcomes and Delivery: Qualitative Multi-Stakeholder Study", journal="J Med Internet Res", year="2018", month="Mar", day="20", volume="20", number="3", pages="e95", keywords="arthroplasty", keywords="replacement", keywords="osteoarthritis", keywords="patient participation", keywords="consumer health informatics", keywords="technology", keywords="telemedicine", keywords="rehabilitation", keywords="self-care", keywords="exercise therapy", keywords="human computer interaction", keywords="wearables", abstract="Background: The growth in patient-centered care delivery combined with the rising costs of health care have perhaps not unsurprisingly been matched by a proliferation of patient-centered technology. This paper takes a multistakeholder approach to explore how digital technology can support the cocreation of value between patients and their care teams in the delivery of total knee replacement (TKR) surgery, an increasingly common procedure to return mobility and relieve pain for people suffering from osteoarthritis. Objective: The aim of this study was to investigate communications and interactions between patients and care teams in the delivery of TKR to identify opportunities for digital technology to add value to TKR health care service by enhancing the cocreation of value. Methods: A multistakeholder qualitative study of user needs was conducted with Australian stakeholders (N=34): surgeons (n=12), physiotherapists (n=3), patients (n=11), and general practitioners (n=8). Data from focus groups and interviews were recorded, transcribed, and analyzed using thematic analysis. Results: Encounters between patients and their care teams are information-rich but time-poor. Results showed seven different stages of the TKR journey that starts with referral to a surgeon and ends with a postoperative review at 12 months. Each stage of the journey has different information and communication challenges that can be enhanced by digital technology. Opportunities for digital technology include improved waiting list management, supporting and reinforcing patient retention and recall of information, motivating and supporting rehabilitation, improving patient preparation for hospital stay, and reducing risks and anxiety associated with postoperative wound care. Conclusions: Digital technology can add value to patients' care team communications by enhancing information flow, assisting patient recall and retention of information, improving accessibility and portability of information, tailoring information to individual needs, and by providing patients with tools to engage in their own health care management. For care teams, digital technology can add value through early detection of postoperative complications, proactive surveillance of health data for postoperative patients and patients on waiting lists, higher compliance with rehabilitation programs, and reduced length of stay. Digital technology has the potential to improve patient satisfaction and outcomes, as well as potentially reduce hospital length of stay and the burden of disease associated with postoperative morbidity. ", doi="10.2196/jmir.7541", url="http://www.jmir.org/2018/3/e95/", url="http://www.ncbi.nlm.nih.gov/pubmed/29559424" } @Article{info:doi/10.2196/humanfactors.8948, author="Thomson, Karen and Brouwers, Corline and Damman, C. Olga and de Bruijne, C. Martine and Timmermans, RM Danielle and Melles, Marijke", title="How Health Care Professionals Evaluate a Digital Intervention to Improve Medication Adherence: Qualitative Exploratory Study", journal="JMIR Hum Factors", year="2018", month="Feb", day="20", volume="5", number="1", pages="e7", keywords="medication adherence", keywords="eHealth", keywords="shared decision making", keywords="self-management", keywords="patient engagement", abstract="Background: Medication nonadherence poses a serious and a hard-to-tackle problem for many chronic diseases. Electronic health (eHealth) apps that foster patient engagement and shared decision making (SDM) may be a novel approach to improve medication adherence. Objective: The aim of this study was to investigate the perspective of health care professionals regarding a newly developed digital app aimed to improve medication adherence. Familial hypercholesterolemia (FH) was chosen as a case example. Methods: A Web-based prototype of the eHealth app---MIK---was codesigned with patients and health care professionals. After user tests with patients, we performed semistructured interviews and user tests with 12 physicians from 6 different hospitals to examine how the functionalities offered by MIK could assist physicians in their consultation and how they could be integrated into daily clinical practice. Qualitative thematic analysis was used to identify themes that covered the physicians' evaluations. Results: On the basis of the interview data, 3 themes were identified, which were (1) perceived impact on patient-physician collaboration; (2) perceived impact on the patient's understanding and self-management regarding medication adherence; and (3) perceived impact on clinical decisions and workflow. Conclusions: The eHealth app MIK seems to have the potential to improve the consultation between the patient and the physician in terms of collaboration and patient engagement. The impact of eHealth apps based on the concept of SDM for improving medication-taking behavior and clinical outcomes is yet to be evaluated. Insights will be useful for further development of eHealth apps aimed at improving self-management by means of patient engagement and SDM. ", doi="10.2196/humanfactors.8948", url="http://humanfactors.jmir.org/2018/1/e7/", url="http://www.ncbi.nlm.nih.gov/pubmed/29463494" } @Article{info:doi/10.2196/jmir.8922, author="Shen, XingRong and Lu, Manman and Feng, Rui and Cheng, Jing and Chai, Jing and Xie, Maomao and Dong, Xuemeng and Jiang, Tao and Wang, Debin", title="Web-Based Just-in-Time Information and Feedback on Antibiotic Use for Village Doctors in Rural Anhui, China: Randomized Controlled Trial", journal="J Med Internet Res", year="2018", month="Feb", day="14", volume="20", number="2", pages="e53", keywords="internet", keywords="drug resistance, bacterial", keywords="primary health care", keywords="randomized controlled trial", keywords="China", abstract="Background: Excessive use of antibiotics is very common worldwide, especially in rural China; various measures that have been used in curbing the problem have shown only marginal effects. Objective: The objective of this study was to test an innovative intervention that provided just-in-time information and feedback (JITIF) to village doctors on care of common infectious diseases. Methods: The information component of JITIF consisted of a set of theory or evidence-based ingredients, including operation guideline, public commitment, and takeaway information, whereas the feedback component tells each participating doctor about his or her performance scores and percentages of antibiotic prescriptions. These ingredients were incorporated together in a synergetic way via a Web-based aid. Evaluation of JITIF adopted a randomized controlled trial design involving 24 village clinics randomized into equal control and intervention arms. Measures used included changes between baseline and endpoint (1 year after baseline) in terms of: percentages of patients with symptomatic respiratory or gastrointestinal tract infections (RTIs or GTIs) being prescribed antibiotics, delivery of essential service procedures, and patients' beliefs and knowledge about antibiotics and infection prevention. Two researchers worked as a group in collecting the data at each site clinic. One performed nonparticipative observation of the service process, while the other performed structured exit interviews about patients' beliefs and knowledge. Data analysis comprised mainly of: (1) descriptive estimations of beliefs or knowledge, practice of indicative procedures, and use of antibiotics at baseline and endpoint for intervention and control groups and (2) chi-square tests for the differences between these groups. Results: A total of 1048 patients completed the evaluation, including 532 at baseline (intervention=269, control=263) and 516 at endpoint (intervention=262, control=254). Patients diagnosed with RTIs and GTIs accounted for 76.5\% (407/532) and 23.5\% (125/352), respectively, at baseline and 80.8\% (417/532) and 19.2\% (99/532) at endpoint. JITIF resulted in substantial improvement in delivery of essential service procedures (2.6\%-24.8\% at baseline on both arms and at endpoint on the control arm vs 88.5\%-95.0\% at endpoint on the intervention arm, P<.001), beliefs favoring rational antibiotics use (11.5\%-39.8\% at baseline on both arms and at endpoint on the control arm vs 19.8\%-62.6\% at endpoint on the intervention arm, P<.001) and knowledge about side effects of antibiotics (35.7\% on the control arm vs 73.7\% on the intervention arm, P<.001), measures for managing or preventing RTIs (39.1\% vs 66.7\%, P=.02), and measures for managing or preventing GTIs (46.8\% vs 69.2\%, P<.001). It also reduced antibiotics prescription (from 88.8\%-62.3\%, P<.001), and this decrease was consistent for RTIs (87.1\% vs 64.3\%, P<.001) and GTIs (94.7\% vs 52.4\%, P<.001). Conclusions: JITIF is effective in controlling antibiotics prescription at least in the short term and may provide a low-cost and sustainable solution to the widespread excessive use of antibiotics in rural China. ", doi="10.2196/jmir.8922", url="http://www.jmir.org/2018/2/e53/", url="http://www.ncbi.nlm.nih.gov/pubmed/29444768" } @Article{info:doi/10.2196/medinform.8662, author="Zheng, Shuai and Jabbour, K. Salma and O'Reilly, E. Shannon and Lu, J. James and Dong, Lihua and Ding, Lijuan and Xiao, Ying and Yue, Ning and Wang, Fusheng and Zou, Wei", title="Automated Information Extraction on Treatment and Prognosis for Non--Small Cell Lung Cancer Radiotherapy Patients: Clinical Study", journal="JMIR Med Inform", year="2018", month="Feb", day="01", volume="6", number="1", pages="e8", keywords="information extraction", keywords="oncology", keywords="chemoradiation treatment", keywords="prognosis", keywords="non--small cell lung", keywords="information storage and retrieval", keywords="natural language processing", abstract="Background: In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. Objective: The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non--small cell lung cancer (NSCLC). Methods: We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. Results: In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93\% and 83\%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7\% and 94.5\% respectively. Conclusions: The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies. ", doi="10.2196/medinform.8662", url="http://medinform.jmir.org/2018/1/e8/", url="http://www.ncbi.nlm.nih.gov/pubmed/29391345" } @Article{info:doi/10.2196/jmir.9268, author="Ye, Chengyin and Fu, Tianyun and Hao, Shiying and Zhang, Yan and Wang, Oliver and Jin, Bo and Xia, Minjie and Liu, Modi and Zhou, Xin and Wu, Qian and Guo, Yanting and Zhu, Chunqing and Li, Yu-Ming and Culver, S. Devore and Alfreds, T. Shaun and Stearns, Frank and Sylvester, G. Karl and Widen, Eric and McElhinney, Doff and Ling, Xuefeng", title="Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning", journal="J Med Internet Res", year="2018", month="Jan", day="30", volume="20", number="1", pages="e22", keywords="hypertension", keywords="risk assessment", keywords="electronic health records", keywords="multiple chronic conditions", keywords="mental disorders", keywords="social determinants of health", abstract="Background: As a high-prevalence health condition, hypertension is clinically costly, difficult to manage, and often leads to severe and life-threatening diseases such as cardiovascular disease (CVD) and stroke. Objective: The aim of this study was to develop and validate prospectively a risk prediction model of incident essential hypertension within the following year. Methods: Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. Retrospective (N=823,627, calendar year 2013) and prospective (N=680,810, calendar year 2014) cohorts were formed. A machine learning algorithm, XGBoost, was adopted in the process of feature selection and model building. It generated an ensemble of classification trees and assigned a final predictive risk score to each individual. Results: The 1-year incident hypertension risk model attained areas under the curve (AUCs) of 0.917 and 0.870 in the retrospective and prospective cohorts, respectively. Risk scores were calculated and stratified into five risk categories, with 4526 out of 381,544 patients (1.19\%) in the lowest risk category (score 0-0.05) and 21,050 out of 41,329 patients (50.93\%) in the highest risk category (score 0.4-1) receiving a diagnosis of incident hypertension in the following 1 year. Type 2 diabetes, lipid disorders, CVDs, mental illness, clinical utilization indicators, and socioeconomic determinants were recognized as driving or associated features of incident essential hypertension. The very high risk population mainly comprised elderly (age>50 years) individuals with multiple chronic conditions, especially those receiving medications for mental disorders. Disparities were also found in social determinants, including some community-level factors associated with higher risk and others that were protective against hypertension. Conclusions: With statewide EHR datasets, our study prospectively validated an accurate 1-year risk prediction model for incident essential hypertension. Our real-time predictive analytic model has been deployed in the state of Maine, providing implications in interventions for hypertension and related diseases and hopefully enhancing hypertension care. ", doi="10.2196/jmir.9268", url="http://www.jmir.org/2018/1/e22/", url="http://www.ncbi.nlm.nih.gov/pubmed/29382633" } @Article{info:doi/10.2196/medinform.7170, author="Carli, Delphine and Fahrni, Guillaume and Bonnabry, Pascal and Lovis, Christian", title="Quality of Decision Support in Computerized Provider Order Entry: Systematic Literature Review", journal="JMIR Med Inform", year="2018", month="Jan", day="24", volume="6", number="1", pages="e3", keywords="decision support systems, clinical", keywords="medical order entry systems", keywords="system, medication alert", keywords="sensitivity", keywords="specificity", keywords="predictive value of tests", abstract="Background: Computerized decision support systems have raised a lot of hopes and expectations in the field of order entry. Although there are numerous studies reporting positive impacts, concerns are increasingly high about alert fatigue and effective impacts of these systems. One of the root causes of fatigue alert reported is the low clinical relevance of these alerts. Objective: The objective of this systematic review was to assess the reported positive predictive value (PPV), as a proxy to clinical relevance, of decision support systems in computerized provider order entry (CPOE). Methods: A systematic search of the scientific literature published between February 2009 and March 2015 on CPOE, clinical decision support systems, and the predictive value associated with alert fatigue was conducted using PubMed database. Inclusion criteria were as follows: English language, full text available (free or pay for access), assessed medication, direct or indirect level of predictive value, sensitivity, or specificity. When possible with the information provided, PPV was calculated or evaluated. Results: Additive queries on PubMed retrieved 928 candidate papers. Of these, 376 were eligible based on abstract. Finally, 26 studies qualified for a full-text review, and 17 provided enough information for the study objectives. An additional 4 papers were added from the references of the reviewed papers. The results demonstrate massive variations in PPVs ranging from 8\% to 83\% according to the object of the decision support, with most results between 20\% and 40\%. The best results were observed when patients' characteristics, such as comorbidity or laboratory test results, were taken into account. There was also an important variation in sensitivity, ranging from 38\% to 91\%. Conclusions: There is increasing reporting of alerts override in CPOE decision support. Several causes are discussed in the literature, the most important one being the clinical relevance of alerts. In this paper, we tried to assess formally the clinical relevance of alerts, using a near-strong proxy, which is the PPV of alerts, or any way to express it such as the rate of true and false positive alerts. In doing this literature review, three inferences were drawn. First, very few papers report direct or enough indirect elements that support the use or the computation of PPV, which is a gold standard for all diagnostic tools in medicine and should be systematically reported for decision support. Second, the PPV varies a lot according to the typology of decision support, so that overall rates are not useful, but must be reported by the type of alert. Finally, in general, the PPVs are below or near 50\%, which can be considered as very low. ", doi="10.2196/medinform.7170", url="http://medinform.jmir.org/2018/1/e3/", url="http://www.ncbi.nlm.nih.gov/pubmed/29367187" } @Article{info:doi/10.2196/medinform.9064, author="Yang, Cheng-Yi and Lo, Yu-Sheng and Chen, Ray-Jade and Liu, Chien-Tsai", title="A Clinical Decision Support Engine Based on a National Medication Repository for the Detection of Potential Duplicate Medications: Design and Evaluation", journal="JMIR Med Inform", year="2018", month="Jan", day="19", volume="6", number="1", pages="e6", keywords="duplicate medication", keywords="adverse drug reaction", keywords="clinical decision support system", keywords="PharmaCloud", abstract="Background: A computerized physician order entry (CPOE) system combined with a clinical decision support system can reduce duplication of medications and thus adverse drug reactions. However, without infrastructure that supports patients' integrated medication history across health care facilities nationwide, duplication of medication can still occur. In Taiwan, the National Health Insurance Administration has implemented a national medication repository and Web-based query system known as the PharmaCloud, which allows physicians to access their patients' medication records prescribed by different health care facilities across Taiwan. Objective: This study aimed to develop a scalable, flexible, and thematic design-based clinical decision support (CDS) engine, which integrates a national medication repository to support CPOE systems in the detection of potential duplication of medication across health care facilities, as well as to analyze its impact on clinical encounters. Methods: A CDS engine was developed that can download patients' up-to-date medication history from the PharmaCloud and support a CPOE system in the detection of potential duplicate medications. When prescribing a medication order using the CPOE system, a physician receives an alert if there is a potential duplicate medication. To investigate the impact of the CDS engine on clinical encounters in outpatient services, a clinical encounter log was created to collect information about time, prescribed drugs, and physicians' responses to handling the alerts for each encounter. Results: The CDS engine was installed in a teaching affiliate hospital, and the clinical encounter log collected information for 3 months, during which a total of 178,300 prescriptions were prescribed in the outpatient departments. In all, 43,844/178,300 (24.59\%) patients signed the PharmaCloud consent form allowing their physicians to access their medication history in the PharmaCloud. The rate of duplicate medication was 5.83\% (1843/31,614) of prescriptions. When prescribing using the CDS engine, the median encounter time was 4.3 (IQR 2.3-7.3) min, longer than that without using the CDS engine (median 3.6, IQR 2.0-6.3 min). From the physicians' responses, we found that 42.06\% (1908/4536) of the potential duplicate medications were recognized by the physicians and the medication orders were canceled. Conclusions: The CDS engine could easily extend functions for detection of adverse drug reactions when more and more electronic health record systems are adopted. Moreover, the CDS engine can retrieve more updated and completed medication histories in the PharmaCloud, so it can have better performance for detection of duplicate medications. Although our CDS engine approach could enhance medication safety, it would make for a longer encounter time. This problem can be mitigated by careful evaluation of adopted solutions for implementation of the CDS engine. The successful key component of a CDS engine is the completeness of the patient's medication history, thus further research to assess the factors in increasing the PharmaCloud consent rate is required. ", doi="10.2196/medinform.9064", url="http://medinform.jmir.org/2018/1/e6/" } @Article{info:doi/10.2196/medinform.8680, author="Wellner, Ben and Grand, Joan and Canzone, Elizabeth and Coarr, Matt and Brady, W. Patrick and Simmons, Jeffrey and Kirkendall, Eric and Dean, Nathan and Kleinman, Monica and Sylvester, Peter", title="Predicting Unplanned Transfers to the Intensive Care Unit: A Machine Learning Approach Leveraging Diverse Clinical Elements", journal="JMIR Med Inform", year="2017", month="Nov", day="22", volume="5", number="4", pages="e45", keywords="clinical deterioration", keywords="machine learning", keywords="data mining", keywords="electronic health record", keywords="patient acuity", keywords="vital signs", keywords="nursing assessment", keywords="clinical laboratory techniques", abstract="Background: Early warning scores aid in the detection of pediatric clinical deteriorations but include limited data inputs, rarely include data trends over time, and have limited validation. Objective: Machine learning methods that make use of large numbers of predictor variables are now commonplace. This work examines how different types of predictor variables derived from the electronic health record affect the performance of predicting unplanned transfers to the intensive care unit (ICU) at three large children's hospitals. Methods: We trained separate models with data from three different institutions from 2011 through 2013 and evaluated models with 2014 data. Cases consisted of patients who transferred from the floor to the ICU and met one or more of 5 different priori defined criteria for suspected unplanned transfers. Controls were patients who were never transferred to the ICU. Predictor variables for the models were derived from vitals, labs, acuity scores, and nursing assessments. Classification models consisted of L1 and L2 regularized logistic regression and neural network models. We evaluated model performance over prediction horizons ranging from 1 to 16 hours. Results: Across the three institutions, the c-statistic values for our best models were 0.892 (95\% CI 0.875-0.904), 0.902 (95\% CI 0.880-0.923), and 0.899 (95\% CI 0.879-0.919) for the task of identifying unplanned ICU transfer 6 hours before its occurrence and achieved 0.871 (95\% CI 0.855-0.888), 0.872 (95\% CI 0.850-0.895), and 0.850 (95\% CI 0.825-0.875) for a prediction horizon of 16 hours. For our first model at 80\% sensitivity, this resulted in a specificity of 80.5\% (95\% CI 77.4-83.7) and a positive predictive value of 5.2\% (95\% CI 4.5-6.2). Conclusions: Feature-rich models with many predictor variables allow for patient deterioration to be predicted accurately, even up to 16 hours in advance. ", doi="10.2196/medinform.8680", url="http://medinform.jmir.org/2017/4/e45/", url="http://www.ncbi.nlm.nih.gov/pubmed/29167089" } @Article{info:doi/10.2196/mhealth.8732, author="Singh, Navdeep and Hess, Erik and Guo, George and Sharp, Adam and Huang, Brian and Breslin, Maggie and Melnick, Edward", title="Tablet-Based Patient-Centered Decision Support for Minor Head Injury in the Emergency Department: Pilot Study", journal="JMIR Mhealth Uhealth", year="2017", month="Sep", day="28", volume="5", number="9", pages="e144", keywords="clinical decision support", keywords="decision aids", keywords="head injury, minor", keywords="medical informatics", keywords="spiral computed tomography", keywords="health services overuse", keywords="patient-centered outcomes research", abstract="Background: The Concussion or Brain Bleed app is a clinician- and patient-facing electronic tool to guide decisions about head computed tomography (CT) use in patients presenting to the emergency department (ED) with minor head injury. This app integrates a patient decision aid and clinical decision support (using the Canadian CT Head Rule, CCHR) at the bedside on a tablet computer to promote conversations around individualized risk and patients' specific concerns within the ED context. Objective: The objective of this study was to describe the use of the Concussion or Brain Bleed app in a high-volume ED and to establish preliminary efficacy estimates on patient experience, clinician experience, health care utilization, and patient safety. These data will guide the planning of a larger multicenter trial testing the effectiveness of the Concussion or Brain Bleed app. Methods: We conducted a prospective pilot study of adult (age 18-65 years) patients presenting to the ED after minor head injury who were identified by participating clinicians as low risk by the CCHR. The primary outcome was patient knowledge regarding the injury, risks, and CT use. Secondary outcomes included patient satisfaction, decisional conflict, trust in physician, clinician acceptability, system usability, Net Promoter scores, head CT rate, and patient safety at 7 days. Results: We enrolled 41 patients cared for by 29 different clinicians. Patient knowledge increased after the use of the app (questions correct out of 9: pre-encounter, 3.3 vs postencounter, 4.7; mean difference 1.4, 95\% CI 0.8-2.0). Patients reported a mean of 11.7 (SD 13.5) on the Decisional Conflict Scale and 92.5 (SD 12.0) in the Trust in Physician Scale (both scales range from 0 to 100). Most patients were satisfied with the app's clarity of information (35, 85\%), helpfulness of information (36, 88\%), and amount of information (36, 88\%). In the 41 encounters, most clinicians thought the information was somewhat or extremely helpful to the patient (35, 85\%), would want to use something similar for other decisions (27, 66\%), and would recommend the app to other providers (28, 68\%). Clinicians reported a mean system usability score of 85.1 (SD 15; scale from 0 to 100 with 85 in the ``excellent'' acceptability range). The total Net Promoter Score was 36.6 (on a scale from --100 to 100). A total of 7 (17\%) patients received a head CT in the ED. No patients had a missed clinically important brain injury at 7 days. Conclusions: An app to help patients assess the utility of CT imaging after head injury in the ED increased patient knowledge. Nearly all clinicians reported the app to be helpful to patients. The high degree of patient satisfaction, clinician acceptability, and system usability support rigorous testing of the app in a larger multicenter trial. ", doi="10.2196/mhealth.8732", url="http://mhealth.jmir.org/2017/9/e144/", url="http://www.ncbi.nlm.nih.gov/pubmed/28958987" } @Article{info:doi/10.2196/medinform.7400, author="Reis, Nogueira Zilma Silveira and Maia, Abreu Thais and Marcolino, Soriano Milena and Becerra-Posada, Francisco and Novillo-Ortiz, David and Ribeiro, Pinho Antonio Luiz", title="Is There Evidence of Cost Benefits of Electronic Medical Records, Standards, or Interoperability in Hospital Information Systems? Overview of Systematic Reviews", journal="JMIR Med Inform", year="2017", month="Aug", day="29", volume="5", number="3", pages="e26", keywords="electronic medical records", keywords="standards", keywords="medical information exchange", keywords="health information exchange", keywords="cost", keywords="benefits and costs", abstract="Background: Electronic health (eHealth) interventions may improve the quality of care by providing timely, accessible information about one patient or an entire population. Electronic patient care information forms the nucleus of computerized health information systems. However, interoperability among systems depends on the adoption of information standards. Additionally, investing in technology systems requires cost-effectiveness studies to ensure the sustainability of processes for stakeholders. Objective: The objective of this study was to assess cost-effectiveness of the use of electronically available inpatient data systems, health information exchange, or standards to support interoperability among systems. Methods: An overview of systematic reviews was conducted, assessing the MEDLINE, Cochrane Library, LILACS, and IEEE Library databases to identify relevant studies published through February 2016. The search was supplemented by citations from the selected papers. The primary outcome sought the cost-effectiveness, and the secondary outcome was the impact on quality of care. Independent reviewers selected studies, and disagreement was resolved by consensus. The quality of the included studies was evaluated using a measurement tool to assess systematic reviews (AMSTAR). Results: The primary search identified 286 papers, and two papers were manually included. A total of 211 were systematic reviews. From the 20 studies that were selected after screening the title and abstract, 14 were deemed ineligible, and six met the inclusion criteria. The interventions did not show a measurable effect on cost-effectiveness. Despite the limited number of studies, the heterogeneity of electronic systems reported, and the types of intervention in hospital routines, it was possible to identify some preliminary benefits in quality of care. Hospital information systems, along with information sharing, had the potential to improve clinical practice by reducing staff errors or incidents, improving automated harm detection, monitoring infections more effectively, and enhancing the continuity of care during physician handoffs. Conclusions: This review identified some benefits in the quality of care but did not provide evidence that the implementation of eHealth interventions had a measurable impact on cost-effectiveness in hospital settings. However, further evidence is needed to infer the impact of standards adoption or interoperability in cost benefits of health care; this in turn requires further research. ", doi="10.2196/medinform.7400", url="http://medinform.jmir.org/2017/3/e26/", url="http://www.ncbi.nlm.nih.gov/pubmed/28851681" } @Article{info:doi/10.2196/jmir.7921, author="Lowenstein, Margaret and Bamgbose, Olusinmi and Gleason, Nathaniel and Feldman, D. Mitchell", title="Psychiatric Consultation at Your Fingertips: Descriptive Analysis of Electronic Consultation From Primary Care to Psychiatry", journal="J Med Internet Res", year="2017", month="Aug", day="04", volume="19", number="8", pages="e279", keywords="mental health", keywords="primary care", keywords="health care delivery", keywords="teleconsultation", keywords="telehealth", keywords="Internet care delivery", abstract="Background: Mental health problems are commonly encountered in primary care, with primary care providers (PCPs) experiencing challenges referring patients to specialty mental health care. Electronic consultation (eConsult) is one model that has been shown to improve timely access to subspecialty care in a number of medical subspecialties. eConsults generally involve a PCP-initiated referral for specialty consultation for a clinical question that is outside their expertise but may not require an in-person evaluation. Objective: Our aim was to describe the implementation of eConsults for psychiatry in a large academic health system. Methods: We performed a content analysis of the first 50 eConsults to psychiatry after program implementation. For each question and response, we coded consults as pertaining to diagnosis and/or management as well as categories of medication choice, drug side effects or interactions, and queries about referrals and navigating the health care system. We also performed a chart review to evaluate the timeliness of psychiatrist responses and PCP implementation of recommendations. Results: Depression was the most common consult template selected by PCPs (20/50, 40\%), followed by the generic template (12/50, 24\%) and anxiety (8/50, 16\%). Most questions (49/50, 98\%) pertained primarily to management, particularly for medications. Psychiatrists commented on both diagnosis (28/50, 56\%) and management (50/50, 100\%), responded in an average of 1.4 days, and recommended in-person consultation for 26\% (13/50) of patients. PCPs implemented psychiatrist recommendations 76\% (38/50) of the time. Conclusions: For the majority of patients, psychiatrists provided strategies for ongoing management in primary care without an in-person evaluation, and PCPs implemented most psychiatrist recommendations. eConsults show promise as one means of supporting PCPs to deliver mental health care to patients with common psychiatric disorders. ", doi="10.2196/jmir.7921", url="http://www.jmir.org/2017/8/e279/", url="http://www.ncbi.nlm.nih.gov/pubmed/28778852" } @Article{info:doi/10.2196/medinform.7627, author="Deliberato, Oct{\'a}vio Rodrigo and Celi, Anthony Leo and Stone, J. David", title="Clinical Note Creation, Binning, and Artificial Intelligence", journal="JMIR Med Inform", year="2017", month="Aug", day="03", volume="5", number="3", pages="e24", keywords="electronic health records", keywords="artificial Intelligence", keywords="clinical informatics", doi="10.2196/medinform.7627", url="http://medinform.jmir.org/2017/3/e24/", url="http://www.ncbi.nlm.nih.gov/pubmed/28778845" } @Article{info:doi/10.2196/jmir.8111, author="Hjollund, Ingvar Niels Henrik", title="Individual Prognosis of Symptom Burden and Functioning in Chronic Diseases: A Generic Method Based on Patient-Reported Outcome (PRO) Measures", journal="J Med Internet Res", year="2017", month="Aug", day="01", volume="19", number="8", pages="e278", keywords="chronic disease", keywords="cohort studies", keywords="depression", keywords="longitudinal studies", keywords="patient-reported outcome measures", keywords="prognosis", keywords="recovery of function", keywords="repeated measurements", keywords="stroke", keywords="surveys and questionnaires", keywords="symptom assessment", abstract="Background: Information to the patient about the long-term prognosis of symptom burden and functioning is an integrated part of clinical practice, but relies mostly on the clinician's personal experience. Relevant prognostic models based on patient-reported outcome (PRO) data with repeated measurements are rarely available. Objective: The aim was to describe a generic method for individual long-term prognosis of symptom burden and functioning that implied few statistical presumptions, to evaluate an implementation for prognosis of depressive symptoms in stroke patients and to provide open access to a Web-based prototype of this implementation for individual use. Methods: The method used to describe individual prognosis of a PRO outcome was based on the selection of a specific subcohort of patients who have the same score as the patient in question at the same time (eg, after diagnosis or treatment start), plus or minus one unit of minimal clinically important difference. This subcohort's experienced courses were then used to provide quantitative measures of prognosis over time. A cohort of 1404 stroke patients provided data for a simulation study and a prototype for individual use. Members of the cohort answered questionnaires every 6 months for 3.5 years. Depressive symptoms were assessed by the Hospital Anxiety and Depression Scale (HADS) and a single item from the SF-12 (MH4) health survey. Four approaches were compared in a simulation study in which the prognosis for each member of the cohort was individually assessed. Results: The mean standard deviations were 40\% to 70\% higher in simulated scores. Mean errors were close to zero, and mean absolute errors were between 0.46 and 0.66 SD in the four approaches. An approach in which missing HADS scores were estimated from the single-item SF-12 MH4 performed marginally better than methods restricted to questionnaires with a genuine HADS score, which indicates that data collected with shorter questionnaires (eg, in clinical practice) may be used together with longer versions with the full scale, given that the design includes at least two simultaneous measurements of the full scale and the surrogate measure. Conclusions: This is the first description and implementation of a nonparametric method for individual PRO-based prognosis. Given that relevant PRO data have been collected longitudinally, the method may be applied to other patient groups and to any outcome related to symptom burden and functioning. This initial implementation has been deliberately made simple, and further elaborations as well as the usability and clinical validity of the method will be scrutinized in clinical practice. An implementation of the prototype is available online at www.prognosis.dk. ", doi="10.2196/jmir.8111", url="http://www.jmir.org/2017/8/e278/", url="http://www.ncbi.nlm.nih.gov/pubmed/28765099" } @Article{info:doi/10.2196/humanfactors.7153, author="Keyworth, Chris and Hart, Jo and Thoong, Hong and Ferguson, Jane and Tully, Mary", title="A Technological Innovation to Reduce Prescribing Errors Based on Implementation Intentions: The Acceptability and Feasibility of MyPrescribe", journal="JMIR Hum Factors", year="2017", month="Aug", day="01", volume="4", number="3", pages="e17", keywords="drug prescribing", keywords="behavior and behavior mechanisms", keywords="clinical competence", keywords="qualitative research", keywords="mobile applications", keywords="pharmacists", keywords="patient safety", keywords="telemedicine", abstract="Background: Although prescribing of medication in hospitals is rarely an error-free process, prescribers receive little feedback on their mistakes and ways to change future practices. Audit and feedback interventions may be an effective approach to modifying the clinical practice of health professionals, but these may pose logistical challenges when used in hospitals. Moreover, such interventions are often labor intensive. Consequently, there is a need to develop effective and innovative interventions to overcome these challenges and to improve the delivery of feedback on prescribing. Implementation intentions, which have been shown to be effective in changing behavior, link critical situations with an appropriate response; however, these have rarely been used in the context of improving prescribing practices. Objective: Semistructured qualitative interviews were conducted to evaluate the acceptability and feasibility of providing feedback on prescribing errors via MyPrescribe, a mobile-compatible website informed by implementation intentions. Methods: Data relating to 200 prescribing errors made by 52 junior doctors were collected by 11 hospital pharmacists. These errors were populated into MyPrescribe, where prescribers were able to construct their own personalized action plans. Qualitative interviews with a subsample of 15 junior doctors were used to explore issues regarding feasibility and acceptability of MyPrescribe and their experiences of using implementation intentions to construct prescribing action plans. Framework analysis was used to identify prominent themes, with findings mapped to the behavioral components of the COM-B model (capability, opportunity, motivation, and behavior) to inform the development of future interventions. Results: MyPrescribe was perceived to be effective in providing opportunities for critical reflection on prescribing errors and to complement existing training (such as junior doctors' e-portfolio). The participants were able to provide examples of how they would use ``If-Then'' plans for patient management. Technology, as opposed to other methods of learning (eg, traditional ``paper based'' learning), was seen as a positive advancement for continued learning. Conclusions: MyPrescribe was perceived as an acceptable and feasible learning tool for changing prescribing practices, with participants suggesting that it would make an important addition to medical prescribers' training in reflective practice. MyPrescribe is a novel theory-based technological innovation that provides the platform for doctors to create personalized implementation intentions. Applying the COM-B model allows for a more detailed understanding of the perceived mechanisms behind prescribing practices and the ways in which interventions aimed at changing professional practice can be implemented. ", doi="10.2196/humanfactors.7153", url="http://humanfactors.jmir.org/2017/3/e17/", url="http://www.ncbi.nlm.nih.gov/pubmed/28765104" } @Article{info:doi/10.2196/medinform.7140, author="Elmessiry, Adel and Cooper, O. William and Catron, F. Thomas and Karrass, Jan and Zhang, Zhe and Singh, P. Munindar", title="Triaging Patient Complaints: Monte Carlo Cross-Validation of Six Machine Learning Classifiers", journal="JMIR Med Inform", year="2017", month="Jul", day="31", volume="5", number="3", pages="e19", keywords="natural language processing", keywords="NLP", keywords="machine learning", keywords="patient complaints", abstract="Background: Unsolicited patient complaints can be a useful service recovery tool for health care organizations. Some patient complaints contain information that may necessitate further action on the part of the health care organization and/or the health care professional. Current approaches depend on the manual processing of patient complaints, which can be costly, slow, and challenging in terms of scalability. Objective: The aim of this study was to evaluate automatic patient triage, which can potentially improve response time and provide much-needed scale, thereby enhancing opportunities to encourage physicians to self-regulate. Methods: We implemented a comparison of several well-known machine learning classifiers to detect whether a complaint was associated with a physician or his/her medical practice. We compared these classifiers using a real-life dataset containing 14,335 patient complaints associated with 768 physicians that was extracted from patient complaints collected by the Patient Advocacy Reporting System developed at Vanderbilt University and associated institutions. We conducted a 10-splits Monte Carlo cross-validation to validate our results. Results: We achieved an accuracy of 82\% and F-score of 81\% in correctly classifying patient complaints with sensitivity and specificity of 0.76 and 0.87, respectively. Conclusions: We demonstrate that natural language processing methods based on modeling patient complaint text can be effective in identifying those patient complaints requiring physician action. ", doi="10.2196/medinform.7140", url="http://medinform.jmir.org/2017/3/e19/", url="http://www.ncbi.nlm.nih.gov/pubmed/28760726" } @Article{info:doi/10.2196/humanfactors.6857, author="St-Maurice, D. Justin and Burns, M. Catherine", title="Modeling Patient Treatment With Medical Records: An Abstraction Hierarchy to Understand User Competencies and Needs", journal="JMIR Hum Factors", year="2017", month="Jul", day="28", volume="4", number="3", pages="e16", keywords="clinical decision-making", keywords="health services research", keywords="qualitative research", keywords="primary health care", keywords="medical records systems, computerized", abstract="Background: Health care is a complex sociotechnical system. Patient treatment is evolving and needs to incorporate the use of technology and new patient-centered treatment paradigms. Cognitive work analysis (CWA) is an effective framework for understanding complex systems, and work domain analysis (WDA) is useful for understanding complex ecologies. Although previous applications of CWA have described patient treatment, due to their scope of work patients were previously characterized as biomedical machines, rather than patient actors involved in their own care. Objective: An abstraction hierarchy that characterizes patients as beings with complex social values and priorities is needed. This can help better understand treatment in a modern approach to care. The purpose of this study was to perform a WDA to represent the treatment of patients with medical records. Methods: The methods to develop this model included the analysis of written texts and collaboration with subject matter experts. Our WDA represents the ecology through its functional purposes, abstract functions, generalized functions, physical functions, and physical forms. Results: Compared with other work domain models, this model is able to articulate the nuanced balance between medical treatment, patient education, and limited health care resources. Concepts in the analysis were similar to the modeling choices of other WDAs but combined them in as a comprehensive, systematic, and contextual overview. The model is helpful to understand user competencies and needs. Future models could be developed to model the patient's domain and enable the exploration of the shared decision-making (SDM) paradigm. Conclusion: Our work domain model links treatment goals, decision-making constraints, and task workflows. This model can be used by system developers who would like to use ecological interface design (EID) to improve systems. Our hierarchy is the first in a future set that could explore new treatment paradigms. Future hierarchies could model the patient as a controller and could be useful for mobile app development. ", doi="10.2196/humanfactors.6857", url="http://humanfactors.jmir.org/2017/3/e16/", url="http://www.ncbi.nlm.nih.gov/pubmed/28754650" } @Article{info:doi/10.2196/jmir.7421, author="Voruganti, Teja and Grunfeld, Eva and Jamieson, Trevor and Kurahashi, M. Allison and Lokuge, Bhadra and Krzyzanowska, K. Monika and Mamdani, Muhammad and Moineddin, Rahim and Husain, Amna", title="My Team of Care Study: A Pilot Randomized Controlled Trial of a Web-Based Communication Tool for Collaborative Care in Patients With Advanced Cancer", journal="J Med Internet Res", year="2017", month="Jul", day="18", volume="19", number="7", pages="e219", keywords="MeSH: Internet", keywords="professional-patient relations", keywords="interdisciplinary communication", keywords="neoplasms", keywords="adult", keywords="chronic disease", keywords="continuity of patient care", keywords="patient care team", keywords="communication", keywords="outcome assessment (health care)", abstract="Background: The management of patients with complex care needs requires the expertise of health care providers from multiple settings and specialties. As such, there is a need for cross-setting, cross-disciplinary solutions that address deficits in communication and continuity of care. We have developed a Web-based tool for clinical collaboration, called Loop, which assembles the patient and care team in a virtual space for the purpose of facilitating communication around care management. Objective: The objectives of this pilot study were to evaluate the feasibility of integrating a tool like Loop into current care practices and to capture preliminary measures of the effect of Loop on continuity of care, quality of care, symptom distress, and health care utilization. Methods: We conducted an open-label pilot cluster randomized controlled trial allocating patients with advanced cancer (defined as stage III or IV disease) with ?3 months prognosis, their participating health care team and caregivers to receive either the Loop intervention or usual care. Outcome data were collected from patients on a monthly basis for 3 months. Trial feasibility was measured with rate of uptake, as well as recruitment and system usage. The Picker Continuity of Care subscale, Palliative care Outcomes Scale, Edmonton Symptom Assessment Scale, and Ambulatory and Home Care Record were patient self-reported measures of continuity of care, quality of care, symptom distress, and health services utilization, respectively. We conducted a content analysis of messages posted on Loop to understand how the system was used. Results: Nineteen physicians (oncologists or palliative care physicians) were randomized to the intervention or control arms. One hundred twenty-seven of their patients with advanced cancer were approached and 48 patients enrolled. Of 24 patients in the intervention arm, 20 (83.3\%) registered onto Loop. In the intervention and control arms, 12 and 11 patients completed three months of follow-up, respectively. A mean of 1.2 (range: 0 to 4) additional healthcare providers with an average total of 3 healthcare providers participated per team. An unadjusted between-arm increase of +11.4 was observed on the Picker scale in favor of the intervention arm. Other measures showed negligible changes. Loop was primarily used for medical care management, symptom reporting, and appointment coordination. Conclusions: The results of this study show that implementation of Loop was feasible. It provides useful information for planning future studies further examining effectiveness and team collaboration. Numerically higher scores were observed for the Loop arm relative to the control arm with respect to continuity of care. Future work is required to understand the incentives and barriers to participation so that the implementation of tools like Loop can be optimized. Trial Registration: ClinicalTrials.gov NCT02372994; https://clinicaltrials.gov/ct2/show/NCT02372994 (Archived by WebCite at http://www.webcitation.org/6r00L4Skb). ", doi="10.2196/jmir.7421", url="http://www.jmir.org/2017/7/e219/", url="http://www.ncbi.nlm.nih.gov/pubmed/28720558" } @Article{info:doi/10.2196/medinform.6169, author="Zolhavarieh, Seyedjamal and Parry, David and Bai, Quan", title="Issues Associated With the Use of Semantic Web Technology in Knowledge Acquisition for Clinical Decision Support Systems: Systematic Review of the Literature", journal="JMIR Med Inform", year="2017", month="Jul", day="05", volume="5", number="3", pages="e18", keywords="semantic web technology", keywords="clinical decision support system", keywords="systematic review", keywords="medical informatics", keywords="knowledge", keywords="Internet", abstract="Background: Knowledge-based clinical decision support system (KB-CDSS) can be used to help practitioners make diagnostic decisions. KB-CDSS may use clinical knowledge obtained from a wide variety of sources to make decisions. However, knowledge acquisition is one of the well-known bottlenecks in KB-CDSSs, partly because of the enormous growth in health-related knowledge available and the difficulty in assessing the quality of this knowledge as well as identifying the ``best'' knowledge to use. This bottleneck not only means that lower-quality knowledge is being used, but also that KB-CDSSs are difficult to develop for areas where expert knowledge may be limited or unavailable. Recent methods have been developed by utilizing Semantic Web (SW) technologies in order to automatically discover relevant knowledge from knowledge sources. Objective: The two main objectives of this study were to (1) identify and categorize knowledge acquisition issues that have been addressed through using SW technologies and (2) highlight the role of SW for acquiring knowledge used in the KB-CDSS. Methods: We conducted a systematic review of the recent work related to knowledge acquisition MeM for clinical decision support systems published in scientific journals. In this regard, we used the keyword search technique to extract relevant papers. Results: The retrieved papers were categorized based on two main issues: (1) format and data heterogeneity and (2) lack of semantic analysis. Most existing approaches will be discussed under these categories. A total of 27 papers were reviewed in this study. Conclusions: The potential for using SW technology in KB-CDSS has only been considered to a minor extent so far despite its promise. This review identifies some questions and issues regarding use of SW technology for extracting relevant knowledge for a KB-CDSS. ", doi="10.2196/medinform.6169", url="http://medinform.jmir.org/2017/3/e18/", url="http://www.ncbi.nlm.nih.gov/pubmed/28679487" } @Article{info:doi/10.2196/medinform.7123, author="Duz, Marco and Marshall, F. John and Parkin, Tim", title="Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records", journal="JMIR Med Inform", year="2017", month="Jun", day="29", volume="5", number="2", pages="e17", keywords="text mining", keywords="data mining", keywords="electronic medical record", keywords="validation studies", abstract="Background: The use of electronic medical records (EMRs) offers opportunity for clinical epidemiological research. With large EMR databases, automated analysis processes are necessary but require thorough validation before they can be routinely used. Objective: The aim of this study was to validate a computer-assisted technique using commercially available content analysis software (SimStat-WordStat v.6 (SS/WS), Provalis Research) for mining free-text EMRs. Methods: The dataset used for the validation process included life-long EMRs from 335 patients (17,563 rows of data), selected at random from a larger dataset (141,543 patients, {\textasciitilde}2.6 million rows of data) and obtained from 10 equine veterinary practices in the United Kingdom. The ability of the computer-assisted technique to detect rows of data (cases) of colic, renal failure, right dorsal colitis, and non-steroidal anti-inflammatory drug (NSAID) use in the population was compared with manual classification. The first step of the computer-assisted analysis process was the definition of inclusion dictionaries to identify cases, including terms identifying a condition of interest. Words in inclusion dictionaries were selected from the list of all words in the dataset obtained in SS/WS. The second step consisted of defining an exclusion dictionary, including combinations of words to remove cases erroneously classified by the inclusion dictionary alone. The third step was the definition of a reinclusion dictionary to reinclude cases that had been erroneously classified by the exclusion dictionary. Finally, cases obtained by the exclusion dictionary were removed from cases obtained by the inclusion dictionary, and cases from the reinclusion dictionary were subsequently reincluded using Rv3.0.2 (R Foundation for Statistical Computing, Vienna, Austria). Manual analysis was performed as a separate process by a single experienced clinician reading through the dataset once and classifying each row of data based on the interpretation of the free-text notes. Validation was performed by comparison of the computer-assisted method with manual analysis, which was used as the gold standard. Sensitivity, specificity, negative predictive values (NPVs), positive predictive values (PPVs), and F values of the computer-assisted process were calculated by comparing them with the manual classification. Results: Lowest sensitivity, specificity, PPVs, NPVs, and F values were 99.82\% (1128/1130), 99.88\% (16410/16429), 94.6\% (223/239), 100.00\% (16410/16412), and 99.0\% (100{\texttimes}2{\texttimes}0.983{\texttimes}0.998/[0.983+0.998]), respectively. The computer-assisted process required few seconds to run, although an estimated 30 h were required for dictionary creation. Manual classification required approximately 80 man-hours. Conclusions: The critical step in this work is the creation of accurate and inclusive dictionaries to ensure that no potential cases are missed. It is significantly easier to remove false positive terms from a SS/WS selected subset of a large database than search that original database for potential false negatives. The benefits of using this method are proportional to the size of the dataset to be analyzed. ", doi="10.2196/medinform.7123", url="http://medinform.jmir.org/2017/2/e17/", url="http://www.ncbi.nlm.nih.gov/pubmed/28663163" } @Article{info:doi/10.2196/jmir.7405, author="Kooij, Laura and Groen, G. Wim and van Harten, H. Wim", title="The Effectiveness of Information Technology-Supported Shared Care for Patients With Chronic Disease: A Systematic Review", journal="J Med Internet Res", year="2017", month="Jun", day="22", volume="19", number="6", pages="e221", keywords="review", keywords="integrated healthcare systems", keywords="health information systems", keywords="chronic disease", abstract="Background: In patients with chronic disease, many health care professionals are involved during treatment and follow-up. This leads to fragmentation that in turn may lead to suboptimal care. Shared care is a means to improve the integration of care delivered by various providers, specifically primary care physicians (PCPs) and specialty care professionals, for patients with chronic disease. The use of information technology (IT) in this field seems promising. Objective: Our aim was to systematically review the literature regarding the effectiveness of IT-supported shared care interventions in chronic disease in terms of provider or professional, process, health or clinical and financial outcomes. Additionally, our aim was to provide an inventory of the IT applications' characteristics that support such interventions. Methods: PubMed, Scopus, and EMBASE were searched from 2006 to 2015 to identify relevant studies using search terms related to shared care, chronic disease, and IT. Eligible studies were in the English language, and the randomized controlled trials (RCTs), controlled trials, or single group pre-post studies used reported on the effects of IT-supported shared care in patients with chronic disease and cancer. The interventions had to involve providers from both primary and specialty health care. Intervention and IT characteristics and effectiveness---in terms of provider or professional (proximal), process (intermediate), health or clinical and financial (distal) outcomes---were extracted. Risk of bias of (cluster) RCTs was assessed using the Cochrane tool. Results: The initial search yielded 4167 results. Thirteen publications were used, including 11 (cluster) RCTs, a controlled trial, and a pre-post feasibility study. Four main categories of IT applications were identified: (1) electronic decision support tools, (2) electronic platform with a call-center, (3) electronic health records, and (4) electronic communication applications. Positive effects were found for decision support-based interventions on financial and health outcomes, such as physical activity. Electronic health record use improved PCP visits and reduced rehospitalization. Electronic platform use resulted in fewer readmissions and better clinical outcomes---for example, in terms of body mass index (BMI) and dyspnea. The use of electronic communication applications using text-based information transfer between professionals had a positive effect on the number of PCPs contacting hospitals, PCPs' satisfaction, and confidence. Conclusions: IT-supported shared care can improve proximal outcomes, such as confidence and satisfaction of PCPs, especially in using electronic communication applications. Positive effects on intermediate and distal outcomes were also reported but were mixed. Surprisingly, few studies were found that substantiated these anticipated benefits. Studies showed a large heterogeneity in the included populations, outcome measures, and IT applications used. Therefore, a firm conclusion cannot be drawn. As IT applications are developed and implemented rapidly, evidence is needed to test the specific added value of IT in shared care interventions. This is expected to require innovative research methods. ", doi="10.2196/jmir.7405", url="http://www.jmir.org/2017/6/e221/", url="http://www.ncbi.nlm.nih.gov/pubmed/28642218" } @Article{info:doi/10.2196/medinform.6959, author="Balikuddembe, S. Michael and Tumwesigye, M. Nazarius and Wakholi, K. Peter and Tyllesk{\"a}r, Thorkild", title="Computerized Childbirth Monitoring Tools for Health Care Providers Managing Labor: A Scoping Review", journal="JMIR Med Inform", year="2017", month="Jun", day="15", volume="5", number="2", pages="e14", keywords="childbirth", keywords="obstetric labor", keywords="fetal monitoring", keywords="medical informatics applications", keywords="systematic review", abstract="Background: Proper monitoring of labor and childbirth prevents many pregnancy-related complications. However, monitoring is still poor in many places partly due to the usability concerns of support tools such as the partograph. In 2011, the World Health Organization (WHO) called for the development and evaluation of context-adaptable electronic health solutions to health challenges. Computerized tools have penetrated many areas of health care, but their influence in supporting health staff with childbirth seems limited. Objective: The objective of this scoping review was to determine the scope and trends of research on computerized labor monitoring tools that could be used by health care providers in childbirth management. Methods: We used key terms to search the Web for eligible peer-reviewed and gray literature. Eligibility criteria were a computerized labor monitoring tool for maternity service providers and dated 2006 to mid-2016. Retrieved papers were screened to eliminate ineligible papers, and consensus was reached on the papers included in the final analysis. Results: We started with about 380,000 papers, of which 14 papers qualified for the final analysis. Most tools were at the design and implementation stages of development. Three papers addressed post-implementation evaluations of two tools. No documentation on clinical outcome studies was retrieved. The parameters targeted with the tools varied, but they included fetal heart (10 of 11 tools), labor progress (8 of 11), and maternal status (7 of 11). Most tools were designed for use in personal computers in low-resource settings and could be customized for different user needs. Conclusions: Research on computerized labor monitoring tools is inadequate. Compared with other labor parameters, there was preponderance to fetal heart monitoring and hardly any summative evaluation of the available tools. More research, including clinical outcomes evaluation of computerized childbirth monitoring tools, is needed. ", doi="10.2196/medinform.6959", url="http://medinform.jmir.org/2017/2/e14/", url="http://www.ncbi.nlm.nih.gov/pubmed/28619702" } @Article{info:doi/10.2196/jmir.6887, author="Meystre, Stephane and Gouripeddi, Ramkiran and Tieder, Joel and Simmons, Jeffrey and Srivastava, Rajendu and Shah, Samir", title="Enhancing Comparative Effectiveness Research With Automated Pediatric Pneumonia Detection in a Multi-Institutional Clinical Repository: A PHIS+ Pilot Study", journal="J Med Internet Res", year="2017", month="May", day="15", volume="19", number="5", pages="e162", keywords="natural language processing", keywords="pneumonia, bacterial", keywords="medical informatics", keywords="comparative effectiveness research", abstract="Background: Community-acquired pneumonia is a leading cause of pediatric morbidity. Administrative data are often used to conduct comparative effectiveness research (CER) with sufficient sample sizes to enhance detection of important outcomes. However, such studies are prone to misclassification errors because of the variable accuracy of discharge diagnosis codes. Objective: The aim of this study was to develop an automated, scalable, and accurate method to determine the presence or absence of pneumonia in children using chest imaging reports. Methods: The multi-institutional PHIS+ clinical repository was developed to support pediatric CER by expanding an administrative database of children's hospitals with detailed clinical data. To develop a scalable approach to find patients with bacterial pneumonia more accurately, we developed a Natural Language Processing (NLP) application to extract relevant information from chest diagnostic imaging reports. Domain experts established a reference standard by manually annotating 282 reports to train and then test the NLP application. Findings of pleural effusion, pulmonary infiltrate, and pneumonia were automatically extracted from the reports and then used to automatically classify whether a report was consistent with bacterial pneumonia. Results: Compared with the annotated diagnostic imaging reports reference standard, the most accurate implementation of machine learning algorithms in our NLP application allowed extracting relevant findings with a sensitivity of .939 and a positive predictive value of .925. It allowed classifying reports with a sensitivity of .71, a positive predictive value of .86, and a specificity of .962. When compared with each of the domain experts manually annotating these reports, the NLP application allowed for significantly higher sensitivity (.71 vs .527) and similar positive predictive value and specificity . Conclusions: NLP-based pneumonia information extraction of pediatric diagnostic imaging reports performed better than domain experts in this pilot study. NLP is an efficient method to extract information from a large collection of imaging reports to facilitate CER. ", doi="10.2196/jmir.6887", url="http://www.jmir.org/2017/5/e162/", url="http://www.ncbi.nlm.nih.gov/pubmed/28506958" } @Article{info:doi/10.2196/jmir.7092, author="Park, Eunjeong and Chang, Hyuk-Jae and Nam, Suk Hyo", title="Use of Machine Learning Classifiers and Sensor Data to Detect Neurological Deficit in Stroke Patients", journal="J Med Internet Res", year="2017", month="Apr", day="18", volume="19", number="4", pages="e120", keywords="medical informatics", keywords="machine learning", keywords="motor", keywords="neurological examination", keywords="stroke", abstract="Background: The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. Objective: The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. Methods: We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Results: Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3\%. Conclusions: Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ", doi="10.2196/jmir.7092", url="http://www.jmir.org/2017/4/e120/", url="http://www.ncbi.nlm.nih.gov/pubmed/28420599" } @Article{info:doi/10.2196/mhealth.7185, author="Adepoju, Omolade Ibukun-Oluwa and Albersen, Antonia Bregje Joanna and De Brouwere, Vincent and van Roosmalen, Jos and Zweekhorst, Marjolein", title="mHealth for Clinical Decision-Making in Sub-Saharan Africa: A Scoping Review", journal="JMIR Mhealth Uhealth", year="2017", month="Mar", day="23", volume="5", number="3", pages="e38", keywords="mHealth", keywords="decision support systems, clinical", keywords="sub-Saharan Africa", keywords="clinical decision-making", abstract="Background: In a bid to deliver quality health services in resource-poor settings, mobile health (mHealth) is increasingly being adopted. The role of mHealth in facilitating evidence-based clinical decision-making through data collection, decision algorithms, and evidence-based guidelines, for example, is established in resource-rich settings. However, the extent to which mobile clinical decision support systems (mCDSS) have been adopted specifically in resource-poor settings such as Africa and the lessons learned about their use in such settings are yet to be established. Objective: The aim of this study was to synthesize evidence on the use of mHealth for point-of-care decision support and improved quality of care by health care workers in Africa. Methods: A scoping review of 4 peer-reviewed and 1 grey literature databases was conducted. No date limits were applied, but only articles in English language were selected. Using pre-established criteria, 2 reviewers screened articles and extracted data. Articles were analyzed using Microsoft Excel and MAXQDA. Results: We retained 22 articles representing 11 different studies in 7 sub-Saharan African countries. Interventions were mainly in the domain of maternal health and ranged from simple text messaging (short message service, SMS) to complex multicomponent interventions. Although health workers are generally supportive of mCDSS and perceive them as useful, concerns about increased workload and altered workflow hinder sustainability. Facilitators and barriers to use of mCDSS include technical and infrastructural support, ownership, health system challenges, and training. Conclusions: The use of mCDSS in sub-Saharan Africa is an indication of progress in mHealth, although their effect on quality of service delivery is yet to be fully explored. Lessons learned are useful for informing future research, policy, and practice for technologically supported health care delivery, especially in resource-poor settings. ", doi="10.2196/mhealth.7185", url="http://mhealth.jmir.org/2017/3/e38/", url="http://www.ncbi.nlm.nih.gov/pubmed/28336504" } @Article{info:doi/10.2196/jmir.6663, author="Cahan, Amos and Cimino, J. James", title="A Learning Health Care System Using Computer-Aided Diagnosis", journal="J Med Internet Res", year="2017", month="Mar", day="08", volume="19", number="3", pages="e54", keywords="diagnostic errors", keywords="diagnosis, computer-assisted", keywords="decision support systems, clinical", keywords="pattern recognition, automated", keywords="knowledge bases", keywords="knowledge management", keywords="diagnosis support systems", keywords="crowdsourcing", keywords="structured knowledge representation", doi="10.2196/jmir.6663", url="http://www.jmir.org/2017/3/e54/", url="http://www.ncbi.nlm.nih.gov/pubmed/28274905" } @Article{info:doi/10.2196/mhealth.6291, author="Furberg, D. Robert and Williams, Pamela and Bagwell, Jacqueline and LaBresh, Kenneth", title="A Mobile Clinical Decision Support Tool for Pediatric Cardiovascular Risk-Reduction Clinical Practice Guidelines: Development and Description", journal="JMIR Mhealth Uhealth", year="2017", month="Mar", day="07", volume="5", number="3", pages="e29", keywords="pediatrics", keywords="cardiovascular risk reduction", keywords="mHealth", keywords="clinical decision support", keywords="clinical practice guidelines", abstract="Background: Widespread application of research findings to improve patient outcomes remains inadequate, and failure to routinely translate research findings into daily clinical practice is a major barrier for the implementation of any evidence-based guideline. Strategies to increase guideline uptake in primary care pediatric practices and to facilitate adherence to recommendations are required. Objective: Our objective was to operationalize the US National Heart, Lung, and Blood Institute's Integrated Guidelines for Cardiovascular Health and Risk Reduction in Children and Adolescents into a mobile clinical decision support (CDS) system for healthcare providers, and to describe the process development and outcomes. Methods: To overcome the difficulty of translating clinical practice guidelines into a computable form that can be used by a CDS system, we used a multilayer framework to convert the evidence synthesis into executable knowledge. We used an iterative process of design, testing, and revision through each step in the translation of the guidelines for use in a CDS tool to support the development of 4 validated modules: an integrated risk assessment; a blood pressure calculator; a body mass index calculator; and a lipid management instrument. Results: The iterative revision process identified several opportunities to improve the CDS tool. Operationalizing the integrated guideline identified numerous areas in which the guideline was vague or incorrect and required more explicit operationalization. Iterative revisions led to workable solutions to problems and understanding of the limitations of the tool. Conclusions: The process and experiences described provide a model for other mobile CDS systems that translate written clinical practice guidelines into actionable, real-time clinical recommendations. ", doi="10.2196/mhealth.6291", url="http://mhealth.jmir.org/2017/3/e29/", url="http://www.ncbi.nlm.nih.gov/pubmed/28270384" } @Article{info:doi/10.2196/jmir.7207, author="Shah, Ahmar Syed and Velardo, Carmelo and Farmer, Andrew and Tarassenko, Lionel", title="Exacerbations in Chronic Obstructive Pulmonary Disease: Identification and Prediction Using a Digital Health System", journal="J Med Internet Res", year="2017", month="Mar", day="07", volume="19", number="3", pages="e69", keywords="COPD", keywords="disease exacerbation", keywords="mobile health", keywords="self-management", keywords="pulse oximetry", keywords="respiratory rate", keywords="clinical prediction rule", keywords="algorithms", abstract="Background: Chronic obstructive pulmonary disease (COPD) is a progressive, chronic respiratory disease with a significant socioeconomic burden. Exacerbations, the sudden and sustained worsening of symptoms, can lead to hospitalization and reduce quality of life. Major limitations of previous telemonitoring interventions for COPD include low compliance, lack of consensus on what constitutes an exacerbation, limited numbers of patients, and short monitoring periods. We developed a telemonitoring system based on a digital health platform that was used to collect data from the 1-year EDGE (Self Management and Support Programme) COPD clinical trial aiming at daily monitoring in a heterogeneous group of patients with moderate to severe COPD. Objective: The objectives of the study were as follows: first, to develop a systematic and reproducible approach to exacerbation identification and to track the progression of patient condition during remote monitoring; and second, to develop a robust algorithm able to predict COPD exacerbation, based on vital signs acquired from a pulse oximeter. Methods: We used data from 110 patients, with a combined monitoring period of more than 35,000 days. We propose a finite-state machine--based approach for modeling COPD exacerbation to gain a deeper insight into COPD patient condition during home monitoring to take account of the time course of symptoms. A robust algorithm based on short-period trend analysis and logistic regression using vital signs derived from a pulse oximeter is also developed to predict exacerbations. Results: On the basis of 27,260 sessions recorded during the clinical trial (average usage of 5.3 times per week for 12 months), there were 361 exacerbation events. There was considerable variation in the length of exacerbation events, with a mean length of 8.8 days. The mean value of oxygen saturation was lower, and both the pulse rate and respiratory rate were higher before an impending exacerbation episode, compared with stable periods. On the basis of the classifier developed in this work, prediction of COPD exacerbation episodes with 60\%-80\% sensitivity will result in 68\%-36\% specificity. Conclusions: All 3 vital signs acquired from a pulse oximeter (pulse rate, oxygen saturation, and respiratory rate) are predictive of COPD exacerbation events, with oxygen saturation being the most predictive, followed by respiratory rate and pulse rate. Combination of these vital signs with a robust algorithm based on machine learning leads to further improvement in positive predictive accuracy. Trial Registration: International Standard Randomized Controlled Trial Number (ISRCTN): 40367841; http://www.isrctn.com/ISRCTN40367841 (Archived by WebCite at http://www.webcitation.org/6olpMWNpc) ", doi="10.2196/jmir.7207", url="http://www.jmir.org/2017/3/e69/", url="http://www.ncbi.nlm.nih.gov/pubmed/28270380" } @Article{info:doi/10.2196/medinform.6730, author="Sharafoddini, Anis and Dubin, A. Joel and Lee, Joon", title="Patient Similarity in Prediction Models Based on Health Data: A Scoping Review", journal="JMIR Med Inform", year="2017", month="Mar", day="03", volume="5", number="1", pages="e7", keywords="patient similarity", keywords="predictive modeling", keywords="health data", keywords="medical records", keywords="electronic health records", keywords="personalized medicine", keywords="data-driven prediction", keywords="review", abstract="Background: Physicians and health policy makers are required to make predictions during their decision making in various medical problems. Many advances have been made in predictive modeling toward outcome prediction, but these innovations target an average patient and are insufficiently adjustable for individual patients. One developing idea in this field is individualized predictive analytics based on patient similarity. The goal of this approach is to identify patients who are similar to an index patient and derive insights from the records of similar patients to provide personalized predictions.. Objective: The aim is to summarize and review published studies describing computer-based approaches for predicting patients' future health status based on health data and patient similarity, identify gaps, and provide a starting point for related future research. Methods: The method involved (1) conducting the review by performing automated searches in Scopus, PubMed, and ISI Web of Science, selecting relevant studies by first screening titles and abstracts then analyzing full-texts, and (2) documenting by extracting publication details and information on context, predictors, missing data, modeling algorithm, outcome, and evaluation methods into a matrix table, synthesizing data, and reporting results. Results: After duplicate removal, 1339 articles were screened in abstracts and titles and 67 were selected for full-text review. In total, 22 articles met the inclusion criteria. Within included articles, hospitals were the main source of data (n=10). Cardiovascular disease (n=7) and diabetes (n=4) were the dominant patient diseases. Most studies (n=18) used neighborhood-based approaches in devising prediction models. Two studies showed that patient similarity-based modeling outperformed population-based predictive methods. Conclusions: Interest in patient similarity-based predictive modeling for diagnosis and prognosis has been growing. In addition to raw/coded health data, wavelet transform and term frequency-inverse document frequency methods were employed to extract predictors. Selecting predictors with potential to highlight special cases and defining new patient similarity metrics were among the gaps identified in the existing literature that provide starting points for future work. Patient status prediction models based on patient similarity and health data offer exciting potential for personalizing and ultimately improving health care, leading to better patient outcomes. ", doi="10.2196/medinform.6730", url="http://medinform.jmir.org/2017/1/e7/", url="http://www.ncbi.nlm.nih.gov/pubmed/28258046" } @Article{info:doi/10.2196/jmir.6962, author="Zheng, Jiaping and Yu, Hong", title="Readability Formulas and User Perceptions of Electronic Health Records Difficulty: A Corpus Study", journal="J Med Internet Res", year="2017", month="Mar", day="02", volume="19", number="3", pages="e59", keywords="electronic health records", keywords="readability", keywords="patients", abstract="Background: Electronic health records (EHRs) are a rich resource for developing applications to engage patients and foster patient activation, thus holding a strong potential to enhance patient-centered care. Studies have shown that providing patients with access to their own EHR notes may improve the understanding of their own clinical conditions and treatments, leading to improved health care outcomes. However, the highly technical language in EHR notes impedes patients' comprehension. Numerous studies have evaluated the difficulty of health-related text using readability formulas such as Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Gunning-Fog Index (GFI). They conclude that the materials are often written at a grade level higher than common recommendations. Objective: The objective of our study was to explore the relationship between the aforementioned readability formulas and the laypeople's perceived difficulty on 2 genres of text: general health information and EHR notes. We also validated the formulas' appropriateness and generalizability on predicting difficulty levels of highly complex technical documents. Methods: We collected 140 Wikipedia articles on diabetes and 242 EHR notes with diabetes International Classification of Diseases, Ninth Revision code. We recruited 15 Amazon Mechanical Turk (AMT) users to rate difficulty levels of the documents. Correlations between laypeople's perceived difficulty levels and readability formula scores were measured, and their difference was tested. We also compared word usage and the impact of medical concepts of the 2 genres of text. Results: The distributions of both readability formulas' scores (P<.001) and laypeople's perceptions (P=.002) on the 2 genres were different. Correlations of readability predictions and laypeople's perceptions were weak. Furthermore, despite being graded at similar levels, documents of different genres were still perceived with different difficulty (P<.001). Word usage in the 2 related genres still differed significantly (P<.001). Conclusions: Our findings suggested that the readability formulas' predictions did not align with perceived difficulty in either text genre. The widely used readability formulas were highly correlated with each other but did not show adequate correlation with readers' perceived difficulty. Therefore, they were not appropriate to assess the readability of EHR notes. ", doi="10.2196/jmir.6962", url="http://www.jmir.org/2017/3/e59/", url="http://www.ncbi.nlm.nih.gov/pubmed/28254738" } @Article{info:doi/10.2196/jmir.5954, author="Berrouiguet, Sofian and Barrig{\'o}n, Luisa Maria and Brandt, A. Sara and Nitzburg, C. George and Ovejero, Santiago and Alvarez-Garcia, Raquel and Carballo, Juan and Walter, Michel and Billot, Romain and Lenca, Philippe and Delgado-Gomez, David and Ropars, Juliette and de la Calle Gonzalez, Ivan and Courtet, Philippe and Baca-Garc{\'i}a, Enrique", title="Ecological Assessment of Clinicians' Antipsychotic Prescription Habits in Psychiatric Inpatients: A Novel Web- and Mobile Phone--Based Prototype for a Dynamic Clinical Decision Support System", journal="J Med Internet Res", year="2017", month="Jan", day="26", volume="19", number="1", pages="e25", keywords="clinical decision-making", keywords="antipsychotic agents", keywords="software", keywords="mobile applications", keywords="off-label use", keywords="prescriptions", abstract="Background: Electronic prescribing devices with clinical decision support systems (CDSSs) hold the potential to significantly improve pharmacological treatment management. Objective: The aim of our study was to develop a novel Web- and mobile phone--based application to provide a dynamic CDSS by monitoring and analyzing practitioners' antipsychotic prescription habits and simultaneously linking these data to inpatients' symptom changes. Methods: We recruited 353 psychiatric inpatients whose symptom levels and prescribed medications were inputted into the MEmind application. We standardized all medications in the MEmind database using the Anatomical Therapeutic Chemical (ATC) classification system and the defined daily dose (DDD). For each patient, MEmind calculated an average for the daily dose prescribed for antipsychotics (using the N05A ATC code), prescribed daily dose (PDD), and the PDD to DDD ratio. Results: MEmind results found that antipsychotics were used by 61.5\% (217/353) of inpatients, with the largest proportion being patients with schizophrenia spectrum disorders (33.4\%, 118/353). Of the 217 patients, 137 (63.2\%, 137/217) were administered pharmacological monotherapy and 80 (36.8\%, 80/217) were administered polytherapy. Antipsychotics were used mostly in schizophrenia spectrum and related psychotic disorders, but they were also prescribed in other nonpsychotic diagnoses. Notably, we observed polypharmacy going against current antipsychotics guidelines. Conclusions: MEmind data indicated that antipsychotic polypharmacy and off-label use in inpatient units is commonly practiced. MEmind holds the potential to create a dynamic CDSS that provides real-time tracking of prescription practices and symptom change. Such feedback can help practitioners determine a maximally therapeutic drug treatment while avoiding unproductive overprescription and off-label use. ", doi="10.2196/jmir.5954", url="http://www.jmir.org/2017/1/e25/", url="http://www.ncbi.nlm.nih.gov/pubmed/28126703" } @Article{info:doi/10.2196/medinform.6690, author="Lee, Joon", title="Patient-Specific Predictive Modeling Using Random Forests: An Observational Study for the Critically Ill", journal="JMIR Med Inform", year="2017", month="Jan", day="17", volume="5", number="1", pages="e3", keywords="forecasting", keywords="critical care", keywords="predictive analytics", keywords="patient similarity", keywords="random forest", abstract="Background: With a large-scale electronic health record repository, it is feasible to build a customized patient outcome prediction model specifically for a given patient. This approach involves identifying past patients who are similar to the present patient and using their data to train a personalized predictive model. Our previous work investigated a cosine-similarity patient similarity metric (PSM) for such patient-specific predictive modeling. Objective: The objective of the study is to investigate the random forest (RF) proximity measure as a PSM in the context of personalized mortality prediction for intensive care unit (ICU) patients. Methods: A total of 17,152 ICU admissions were extracted from the Multiparameter Intelligent Monitoring in Intensive Care II database. A number of predictor variables were extracted from the first 24 hours in the ICU. Outcome to be predicted was 30-day mortality. A patient-specific predictive model was trained for each ICU admission using an RF PSM inspired by the RF proximity measure. Death counting, logistic regression, decision tree, and RF models were studied with a hard threshold applied to RF PSM values to only include the M most similar patients in model training, where M was varied. In addition, case-specific random forests (CSRFs), which uses RF proximity for weighted bootstrapping, were trained. Results: Compared to our previous study that investigated a cosine similarity PSM, the RF PSM resulted in superior or comparable predictive performance. RF and CSRF exhibited the best performances (in terms of mean area under the receiver operating characteristic curve [95\% confidence interval], RF: 0.839 [0.835-0.844]; CSRF: 0.832 [0.821-0.843]). RF and CSRF did not benefit from personalization via the use of the RF PSM, while the other models did. Conclusions: The RF PSM led to good mortality prediction performance for several predictive models, although it failed to induce improved performance in RF and CSRF. The distinction between predictor and similarity variables is an important issue arising from the present study. RFs present a promising method for patient-specific outcome prediction. ", doi="10.2196/medinform.6690", url="http://medinform.jmir.org/2017/1/e3/", url="http://www.ncbi.nlm.nih.gov/pubmed/28096065" } @Article{info:doi/10.2196/humanfactors.6642, author="Abraham, Joanna and Kannampallil, G. Thomas and Patel, L. Vimla and Patel, Bela and Almoosa, F. Khalid", title="Impact of Structured Rounding Tools on Time Allocation During Multidisciplinary Rounds: An Observational Study", journal="JMIR Hum Factors", year="2016", month="Dec", day="09", volume="3", number="2", pages="e29", keywords="teaching rounds", keywords="communication", keywords="intensive care units", abstract="Background: Recent research has shown evidence of disproportionate time allocation for patient communication during multidisciplinary rounds (MDRs). Studies have shown that patients discussed later during rounds receive lesser time. Objective: The aim of our study was to investigate whether disproportionate time allocation effects persist with the use of structured rounding tools. Methods: Using audio recordings of rounds (N=82 patients), we compared time allocation and communication breakdowns between a problem-based Subjective, Objective, Assessment, and Plan (SOAP) and a system-based Handoff Intervention Tool (HAND-IT) rounding tools. Results: We found no significant linear dependence of the order of patient presentation on the time spent or on communication breakdowns for both structured tools. However, for the problem-based tool, there was a significant linear relationship between the time spent on discussing a patient and the number of communication breakdowns (P<.05)----with an average of 1.04 additional breakdowns with every 120 seconds in discussion. Conclusions: The use of structured rounding tools potentially mitigates disproportionate time allocation and communication breakdowns during rounds, with the more structured HAND-IT, almost completely eliminating such effects. These results have potential implications for planning, prioritization, and training for time management during MDRs. ", doi="10.2196/humanfactors.6642", url="http://humanfactors.jmir.org/2016/2/e29/", url="http://www.ncbi.nlm.nih.gov/pubmed/27940423" } @Article{info:doi/10.2196/mededu.6288, author="De Angelis, Gino and Davies, Barbara and King, Judy and McEwan, Jessica and Cavallo, Sabrina and Loew, Laurianne and Wells, A. George and Brosseau, Lucie", title="Information and Communication Technologies for the Dissemination of Clinical Practice Guidelines to Health Professionals: A Systematic Review", journal="JMIR Med Educ", year="2016", month="Nov", day="30", volume="2", number="2", pages="e16", keywords="health information technologies", keywords="electronic mail", keywords="email", keywords="Web 2.0", keywords="practice guidelines", keywords="health professions", keywords="information dissemination", abstract="Background: The transfer of research knowledge into clinical practice can be a continuous challenge for researchers. Information and communication technologies, such as websites and email, have emerged as popular tools for the dissemination of evidence to health professionals. Objective: The objective of this systematic review was to identify research on health professionals' perceived usability and practice behavior change of information and communication technologies for the dissemination of clinical practice guidelines. Methods: We used a systematic approach to retrieve and extract data about relevant studies. We identified 2248 citations, of which 21 studies met criteria for inclusion; 20 studies were randomized controlled trials, and 1 was a controlled clinical trial. The following information and communication technologies were evaluated: websites (5 studies), computer software (3 studies), Web-based workshops (2 studies), computerized decision support systems (2 studies), electronic educational game (1 study), email (2 studies), and multifaceted interventions that consisted of at least one information and communication technology component (6 studies). Results: Website studies demonstrated significant improvements in perceived usefulness and perceived ease of use, but not for knowledge, reducing barriers, and intention to use clinical practice guidelines. Computer software studies demonstrated significant improvements in perceived usefulness, but not for knowledge and skills. Web-based workshop and email studies demonstrated significant improvements in knowledge, perceived usefulness, and skills. An electronic educational game intervention demonstrated a significant improvement from baseline in knowledge after 12 and 24 weeks. Computerized decision support system studies demonstrated variable findings for improvement in skills. Multifaceted interventions demonstrated significant improvements in beliefs about capabilities, perceived usefulness, and intention to use clinical practice guidelines, but variable findings for improvements in skills. Most multifaceted studies demonstrated significant improvements in knowledge. Conclusions: The findings suggest that health professionals' perceived usability and practice behavior change vary by type of information and communication technology. Heterogeneity and the paucity of properly conducted studies did not allow for a clear comparison between studies and a conclusion on the effectiveness of information and communication technologies as a knowledge translation strategy for the dissemination of clinical practice guidelines. ", doi="10.2196/mededu.6288", url="http://mededu.jmir.org/2016/2/e16/", url="http://www.ncbi.nlm.nih.gov/pubmed/27903488" } @Article{info:doi/10.2196/biomedeng.6401, author="Elgendi, Mohamed and Howard, Newton and Lovell, Nigel and Cichocki, Andrzej and Brearley, Matt and Abbott, Derek and Adatia, Ian", title="A Six-Step Framework on Biomedical Signal Analysis for Tackling Noncommunicable Diseases: Current and Future Perspectives", journal="JMIR Biomed Eng", year="2016", month="Oct", day="17", volume="1", number="1", pages="e1", keywords="mobile health", keywords="smart healthcare", keywords="affordable diagnostics", keywords="wearable devices", keywords="global health", keywords="eHealth", keywords="mHealth", keywords="point-of-care devices", doi="10.2196/biomedeng.6401", url="http://biomedeng.jmir.org/2016/1/e1/" } @Article{info:doi/10.2196/mhealth.6189, author="Johnson, Emily and Emani, K. Vamsi and Ren, Jinma", title="Breadth of Coverage, Ease of Use, and Quality of Mobile Point-of-Care Tool Information Summaries: An Evaluation", journal="JMIR Mhealth Uhealth", year="2016", month="Oct", day="12", volume="4", number="4", pages="e117", keywords="mHealth", keywords="mobile health", keywords="mobile app", keywords="assessment", keywords="internal medicine", keywords="point-of-care tools", abstract="Background: With advances in mobile technology, accessibility of clinical resources at the point of care has increased. Objective: The objective of this research was to identify if six selected mobile point-of-care tools meet the needs of clinicians in internal medicine. Point-of-care tools were evaluated for breadth of coverage, ease of use, and quality. Methods: Six point-of-care tools were evaluated utilizing four different devices (two smartphones and two tablets). Breadth of coverage was measured using select International Classification of Diseases, Ninth Revision, codes if information on summary, etiology, pathophysiology, clinical manifestations, diagnosis, treatment, and prognosis was provided. Quality measures included treatment and diagnostic inline references and individual and application time stamping. Ease of use covered search within topic, table of contents, scrolling, affordance, connectivity, and personal accounts. Analysis of variance based on the rank of score was used. Results: Breadth of coverage was similar among Medscape (mean 6.88), Uptodate (mean 6.51), DynaMedPlus (mean 6.46), and EvidencePlus (mean 6.41) (P>.05) with DynaMed (mean 5.53) and Epocrates (mean 6.12) scoring significantly lower (P<.05). Ease of use had DynaMedPlus with the highest score, and EvidencePlus was lowest (6.0 vs 4.0, respectively, P<.05). For quality, reviewers rated the same score (4.00) for all tools except for Medscape, which was rated lower (P<.05). Conclusions: For breadth of coverage, most point-of-care tools were similar with the exception of DynaMed. For ease of use, only UpToDate and DynaMedPlus allow for search within a topic. All point-of-care tools have remote access with the exception of UpToDate and Essential Evidence Plus. All tools except Medscape covered criteria for quality evaluation. Overall, there was no significant difference between the point-of-care tools with regard to coverage on common topics used by internal medicine clinicians. Selection of point-of-care tools is highly dependent on individual preference based on ease of use and cost of the application. ", doi="10.2196/mhealth.6189", url="http://mhealth.jmir.org/2016/4/e117/", url="http://www.ncbi.nlm.nih.gov/pubmed/27733328" } @Article{info:doi/10.2196/publichealth.5810, author="Hoyt, Eugene Robert and Snider, Dallas and Thompson, Carla and Mantravadi, Sarita", title="IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics", journal="JMIR Public Health Surveill", year="2016", month="Oct", day="11", volume="2", number="2", pages="e157", keywords="data analysis", keywords="data mining", keywords="machine learning", keywords="statistical data analysis", keywords="natural language processing", abstract="Background: We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective: To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods: The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results: Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions: IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise. ", doi="10.2196/publichealth.5810", url="http://publichealth.jmir.org/2016/2/e157/", url="http://www.ncbi.nlm.nih.gov/pubmed/27729304" } @Article{info:doi/10.2196/medinform.5984, author="Choi, Iee and Kim, Kyu Jin and Kim, Jun Sun and Cho, Chul Soo and Kim, Nyeo Il", title="Satisfaction Levels and Factors Influencing Satisfaction With Use of a Social App for Neonatal and Pediatric Patient Transfer Information Systems: A Questionnaire Study Among Doctors", journal="JMIR Med Inform", year="2016", month="Aug", day="04", volume="4", number="3", pages="e26", keywords="social media", keywords="personal satisfaction", keywords="information systems", keywords="patient transfer", abstract="Background: The treatment of neonatal and pediatric patients is limited to certain medical institutions depending on treatment difficulty. Effective patient transfers are necessary in situations where there are limited medical resources. In South Korea, the government has made a considerable effort to establish patient transfer systems using various means, such as websites, telephone, and so forth. However, in reality, the effort has not yet been effective. Objective: In this study, we ran a patient transfer information system using a social app for effective patient transfer. We analyzed the results, satisfaction levels, and the factors influencing satisfaction. Methods: Naver Band is a social app and mobile community application which in Korea is more popular than Facebook. It facilitates group communication. Using Naver Band, two systems were created: one by the Neonatal Intensive Care Unit and the other by the Department of Pediatrics at Chonbuk National University Children's Hospital, South Korea. The information necessary for patient transfers was provided to participating obstetricians (n=51) and pediatricians (n=90). We conducted a survey to evaluate the systems and reviewed the results retrospectively. Results: The number of patients transferred was reported to increase by 65\% (26/40) obstetricians and 40\% (23/57) pediatricians. The time taken for transfers was reported to decrease by 72\% (29/40) obstetricians and 59\% (34/57) pediatricians. Satisfaction was indicated by 83\% (33/40) obstetricians and 89\% (51/57) pediatricians. Regarding factors influencing satisfaction, the obstetricians reported communication with doctors in charge (P=.03) and time reduction during transfers (P=.02), whereas the pediatricians indicated review of the diagnosis and treatment of transferred patients (P=.01) and the time reduction during transfers (P=.007). Conclusions: The users were highly satisfied and different users indicated different factors of satisfaction. This finding implies that users' requirements should be accommodated in future developments of patient transfer information systems. ", doi="10.2196/medinform.5984", url="http://medinform.jmir.org/2016/3/e26/", url="http://www.ncbi.nlm.nih.gov/pubmed/27492978" } @Article{info:doi/10.2196/mental.5475, author="Karmakar, Chandan and Luo, Wei and Tran, Truyen and Berk, Michael and Venkatesh, Svetha", title="Predicting Risk of Suicide Attempt Using History of Physical Illnesses From Electronic Medical Records", journal="JMIR Mental Health", year="2016", month="Jul", day="11", volume="3", number="3", pages="e19", keywords="suicide risk", keywords="electronic medical record", keywords="history of physical illnesses", keywords="ICD-10 codes", keywords="suicide risk prediction model", abstract="Background: Although physical illnesses, routinely documented in electronic medical records (EMR), have been found to be a contributing factor to suicides, no automated systems use this information to predict suicide risk. Objective: The aim of this study is to quantify the impact of physical illnesses on suicide risk, and develop a predictive model that captures this relationship using EMR data. Methods: We used history of physical illnesses (except chapter V: Mental and behavioral disorders) from EMR data over different time-periods to build a lookup table that contains the probability of suicide risk for each chapter of the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) codes. The lookup table was then used to predict the probability of suicide risk for any new assessment. Based on the different lengths of history of physical illnesses, we developed six different models to predict suicide risk. We tested the performance of developed models to predict 90-day risk using historical data over differing time-periods ranging from 3 to 48 months. A total of 16,858 assessments from 7399 mental health patients with at least one risk assessment was used for the validation of the developed model. The performance was measured using area under the receiver operating characteristic curve (AUC). Results: The best predictive results were derived (AUC=0.71) using combined data across all time-periods, which significantly outperformed the clinical baseline derived from routine risk assessment (AUC=0.56). The proposed approach thus shows potential to be incorporated in the broader risk assessment processes used by clinicians. Conclusions: This study provides a novel approach to exploit the history of physical illnesses extracted from EMR (ICD-10 codes without chapter V-mental and behavioral disorders) to predict suicide risk, and this model outperforms existing clinical assessments of suicide risk. ", doi="10.2196/mental.5475", url="http://mental.jmir.org/2016/3/e19/", url="http://www.ncbi.nlm.nih.gov/pubmed/27400764" } @Article{info:doi/10.2196/jmir.5549, author="Van Poucke, Sven and Thomeer, Michiel and Heath, John and Vukicevic, Milan", title="Are Randomized Controlled Trials the (G)old Standard? From Clinical Intelligence to Prescriptive Analytics", journal="J Med Internet Res", year="2016", month="Jul", day="06", volume="18", number="7", pages="e185", keywords="randomized controlled trials", keywords="data mining", keywords="big data", keywords="predictive analytics", keywords="algorithm", keywords="modeling", keywords="ensemble methods", doi="10.2196/jmir.5549", url="http://www.jmir.org/2016/7/e185/", url="http://www.ncbi.nlm.nih.gov/pubmed/27383622" } @Article{info:doi/10.2196/humanfactors.4996, author="Kurahashi, M. Allison and Weinstein, B. Peter and Jamieson, Trevor and Stinson, N. Jennifer and Cafazzo, A. Joseph and Lokuge, Bhadra and Morita, P. Plinio and Cohen, Eyal and Rapoport, Adam and Bezjak, Andrea and Husain, Amna", title="In the Loop: The Organization of Team-Based Communication in a Patient-Centered Clinical Collaboration System", journal="JMIR Human Factors", year="2016", month="Mar", day="24", volume="3", number="1", pages="e12", keywords="collaborative care", keywords="patient-centered care", keywords="patient engagement", keywords="chronic disease", keywords="communication", keywords="Internet communication tools", keywords="Internet communication technologies", abstract="Background: We describe the development and evaluation of a secure Web-based system for the purpose of collaborative care called Loop. Loop assembles the team of care with the patient as an integral member of the team in a secure space. Objective: The objectives of this paper are to present the iterative design of the separate views for health care providers (HCPs) within each patient's secure space and examine patients', caregivers', and HCPs' perspectives on this separate view for HCP-only communication. Methods: The overall research program includes cycles of ethnography, prototyping, usability testing, and pilot testing. This paper describes the usability testing phase that directly informed development. A descriptive qualitative approach was used to analyze participant perspectives that emerged during usability testing. Results: During usability testing, we sampled 89 participants from three user groups: 23 patients, 19 caregivers, and 47 HCPs. Almost all perspectives from the three user groups supported the need for an HCP-only communication view. In an earlier prototype, the visual presentation caused confusion among HCPs when reading and composing messages about whether a message was visible to the patient. Usability testing guided us to design a more deliberate distinction between posting in the Patient and Team view and the Health Care Provider Only view at the time of composing a message, which once posted is distinguished by an icon. Conclusions: The team made a decision to incorporate an HCP-only communication view based on findings during earlier phases of work. During usability testing we tested the separate communication views, and all groups supported this partition. We spent considerable effort designing the partition; however, preliminary findings from the next phase of evaluation, pilot testing, show that the Patient and Team communication is predominantly being used. This demonstrates the importance of a subsequent phase of the clinical trial of Loop to validate the concept and design. ", doi="10.2196/humanfactors.4996", url="http://humanfactors.jmir.org/2016/1/e12/", url="http://www.ncbi.nlm.nih.gov/pubmed/27025912" } @Article{info:doi/10.2196/resprot.5155, author="Luo, Gang and Stone, L. Bryan and Johnson, D. Michael and Nkoy, L. Flory", title="Predicting Appropriate Admission of Bronchiolitis Patients in the Emergency Department: Rationale and Methods", journal="JMIR Res Protoc", year="2016", month="Mar", day="07", volume="5", number="1", pages="e41", keywords="Decision support techniques", keywords="forecasting", keywords="computer simulation", keywords="machine learning", abstract="Background: In young children, bronchiolitis is the most common illness resulting in hospitalization. For children less than age 2, bronchiolitis incurs an annual total inpatient cost of \$1.73 billion. Each year in the United States, 287,000 emergency department (ED) visits occur because of bronchiolitis, with a hospital admission rate of 32\%-40\%. Due to a lack of evidence and objective criteria for managing bronchiolitis, ED disposition decisions (hospital admission or discharge to home) are often made subjectively, resulting in significant practice variation. Studies reviewing admission need suggest that up to 29\% of admissions from the ED are unnecessary. About 6\% of ED discharges for bronchiolitis result in ED returns with admission. These inappropriate dispositions waste limited health care resources, increase patient and parental distress, expose patients to iatrogenic risks, and worsen outcomes. Existing clinical guidelines for bronchiolitis offer limited improvement in patient outcomes. Methodological shortcomings include that the guidelines provide no specific thresholds for ED decisions to admit or to discharge, have an insufficient level of detail, and do not account for differences in patient and illness characteristics including co-morbidities. Predictive models are frequently used to complement clinical guidelines, reduce practice variation, and improve clinicians' decision making. Used in real time, predictive models can present objective criteria supported by historical data for an individualized disease management plan and guide admission decisions. However, existing predictive models for ED patients with bronchiolitis have limitations, including low accuracy and the assumption that the actual ED disposition decision was appropriate. To date, no operational definition of appropriate admission exists. No model has been built based on appropriate admissions, which include both actual admissions that were necessary and actual ED discharges that were unsafe. Objective: The goal of this study is to develop a predictive model to guide appropriate hospital admission for ED patients with bronchiolitis. Methods: This study will: (1) develop an operational definition of appropriate hospital admission for ED patients with bronchiolitis, (2) develop and test the accuracy of a new model to predict appropriate hospital admission for an ED patient with bronchiolitis, and (3) conduct simulations to estimate the impact of using the model on bronchiolitis outcomes. Results: We are currently extracting administrative and clinical data from the enterprise data warehouse of an integrated health care system. Our goal is to finish this study by the end of 2019. Conclusions: This study will produce a new predictive model that can be operationalized to guide and improve disposition decisions for ED patients with bronchiolitis. Broad use of the model would reduce iatrogenic risk, patient and parental distress, health care use, and costs and improve outcomes for bronchiolitis patients. ", doi="10.2196/resprot.5155", url="http://www.researchprotocols.org/2016/1/e41/", url="http://www.ncbi.nlm.nih.gov/pubmed/26952700" } @Article{info:doi/10.2196/jmir.5234, author="Kwag, Hyogene Koren and Gonz{\'a}lez-Lorenzo, Marien and Banzi, Rita and Bonovas, Stefanos and Moja, Lorenzo", title="Providing Doctors With High-Quality Information: An Updated Evaluation of Web-Based Point-of-Care Information Summaries", journal="J Med Internet Res", year="2016", month="Jan", day="19", volume="18", number="1", pages="e15", keywords="point-of-care summaries", keywords="internet information", keywords="evidence-based medicine", keywords="information science", abstract="Background: The complexity of modern practice requires health professionals to be active information-seekers. Objective: Our aim was to review the quality and progress of point-of-care information summaries---Web-based medical compendia that are specifically designed to deliver pre-digested, rapidly accessible, comprehensive, and periodically updated information to health care providers. We aimed to evaluate product claims of being evidence-based. Methods: We updated our previous evaluations by searching Medline, Google, librarian association websites, and conference proceedings from August 2012 to December 2014. We included Web-based, regularly updated point-of-care information summaries with claims of being evidence-based. We extracted data on the general characteristics and content presentation of products, and we quantitatively assessed their breadth of disease coverage, editorial quality, and evidence-based methodology. We assessed potential relationships between these dimensions and compared them with our 2008 assessment. Results: We screened 58 products; 26 met our inclusion criteria. Nearly a quarter (6/26, 23\%) were newly identified in 2014. We accessed and analyzed 23 products for content presentation and quantitative dimensions. Most summaries were developed by major publishers in the United States and the United Kingdom; no products derived from low- and middle-income countries. The main target audience remained physicians, although nurses and physiotherapists were increasingly represented. Best Practice, Dynamed, and UptoDate scored the highest across all dimensions. The majority of products did not excel across all dimensions: we found only a moderate positive correlation between editorial quality and evidence-based methodology (r=.41, P=.0496). However, all dimensions improved from 2008: editorial quality (P=.01), evidence-based methodology (P=.015), and volume of diseases and medical conditions (P<.001). Conclusions: Medical and scientific publishers are investing substantial resources towards the development and maintenance of point-of-care summaries. The number of these products has increased since 2008 along with their quality. Best Practice, Dynamed, and UptoDate scored the highest across all dimensions, while others that were marketed as evidence-based were less reliable. Individuals and institutions should regularly assess the value of point-of-care summaries as their quality changes rapidly over time. ", doi="10.2196/jmir.5234", url="http://www.jmir.org/2016/1/e15/", url="http://www.ncbi.nlm.nih.gov/pubmed/26786976" } @Article{info:doi/10.2196/medinform.4640, author="Khazaei, Hamzeh and McGregor, Carolyn and Eklund, Mikael J. and El-Khatib, Khalil", title="Real-Time and Retrospective Health-Analytics-as-a-Service: A Novel Framework", journal="JMIR Med Inform", year="2015", month="Nov", day="18", volume="3", number="4", pages="e36", keywords="premature babies", keywords="physiological data", keywords="decision support system", keywords="analytics-as-a-service", keywords="cloud computing", keywords="big data, health informatics", keywords="real-time analytics", keywords="retrospective analysis", keywords="performance modeling", abstract="Background: Analytics-as-a-service (AaaS) is one of the latest provisions emerging from the cloud services family. Utilizing this paradigm of computing in health informatics will bene?t patients, care providers, and governments signi?cantly. This work is a novel approach to realize health analytics as services in critical care units in particular. Objective: To design, implement, evaluate, and deploy an extendable big-data compatible framework for health-analytics-as-a-service that offers both real-time and retrospective analysis. Methods: We present a novel framework that can realize health data analytics-as-a-service. The framework is flexible and con?gurable for different scenarios by utilizing the latest technologies and best practices for data acquisition, transformation, storage, analytics, knowledge extraction, and visualization. We have instantiated the proposed method, through the Artemis project, that is, a customization of the framework for live monitoring and retrospective research on premature babies and ill term infants in neonatal intensive care units (NICUs). Results: We demonstrated the proposed framework in this paper for monitoring NICUs and refer to it as the Artemis-In-Cloud (Artemis-IC) project. A pilot of Artemis has been deployed in the SickKids hospital NICU. By infusing the output of this pilot set up to an analytical model, we predict important performance measures for the ?nal deployment of Artemis-IC. This process can be carried out for other hospitals following the same steps with minimal effort. SickKids' NICU has 36 beds and can classify the patients generally into 5 different types including surgical and premature babies. The arrival rate is estimated as 4.5 patients per day, and the average length of stay was calculated as 16 days. Mean number of medical monitoring algorithms per patient is 9, which renders 311 live algorithms for the whole NICU running on the framework. The memory and computation power required for Artemis-IC to handle the SickKids NICU will be 32 GB and 16 CPU cores, respectively. The required amount of storage was estimated as 8.6 TB per year. There will always be 34.9 patients in SickKids NICU on average. Currently, 46\% of patients cannot get admitted to SickKids NICU due to lack of resources. By increasing the capacity to 90 beds, all patients can be accommodated. For such a provisioning, Artemis-IC will need 16 TB of storage per year, 55 GB of memory, and 28 CPU cores. Conclusions: Our contributions in this work relate to a cloud architecture for the analysis of physiological data for clinical decisions support for tertiary care use. We demonstrate how to size the equipment needed in the cloud for that architecture based on a very realistic assessment of the patient characteristics and the associated clinical decision support algorithms that would be required to run for those patients. We show the principle of how this could be performed and furthermore that it can be replicated for any critical care setting within a tertiary institution. ", doi="10.2196/medinform.4640", url="http://medinform.jmir.org/2015/4/e36/", url="http://www.ncbi.nlm.nih.gov/pubmed/26582268" } @Article{info:doi/10.2196/medinform.4192, author="Celi, Anthony Leo and Marshall, David Jeffrey and Lai, Yuan and Stone, J. David", title="Disrupting Electronic Health Records Systems: The Next Generation", journal="JMIR Med Inform", year="2015", month="Oct", day="23", volume="3", number="4", pages="e34", keywords="clinical decision making", keywords="clinical decision support", keywords="electronic health records", keywords="electronic notes", doi="10.2196/medinform.4192", url="http://medinform.jmir.org/2015/4/e34/", url="http://www.ncbi.nlm.nih.gov/pubmed/26500106" } @Article{info:doi/10.2196/jmir.4976, author="Hu, Zhongkai and Hao, Shiying and Jin, Bo and Shin, Young Andrew and Zhu, Chunqing and Huang, Min and Wang, Yue and Zheng, Le and Dai, Dorothy and Culver, S. Devore and Alfreds, T. Shaun and Rogow, Todd and Stearns, Frank and Sylvester, G. Karl and Widen, Eric and Ling, Xuefeng", title="Online Prediction of Health Care Utilization in the Next Six Months Based on Electronic Health Record Information: A Cohort and Validation Study", journal="J Med Internet Res", year="2015", month="Sep", day="22", volume="17", number="9", pages="e219", keywords="health care costs", keywords="electronic medical record", keywords="prospective studies", keywords="statistical data analysis", keywords="risk assessment", abstract="Background: The increasing rate of health care expenditures in the United States has placed a significant burden on the nation's economy. Predicting future health care utilization of patients can provide useful information to better understand and manage overall health care deliveries and clinical resource allocation. Objective: This study developed an electronic medical record (EMR)-based online risk model predictive of resource utilization for patients in Maine in the next 6 months across all payers, all diseases, and all demographic groups. Methods: In the HealthInfoNet, Maine's health information exchange (HIE), a retrospective cohort of 1,273,114 patients was constructed with the preceding 12-month EMR. Each patient's next 6-month (between January 1, 2013 and June 30, 2013) health care resource utilization was retrospectively scored ranging from 0 to 100 and a decision tree--based predictive model was developed. Our model was later integrated in the Maine HIE population exploration system to allow a prospective validation analysis of 1,358,153 patients by forecasting their next 6-month risk of resource utilization between July 1, 2013 and December 31, 2013. Results: Prospectively predicted risks, on either an individual level or a population (per 1000 patients) level, were consistent with the next 6-month resource utilization distributions and the clinical patterns at the population level. Results demonstrated the strong correlation between its care resource utilization and our risk scores, supporting the effectiveness of our model. With the online population risk monitoring enterprise dashboards, the effectiveness of the predictive algorithm has been validated by clinicians and caregivers in the State of Maine. Conclusions: The model and associated online applications were designed for tracking the evolving nature of total population risk, in a longitudinal manner, for health care resource utilization. It will enable more effective care management strategies driving improved patient outcomes. ", doi="10.2196/jmir.4976", url="http://www.jmir.org/2015/9/e219/", url="http://www.ncbi.nlm.nih.gov/pubmed/26395541" } @Article{info:doi/10.2196/medinform.4171, author="Tseng, Yi-Ju and Wu, Jung-Hsuan and Lin, Hui-Chi and Chen, Ming-Yuan and Ping, Xiao-Ou and Sun, Chun-Chuan and Shang, Rung-Ji and Sheng, Wang-Huei and Chen, Yee-Chun and Lai, Feipei and Chang, Shan-Chwen", title="A Web-Based, Hospital-Wide Health Care-Associated Bloodstream Infection Surveillance and Classification System: Development and Evaluation", journal="JMIR Med Inform", year="2015", month="Sep", day="21", volume="3", number="3", pages="e31", keywords="health care-associated infection", keywords="infection control", keywords="information systems", keywords="surveillance", keywords="Web-based services", abstract="Background: Surveillance of health care-associated infections is an essential component of infection prevention programs, but conventional systems are labor intensive and performance dependent. Objective: To develop an automatic surveillance and classification system for health care-associated bloodstream infection (HABSI), and to evaluate its performance by comparing it with a conventional infection control personnel (ICP)-based surveillance system. Methods: We developed a Web-based system that was integrated into the medical information system of a 2200-bed teaching hospital in Taiwan. The system automatically detects and classifies HABSIs. Results: In this study, the number of computer-detected HABSIs correlated closely with the number of HABSIs detected by ICP by department (n=20; r=.999 P<.001) and by time (n=14; r=.941; P<.001). Compared with reference standards, this system performed excellently with regard to sensitivity (98.16\%), specificity (99.96\%), positive predictive value (95.81\%), and negative predictive value (99.98\%). The system enabled decreasing the delay in confirmation of HABSI cases, on average, by 29 days. Conclusions: This system provides reliable and objective HABSI data for quality indicators, improving the delay caused by a conventional surveillance system. ", doi="10.2196/medinform.4171", url="http://medinform.jmir.org/2015/3/e31/", url="http://www.ncbi.nlm.nih.gov/pubmed/26392229" } @Article{info:doi/10.2196/medinform.4457, author="Slight, Patricia Sarah and Berner, S. Eta and Galanter, William and Huff, Stanley and Lambert, L. Bruce and Lannon, Carole and Lehmann, U. Christoph and McCourt, J. Brian and McNamara, Michael and Menachemi, Nir and Payne, H. Thomas and Spooner, Andrew S. and Schiff, D. Gordon and Wang, Y. Tracy and Akincigil, Ayse and Crystal, Stephen and Fortmann, P. Stephen and Vandermeer, L. Meredith and Bates, W. David", title="Meaningful Use of Electronic Health Records: Experiences From the Field and Future Opportunities", journal="JMIR Med Inform", year="2015", month="Sep", day="18", volume="3", number="3", pages="e30", keywords="medical informatics", keywords="health policy", keywords="electronic health records", keywords="meaningful use", abstract="Background: With the aim of improving health care processes through health information technology (HIT), the US government has promulgated requirements for ``meaningful use'' (MU) of electronic health records (EHRs) as a condition for providers receiving financial incentives for the adoption and use of these systems. Considerable uncertainty remains about the impact of these requirements on the effective application of EHR systems. Objective: The Agency for Healthcare Research and Quality (AHRQ)-sponsored Centers for Education and Research in Therapeutics (CERTs) critically examined the impact of the MU policy relating to the use of medications and jointly developed recommendations to help inform future HIT policy. Methods: We gathered perspectives from a wide range of stakeholders (N=35) who had experience with MU requirements, including academicians, practitioners, and policy makers from different health care organizations including and beyond the CERTs. Specific issues and recommendations were discussed and agreed on as a group. Results: Stakeholders' knowledge and experiences from implementing MU requirements fell into 6 domains: (1) accuracy of medication lists and medication reconciliation, (2) problem list accuracy and the shift in HIT priorities, (3) accuracy of allergy lists and allergy-related standards development, (4) support of safer and effective prescribing for children, (5) considerations for rural communities, and (6) general issues with achieving MU. Standards are needed to better facilitate the exchange of data elements between health care settings. Several organizations felt that their preoccupation with fulfilling MU requirements stifled innovation. Greater emphasis should be placed on local HIT configurations that better address population health care needs. Conclusions: Although MU has stimulated adoption of EHRs, its effects on quality and safety remain uncertain. Stakeholders felt that MU requirements should be more flexible and recognize that integrated models may achieve information-sharing goals in alternate ways. Future certification rules and requirements should enhance EHR functionalities critical for safer prescribing of medications in children. ", doi="10.2196/medinform.4457", url="http://medinform.jmir.org/2015/3/e30/" } @Article{info:doi/10.2196/jmir.4456, author="Boeldt, L. Debra and Wineinger, E. Nathan and Waalen, Jill and Gollamudi, Shreya and Grossberg, Adam and Steinhubl, R. Steven and McCollister-Slipp, Anna and Rogers, A. Marc and Silvers, Carey and Topol, J. Eric", title="How Consumers and Physicians View New Medical Technology: Comparative Survey", journal="J Med Internet Res", year="2015", month="Sep", day="14", volume="17", number="9", pages="e215", keywords="digital revolution", keywords="healthcare", keywords="medical technology", keywords="physician and consumer attitudes", keywords="electronic health record", keywords="mobile health", abstract="Background: As a result of the digital revolution coming to medicine, a number of new tools are becoming available and are starting to be introduced in clinical practice. Objective: We aim to assess health care professional and consumer attitudes toward new medical technology including smartphones, genetic testing, privacy, and patient-accessible electronic health records. Methods: We performed a survey with 1406 health care providers and 1102 consumer responders. Results: Consumers who completed the survey were more likely to prefer new technologies for a medical diagnosis (437/1102, 39.66\%) compared with providers (194/1406, 13.80\%; P<.001), with more providers (393/1406, 27.95\%) than consumers (175/1102, 15.88\%) reporting feeling uneasy about using technology for a diagnosis. Both providers and consumers supported genetic testing for various purposes, with providers (1234/1406, 87.77\%) being significantly more likely than consumers (806/1102, 73.14\%) to support genetic testing when planning to have a baby (P<.001). Similarly, 91.68\% (1289/1406) of providers and 81.22\% (895/1102) of consumers supported diagnosing problems in a fetus (P<.001). Among providers, 90.33\% (1270/1406) were concerned that patients would experience anxiety after accessing health records, and 81.95\% (1149/1406) felt it would lead to requests for unnecessary medical evaluations, but only 34.30\% (378/1102; P<.001) and 24.59\% (271/1102; P<.001) of consumers expressed the same concerns, respectively. Physicians (137/827, 16.6\%) reported less concern about the use of technology for diagnosis compared to medical students (21/235, 8.9\%; P=.03) and also more frequently felt that patients owned their medical record (323/827, 39.1\%; and 30/235, 12.8\%, respectively; P<.001). Conclusions: Consumers and health professionals differ significantly and broadly in their views of emerging medical technology, with more enthusiasm and support expressed by consumers. ", doi="10.2196/jmir.4456", url="http://www.jmir.org/2015/9/e215/", url="http://www.ncbi.nlm.nih.gov/pubmed/26369254" } @Article{info:doi/10.2196/humanfactors.4537, author="Press, Anne and McCullagh, Lauren and Khan, Sundas and Schachter, Andy and Pardo, Salvatore and McGinn, Thomas", title="Usability Testing of a Complex Clinical Decision Support Tool in the Emergency Department: Lessons Learned", journal="JMIR Human Factors", year="2015", month="Sep", day="10", volume="2", number="2", pages="e14", keywords="clinical decision support", keywords="emergency department", keywords="usability testing", keywords="clinical prediction rules", keywords="Wells criteria", keywords="pulmonary embolism", abstract="Background: As the electronic health record (EHR) becomes the preferred documentation tool across medical practices, health care organizations are pushing for clinical decision support systems (CDSS) to help bring clinical decision support (CDS) tools to the forefront of patient-physician interactions. A CDSS is integrated into the EHR and allows physicians to easily utilize CDS tools. However, often CDSS are integrated into the EHR without an initial phase of usability testing, resulting in poor adoption rates. Usability testing is important because it evaluates a CDSS by testing it on actual users. This paper outlines the usability phase of a study, which will test the impact of integration of the Wells CDSS for pulmonary embolism (PE) diagnosis into a large urban emergency department, where workflow is often chaotic and high stakes decisions are frequently made. We hypothesize that conducting usability testing prior to integration of the Wells score into an emergency room EHR will result in increased adoption rates by physicians. Objective: The objective of the study was to conduct usability testing for the integration of the Wells clinical prediction rule into a tertiary care center's emergency department EHR. Methods: We conducted usability testing of a CDS tool in the emergency department EHR. The CDS tool consisted of the Wells rule for PE in the form of a calculator and was triggered off computed tomography (CT) orders or patients' chief complaint. The study was conducted at a tertiary hospital in Queens, New York. There were seven residents that were recruited and participated in two phases of usability testing. The usability testing employed a ``think aloud'' method and ``near-live'' clinical simulation, where care providers interacted with standardized patients enacting a clinical scenario. Both phases were audiotaped, video-taped, and had screen-capture software activated for onscreen recordings. Results: Phase I: Data from the ``think-aloud'' phase of the study showed an overall positive outlook on the Wells tool in assessing a patient for a PE diagnosis. Subjects described the tool as ``well-organized'' and ``better than clinical judgment''. Changes were made to improve tool placement into the EHR to make it optimal for decision-making, auto-populating boxes, and minimizing click fatigue. Phase II: After incorporating the changes noted in Phase 1, the participants noted tool improvements. There was less toggling between screens, they had all the clinical information required to complete the tool, and were able to complete the patient visit efficiently. However, an optimal location for triggering the tool remained controversial. Conclusions: This study successfully combined ``think-aloud'' protocol analysis with ``near-live'' clinical simulations in a usability evaluation of a CDS tool that will be implemented into the emergency room environment. Both methods proved useful in the assessment of the CDS tool and allowed us to refine tool usability and workflow. ", doi="10.2196/humanfactors.4537", url="http://humanfactors.jmir.org/2015/2/e14/", url="http://www.ncbi.nlm.nih.gov/pubmed/27025540" } @Article{info:doi/10.2196/mhealth.4388, author="Patel, Rakesh and Green, William and Shahzad, Waseem Muhammad and Larkin, Chris", title="Use of Mobile Clinical Decision Support Software by Junior Doctors at a UK Teaching Hospital: Identification and Evaluation of Barriers to Engagement", journal="JMIR mHealth uHealth", year="2015", month="Aug", day="13", volume="3", number="3", pages="e80", keywords="clinical decision support systems", keywords="health care technology", keywords="human-centered computing", keywords="medical education", keywords="patient safety", keywords="ubiquitous and mobile computing", abstract="Background: Clinical decision support (CDS) tools improve clinical diagnostic decision making and patient safety. The availability of CDS to health care professionals has grown in line with the increased prevalence of apps and smart mobile devices. Despite these benefits, patients may have safety concerns about the use of mobile devices around medical equipment. Objective: This research explored the engagement of junior doctors (JDs) with CDS and the perceptions of patients about their use. There were three objectives for this research: (1) to measure the actual usage of CDS tools on mobile devices (mCDS) by JDs, (2) to explore the perceptions of JDs about the drivers and barriers to using mCDS, and (3) to explore the perceptions of patients about the use of mCDS. Methods: This study used a mixed-methods approach to study the engagement of JDs with CDS accessed through mobile devices. Usage data were collected on the number of interactions by JDs with mCDS. The perceived drivers and barriers for JDs to using CDS were then explored by interviews. Finally, these findings were contrasted with the perception of patients about the use of mCDS by JDs. Results: Nine of the 16 JDs made a total of 142 recorded interactions with the mCDS over a 4-month period. Only 27 of the 114 interactions (24\%) that could be categorized as on-shift or off-shift occurred on-shift. Eight individual, institutional, and cultural barriers to engagement emerged from interviews with the user group. In contrast to reported cautions and concerns about the impact of clinicians' use of mobile phone on patient health and safety, patients had positive perceptions about the use of mCDS. Conclusions: Patients reported positive perceptions toward mCDS. The usage of mCDS to support clinical decision making was considered to be positive as part of everyday clinical practice. The degree of engagement was found to be limited due to a number of individual, institutional, and cultural barriers. The majority of mCDS engagement occurred outside of the workplace. Further research is required to verify these findings and assess their implications for future policy and practice. ", doi="10.2196/mhealth.4388", url="http://mhealth.jmir.org/2015/3/e80/", url="http://www.ncbi.nlm.nih.gov/pubmed/26272411" } @Article{info:doi/10.2196/mededu.4267, author="O'Carroll, Marie Aoife and Westby, Patricia Erin and Dooley, Joseph and Gordon, E. Kevin", title="Information-Seeking Behaviors of Medical Students: A Cross-Sectional Web-Based Survey", journal="JMIR Medical Education", year="2015", month="Jun", day="29", volume="1", number="1", pages="e4", keywords="information-seeking behavior", keywords="information retrieval", keywords="Internet", keywords="medical education", keywords="medical students", abstract="Background: Medical students face an information-rich environment in which retrieval and appraisal strategies are increasingly important. Objective: To describe medical students' current pattern of health information resource use and characterize their experience of instruction on information search and appraisal. Methods: We conducted a cross-sectional web-based survey of students registered in the four-year MD Program at Dalhousie University (Halifax, Nova Scotia, and Saint John, New Brunswick, sites), Canada. We collected self-reported data on information-seeking behavior, instruction, and evaluation of resources in the context of their medical education. Data were analyzed using descriptive statistics. Results: Surveys were returned by 213 of 462 eligible students (46.1\%). Most respondents (165/204, 80.9\%) recalled receiving formal instruction regarding information searches, but this seldom included nontraditional tools such as Google (23/107, 11.1\%), Wikipedia, or social media. In their daily practice, however, they reported heavy use of these tools, as well as EBM summaries. Accessibility, understandability, and overall usefulness were common features of highly used resources. Students identified challenges managing information and/or resource overload and source accessibility. Conclusions: Medical students receive instruction primarily on searching and assessing primary medical literature. In their daily practice, however, they rely heavily on nontraditional tools as well as EBM summaries. Attention to appropriate use and appraisal of nontraditional sources might enhance the current EBM curriculum. ", doi="10.2196/mededu.4267", url="http://mededu.jmir.org/2015/1/e4/", url="http://www.ncbi.nlm.nih.gov/pubmed/27731842" } @Article{info:doi/10.2196/medinform.3445, author="Tsoukalas, Athanasios and Albertson, Timothy and Tagkopoulos, Ilias", title="From Data to Optimal Decision Making: A Data-Driven, Probabilistic Machine Learning Approach to Decision Support for Patients With Sepsis", journal="JMIR Med Inform", year="2015", month="Feb", day="24", volume="3", number="1", pages="e11", keywords="sepsis", keywords="clinical decision support tool", keywords="probabilistic modeling", keywords="Partially Observable Markov Decision Processes", keywords="POMDP", keywords="CDSS", abstract="Background: A tantalizing question in medical informatics is how to construct knowledge from heterogeneous datasets, and as an extension, inform clinical decisions. The emergence of large-scale data integration in electronic health records (EHR) presents tremendous opportunities. However, our ability to efficiently extract informed decision support is limited due to the complexity of the clinical states and decision process, missing data and lack of analytical tools to advice based on statistical relationships. Objective: Development and assessment of a data-driven method that infers the probability distribution of the current state of patients with sepsis, likely trajectories, optimal actions related to antibiotic administration, prediction of mortality and length-of-stay. Methods: We present a data-driven, probabilistic framework for clinical decision support in sepsis-related cases. We first define states, actions, observations and rewards based on clinical practice, expert knowledge and data representations in an EHR dataset of 1492 patients. We then use Partially Observable Markov Decision Process (POMDP) model to derive the optimal policy based on individual patient trajectories and we evaluate the performance of the model-derived policies in a separate test set. Policy decisions were focused on the type of antibiotic combinations to administer. Multi-class and discriminative classifiers were used to predict mortality and length of stay. Results: Data-derived antibiotic administration policies led to a favorable patient outcome in 49\% of the cases, versus 37\% when the alternative policies were followed (P=1.3e-13). Sensitivity analysis on the model parameters and missing data argue for a highly robust decision support tool that withstands parameter variation and data uncertainty. When the optimal policy was followed, 387 patients (25.9\%) have 90\% of their transitions to better states and 503 patients (33.7\%) patients had 90\% of their transitions to worse states (P=4.0e-06), while in the non-policy cases, these numbers are 192 (12.9\%) and 764 (51.2\%) patients (P=4.6e-117), respectively. Furthermore, the percentage of transitions within a trajectory that lead to a better or better/same state are significantly higher by following the policy than for non-policy cases (605 vs 344 patients, P=8.6e-25). Mortality was predicted with an AUC of 0.7 and 0.82 accuracy in the general case and similar performance was obtained for the inference of the length-of-stay (AUC of 0.69 to 0.73 with accuracies from 0.69 to 0.82). Conclusions: A data-driven model was able to suggest favorable actions, predict mortality and length of stay with high accuracy. This work provides a solid basis for a scalable probabilistic clinical decision support framework for sepsis treatment that can be expanded to other clinically relevant states and actions, as well as a data-driven model that can be adopted in other clinical areas with sufficient training data. ", doi="10.2196/medinform.3445", url="http://medinform.jmir.org/2015/1/e11/", url="http://www.ncbi.nlm.nih.gov/pubmed/25710907" } @Article{info:doi/10.2196/medinform.3725, author="Rodriguez, L. Keri and Burkitt, H. Kelly and Bayliss, K. Nichole and Skoko, E. Jennifer and Switzer, E. Galen and Zickmund, L. Susan and Fine, J. Michael and Macpherson, S. David", title="Veteran, Primary Care Provider, and Specialist Satisfaction With Electronic Consultation", journal="JMIR Med Inform", year="2015", month="Jan", day="14", volume="3", number="1", pages="e5", keywords="access", keywords="rural health", keywords="referral and consultation", keywords="patient satisfaction", keywords="veterans", abstract="Background: Access to specialty care is challenging for veterans in rural locations. To address this challenge, in December 2009, the Veterans Affairs (VA) Pittsburgh Healthcare System (VAPHS) implemented an electronic consultation (e-consult) program to provide primary care providers (PCPs) and patients with enhanced specialty care access. Objective: The aim of this quality improvement (QI) project evaluation was to: (1) assess satisfaction with the e-consult process, and (2) identify perceived facilitators and barriers to using the e-consult program. Methods: We conducted semistructured telephone interviews with veteran patients (N=15), Community Based Outpatient Clinic (CBOC) PCPs (N=15), and VA Pittsburgh specialty physicians (N=4) who used the e-consult program between December 2009 to August 2010. Participants answered questions regarding satisfaction in eight domains and identified factors contributing to their responses. Results: Most participants were white (patients=87\%; PCPs=80\%; specialists=75\%) and male (patients=93\%; PCPs=67\%; specialists=75\%). On average, patients had one e-consult (SD 0), PCPs initiated 6 e-consults (SD 6), and VAPHS specialists performed 17 e-consults (SD 11). Patients, PCPs, and specialty physicians were satisfied with e-consults median (range) of 5.0 (4-5) on 1-5 Likert-scale, 4.0 (3-5), and 3.5 (3-5) respectively. The most common reason why patients and specialists reported increased overall satisfaction with e-consults was improved communication, whereas improved timeliness of care was the most common reason for PCPs. Communication was the most reported perceived barrier and facilitator to e-consult use. Conclusions: Veterans and VA health care providers were satisfied with the e-consult process. Our findings suggest that while the reasons for satisfaction with e-consult differ somewhat for patients and physicians, e-consult may be a useful tool to improve VA health care system access for rural patients. ", doi="10.2196/medinform.3725", url="http://medinform.jmir.org/2015/1/e5/", url="http://www.ncbi.nlm.nih.gov/pubmed/25589233" } @Article{info:doi/10.2196/medinform.3525, author="Williams, Hawys and Spencer, Karen and Sanders, Caroline and Lund, David and Whitley, A. Edgar and Kaye, Jane and Dixon, G. William", title="Dynamic Consent: A Possible Solution to Improve Patient Confidence and Trust in How Electronic Patient Records Are Used in Medical Research", journal="JMIR Med Inform", year="2015", month="Jan", day="13", volume="3", number="1", pages="e3", keywords="dynamic consent", keywords="electronic patient record (EPR)", keywords="medical research", keywords="confidentiality", keywords="privacy", keywords="governance", keywords="NHS", keywords="data linkage", keywords="care.data", doi="10.2196/medinform.3525", url="http://medinform.jmir.org/2015/1/e3/", url="http://www.ncbi.nlm.nih.gov/pubmed/25586934" } @Article{info:doi/10.2196/medinform.3503, author="Fraccaro, Paolo and Arguello Casteleiro, Mercedes and Ainsworth, John and Buchan, Iain", title="Adoption of Clinical Decision Support in Multimorbidity: A Systematic Review", journal="JMIR Med Inform", year="2015", month="Jan", day="07", volume="3", number="1", pages="e4", keywords="decision support systems, management", keywords="systematic review", keywords="multiple chronic diseases", keywords="multiple pathologies", keywords="multiple medications", abstract="Background: Patients with multiple conditions have complex needs and are increasing in number as populations age. This multimorbidity is one of the greatest challenges facing health care. Having more than 1 condition generates (1) interactions between pathologies, (2) duplication of tests, (3) difficulties in adhering to often conflicting clinical practice guidelines, (4) obstacles in the continuity of care, (5) confusing self-management information, and (6) medication errors. In this context, clinical decision support (CDS) systems need to be able to handle realistic complexity and minimize iatrogenic risks. Objective: The aim of this review was to identify to what extent CDS is adopted in multimorbidity. Methods: This review followed PRISMA guidance and adopted a multidisciplinary approach. Scopus and PubMed searches were performed by combining terms from 3 different thesauri containing synonyms for (1) multimorbidity and comorbidity, (2) polypharmacy, and (3) CDS. The relevant articles were identified by examining the titles and abstracts. The full text of selected/relevant articles was analyzed in-depth. For articles appropriate for this review, data were collected on clinical tasks, diseases, decision maker, methods, data input context, user interface considerations, and evaluation of effectiveness. Results: A total of 50 articles were selected for the full in-depth analysis and 20 studies were included in the final review. Medication (n=10) and clinical guidance (n=8) were the predominant clinical tasks. Four studies focused on merging concurrent clinical practice guidelines. A total of 17 articles reported their CDS systems were knowledge-based. Most articles reviewed considered patients' clinical records (n=19), clinical practice guidelines (n=12), and clinicians' knowledge (n=10) as contextual input data. The most frequent diseases mentioned were cardiovascular (n=9) and diabetes mellitus (n=5). In all, 12 articles mentioned generalist doctor(s) as the decision maker(s). For articles reviewed, there were no studies referring to the active involvement of the patient in the decision-making process or to patient self-management. None of the articles reviewed adopted mobile technologies. There were no rigorous evaluations of usability or effectiveness of the CDS systems reported. Conclusions: This review shows that multimorbidity is underinvestigated in the informatics of supporting clinical decisions. CDS interventions that systematize clinical practice guidelines without considering the interactions of different conditions and care processes may lead to unhelpful or harmful clinical actions. To improve patient safety in multimorbidity, there is a need for more evidence about how both conditions and care processes interact. The data needed to build this evidence base exist in many electronic health record systems and are underused. ", doi="10.2196/medinform.3503", url="http://medinform.jmir.org/2015/1/e4/", url="http://www.ncbi.nlm.nih.gov/pubmed/25785897" } @Article{info:doi/10.2196/medinform.3179, author="Burgos, Felip and Melia, Umberto and Vallverd{\'u}, Montserrat and Velickovski, Filip and Lluch-Ariet, Mag{\'i} and Caminal, Pere and Roca, Josep", title="Clinical Decision Support System to Enhance Quality Control of Spirometry Using Information and Communication Technologies", journal="JMIR Med Inform", year="2014", month="Oct", day="21", volume="2", number="2", pages="e29", keywords="spirometry", keywords="telemedicine", keywords="information communication technologies", keywords="primary care", keywords="quality control", abstract="Background: We recently demonstrated that quality of spirometry in primary care could markedly improve with remote offline support from specialized professionals. It is hypothesized that implementation of automatic online assessment of quality of spirometry using information and communication technologies may significantly enhance the potential for extensive deployment of a high quality spirometry program in integrated care settings. Objective: The objective of the study was to elaborate and validate a Clinical Decision Support System (CDSS) for automatic online quality assessment of spirometry. Methods: The CDSS was done through a three step process including: (1) identification of optimal sampling frequency; (2) iterations to build-up an initial version using the 24 standard spirometry curves recommended by the American Thoracic Society; and (3) iterations to refine the CDSS using 270 curves from 90 patients. In each of these steps the results were checked against one expert. Finally, 778 spirometry curves from 291 patients were analyzed for validation purposes. Results: The CDSS generated appropriate online classification and certification in 685/778 (88.1\%) of spirometry testing, with 96\% sensitivity and 95\% specificity. Conclusions: Consequently, only 93/778 (11.9\%) of spirometry testing required offline remote classification by an expert, indicating a potential positive role of the CDSS in the deployment of a high quality spirometry program in an integrated care setting. ", doi="10.2196/medinform.3179", url="http://medinform.jmir.org/2014/2/e29/", url="http://www.ncbi.nlm.nih.gov/pubmed/25600957" } @Article{info:doi/10.2196/medinform.3560, author="Beyan, Timur and Ayd?n Son, Ye?im", title="Incorporation of Personal Single Nucleotide Polymorphism (SNP) Data into a National Level Electronic Health Record for Disease Risk Assessment, Part 3: An Evaluation of SNP Incorporated National Health Information System of Turkey for Prostate Cancer", journal="JMIR Med Inform", year="2014", month="Aug", day="19", volume="2", number="2", pages="e21", keywords="health information systems", keywords="clinical decision support systems", keywords="disease risk model", keywords="electronic health record", keywords="epigenetics", keywords="personalized medicine", keywords="single nucleotide polymorphism", abstract="Background: A personalized medicine approach provides opportunities for predictive and preventive medicine. Using genomic, clinical, environmental, and behavioral data, the tracking and management of individual wellness is possible. A prolific way to carry this personalized approach into routine practices can be accomplished by integrating clinical interpretations of genomic variations into electronic medical records (EMRs)/electronic health records (EHRs). Today, various central EHR infrastructures have been constituted in many countries of the world, including Turkey. Objective: As an initial attempt to develop a sophisticated infrastructure, we have concentrated on incorporating the personal single nucleotide polymorphism (SNP) data into the National Health Information System of Turkey (NHIS-T) for disease risk assessment, and evaluated the performance of various predictive models for prostate cancer cases. We present our work as a three part miniseries: (1) an overview of requirements, (2) the incorporation of SNP data into the NHIS-T, and (3) an evaluation of SNP data incorporated into the NHIS-T for prostate cancer. Methods: In the third article of this miniseries, we have evaluated the proposed complementary capabilities (ie, knowledge base and end-user application) with real data. Before the evaluation phase, clinicogenomic associations about increased prostate cancer risk were extracted from knowledge sources, and published predictive genomic models assessing individual prostate cancer risk were collected. To evaluate complementary capabilities, we also gathered personal SNP data of four prostate cancer cases and fifteen controls. Using these data files, we compared various independent and model-based, prostate cancer risk assessment approaches. Results: Through the extraction and selection processes of SNP-prostate cancer risk associations, we collected 209 independent associations for increased risk of prostate cancer from the studied knowledge sources. Also, we gathered six cumulative models and two probabilistic models. Cumulative models and assessment of independent associations did not have impressive results. There was one of the probabilistic, model-based interpretation that was successful compared to the others. In envirobehavioral and clinical evaluations, we found that some of the comorbidities, especially, would be useful to evaluate disease risk. Even though we had a very limited dataset, a comparison of performances of different disease models and their implementation with real data as use case scenarios helped us to gain deeper insight into the proposed architecture. Conclusions: In order to benefit from genomic variation data, existing EHR/EMR systems must be constructed with the capability of tracking and monitoring all aspects of personal health status (genomic, clinical, environmental, etc) in 24/7 situations, and also with the capability of suggesting evidence-based recommendations. A national-level, accredited knowledge base is a top requirement for improved end-user systems interpreting these parameters. Finally, categorization using similar, individual characteristics (SNP patterns, exposure history, etc) may be an effective way to predict disease risks, but this approach needs to be concretized and supported with new studies. ", doi="10.2196/medinform.3560", url="http://medinform.jmir.org/2014/2/e21/", url="http://www.ncbi.nlm.nih.gov/pubmed/25600087" } @Article{info:doi/10.2196/medinform.3555, author="Beyan, Timur and Ayd?n Son, Ye?im", title="Incorporation of Personal Single Nucleotide Polymorphism (SNP) Data into a National Level Electronic Health Record for Disease Risk Assessment, Part 2: The Incorporation of SNP into the National Health Information System of Turkey", journal="JMIR Med Inform", year="2014", month="Aug", day="11", volume="2", number="2", pages="e17", keywords="health information systems", keywords="clinical decision support systems", keywords="disease risk model", keywords="electronic health record", keywords="epigenetics", keywords="personalized medicine", keywords="single nucleotide polymorphism", abstract="Background: A personalized medicine approach provides opportunities for predictive and preventive medicine. Using genomic, clinical, environmental, and behavioral data, the tracking and management of individual wellness is possible. A prolific way to carry this personalized approach into routine practices can be accomplished by integrating clinical interpretations of genomic variations into electronic medical record (EMR)s/electronic health record (EHR)s systems. Today, various central EHR infrastructures have been constituted in many countries of the world, including Turkey. Objective: As an initial attempt to develop a sophisticated infrastructure, we have concentrated on incorporating the personal single nucleotide polymorphism (SNP) data into the National Health Information System of Turkey (NHIS-T) for disease risk assessment, and evaluated the performance of various predictive models for prostate cancer cases. We present our work as a miniseries containing three parts: (1) an overview of requirements, (2) the incorporation of SNP into the NHIS-T, and (3) an evaluation of SNP data incorporated into the NHIS-T for prostate cancer. Methods: For the second article of this miniseries, we have analyzed the existing NHIS-T and proposed the possible extensional architectures. In light of the literature survey and characteristics of NHIS-T, we have proposed and argued opportunities and obstacles for a SNP incorporated NHIS-T. A prototype with complementary capabilities (knowledge base and end-user applications) for these architectures has been designed and developed. Results: In the proposed architectures, the clinically relevant personal SNP (CR-SNP) and clinicogenomic associations are shared between central repositories and end-users via the NHIS-T infrastructure. To produce these files, we need to develop a national level clinicogenomic knowledge base. Regarding clinicogenomic decision support, we planned to complete interpretation of these associations on the end-user applications. This approach gives us the flexibility to add/update envirobehavioral parameters and family health history that will be monitored or collected by end users. Conclusions: Our results emphasized that even though the existing NHIS-T messaging infrastructure supports the integration of SNP data and clinicogenomic association, it is critical to develop a national level, accredited knowledge base and better end-user systems for the interpretation of genomic, clinical, and envirobehavioral parameters. ", doi="10.2196/medinform.3555", url="http://medinform.jmir.org/2014/2/e17/", url="http://www.ncbi.nlm.nih.gov/pubmed/25599817" } @Article{info:doi/10.2196/medinform.3023, author="Lozano-Rub{\'i}, Raimundo and Pastor, Xavier and Lozano, Esther", title="OWLing Clinical Data Repositories With the Ontology Web Language", journal="JMIR Med Inform", year="2014", month="Aug", day="01", volume="2", number="2", pages="e14", keywords="biomedical ontologies", keywords="data storage and retrieval", keywords="knowledge management", keywords="data sharing", keywords="electronic health records", abstract="Background: The health sciences are based upon information. Clinical information is usually stored and managed by physicians with precarious tools, such as spreadsheets. The biomedical domain is more complex than other domains that have adopted information and communication technologies as pervasive business tools. Moreover, medicine continuously changes its corpus of knowledge because of new discoveries and the rearrangements in the relationships among concepts. This scenario makes it especially difficult to offer good tools to answer the professional needs of researchers and constitutes a barrier that needs innovation to discover useful solutions. Objective: The objective was to design and implement a framework for the development of clinical data repositories, capable of facing the continuous change in the biomedicine domain and minimizing the technical knowledge required from final users. Methods: We combined knowledge management tools and methodologies with relational technology. We present an ontology-based approach that is flexible and efficient for dealing with complexity and change, integrated with a solid relational storage and a Web graphical user interface. Results: Onto Clinical Research Forms (OntoCRF) is a framework for the definition, modeling, and instantiation of data repositories. It does not need any database design or programming. All required information to define a new project is explicitly stated in ontologies. Moreover, the user interface is built automatically on the fly as Web pages, whereas data are stored in a generic repository. This allows for immediate deployment and population of the database as well as instant online availability of any modification. Conclusions: OntoCRF is a complete framework to build data repositories with a solid relational storage. Driven by ontologies, OntoCRF is more flexible and efficient to deal with complexity and change than traditional systems and does not require very skilled technical people facilitating the engineering of clinical software systems. ", doi="10.2196/medinform.3023", url="http://medinform.jmir.org/2014/2/e14/", url="http://www.ncbi.nlm.nih.gov/pubmed/25599697" } @Article{info:doi/10.2196/medinform.3169, author="Beyan, Timur and Ayd?n Son, Ye?im", title="Incorporation of Personal Single Nucleotide Polymorphism (SNP) Data into a National Level Electronic Health Record for Disease Risk Assessment, Part 1: An Overview of Requirements", journal="JMIR Med Inform", year="2014", month="Jul", day="24", volume="2", number="2", pages="e15", keywords="health information systems", keywords="clinical decision support systems", keywords="disease risk model", keywords="electronic health record", keywords="epigenetics", keywords="personalized medicine", keywords="single nucleotide polymorphism", abstract="Background: Personalized medicine approaches provide opportunities for predictive and preventive medicine. Using genomic, clinical, environmental, and behavioral data, tracking and management of individual wellness is possible. A prolific way to carry this personalized approach into routine practices can be accomplished by integrating clinical interpretations of genomic variations into electronic medical records (EMRs)/electronic health records (EHRs). Today, various central EHR infrastructures have been constituted in many countries of the world including Turkey. Objective: The objective of this study was to concentrate on incorporating the personal single nucleotide polymorphism (SNP) data into the National Health Information System of Turkey (NHIS-T) for disease risk assessment, and evaluate the performance of various predictive models for prostate cancer cases. We present our work as a miniseries containing three parts: (1) an overview of requirements, (2) the incorporation of SNP into the NHIS-T, and (3) an evaluation of SNP incorporated NHIS-T for prostate cancer. Methods: For the first article of this miniseries, the scientific literature is reviewed and the requirements of SNP data integration into EMRs/EHRs are extracted and presented. Results: In the literature, basic requirements of genomic-enabled EMRs/EHRs are listed as incorporating genotype data and its clinical interpretation into EMRs/EHRs, developing accurate and accessible clinicogenomic interpretation resources (knowledge bases), interpreting and reinterpreting of variant data, and immersing of clinicogenomic information into the medical decision processes. In this section, we have analyzed these requirements under the subtitles of terminology standards, interoperability standards, clinicogenomic knowledge bases, defining clinical significance, and clinicogenomic decision support. Conclusions: In order to integrate structured genotype and phenotype data into any system, there is a need to determine data components, terminology standards, and identifiers of clinicogenomic information. Also, we need to determine interoperability standards to share information between different information systems of stakeholders, and develop decision support capability to interpret genomic variations based on the knowledge bases via different assessment approaches. ", doi="10.2196/medinform.3169", url="http://medinform.jmir.org/2014/2/e15/", url="http://www.ncbi.nlm.nih.gov/pubmed/25599712" } @Article{info:doi/10.2196/medinform.2984, author="Cruz, RR Magda and Martins, Cristina and Dias, Jo{\~a}o and Pinto, S. Jos{\'e}", title="A Validation of an Intelligent Decision-Making Support System for the Nutrition Diagnosis of Bariatric Surgery Patients", journal="JMIR Med Inform", year="2014", month="Jul", day="08", volume="2", number="2", pages="e8", keywords="bariatric surgery", keywords="nutrition diagnosis", keywords="artificial intelligence", keywords="Bayesian networks", keywords="decision-making", keywords="support system", abstract="Background: Bariatric surgery is an important method for treatment of morbid obesity. It is known that significant nutritional deficiencies might occur after surgery, such as, calorie-protein malnutrition, iron deficiency anemia, and lack of vitamin B12, thiamine, and folic acid. Objective: The objective of our study was to validate a computerized intelligent decision support system that suggests nutritional diagnoses of patients submitted to bariatric surgery. Methods: There were fifteen clinical cases that were developed and sent to three dietitians in order to evaluate and define a nutritional diagnosis. After this step, the cases were sent to four bariatric surgery expert dietitians who were aiming to collaborate on a gold standard. The nutritional diagnosis was to be defined individually, and any disagreements were solved through a consensus. The final result was used as the gold standard. Bayesian networks were used to implement the system, and database training was done with Shell Netica. For the system validation, a similar answer rate was calculated, as well as the specificity and sensibility. Receiver operating characteristic (ROC) curves were projected to each nutritional diagnosis. Results: Among the four experts, the rate of similar answers found was 80\% (48/60) to 93\% (56/60), depending on the nutritional diagnosis. The rate of similar answers of the system, compared to the gold standard, was 100\% (60/60). The system sensibility and specificity were 95.0\%. The ROC curves projection showed that the system was able to represent the expert knowledge (gold standard), and to help them in their daily tasks. Conclusions: The system that was developed was validated to be used by health care professionals for decision-making support in their nutritional diagnosis of patients submitted to bariatric surgery. ", doi="10.2196/medinform.2984", url="http://medinform.jmir.org/2014/2/e8/", url="http://www.ncbi.nlm.nih.gov/pubmed/25601419" } @Article{info:doi/10.2196/medinform.3022, author="Polepalli Ramesh, Balaji and Belknap, M. Steven and Li, Zuofeng and Frid, Nadya and West, P. Dennis and Yu, Hong", title="Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration's Adverse Event Reporting System Narratives", journal="JMIR Med Inform", year="2014", month="Jun", day="27", volume="2", number="1", pages="e10", keywords="natural language processing", keywords="pharmacovigilance", keywords="adverse drug events", abstract="Background: The Food and Drug Administration's (FDA) Adverse Event Reporting System (FAERS) is a repository of spontaneously-reported adverse drug events (ADEs) for FDA-approved prescription drugs. FAERS reports include both structured reports and unstructured narratives. The narratives often include essential information for evaluation of the severity, causality, and description of ADEs that are not present in the structured data. The timely identification of unknown toxicities of prescription drugs is an important, unsolved problem. Objective: The objective of this study was to develop an annotated corpus of FAERS narratives and biomedical named entity tagger to automatically identify ADE related information in the FAERS narratives. Methods: We developed an annotation guideline and annotate medication information and adverse event related entities on 122 FAERS narratives comprising approximately 23,000 word tokens. A named entity tagger using supervised machine learning approaches was built for detecting medication information and adverse event entities using various categories of features. Results: The annotated corpus had an agreement of over .9 Cohen's kappa for medication and adverse event entities. The best performing tagger achieves an overall performance of 0.73 F1 score for detection of medication, adverse event and other named entities. Conclusions: In this study, we developed an annotated corpus of FAERS narratives and machine learning based models for automatically extracting medication and adverse event information from the FAERS narratives. Our study is an important step towards enriching the FAERS data for postmarketing pharmacovigilance. ", doi="10.2196/medinform.3022", url="http://medinform.jmir.org/2014/1/e10/", url="http://www.ncbi.nlm.nih.gov/pubmed/25600332" } @Article{info:doi/10.2196/medinform.3110, author="Celi, Anthony Leo and Zimolzak, J. Andrew and Stone, J. David", title="Dynamic Clinical Data Mining: Search Engine-Based Decision Support", journal="JMIR Med Inform", year="2014", month="Jun", day="23", volume="2", number="1", pages="e13", keywords="decision support", keywords="clinical informatics", keywords="big data", doi="10.2196/medinform.3110", url="http://medinform.jmir.org/2014/1/e13/", url="http://www.ncbi.nlm.nih.gov/pubmed/25600664" } @Article{info:doi/10.2196/medinform.3316, author="Elliott, Pamela and Martin, Desmond and Neville, Doreen", title="Electronic Clinical Safety Reporting System: A Benefits Evaluation", journal="JMIR Med Inform", year="2014", month="Jun", day="11", volume="2", number="1", pages="e12", keywords="electronic occurrence reporting", keywords="electronic clinical safety reporting", keywords="adverse event reporting in health care", keywords="evaluating electronic reporting systems in health care", keywords="health information technology evaluations", abstract="Background: Eastern Health, a large health care organization in Newfoundland and Labrador (NL), started a staged implementation of an electronic occurrence reporting system (used interchangeably with ``clinical safety reporting system'') in 2008, completing Phase One in 2009. The electronic clinical safety reporting system (CSRS) was designed to replace a paper-based system. The CSRS involves reporting on occurrences such as falls, safety/security issues, medication errors, treatment and procedural mishaps, medical equipment malfunctions, and close calls. The electronic system was purchased from a vendor in the United Kingdom that had implemented the system in the United Kingdom and other places, such as British Columbia. The main objective of the new system was to improve the reporting process with the goal of improving clinical safety. The project was funded jointly by Eastern Health and Canada Health Infoway. Objective: The objectives of the evaluation were to: (1) assess the CSRS on achieving its stated objectives (particularly, the benefits realized and lessons learned), and (2) identify contributions, if any, that can be made to the emerging field of electronic clinical safety reporting. Methods: The evaluation involved mixed methods, including extensive stakeholder participation, pre/post comparative study design, and triangulation of data where possible. The data were collected from several sources, such as project documentation, occurrence reporting records, stakeholder workshops, surveys, focus groups, and key informant interviews. Results: The findings provided evidence that frontline staff and managers support the CSRS, identifying both benefits and areas for improvement. Many benefits were realized, such as increases in the number of occurrences reported, in occurrences reported within 48 hours, in occurrences reported by staff other than registered nurses, in close calls reported, and improved timelines for notification. There was also user satisfaction with the tool regarding ease of use, accessibility, and consistency. The implementation process encountered challenges related to customizing the software and the development of the classification system for coding occurrences. This impacted on the ability of the managers to close-out files in a timely fashion. The issues that were identified, and suggestions for improvements to the form itself, were shared with the Project Team as soon as they were noted. Changes were made to the system before the rollout. Conclusions: There were many benefits realized from the new system that can contribute to improved clinical safety. The participants preferred the electronic system over the paper-based system. The lessons learned during the implementation process resulted in recommendations that informed the rollout of the system in Eastern Health, and in other health care organizations in the province of Newfoundland and Labrador. This study also informed the evaluation of other health organizations in the province, which was completed in 2013. ", doi="10.2196/medinform.3316", url="http://medinform.jmir.org/2014/1/e12/", url="http://www.ncbi.nlm.nih.gov/pubmed/25600569" } @Article{info:doi/10.2196/jmir.3263, author="Mart{\'i}n Ruiz, Luisa Mar{\'i}a and Valero Duboy, {\'A}ngel Miguel and Torcal Loriente, Carmen and Pau de la Cruz, Iv{\'a}n", title="Evaluating a Web-Based Clinical Decision Support System for Language Disorders Screening in a Nursery School", journal="J Med Internet Res", year="2014", month="May", day="28", volume="16", number="5", pages="e139", keywords="primary health care", keywords="health information systems", keywords="knowledge management", keywords="evaluation", keywords="early diagnosis", keywords="eHealth", keywords="language disorders", abstract="Background: Early and effective identification of developmental disorders during childhood remains a critical task for the international community. The second highest prevalence of common developmental disorders in children are language delays, which are frequently the first symptoms of a possible disorder. Objective: This paper evaluates a Web-based Clinical Decision Support System (CDSS) whose aim is to enhance the screening of language disorders at a nursery school. The common lack of early diagnosis of language disorders led us to deploy an easy-to-use CDSS in order to evaluate its accuracy in early detection of language pathologies. This CDSS can be used by pediatricians to support the screening of language disorders in primary care. Methods: This paper details the evaluation results of the ``Gades'' CDSS at a nursery school with 146 children, 12 educators, and 1 language therapist. The methodology embraces two consecutive phases. The first stage involves the observation of each child's language abilities, carried out by the educators, to facilitate the evaluation of language acquisition level performed by a language therapist. Next, the same language therapist evaluates the reliability of the observed results. Results: The Gades CDSS was integrated to provide the language therapist with the required clinical information. The validation process showed a global 83.6\% (122/146) success rate in language evaluation and a 7\% (7/94) rate of non-accepted system decisions within the range of children from 0 to 3 years old. The system helped language therapists to identify new children with potential disorders who required further evaluation. This process will revalidate the CDSS output and allow the enhancement of early detection of language disorders in children. The system does need minor refinement, since the therapists disagreed with some questions from the CDSS knowledge base (KB) and suggested adding a few questions about speech production and pragmatic abilities. The refinement of the KB will address these issues and include the requested improvements, with the support of the experts who took part in the original KB development. Conclusions: This research demonstrated the benefit of a Web-based CDSS to monitor children's neurodevelopment via the early detection of language delays at a nursery school. Current next steps focus on the design of a model that includes pseudo auto-learning capacity, supervised by experts. ", doi="10.2196/jmir.3263", url="http://www.jmir.org/2014/5/e139/", url="http://www.ncbi.nlm.nih.gov/pubmed/24870413" } @Article{info:doi/10.2196/resprot.3071, author="Peiris, David and Williams, Christopher and Holbrook, Rachel and Lindner, Robyn and Reeve, James and Das, Anurina and Maher, Christopher", title="A Web-Based Clinical Decision Support Tool for Primary Health Care Management of Back Pain: Development and Mixed Methods Evaluation", journal="JMIR Res Protoc", year="2014", month="Apr", day="02", volume="3", number="2", pages="e17", keywords="clinical decision support systems", keywords="back pain", keywords="primary care", abstract="Background: Many patients with back pain do not receive health care in accordance with best practice recommendations. Implementation trials to address this issue have had limited success. Despite the known effectiveness of clinical decision support systems (CDSS), none of these are available for back pain management. Objective: The objective of our study was to develop a Web-based CDSS to support Australian general practitioners (GPs) to diagnose and manage back pain according to guidelines. Methods: Asking a panel of international experts to review recommendations for sixteen clinical vignettes validated the tool. It was then launched nationally as part of National Pain Week and promoted to GPs via a media release and clinic based visits. Following this, a mixed methods evaluation was conducted to determine tool feasibility, acceptability, and utility. The 12 month usage data were analyzed, and in-depth, semistructured interviews with 20 GPs were conducted to identify barriers and enablers to uptake. Results: The tool had acceptable face validity when reviewed by experts. Over a 12 month period there were 7125 website visits with 4503 (63.20\%) unique users. Assuming most unique users are GPs, around one quarter of the country's GPs may have used the tool at least once. Although usage was high, GP interviews highlighted the sometimes complex nature of management where the tool may not influence care. Conversely, several ``touch-points'', whereby the tool may exert its influence, were identified, most notably patient engagement. Conclusions: A novel CDSS tool has the potential to assist with evidence-based management of back pain. A clinical trial is required to determine its impact on practitioner and patient outcomes. ", doi="10.2196/resprot.3071", url="http://www.researchprotocols.org/2014/2/e17/", url="http://www.ncbi.nlm.nih.gov/pubmed/24694921" } @Article{info:doi/10.2196/jmir.2758, author="Pimmer, Christoph and Mateescu, Magdalena and Zahn, Carmen and Genewein, Urs", title="Smartphones as Multimodal Communication Devices to Facilitate Clinical Knowledge Processes: Randomized Controlled Trial", journal="J Med Internet Res", year="2013", month="Nov", day="27", volume="15", number="11", pages="e263", keywords="mobile health", keywords="mobile phone", keywords="telemedicine", keywords="educational technology", keywords="learning", keywords="problem solving", keywords="multimedia", keywords="audiovisual aids", abstract="Background: Despite the widespread use and advancements of mobile technology that facilitate rich communication modes, there is little evidence demonstrating the value of smartphones for effective interclinician communication and knowledge processes. Objective: The objective of this study was to determine the effects of different synchronous smartphone-based modes of communication, such as (1) speech only, (2) speech and images, and (3) speech, images, and image annotation (guided noticing) on the recall and transfer of visually and verbally represented medical knowledge. Methods: The experiment was conducted from November 2011 to May 2012 at the University Hospital Basel (Switzerland) with 42 medical students in a master's program. All participants analyzed a standardized case (a patient with a subcapital fracture of the fifth metacarpal bone) based on a radiological image, photographs of the hand, and textual descriptions, and were asked to consult a remote surgical specialist via a smartphone. Participants were randomly assigned to 3 experimental conditions/groups. In group 1, the specialist provided verbal explanations (speech only). In group 2, the specialist provided verbal explanations and displayed the radiological image and the photographs to the participants (speech and images). In group 3, the specialist provided verbal explanations, displayed the radiological image and the photographs, and annotated the radiological image by drawing structures/angle elements (speech, images, and image annotation). To assess knowledge recall, participants were asked to write brief summaries of the case (verbally represented knowledge) after the consultation and to re-analyze the diagnostic images (visually represented knowledge). To assess knowledge transfer, participants analyzed a similar case without specialist support. Results: Data analysis by ANOVA found that participants in groups 2 and 3 (images used) evaluated the support provided by the specialist as significantly more positive than group 1, the speech-only group (group 1: mean 4.08, SD 0.90; group 2: mean 4.73, SD 0.59; group 3: mean 4.93, SD 0.25; F2,39=6.76, P=.003; partial $\eta$2=0.26, 1--$\beta$=.90). However, significant positive effects on the recall and transfer of visually represented medical knowledge were only observed when the smartphone-based communication involved the combination of speech, images, and image annotation (group 3). There were no significant positive effects on the recall and transfer of visually represented knowledge between group 1 (speech only) and group 2 (speech and images). No significant differences were observed between the groups regarding verbally represented medical knowledge. Conclusions: The results show (1) the value of annotation functions for digital and mobile technology for interclinician communication and medical informatics, and (2) the use of guided noticing (the integration of speech, images, and image annotation) leads to significantly improved knowledge gains for visually represented knowledge. This is particularly valuable in situations involving complex visual subject matters, typical in clinical practice. ", doi="10.2196/jmir.2758", url="http://www.jmir.org/2013/11/e263/", url="http://www.ncbi.nlm.nih.gov/pubmed/24284080" } @Article{info:doi/10.2196/jmir.2530, author="Mickan, Sharon and Tilson, K. Julie and Atherton, Helen and Roberts, Wyn Nia and Heneghan, Carl", title="Evidence of Effectiveness of Health Care Professionals Using Handheld Computers: A Scoping Review of Systematic Reviews", journal="J Med Internet Res", year="2013", month="Oct", day="28", volume="15", number="10", pages="e212", keywords="handheld computers", keywords="mobile devices", keywords="mhealth", keywords="PDA", keywords="information seeking behavior", keywords="evidence-based practice", keywords="delivery of health care", keywords="clinical practice", keywords="health technology adoption", keywords="diffusion of innovation", keywords="systematic review", keywords="evidence synthesis", keywords="documentation", abstract="Background: Handheld computers and mobile devices provide instant access to vast amounts and types of useful information for health care professionals. Their reduced size and increased processing speed has led to rapid adoption in health care. Thus, it is important to identify whether handheld computers are actually effective in clinical practice. Objective: A scoping review of systematic reviews was designed to provide a quick overview of the documented evidence of effectiveness for health care professionals using handheld computers in their clinical work. Methods: A detailed search, sensitive for systematic reviews was applied for Cochrane, Medline, EMBASE, PsycINFO, Allied and Complementary Medicine Database (AMED), Global Health, and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases. All outcomes that demonstrated effectiveness in clinical practice were included. Classroom learning and patient use of handheld computers were excluded. Quality was assessed using the Assessment of Multiple Systematic Reviews (AMSTAR) tool. A previously published conceptual framework was used as the basis for dual data extraction. Reported outcomes were summarized according to the primary function of the handheld computer. Results: Five systematic reviews met the inclusion and quality criteria. Together, they reviewed 138 unique primary studies. Most reviewed descriptive intervention studies, where physicians, pharmacists, or medical students used personal digital assistants. Effectiveness was demonstrated across four distinct functions of handheld computers: patient documentation, patient care, information seeking, and professional work patterns. Within each of these functions, a range of positive outcomes were reported using both objective and self-report measures. The use of handheld computers improved patient documentation through more complete recording, fewer documentation errors, and increased efficiency. Handheld computers provided easy access to clinical decision support systems and patient management systems, which improved decision making for patient care. Handheld computers saved time and gave earlier access to new information. There were also reports that handheld computers enhanced work patterns and efficiency. Conclusions: This scoping review summarizes the secondary evidence for effectiveness of handheld computers and mhealth. It provides a snapshot of effective use by health care professionals across four key functions. We identified evidence to suggest that handheld computers provide easy and timely access to information and enable accurate and complete documentation. Further, they can give health care professionals instant access to evidence-based decision support and patient management systems to improve clinical decision making. Finally, there is evidence that handheld computers allow health professionals to be more efficient in their work practices. It is anticipated that this evidence will guide clinicians and managers in implementing handheld computers in clinical practice and in designing future research. ", doi="10.2196/jmir.2530", url="http://www.jmir.org/2013/10/e212/", url="http://www.ncbi.nlm.nih.gov/pubmed/24165786" } @Article{info:doi/10.2196/jmir.2764, author="Ashley, Laura and Jones, Helen and Thomas, James and Newsham, Alex and Downing, Amy and Morris, Eva and Brown, Julia and Velikova, Galina and Forman, David and Wright, Penny", title="Integrating Patient Reported Outcomes With Clinical Cancer Registry Data: A Feasibility Study of the Electronic Patient-Reported Outcomes From Cancer Survivors (ePOCS) System", journal="J Med Internet Res", year="2013", month="Oct", day="25", volume="15", number="10", pages="e230", keywords="cancer", keywords="oncology", keywords="patient reported outcomes", keywords="patient reported outcome measures", keywords="health-related quality of life", keywords="survivorship", keywords="cancer registry", keywords="electronic data capture", keywords="health information technology", keywords="Internet", abstract="Background: Routine measurement of Patient Reported Outcomes (PROs) linked with clinical data across the patient pathway is increasingly important for informing future care planning. The innovative electronic Patient-reported Outcomes from Cancer Survivors (ePOCS) system was developed to integrate PROs, collected online at specified post-diagnostic time-points, with clinical and treatment data in cancer registries. Objective: This study tested the technical and clinical feasibility of ePOCS by running the system with a sample of potentially curable breast, colorectal, and prostate cancer patients in their first 15 months post diagnosis. Methods: Patients completed questionnaires comprising multiple Patient Reported Outcome Measures (PROMs) via ePOCS within 6 months (T1), and at 9 (T2) and 15 (T3) months, post diagnosis. Feasibility outcomes included system informatics performance, patient recruitment, retention, representativeness and questionnaire completion (response rate), patient feedback, and administration burden involved in running the system. Results: ePOCS ran efficiently with few technical problems. Patient participation was 55.21\% (636/1152) overall, although varied by approach mode, and was considerably higher among patients approached face-to-face (61.4\%, 490/798) than by telephone (48.8\%, 21/43) or letter (41.0\%, 125/305). Older and less affluent patients were less likely to join (both P<.001). Most non-consenters (71.1\%, 234/329) cited information technology reasons (ie, difficulty using a computer). Questionnaires were fully or partially completed by 85.1\% (541/636) of invited participants at T1 (80 questions total), 70.0\% (442/631) at T2 (102-108 questions), and 66.3\% (414/624) at T3 (148-154 questions), and fully completed at all three time-points by 57.6\% (344/597) of participants. Reminders (mainly via email) effectively prompted responses. The PROs were successfully linked with cancer registry data for 100\% of patients (N=636). Participant feedback was encouraging and positive, with most patients reporting that they found ePOCS easy to use and that, if asked, they would continue using the system long-term (86.2\%, 361/419). ePOCS was not administratively burdensome to run day-to-day, and patient-initiated inquiries averaged just 11 inquiries per month. Conclusions: The informatics underlying the ePOCS system demonstrated successful proof-of-concept -- the system successfully linked PROs with registry data for 100\% of the patients. The majority of patients were keen to engage. Participation rates are likely to improve as the Internet becomes more universally adopted. ePOCS can help overcome the challenges of routinely collecting PROs and linking with clinical data, which is integral for treatment and supportive care planning and for targeting service provision. ", doi="10.2196/jmir.2764", url="http://www.jmir.org/2013/10/e230/", url="http://www.ncbi.nlm.nih.gov/pubmed/24161667" } @Article{info:doi/10.2196/medinform.2519, author="Ping, Xiao-Ou and Chung, Yufang and Tseng, Yi-Ju and Liang, Ja-Der and Yang, Pei-Ming and Huang, Guan-Tarn and Lai, Feipei", title="A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model", journal="JMIR Med Inform", year="2013", month="Oct", day="08", volume="1", number="1", pages="e2", keywords="electronic medical records", keywords="query languages", keywords="information retrieval query processing", keywords="ontology engineering", keywords="clinical practice guideline", abstract="Background: Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective: The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods: The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Prot{\'e}g{\'e} environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results: In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, ``degree of liver damage,'' ``degree of liver damage when applying a mutually exclusive setting,'' and ``treatments for liver cancer'') was 100\% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions: The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. ", doi="10.2196/medinform.2519", url="http://medinform.jmir.org/2013/1/e2/", url="http://www.ncbi.nlm.nih.gov/pubmed/25600078" } @Article{info:doi/10.2196/medinform.2766, author="Cheung, SK Clement and Tong, LH Ellen and Cheung, Tseung Ngai and Chan, Man Wai and Wang, HX Harry and Kwan, WM Mandy and Fan, KM Carmen and Liu, QL Kirin and Wong, CS Martin", title="Factors Associated With Adoption of the Electronic Health Record System Among Primary Care Physicians", journal="JMIR Med Inform", year="2013", month="Aug", day="26", volume="1", number="1", pages="e1", keywords="electronic medical record", keywords="physicians", keywords="adoption", keywords="associated factors", keywords="medical informatics", abstract="Background: A territory-wide Internet-based electronic patient record allows better patient care in different sectors. The engagement of private physicians is one of the major facilitators for implementation, but there is limited information about the current adoption level of electronic medical record (eMR) among private primary care physicians. Objective: This survey measured the adoption level, enabling factors, and hindering factors of eMR, among private physicians in Hong Kong. It also evaluated the key functions and the popularity of electronic systems and vendors used by these private practitioners. Methods: A central registry consisting of 4324 private practitioners was set up. Invitations for self-administered surveys and the completed questionnaires were sent and returned via fax, email, postal mail, and on-site clinic visits. Current users and non-users of eMR system were compared according to their demographic and practice characteristics. Student's t tests and chi-square tests were used for continuous and categorical variables, respectively. Results: A total of 524 completed surveys (response rate 524/4405 11.90\%) were collected. The proportion of using eMR in private clinics was 79.6\% (417/524). When compared with non-users, the eMR users were younger (users: 48.4 years SD 10.6 years vs non-users: 61.7 years SD 10.2 years, P<.001); more were female physicians (users: 80/417, 19.2\% vs non-users: 14/107, 13.1\%, P=.013); possessed less clinical experience (with more than20 years of practice: users: 261/417, 62.6\% vs non-user: 93/107, 86.9\%, P<.001); fewer worked under a Health Maintenance Organization (users: 347/417, 83.2\% vs non-users: 97/107, 90.7\%, P<.001) and more worked with practice partners (users: 126/417, 30.2\% vs non-users: 4/107, 3.7\%, P<.001). Efficiency (379/417, 90.9\%) and reduction of medical errors (229/417, 54.9\%) were the major enabling factors, while patient-unfriendliness (58/107, 54.2\%) and limited consultation time (54/107, 50.5\%) were the most commonly reported hindering factors. The key functions of computer software among eMR users consisted of electronic patient registration system (376/417, 90.2\%), drug dispensing system (328/417, 78.7\%) and electronic drug labels (296/417, 71.0\%). SoftLink Clinic Solution was the most popular vendor (160/417, 38.4\%). Conclusions: These findings identified several physician groups who should be targeted for more assistance on eMR installation and its adoption. Future studies should address the barriers of using Internet-based eMR to enhance its adoption. ", doi="10.2196/medinform.2766", url="http://medinform.jmir.org/2013/1/e1/", url="http://www.ncbi.nlm.nih.gov/pubmed/25599989" } @Article{info:doi/10.2196/resprot.2644, author="Van de Velde, Stijn and Vander Stichele, Robert and Fauquert, Benjamin and Geens, Siegfried and Heselmans, Annemie and Ramaekers, Dirk and Kunnamo, Ilkka and Aertgeerts, Bert", title="EBMPracticeNet: A Bilingual National Electronic Point-Of-Care Project for Retrieval of Evidence-Based Clinical Guideline Information and Decision Support", journal="JMIR Res Protoc", year="2013", month="Jul", day="10", volume="2", number="2", pages="e23", keywords="evidence-based medicine", keywords="practice guidelines as topic", keywords="decision support systems", keywords="clinical", keywords="point-of-care systems", keywords="biomedical technology", keywords="medical informatics", keywords="information storage and retrieval", keywords="information management", keywords="ambulatory care information systems", abstract="Background: In Belgium, the construction of a national electronic point-of-care information service, EBMPracticeNet, was initiated in 2011 to optimize quality of care by promoting evidence-based decision-making. The collaboration of the government, health care providers, evidence-based medicine (EBM) partners, and vendors of electronic health records (EHR) is unique to this project. All Belgian health care professionals get free access to an up-to-date database of validated Belgian and nearly 1000 international guidelines, incorporated in a portal that also provides EBM information from other sources than guidelines, including computerized clinical decision support that is integrated in the EHRs. Objective: The objective of this paper was to describe the development strategy, the overall content, and the management of EBMPracticeNet which may be of relevance to other health organizations creating national or regional electronic point-of-care information services. Methods: Several candidate providers of comprehensive guideline solutions were evaluated and one database was selected. Translation of the guidelines to Dutch and French was done with translation software, post-editing by translators and medical proofreading. A strategy is determined to adapt the guideline content to the Belgian context. Acceptance of the computerized clinical decision support tool has been tested and a randomized controlled trial is planned to evaluate the effect on process and patient outcomes. Results: Currently, EBMPracticeNet is in ``work in progress'' state. Reference is made to the results of a pilot study and to further planned research including a randomized controlled trial. Conclusions: The collaboration of government, health care providers, EBM partners, and vendors of EHRs is unique. The potential value of the project is great. The link between all the EHRs from different vendors and a national database held on a single platform that is controlled by all EBM organizations in Belgium are the strengths of EBMPracticeNet. ", doi="10.2196/resprot.2644", url="http://www.researchprotocols.org/2013/2/e23/", url="http://www.ncbi.nlm.nih.gov/pubmed/23842038" } @Article{info:doi/10.2196/ijmr.2454, author="Klann, G. Jeffrey and McCoy, B. Allison and Wright, Adam and Wattanasin, Nich and Sittig, F. Dean and Murphy, N. Shawn", title="Health Care Transformation Through Collaboration on Open-Source Informatics Projects: Integrating a Medical Applications Platform, Research Data Repository, and Patient Summarization", journal="Interact J Med Res", year="2013", month="May", day="30", volume="2", number="1", pages="e11", keywords="clinical information systems", keywords="medical informatics", keywords="knowledge bases", keywords="user-computer interface", keywords="data display", keywords="diffusion of innovation", abstract="Background: The Strategic Health IT Advanced Research Projects (SHARP) program seeks to conquer well-understood challenges in medical informatics through breakthrough research. Two SHARP centers have found alignment in their methodological needs: (1) members of the National Center for Cognitive Informatics and Decision-making (NCCD) have developed knowledge bases to support problem-oriented summarizations of patient data, and (2) Substitutable Medical Apps, Reusable Technologies (SMART), which is a platform for reusable medical apps that can run on participating platforms connected to various electronic health records (EHR). Combining the work of these two centers will ensure wide dissemination of new methods for synthesized views of patient data. Informatics for Integrating Biology and the Bedside (i2b2) is an NIH-funded clinical research data repository platform in use at over 100 sites worldwide. By also working with a co-occurring initiative to SMART-enabling i2b2, we can confidently write one app that can be used extremely broadly. Objective: Our goal was to facilitate development of intuitive, problem-oriented views of the patient record using NCCD knowledge bases that would run in any EHR. To do this, we developed a collaboration between the two SHARPs and an NIH center, i2b2. Methods: First, we implemented collaborative tools to connect researchers at three institutions. Next, we developed a patient summarization app using the SMART platform and a previously validated NCCD problem-medication linkage knowledge base derived from the National Drug File-Reference Terminology (NDF-RT). Finally, to SMART-enable i2b2, we implemented two new Web service ``cells'' that expose the SMART application programming interface (API), and we made changes to the Web interface of i2b2 to host a ``carousel'' of SMART apps. Results: We deployed our SMART-based, NDF-RT-derived patient summarization app in this SMART-i2b2 container. It displays a problem-oriented view of medications and presents a line-graph display of laboratory results. Conclusions: This summarization app can be run in any EHR environment that either supports SMART or runs SMART-enabled i2b2. This i2b2 ``clinical bridge'' demonstrates a pathway for reusable app development that does not require EHR vendors to immediately adopt the SMART API. Apps can be developed in SMART and run by clinicians in the i2b2 repository, reusing clinical data extracted from EHRs. This may encourage the adoption of SMART by supporting SMART app development until EHRs adopt the platform. It also allows a new variety of clinical SMART apps, fueled by the broad aggregation of data types available in research repositories. The app (including its knowledge base) and SMART-i2b2 are open-source and freely available for download. ", doi="10.2196/ijmr.2454", url="http://www.i-jmr.org/2013/1/e11/", url="http://www.ncbi.nlm.nih.gov/pubmed/23722634" } @Article{info:doi/10.2196/jmir.2495, author="Chen, Wei-Hsin and Hsieh, Sheau-Ling and Hsu, Kai-Ping and Chen, Han-Ping and Su, Xing-Yu and Tseng, Yi-Ju and Chien, Yin-Hsiu and Hwu, Wuh-Liang and Lai, Feipei", title="Web-Based Newborn Screening System for Metabolic Diseases: Machine Learning Versus Clinicians", journal="J Med Internet Res", year="2013", month="May", day="23", volume="15", number="5", pages="e98", keywords="Web-based services", keywords="neonatal screening", keywords="tandem mass spectrometry", keywords="information systems", keywords="metabolism", keywords="inborn errors", abstract="Background: A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. Objective: The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. Methods: The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C\# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases. Results: The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency. Conclusions: This SOA Web service--based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically. ", doi="10.2196/jmir.2495", url="http://www.jmir.org/2013/5/e98/", url="http://www.ncbi.nlm.nih.gov/pubmed/23702487" } @Article{info:doi/10.2196/ijmr.2468, author="Li, Junhua and Talaei-Khoei, Amir and Seale, Holly and Ray, Pradeep and MacIntyre, Raina C.", title="Health Care Provider Adoption of eHealth: Systematic Literature Review", journal="Interact J Med Res", year="2013", month="Apr", day="16", volume="2", number="1", pages="e7", keywords="technology acceptance", keywords="eHealth", keywords="health care provider", keywords="adoption", abstract="Background: eHealth is an application of information and communication technologies across the whole range of functions that affect health. The benefits of eHealth (eg, improvement of health care operational efficiency and quality of patient care) have previously been documented in the literature. Health care providers (eg, medical doctors) are the key driving force in pushing eHealth initiatives. Without their acceptance and actual use, those eHealth benefits would be unlikely to be reaped. Objective: To identify and synthesize influential factors to health care providers' acceptance of various eHealth systems. Methods: This systematic literature review was conducted in four steps. The first two steps facilitated the location and identification of relevant articles. The third step extracted key information from those articles including the studies' characteristics and results. In the last step, identified factors were analyzed and grouped in accordance with the Unified Theory of Acceptance and Use of Technology (UTAUT). Results: This study included 93 papers that have studied health care providers' acceptance of eHealth. From these papers, 40 factors were identified and grouped into 7 clusters: (1) health care provider characteristics, (2) medical practice characteristics, (3) voluntariness of use, (4) performance expectancy, (5) effort expectancy, (6) social influence, and (7) facilitating or inhibiting conditions. Conclusions: The grouping results demonstrated that the UTAUT model is useful for organizing the literature but has its limitations. Due to the complex contextual dynamics of health care settings, our work suggested that there would be potential to extend theories on information technology adoption, which is of great benefit to readers interested in learning more on the topic. Practically, these findings may help health care decision makers proactively introduce interventions to encourage acceptance of eHealth and may also assist health policy makers refine relevant policies to promote the eHealth innovation. ", doi="10.2196/ijmr.2468", url="http://www.i-jmr.org/2013/1/e7/", url="http://www.ncbi.nlm.nih.gov/pubmed/23608679" } @Article{info:doi/10.2196/ijmr.2437, author="Nambisan, Priya and Kreps, L. Gary and Polit, Stan", title="Understanding Electronic Medical Record Adoption in the United States: Communication and Sociocultural Perspectives", journal="Interact J Med Res", year="2013", month="Mar", day="26", volume="2", number="1", pages="e5", keywords="electronic health records adoption", keywords="communication", keywords="systems approach", abstract="Background: This paper adopts a communication and sociocultural perspective to analyze the factors behind the lag in electronic medical record (EMR) adoption in the United States. Much of the extant research on this topic has emphasized economic factors, particularly, lack of economic incentives, as the primary cause of the delay in EMR adoption. This prompted the Health Information Technology on Economic and Clinical Health Act that allow financial incentives through the Centers of Medicare and Medicaid Services for many health care organizations planning to adopt EMR. However, financial incentives alone have not solved the problem; many new innovations do not diffuse even when offered for free. Thus, this paper underlines the need to consider communication and sociocultural factors to develop a better understanding of the impediments of EMR adoption. Objective: The objective of this paper was to develop a holistic understanding of EMR adoption by identifying and analyzing the impact of communication and sociocultural factors that operate at 3 levels: macro (environmental), meso (organizational), and micro (individual). Methods: We use the systems approach to focus on the 3 levels (macro, meso, and micro) and developed propositions at each level drawing on the communication and sociocultural perspectives. Results: Our analysis resulted in 10 propositions that connect communication and sociocultural aspects with EMR adoption. Conclusions: This paper brings perspectives from the social sciences that have largely been missing in the extant literature of health information technology (HIT) adoption. In doing so, it implies how communication and sociocultural factors may complement (and in some instances, reinforce) the impact of economic factors on HIT adoption. ", doi="10.2196/ijmr.2437", url="http://www.i-jmr.org/2013/1/e5/", url="http://www.ncbi.nlm.nih.gov/pubmed/23612390" } @Article{info:doi/10.5210/ojphi.v5i1.4424, title="Roles of Health Literacy in Relation to Social Determinants of Health and Recommendations for Informatics-Based Interventions: Systematic Review", journal="Online J Public Health Inform", year="2013", volume="5", number="1", pages="e4424", doi="10.5210/ojphi.v5i1.4424", url="" } @Article{info:doi/10.2196/ijmr.2402, author="Yuan, Juntao Michael and Finley, Mike George and Long, Ju and Mills, Christy and Johnson, Kim Ron", title="Evaluation of User Interface and Workflow Design of a Bedside Nursing Clinical Decision Support System", journal="Interact J Med Res", year="2013", month="Jan", day="31", volume="2", number="1", pages="e4", keywords="clinical decision support systems", keywords="user-computer interface", keywords="software design", keywords="human computer interaction", keywords="usability testing", keywords="heuristic evaluations", keywords="software performance", keywords="patient-centered care", abstract="Background: Clinical decision support systems (CDSS) are important tools to improve health care outcomes and reduce preventable medical adverse events. However, the effectiveness and success of CDSS depend on their implementation context and usability in complex health care settings. As a result, usability design and validation, especially in real world clinical settings, are crucial aspects of successful CDSS implementations. Objective: Our objective was to develop a novel CDSS to help frontline nurses better manage critical symptom changes in hospitalized patients, hence reducing preventable failure to rescue cases. A robust user interface and implementation strategy that fit into existing workflows was key for the success of the CDSS. Methods: Guided by a formal usability evaluation framework, UFuRT (user, function, representation, and task analysis), we developed a high-level specification of the product that captures key usability requirements and is flexible to implement. We interviewed users of the proposed CDSS to identify requirements, listed functions, and operations the system must perform. We then designed visual and workflow representations of the product to perform the operations.The user interface and workflow design were evaluated via heuristic and end user performance evaluation. The heuristic evaluation was done after the first prototype, and its results were incorporated into the product before the end user evaluation was conducted. First, we recruited 4 evaluators with strong domain expertise to study the initial prototype. Heuristic violations were coded and rated for severity. Second, after development of the system, we assembled a panel of nurses, consisting of 3 licensed vocational nurses and 7 registered nurses, to evaluate the user interface and workflow via simulated use cases. We recorded whether each session was successfully completed and its completion time. Each nurse was asked to use the National Aeronautics and Space Administration (NASA) Task Load Index to self-evaluate the amount of cognitive and physical burden associated with using the device. Results: A total of 83 heuristic violations were identified in the studies. The distribution of the heuristic violations and their average severity are reported. The nurse evaluators successfully completed all 30 sessions of the performance evaluations. All nurses were able to use the device after a single training session. On average, the nurses took 111 seconds (SD 30 seconds) to complete the simulated task. The NASA Task Load Index results indicated that the work overhead on the nurses was low. In fact, most of the burden measures were consistent with zero. The only potentially significant burden was temporal demand, which was consistent with the primary use case of the tool. Conclusions: The evaluation has shown that our design was functional and met the requirements demanded by the nurses' tight schedules and heavy workloads. The user interface embedded in the tool provided compelling utility to the nurse with minimal distraction. ", doi="10.2196/ijmr.2402", url="http://www.i-jmr.org/2013/1/e4/", url="http://www.ncbi.nlm.nih.gov/pubmed/23612350" } @Article{info:doi/10.2196/ijmr.2092, author="Castiglione, Filippo and Diaz, Vanessa and Gaggioli, Andrea and Li{\`o}, Pietro and Mazz{\`a}, Claudia and Merelli, Emanuela and Meskers, G.M Carel and Pappalardo, Francesco and von Ammon, Rainer", title="Physio-Environmental Sensing and Live Modeling", journal="Interact J Med Res", year="2013", month="Jan", day="30", volume="2", number="1", pages="e3", keywords="personalized health care, mobile networks, computer models, telediagnosis", doi="10.2196/ijmr.2092", url="http://www.i-jmr.org/2013/1/e3/", url="http://www.ncbi.nlm.nih.gov/pubmed/23612245" } @Article{info:doi/10.2196/jmir.2105, author="Jeffery, Rebecca and Navarro, Tamara and Lokker, Cynthia and Haynes, Brian R. and Wilczynski, L. Nancy and Farjou, George", title="How Current Are Leading Evidence-Based Medical Textbooks? An Analytic Survey of Four Online Textbooks", journal="J Med Internet Res", year="2012", month="Dec", day="10", volume="14", number="6", pages="e175", keywords="databases, bibliographic", keywords="medical informatics", keywords="evidence-based medicine", abstract="Background: The consistency of treatment recommendations of evidence-based medical textbooks with more recently published evidence has not been investigated to date. Inconsistencies could affect the quality of medical care. Objective: To determine the frequency with which topics in leading online evidence-based medical textbooks report treatment recommendations consistent with more recently published research evidence. Methods: Summarized treatment recommendations in 200 clinical topics (ie, disease states) covered in four evidence-based textbooks--UpToDate, Physicians' Information Education Resource (PIER), DynaMed, and Best Practice--were compared with articles identified in an evidence rating service (McMaster Premium Literature Service, PLUS) since the date of the most recent topic updates in each textbook. Textbook treatment recommendations were compared with article results to determine if the articles provided different, new conclusions. From these findings, the proportion of topics which potentially require updating in each textbook was calculated. Results: 478 clinical topics were assessed for inclusion to find 200 topics that were addressed by all four textbooks. The proportion of topics for which there was 1 or more recently published articles found in PLUS with evidence that differed from the textbooks' treatment recommendations was 23\% (95\% CI 17-29\%) for DynaMed, 52\% (95\% CI 45-59\%) for UpToDate, 55\% (95\% CI 48-61\%) for PIER, and 60\% (95\% CI 53-66\%) for Best Practice ($\chi$23=65.3, P<.001). The time since the last update for each textbook averaged from 170 days (range 131-209) for DynaMed, to 488 days (range 423-554) for PIER (P<.001 across all textbooks). Conclusions: In online evidence-based textbooks, the proportion of topics with potentially outdated treatment recommendations varies substantially. ", doi="10.2196/jmir.2105", url="http://www.jmir.org/2012/6/e175/", url="http://www.ncbi.nlm.nih.gov/pubmed/23220465" } @Article{info:doi/10.2196/ijmr.2089, author="{\"O}hlund, Sten-Erik and {\AA}strand, Bengt and Petersson, G{\"o}ran", title="Improving Interoperability in ePrescribing", journal="Interact J Med Res", year="2012", month="Nov", day="22", volume="1", number="2", pages="e17", keywords="eHealth, Electronic prescribing, Electronic prescription, Information quality, Interoperability", abstract="Background: The increased application of eServices in health care, in general, and ePrescribing (electronic prescribing) in particular, have brought quality and interoperability to the forefront.The application of standards has been put forward as one important factor in improving interoperability. However, less focus has been placed on other factors, such as stakeholders' involvement and the measurement of interoperability.An information system (IS) can be regarded to comprise an instrument for technology-mediated work communication. In this study, interoperability refers to the interoperation in the ePrescribing process, involving people, systems, procedures and organizations. We have focused on the quality of the ePrescription message as one component of the interoperation in the ePrescribing process. Objective: The objective was to analyze how combined efforts in improving interoperability with the introduction of the new national ePrescription format (NEF) have impacted interoperability in the ePrescribing process in Sweden, with the focus on the quality of the ePrescription message. Methods: Consecutive sampling of electronic prescriptions in Sweden before and after the introduction of NEF was undertaken in April 2008 (pre-NEF) and April 2009 (post-NEF).Interoperability problems were identified and classified based on message format specifications and prescription rules. Results: The introduction of NEF improved the interoperability of ePrescriptions substantially. In the pre-NEF sample, a total of 98.6\% of the prescriptions had errors. In the post-NEF sample, only 0.9\% of the prescriptions had errors. The mean number of errors was fewer for the erroneous prescriptions: 4.8 in pre-NEF compared to 1.0 in post-NEF. Conclusions: We conclude that a systematic comprehensive work on interoperability, covering technical, semantical, professional, judicial and process aspects, involving the stakeholders, resulted in an improved interoperability of ePrescriptions. ", doi="10.2196/ijmr.2089", url="http://www.i-jmr.org/2012/2/e17/", url="http://www.ncbi.nlm.nih.gov/pubmed/23612314" } @Article{info:doi/10.2196/ijmr.2163, author="Parra, Carlos and J{\'o}dar-S{\'a}nchez, Francisco and Jim{\'e}nez-Hern{\'a}ndez, Dolores M. and Vigil, Eduardo and Palomino-Garc{\'i}a, Alfredo and Moniche-{\'A}lvarez, Francisco and De la Torre-Laviana, Javier Francisco and Bonachela, Patricia and Fern{\'a}ndez, Jos{\'e} Francisco and Cayuela-Dom{\'i}nguez, Aurelio and Leal, Sandra", title="Development, Implementation, and Evaluation of a Telemedicine Service for the Treatment of Acute Stroke Patients: TeleStroke", journal="Interact J Med Res", year="2012", month="Nov", day="15", volume="1", number="2", pages="e15", keywords="Telemedicine", keywords="Standardization", keywords="Stroke", keywords="Fibrinolysis", abstract="Background: Health care service based on telemedicine can reduce both physical and time barriers in stroke treatments. Moreover, this service connects centers specializing in stroke treatment with other centers and practitioners, thereby increasing accessibility to neurological specialist care and fibrinolytic treatment. Objective: Development, implementation, and evaluation of a care service for the treatment of acute stroke patients based on telemedicine (TeleStroke) at Virgen del Roc{\'i}o University Hospital. Methods: The evaluation phase, conducted from October 2008 to January 2011, involved patients who presented acute stroke symptoms confirmed by the emergency physician; they were examined using TeleStroke in two hospitals, at a distance of 16 and 110 kilometers from Virgen del Roc{\'i}o University Hospital. We analyzed the number of interconsultation sheets, the percentage of patients treated with fibrinolysis, and the number of times they were treated. To evaluate medical professionals' acceptance of the TeleStroke system, we developed a web-based questionnaire using a Technology Acceptance Model. Results: A total of 28 patients were evaluated through the interconsultation sheet. Out of 28 patients, 19 (68\%) received fibrinolytic treatment. The most common reasons for not treating with fibrinolysis included: clinical criteria in six out of nine patients (66\%) and beyond the time window in three out of nine patients (33\%). The mean ``onset-to-hospital'' time was 69 minutes, the mean time from admission to CT image was 33 minutes, the mean ``door-to-needle'' time was 82 minutes, and the mean ``onset-to-needle'' time was 150 minutes. Out of 61 medical professionals, 34 (56\%) completed a questionnaire to evaluate the acceptability of the TeleStroke system. The mean values for each item were over 6.50, indicating that respondents positively evaluated each item. This survey was assessed using the Cronbach alpha test to determine the reliability of the questionnaire and the results obtained, giving a value of 0.97. Conclusions: The implementation of TeleStroke has made it possible for patients in the acute phase of stroke to receive effective treatment, something that was previously impossible because of the time required to transfer them to referral hospitals. ", doi="10.2196/ijmr.2163", url="http://www.i-jmr.org/2012/2/e15/", url="http://www.ncbi.nlm.nih.gov/pubmed/23612154" } @Article{info:doi/10.2196/ijmr.2126, author="Carmona-Cejudo, M. Jos{\'e} and Hortas, Luisa Maria and Baena-Garc{\'i}a, Manuel and Lana-Linati, Jorge and Gonz{\'a}lez, Carlos and Redondo, Maximino and Morales-Bueno, Rafael", title="DB4US: A Decision Support System for Laboratory Information Management", journal="Interact J Med Res", year="2012", month="Nov", day="14", volume="1", number="2", pages="e16", keywords="Automation, laboratory", keywords="Medical Informatics Applications", keywords="Data Mining", keywords="Quality Indicators, Health Care", abstract="Background: Until recently, laboratory automation has focused primarily on improving hardware. Future advances are concentrated on intelligent software since laboratories performing clinical diagnostic testing require improved information systems to address their data processing needs. In this paper, we propose DB4US, an application that automates information related to laboratory quality indicators information. Currently, there is a lack of ready-to-use management quality measures. This application addresses this deficiency through the extraction, consolidation, statistical analysis, and visualization of data related to the use of demographics, reagents, and turn-around times. The design and implementation issues, as well as the technologies used for the implementation of this system, are discussed in this paper. Objective: To develop a general methodology that integrates the computation of ready-to-use management quality measures and a dashboard to easily analyze the overall performance of a laboratory, as well as automatically detect anomalies or errors. The novelty of our approach lies in the application of integrated web-based dashboards as an information management system in hospital laboratories. Methods: We propose a new methodology for laboratory information management based on the extraction, consolidation, statistical analysis, and visualization of data related to demographics, reagents, and turn-around times, offering a dashboard-like user web interface to the laboratory manager. The methodology comprises a unified data warehouse that stores and consolidates multidimensional data from different data sources. The methodology is illustrated through the implementation and validation of DB4US, a novel web application based on this methodology that constructs an interface to obtain ready-to-use indicators, and offers the possibility to drill down from high-level metrics to more detailed summaries. The offered indicators are calculated beforehand so that they are ready to use when the user needs them. The design is based on a set of different parallel processes to precalculate indicators. The application displays information related to tests, requests, samples, and turn-around times. The dashboard is designed to show the set of indicators on a single screen. Results: DB4US was deployed for the first time in the Hospital Costa del Sol in 2008. In our evaluation we show the positive impact of this methodology for laboratory professionals, since the use of our application has reduced the time needed for the elaboration of the different statistical indicators and has also provided information that has been used to optimize the usage of laboratory resources by the discovery of anomalies in the indicators. DB4US users benefit from Internet-based communication of results, since this information is available from any computer without having to install any additional software. Conclusions: The proposed methodology and the accompanying web application, DB4US, automates the processing of information related to laboratory quality indicators and offers a novel approach for managing laboratory-related information, benefiting from an Internet-based communication mechanism. The application of this methodology has been shown to improve the usage of time, as well as other laboratory resources. ", doi="10.2196/ijmr.2126", url="http://www.i-jmr.org/2012/2/e16/", url="http://www.ncbi.nlm.nih.gov/pubmed/23608745" } @Article{info:doi/10.2196/ijmr.2064, author="Heyworth, Leonie and Zhang, Fang and Jenter, A. Chelsea and Kell, Rachel and Volk, A. Lynn and Tripathi, Micky and Bates, W. David and Simon, R. Steven", title="Physician Satisfaction Following Electronic Health Record Adoption in Three Massachusetts Communities", journal="Interact J Med Res", year="2012", month="Nov", day="08", volume="1", number="2", pages="e12", keywords="electronic health record", keywords="physician satisfaction", keywords="implementation", keywords="Massachusetts eHealth collaborative", abstract="Background: Despite mandates and incentives for electronic health record (EHR) adoption, little is known about factors predicting physicians' satisfaction following EHR implementation. Objective: To measure predictors of physician satisfaction following EHR adoption. Methods: A total of 163 physicians completed a mailed survey before and after EHR implementation through a statewide pilot project in Massachusetts. Multivariable logistic regression identified predictors of physician satisfaction with their current practice situation in 2009 and generalized estimating equations accounted for clustering. Results: The response rate was 77\% in 2005 and 68\% in 2009. In 2005, prior to EHR adoption, 28\% of physicians were very satisfied with their current practice situation compared to 25\% in 2009, following EHR adoption (P?5 years) natural history studies in various diseases and from several existing registries. Face validity of the questions was determined by review by many experts (both terminology experts at the College of American Pathologists (CAP) and research and informatics experts at the University of South Florida (USF)) for commonality, clarity, and organization. Questions were re-worded slightly, as needed, to make the full semantics of the question clear and to make the questions generalizable to multiple diseases where possible. Questions were indexed with metadata (structured and descriptive information) using a standard metadata framework to record such information as context, format, question asker and responder, and data standards information. Results: At present, PRISM contains over 2,200 questions, with content of PRISM relevant to virtually all rare diseases. While the inclusion of disease-specific questions for thousands of rare disease organizations seeking to develop registries would present a challenge for traditional standards development organizations, the PRISM library could serve as a platform to liaison between rare disease communities and existing standardized controlled terminologies, item banks, and coding systems. Conclusions: If widely used, PRISM will enable the re-use of questions across registries, reduce variation in registry data collection, and facilitate a bottom-up standardization of patient registries. Although it was initially developed to fulfill an urgent need in the rare disease community for shared resources, the PRISM library of patient-directed registry questions can be a valuable resource for registries in any disease -- whether common or rare. Trial Registration: N/A ", doi="10.2196/ijmr.2107", url="http://www.i-jmr.org/2012/2/e10/", url="http://www.ncbi.nlm.nih.gov/pubmed/23611924" } @Article{info:doi/10.2196/ijmr.2193, author="Spanakis, G. Emmanouil and Chiarugi, Franco and Kouroubali, Angelina and Spat, Stephan and Beck, Peter and Asanin, Stefan and Rosengren, Peter and Gergely, Tamas and Thestrup, Jesper", title="Diabetes Management Using Modern Information and Communication Technologies and New Care Models", journal="Interact J Med Res", year="2012", month="Oct", day="04", volume="1", number="2", pages="e8", keywords="Medical Information Systems", keywords="Medical Expert Systems", keywords="Biomedical Engineering", keywords="Biomedical Informatics", keywords="Biomedical Computing", keywords="Telemedicine", keywords="Diabetes", abstract="Background: Diabetes, a metabolic disorder, has reached epidemic proportions in developed countries. The disease has two main forms: type 1 and type 2. Disease management entails administration of insulin in combination with careful blood glucose monitoring (type 1) or involves the adjustment of diet and exercise level, the use of oral anti-diabetic drugs, and insulin administration to control blood sugar (type 2). Objective: State-of-the-art technologies have the potential to assist healthcare professionals, patients, and informal carers to better manage diabetes insulin therapy, help patients understand their disease, support self-management, and provide a safe environment by monitoring adverse and potentially life-threatening situations with appropriate crisis management. Methods: New care models incorporating advanced information and communication technologies have the potential to provide service platforms able to improve health care, personalization, inclusion, and empowerment of the patient, and to support diverse user preferences and needs in different countries. The REACTION project proposes to create a service-oriented architectural platform based on numerous individual services and implementing novel care models that can be deployed in different settings to perform patient monitoring, distributed decision support, health care workflow management, and clinical feedback provision. Results: This paper presents the work performed in the context of the REACTION project focusing on the development of a health care service platform able to support diabetes management in different healthcare regimes, through clinical applications, such as monitoring of vital signs, feedback provision to the point of care, integrative risk assessment, and event and alarm handling. While moving towards the full implementation of the platform, three major areas of research and development have been identified and consequently approached: the first one is related to the glucose sensor technology and wearability, the second is related to the platform architecture, and the third to the implementation of the end-user services. The Glucose Management System, already developed within the REACTION project, is able to monitor a range of parameters from various sources including glucose levels, nutritional intakes, administered drugs, and patient's insulin sensitivity, offering decision support for insulin dosing to professional caregivers on a mobile tablet platform that fulfills the need of the users and supports medical workflow procedures in compliance with the Medical Device Directive requirements. Conclusions: Good control of diabetes, as well as increased emphasis on control of lifestyle factors, may reduce the risk profile of most complications and contribute to health improvement. The REACTION project aims to respond to these challenges by providing integrated, professional, management, and therapy services to diabetic patients in different health care regimes across Europe in an interoperable communication platform. ", doi="10.2196/ijmr.2193", url="http://www.jmir.org/2012/2/e8/", url="http://www.ncbi.nlm.nih.gov/pubmed/23612026" } @Article{info:doi/10.2196/ijmr.2101, author="Tielemans, M. Merel and Jansen, BMJ Jan and van Oijen, GH Martijn", title="Open Access Capture of Patients With Gastroesophageal Reflux Disease Using an Online Patient-Reported Outcomes Instrument", journal="Interact J Med Res", year="2012", month="Sep", day="26", volume="1", number="2", pages="e7", keywords="Gastroesophageal reflux", keywords="proton pump inhibitor", keywords="Internet", keywords="open access questionnaire", keywords="partial responsiveness", abstract="Background: Persons with gastroesophageal reflux disease (GERD) frequently search online for information about causes and treatment options. The GerdQ self-assessment questionnaire can be used for diagnosis of GERD and follow-up of symptoms. Objectives: To assess whether it is feasible (1) to study the prevalence and impact of GERD in persons visiting a GERD information website, and (2) to identify partial responsiveness to proton pump inhibitor (PPI) therapy using the GerdQ. Methods: All visitors (aged 18--79 years) to a GERD information website between November 2008 and May 2011 were invited to complete the GerdQ online. The GerdQ questionnaire consists of 6 questions (score per question: 0--3). In respondents who did not use PPIs, we used the questionnaire to identify those with GERD (total score ?8) and assess the influence of these symptoms on their daily life, divided into low (total score <3 on impact questions) and high impact (total score ?3 on impact questions). In PPI users, we used the GerdQ to quantify partial responsiveness by any report of heartburn, regurgitation, sleep disturbance, or over-the-counter medication use for more than 1 day in the preceding week. We subsequently asked GerdQ respondents scoring ?8 to complete the disease-specific Quality of Life in Reflux and Dyspepsia (QOLRAD) questionnaire. Results: A total of 131,286 visitors completed the GerdQ, of whom 80.23\% (n = 105,329) did not use a PPI. Of these, we identified 67,379 respondents (63.97\%) to have GERD (n = 32,935; 48.88\% high impact). We invited 14,028 non-PPI users to complete the QOLRAD questionnaire, of whom 1231 (8.78\%) completed the questionnaire. Mean total QOLRAD scores were 5.14 (SEM 0.04) for those with high-impact GERD and 5.77 (SEM 0.04) for those with low-impact GERD (P < .001). In PPI users, 22,826 of 25,957 respondents (87.94\%) reported partial responsiveness. We invited 6238 PPI users to complete the QOLRAD questionnaire, of whom 599 (9.60\%) completed the disease-specific quality-of-life questionnaire. Mean total QOLRAD scores were 4.62 (SEM 0.05) for partial responders and 5.88 (SEM 0.14) for adequate responders (P < .001). Conclusions: The GerdQ identified GERD in many website respondents and measured partial responsiveness in the majority of PPI users. Both non-PPI users with GERD and PPI users with partial responsiveness were associated with a decreased health-related quality of life. We have shown the feasibility of GERD patient identification online. ", doi="10.2196/ijmr.2101", url="http://www.jmir.org/2012/2/e7/", url="http://www.ncbi.nlm.nih.gov/pubmed/23611985" } @Article{info:doi/10.2196/ijmr.2022, author="Wu, C. Robert and Lo, Vivian and Rossos, Peter and Kuziemsky, Craig and O'Leary, J. Kevin and Cafazzo, A. Joseph and Reeves, Scott and Wong, M. Brian and Morra, Dante", title="Improving Hospital Care and Collaborative Communications for the 21st Century: Key Recommendations for General Internal Medicine", journal="Interact J Med Res", year="2012", month="Sep", day="24", volume="1", number="2", pages="e9", keywords="hospital care communication", keywords="technology", keywords="knowledge transfer", keywords="interprofessional collaboration", abstract="Background: Communication and collaboration failures can have negative impacts on the efficiency of both individual clinicians and health care system delivery as well as on the quality of patient care. Recognizing the problems associated with clinical and collaboration communication, health care professionals and organizations alike have begun to look at alternative communication technologies to address some of these inefficiencies and to improve interprofessional collaboration. Objective: To develop recommendations that assist health care organizations in improving communication and collaboration in order to develop effective methods for evaluation. Methods: An interprofessional meeting was held in a large urban city in Canada with 19 nationally and internationally renowned experts to discuss suitable recommendations for an ideal communication and collaboration system as well as a research framework for general internal medicine (GIM) environments. Results: In designing an ideal GIM communication and collaboration system, attendees believed that the new system should possess attributes that aim to: a) improve workflow through prioritization of information and detection of individuals' contextual situations; b) promote stronger interprofessional relationships with adequate exchange of information; c) enhance patient-centered care by allowing greater patient autonomy over their health care information; d) enable interoperability and scalability between and within institutions; and e) function across different platforms. In terms of evaluating the effects of technology in GIM settings, participants championed the use of rigorous scientific methods that span multiple perspectives and disciplines. Specifically, participants recommended that consistent measures and definitions need to be established so that these impacts can be examined across individual, group, and organizational levels. Conclusions: Discussions from our meeting demonstrated the complexities of technological implementations in GIM settings. Recommendations on the design principles and research paradigms for an improved communication system are described. ", doi="10.2196/ijmr.2022", url="http://www.i-jmr.org/2012/2/e9/", url="http://www.ncbi.nlm.nih.gov/pubmed/23612055" } @Article{info:doi/10.2196/ijmr.2131, author="Steele, Scott and Bilchik, Anton and Eberhardt, John and Kalina, Philip and Nissan, Aviram and Johnson, Eric and Avital, Itzhak and Stojadinovic, Alexander", title="Using Machine-Learned Bayesian Belief Networks to Predict Perioperative Risk of Clostridium Difficile Infection Following Colon Surgery", journal="Interact J Med Res", year="2012", month="Sep", day="19", volume="1", number="2", pages="e6", keywords="Clostridium difficile", keywords="Bayesian belief network", keywords="pseudomembranous colitis", keywords="colectomy", keywords="NIS", abstract="Background: Clostridium difficile (C-Diff) infection following colorectal resection is an increasing source of morbidity and mortality. Objective: We sought to determine if machine-learned Bayesian belief networks (ml-BBNs) could preoperatively provide clinicians with postoperative estimates of C-Diff risk. Methods: We performed a retrospective modeling of the Nationwide Inpatient Sample (NIS) national registry dataset with independent set validation. The NIS registries for 2005 and 2006 were used for initial model training, and the data from 2007 were used for testing and validation. International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes were used to identify subjects undergoing colon resection and postoperative C-Diff development. The ml-BBNs were trained using a stepwise process. Receiver operating characteristic (ROC) curve analysis was conducted and area under the curve (AUC), positive predictive value (PPV), and negative predictive value (NPV) were calculated. Results: From over 24 million admissions, 170,363 undergoing colon resection met the inclusion criteria. Overall, 1.7\% developed postoperative C-Diff. Using the ml-BBN to estimate C-Diff risk, model AUC is 0.75. Using only known a priori features, AUC is 0.74. The model has two configurations: a high sensitivity and a high specificity configuration. Sensitivity, specificity, PPV, and NPV are 81.0\%, 50.1\%, 2.6\%, and 99.4\% for high sensitivity and 55.4\%, 81.3\%, 3.5\%, and 99.1\% for high specificity. C-Diff has 4 first-degree associates that influence the probability of C-Diff development: weight loss, tumor metastases, inflammation/infections, and disease severity. Conclusions: Machine-learned BBNs can produce robust estimates of postoperative C-Diff infection, allowing clinicians to identify high-risk patients and potentially implement measures to reduce its incidence or morbidity. ", doi="10.2196/ijmr.2131", url="http://www.jmir.org/2012/2/e6/", url="http://www.ncbi.nlm.nih.gov/pubmed/23611947" } @Article{info:doi/10.2196/ijmr.2150, author="Malo, Christian and Neveu, Xavier and Archambault, Michel Patrick and {\'E}mond, Marcel and Gagnon, Marie-Pierre", title="Exploring Nurses' Intention to Use a Computerized Platform in the Resuscitation Unit: Development and Validation of a Questionnaire Based on the Theory of Planned Behavior", journal="Interact J Med Res", year="2012", month="Sep", day="13", volume="1", number="2", pages="e5", keywords="Primary care nurses", keywords="adoption of new behavior", keywords="intention", keywords="theory of planned behaviour", keywords="emergency department", keywords="trauma care", keywords="electronic health record", keywords="clinical decision support system", abstract="Background: In emergency department resuscitation units, writing down information related to interventions, physical examination, vital signs, investigations, and treatments ordered is a crucial task carried out by nurses. To facilitate this task, a team composed of emergency physicians, nurses, and one computer engineer created a novel electronic platform equipped with a tactile screen that allows systematic collection of critical data. This electronic platform also has medical software (ReaScribe+) that functions as an electronic medical record and a clinical decision support system. Objective: To develop and validate a questionnaire that can help evaluate nurses' intention to use a novel computerized platform in an emergency department resuscitation unit, based on Ajzen's theory of planned behavior (TPB). Methods: The sample for this study was composed of 87 nurses who worked in the resuscitation unit of a tertiary trauma center. We held three focus groups with nurses working in the resuscitation unit to identify the salient modal beliefs regarding their intended use of a new electronic medical charting system for the care of trauma patients. The system included a clinical decision support tool. We developed a questionnaire in which salient modal beliefs were used as items to evaluate the TPB constructs. We also added 13 questions to evaluate nurses' computer literacy. The final questionnaire was composed of 46 questions to be answered on a 7-point Likert scale. All nurses in the resuscitation unit and present during a regular work shift were individually contacted by the principal investigator or a research assistant (phase 1). A subsample of the nurses who completed the questionnaire was invited to complete it a second time 2 weeks later (phase 2). Results: In phase 1, we received 62 of the 70 questionnaires administered (89\% response rate). Of the 27 questionnaires administered in phase 2 (retest phase), 25 were completed (93\% response rate). The questionnaire showed very good internal consistency, as Cronbach alpha was higher than .7 for all constructs. Temporal stability was acceptable with intraclass correlations between .41 and .66. The intention to use the electronic platform to chart the resuscitation of trauma patients was very high among the respondents. In the logistic regression model, the only construct that predicted nurses' intention to adopt the computerized platform was the professional norm (odds ratio 3.31, 95\% confidence interval 1.41--7.78). Conclusions: We developed and validated a questionnaire that can now be used in other emergency departments prior to implementation of the computerized platform. The intention to adopt was very high among the respondents, which suggests that the implementation of this innovation could be successful at our institution. ", doi="10.2196/ijmr.2150", url="http://www.jmir.org/2012/2/e5/", url="http://www.ncbi.nlm.nih.gov/pubmed/23611903" } @Article{info:doi/10.2196/ijmr.2113, author="Bassi, Jesdeep and Lau, Francis and Lesperance, Mary", title="Perceived Impact of Electronic Medical Records in Physician Office Practices: A Review of Survey-Based Research", journal="Interact J Med Res", year="2012", month="Jul", day="28", volume="1", number="2", pages="e3", keywords="Health care surveys", keywords="evaluation studies", keywords="ambulatory care information systems", abstract="Background: Physician office practices are increasingly adopting electronic medical records (EMRs). Therefore, the impact of such systems needs to be evaluated to ensure they are helping practices to realize expected benefits. In addition to experimental and observational studies examining objective impacts, the user's subjective view needs to be understood, since ultimate acceptance and use of the system depends on them. Surveys are commonly used to elicit these views. Objective: To determine which areas of EMR implementation in office practices have been addressed in survey-based research studies, to compare the perceived impacts between users and nonusers for the most-addressed areas, and to contribute to the knowledge regarding survey-based research for assessing the impact of health information systems (HIS). Methods: We searched databases and systematic review citations for papers published between 2000 and 2012 (May) that evaluated the perceived impact of using an EMR system in an office-based practice, were based on original data, had providers as the primary end user, and reported outcome measures related to the system's positive or negative impact. We identified all the reported metrics related to EMR use and mapped them to the Clinical Adoption Framework to analyze the gap. We then subjected the impact-specific areas with the most reported results to a meta-analysis, which examined overall positive and negative perceived impacts for users and nonusers. Results: We selected 19 papers for the review. We found that most impact-specific areas corresponded to the micro level of the framework and that appropriateness or effectiveness and efficiency were well addressed through surveys. However, other areas such as access, which includes patient and caregiver participation and their ability to access services, had very few metrics. We selected 7 impact-specific areas for meta-analysis: security and privacy; quality of patient care or clinical outcomes; patient--physician relationship and communication; communication with other providers; accessibility of records and information; business or practice efficiency; and costs or savings. All the results for accessibility of records and information and for communication with providers indicated a positive view. The area with the most mixed results was security and privacy. Conclusions: Users sometimes were likelier than nonusers to have a positive view of the selected areas. However, when looking at the two groups separately, we often found more positive views for most of the examined areas regardless of use status. Despite limitations of a small number of papers and their heterogeneity, the results of this review are promising in terms of finding positive perceptions of EMR adoption for users and nonusers. In addition, we identified issues related to survey-based research for HIS evaluation, particularly regarding constructs for evaluation and quality of study design and reporting. ", doi="10.2196/ijmr.2113", url="http://www.jmir.org/2012/2/e3/", url="http://www.ncbi.nlm.nih.gov/pubmed/23611832" } @Article{info:doi/10.2196/jmir.1539, author="Ketchum, M. Andrea and Saleh, A. Ahlam and Jeong, Kwonho", title="Type of Evidence Behind Point-of-Care Clinical Information Products: A Bibliometric Analysis", journal="J Med Internet Res", year="2011", month="Feb", day="18", volume="13", number="1", pages="e21", keywords="Databases, Factual", keywords="Bibliometrics", keywords="Medical Informatics", keywords="Evidence-based Medicine", abstract="Background: Point-of-care (POC) products are widely used as information reference tools in the clinical setting. Although usability, scope of coverage, ability to answer clinical questions, and impact on health outcomes have been studied, no comparative analysis of the characteristics of the references, the evidence for the content, in POC products is available. Objective: The objective of this study was to compare the type of evidence behind five POC clinical information products. Methods: This study is a comparative bibliometric analysis of references cited in monographs in POC products. Five commonly used products served as subjects for the study: ACP PIER, Clinical Evidence, DynaMed, FirstCONSULT, and UpToDate. The four clinical topics examined to identify content in the products were asthma, hypertension, hyperlipidemia, and carbon monoxide poisoning. Four indicators were measured: distribution of citations, type of evidence, product currency, and citation overlap. The type of evidence was determined based primarily on the publication type found in the MEDLINE bibliographic record, as well as the Medical Subject Headings (MeSH), both assigned by the US National Library of Medicine. MeSH is the controlled vocabulary used for indexing articles in MEDLINE/PubMed. Results: FirstCONSULT had the greatest proportion of references with higher levels of evidence publication types such as systematic review and randomized controlled trial (137/153, 89.5\%), although it contained the lowest total number of references (153/2330, 6.6\%). DynaMed had the largest total number of references (1131/2330, 48.5\%) and the largest proportion of current (2007-2009) references (170/1131, 15\%). The distribution of references cited for each topic varied between products. For example, asthma had the most references listed in DynaMed, Clinical Evidence, and FirstCONSULT, while hypertension had the most references in UpToDate and ACP PIER. An unexpected finding was that the rate of citation overlap was less than 1\% for each topic across all five products. Conclusions: Differences between POC products are revealed by examining the references cited in the monographs themselves. Citation analysis extended to include key content indicators can be used to compare the evidence levels of the literature supporting the content found in POC products. ", doi="10.2196/jmir.1539", url="http://www.jmir.org/2011/1/e21/" }