Status
Conditions
About
This diagnostic study aims to compare the performance of an artificial intelligence (AI) algorithm designed to assist in the interpretation of traumatic bone radiographs (all anatomical regions excluding the thorax) with that of human readers, including emergency medicine and family medicine residents as well as senior physicians (one emergency medicine specialist and one orthopedic surgeon).
The study follows a paired reader study design: identical anonymized radiographic images are independently interpreted by the AI system and by human readers. The reference standard ("gold standard") will be defined by the consensus reading of the two senior physicians. Inter-observer agreement (kappa statistics) between the AI, residents, and senior reference readings will be estimated, and false negatives and false positives will be analyzed by lesion type and anatomical location.
Full description
This diagnostic accuracy study is designed to evaluate and compare the performance of an artificial intelligence (AI) algorithm with that of human readers in the interpretation of traumatic bone radiographs. The study focuses on radiographs of all skeletal anatomical regions except the thorax (e.g., skull, upper limbs, pelvis, and lower limbs) obtained in the context of acute trauma.
Study Design
The investigation will employ a paired reader design, meaning that each radiographic image will be interpreted independently by both the AI system and multiple human readers. This design allows direct, within-case comparison of diagnostic performance between the AI algorithm and human interpreters, minimizing variability related to case mix.
Study Population and Image Selection
Radiographs will be retrospectively collected from the hospital's Picture Archiving and Communication System (PACS). Eligible cases will include all conventional radiographs performed for suspected bone trauma over a defined inclusion period. Images must meet adequate technical quality standards and contain no patient-identifiable information. Exclusion criteria include incomplete imaging studies, thoracic images, and cases lacking definitive follow-up or diagnostic confirmation.
Image Preparation and Anonymization
All selected radiographs will be anonymized and assigned a random identification code. The dataset will be organized by anatomical region and then randomized to prevent recognition bias among readers.
Readers and Reading Procedure
Interpretations will be performed by:
Artificial Intelligence (AI) system: A deep learning algorithm trained to detect bone fractures and other traumatic findings on plain radiographs.
Human readers:
Two groups of resident physicians (emergency medicine and family medicine residents) with varying levels of training.
Two senior physicians, one emergency medicine specialist and one orthopedic surgeon, who will serve as expert readers.
Each image will be read independently by the AI algorithm and by each human reader, without access to clinical information or to the interpretations of others.
Reference Standard (Gold Standard)
The reference standard diagnosis for each radiograph will be established through a consensus review by the two senior physicians. In cases of initial disagreement, consensus will be reached through joint discussion, with review of clinical or additional imaging data if necessary.
Outcome Measures
The primary outcome measure will be the diagnostic accuracy of the AI algorithm compared with human readers for detecting bone fractures and other traumatic lesions, using the expert consensus as the gold standard.
Secondary Outcomes
Inter-observer agreement between the AI system and each human reader will be quantified using Cohen's kappa (κ) statistics.
Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) will be calculated for all readers and the AI.
Error analysis will be performed to characterize false negatives and false positives according to lesion type (e.g., cortical break, avulsion, dislocation) and anatomical location (e.g., wrist, ankle, pelvis).
Subgroup analyses will assess differences in performance by anatomical region and by reader experience level.
Statistical Analysis
Continuous variables will be expressed as means ± standard deviation or medians with interquartile ranges. Categorical data will be summarized as frequencies and percentages. Diagnostic performance metrics will be compared using appropriate statistical tests (e.g., McNemar's test for paired proportions). Confidence intervals (95%) will be calculated for all main estimates. A p-value < 0.05 will be considered statistically significant.
Ethical Considerations
The study will be conducted in accordance with institutional and ethical guidelines for retrospective diagnostic research. Since all radiographs will be anonymized and analyzed retrospectively, informed consent requirements may be waived by the ethics committee.
Enrollment
Sex
Volunteers
Inclusion criteria
Exclusion criteria
500 participants in 2 patient groups
Loading...
Data sourced from clinicaltrials.gov
Clinical trials
Research sites
Resources
Legal