Performance of an OCR-Prompt-LLM Integrated Workflow for Extracting Multi-dimensional Clinical Data in Ischemic Heart Disease (OPAL-CAD)

National Center for Cardiovascular Diseases

Status

Completed

Conditions

Data Collection

Coronary Artery Disease

Artificial Intelligence (AI)

Treatments

Device: Manual Clinical Data Review

Device: OCR-Prompt-LLM Information Extraction Workflow

Study type

Observational

Funder types

Other

Identifiers

NCT07499830

CAD-LLM-2025-01

Details and patient eligibility

About

This research aims to evaluate a comprehensive AI-driven workflow for both clinical data extraction and diagnostic classification in coronary artery disease (CAD). Leveraging OCR and Large Language Models (LLMs), the system is designed to extract ten key clinical parameters (such as LVEF and lab results) and provide diagnostic subtypes (UA, STEMI, NSTEMI, CCS) directly from unstructured inpatient records. A man-machine comparative trial will be conducted using a test set of 308 patients, where the performance of the LLM-based workflow will be benchmarked against the average diagnostic accuracy and processing time of seven clinical physicians. The findings will provide evidence for the feasibility of using LLMs to enhance clinical data structuring and diagnostic efficiency in cardiology.

Enrollment

308 patients

Sex

All

Ages

18+ years old

Volunteers

Accepts Healthy Volunteers

Inclusion criteria

Patients aged 18 years and older.
Clinical records of patients who were previously enrolled in the AIM-CHD (for the pilot/prompt optimization set) or SMART-CHD (for the internal validation cohort) studies.
Patients diagnosed with, or suspected of having, coronary artery disease (CAD), including subtypes: Unstable Angina (UA), STEMI, NSTEMI, and Chronic Coronary Syndrome (CCS).

Exclusion criteria

Clinical records with severe data fragmentation or missing more than 50% of the key clinical indicators.
Handwritten medical records or low-quality scans that are illegible for Optical Character Recognition (OCR) processing.
Duplicate records or records with conflicting "Gold Standard" labels that cannot be reconciled by the expert committee.

Trial design

308 participants in 3 patient groups

Test Cohort

Description:

This group consists of 50 patient records from the AIM-CHD Study at Fuwai Hospital. These data are specifically utilized for refining OCR processing and optimizing Prompt Engineering for the LLM-based workflow.

Treatment:

Device: OCR-Prompt-LLM Information Extraction Workflow

Device: Manual Clinical Data Review

Internal Validation Cohort

Description:

This cohort includes 188 clinical cases sourced from the SMART-CHD Study at Fuwai Hospital. These records serve as the primary internal benchmark to evaluate the diagnostic and extraction accuracy of the LLM workflow against the established ground truth.

Treatment:

Device: OCR-Prompt-LLM Information Extraction Workflow

Device: Manual Clinical Data Review

External Validation Cohort

Description:

This cohort comprises 70 patient records collected from 8 independent sub-centers (excluding Fuwai Hospital) to assess the generalizability and robustness of the model across diverse clinical environments and different medical record formats.

Treatment:

Device: OCR-Prompt-LLM Information Extraction Workflow

Device: Manual Clinical Data Review

Trial contacts and locations

Data sourced from clinicaltrials.gov

Clinical trials

Find clinical trials Trials by location

Research sites

Find research sites Learn about CTV for professionals

Resources

Contact CTV support

Legal

Privacy Notice Terms