Evaluating AI and Human Expert Decisions in Colorectal Cancer

Peking University Cancer Hospital & Institute

Status

Not yet enrolling

Conditions

Colorectal Cancer

Treatments

Other: LLM-MDT

Study type

Observational

Funder types

Other

Identifiers

NCT07045207

PekingUCHI2025YJZ57

Details and patient eligibility

About

The goal of this observational study is to evaluate the decision-making consistency between large language models (LLMs) and expert multidisciplinary teams (MDTs) in adult patients diagnosed with colorectal cancer who underwent MDT consultation between January 2023 and December 2024.

The main questions it aims to answer are:

How consistent are the treatment decisions generated by LLMs compared to actual MDT decisions? Do different LLMs (e.g., ChatGPT, DeepSeek) show varying levels of agreement with expert recommendations? What clinical factors contribute to differences between AI-generated and human expert decisions? Researchers will compare the AI-generated treatment recommendations with real-world MDT decisions using anonymized patient records to see if LLMs can reliably support clinical decision-making in oncology.

Participants will:

Have their de-identified clinical data (e.g., imaging, pathology, MDT notes) processed through several LLMs Not be contacted or receive any interventions, as this is a retrospective study using existing clinical records only.

Full description

This is a retrospective, non-interventional observational study aiming to evaluate the consistency between treatment decisions made by large language models (LLMs) and multidisciplinary team (MDT) experts in the management of colorectal cancer (CRC).

Colorectal cancer is a highly heterogeneous malignancy requiring personalized treatment strategies, often developed through MDT discussions that integrate input from surgery, oncology, radiology, pathology, and other specialties. While MDTs improve treatment planning and outcomes, they are time- and resource-intensive, and subject to variability in expert judgment. With the rise of artificial intelligence, especially LLMs such as ChatGPT and DeepSeek, there is growing interest in their potential role in assisting or standardizing clinical decision-making.

In this study, researchers will retrospectively analyze de-identified clinical records of approximately 1,500 patients with histologically confirmed colorectal cancer who underwent MDT consultation at a tertiary cancer center between January 2023 and December 2024. Key clinical data-including demographic information, imaging reports (CT, MRI), endoscopy results, pathology findings, and MDT recommendations-will be extracted and anonymized.

These de-identified records will be input into several LLMs (ChatGPT, DeepSeek, Baichuan, and Qwen) running on secure offline servers. The models will be asked to generate treatment recommendations, which will be categorized into predefined decision codes (e.g., surgery, systemic therapy, chemoradiotherapy, further diagnostics). Each case will be input three times to assess the consistency of the model output.

The primary outcome is the agreement between AI-generated recommendations and original MDT decisions, quantified using Cohen's Kappa. Secondary analyses include comparison among LLMs using chi-squared tests, evaluation of output consistency via Fleiss' Kappa, and identification of clinical factors associated with discordant decisions.

This study does not involve any direct patient contact, intervention, or new clinical procedures. All data are historical and anonymized in accordance with ethical and legal requirements. The results are expected to inform the potential value, limitations, and appropriate use of AI in supporting multidisciplinary decision-making in oncology.

Enrollment

1,500 estimated patients

Sex

All

Volunteers

No Healthy Volunteers

Inclusion criteria

Patients with a histologically confirmed diagnosis of colorectal cancer
Patients who received multidisciplinary team (MDT) consultation at Peking University Cancer Hospital between January 1, 2023 and December 31, 2024
Availability of complete clinical records, including(MDT consultation notes, CT or MRI imaging reports, Pathology reports, Outpatient or inpatient medical summaries)

Exclusion criteria

Incomplete or missing medical records related to MDT decision-making
MDT consultations conducted for non-oncologic purposes (e.g., hernia evaluation, stoma planning)
Missing critical clinical data such as imaging or pathology reports
Duplicate or conflicting records that prevent reliable data analysis

Trial contacts and locations

Central trial contact

Yongjiu Chen, PhD

Data sourced from clinicaltrials.gov

Clinical trials

Find clinical trials Trials by location

Research sites

Find research sites Learn about CTV for professionals

Resources

Contact CTV support

Legal

Privacy Notice Terms