Status
Conditions
Treatments
About
The goal of this observational study is to evaluate the decision-making consistency between large language models (LLMs) and expert multidisciplinary teams (MDTs) in adult patients diagnosed with colorectal cancer who underwent MDT consultation between January 2023 and December 2024.
The main questions it aims to answer are:
How consistent are the treatment decisions generated by LLMs compared to actual MDT decisions? Do different LLMs (e.g., ChatGPT, DeepSeek) show varying levels of agreement with expert recommendations? What clinical factors contribute to differences between AI-generated and human expert decisions? Researchers will compare the AI-generated treatment recommendations with real-world MDT decisions using anonymized patient records to see if LLMs can reliably support clinical decision-making in oncology.
Participants will:
Have their de-identified clinical data (e.g., imaging, pathology, MDT notes) processed through several LLMs Not be contacted or receive any interventions, as this is a retrospective study using existing clinical records only.
Full description
This is a retrospective, non-interventional observational study aiming to evaluate the consistency between treatment decisions made by large language models (LLMs) and multidisciplinary team (MDT) experts in the management of colorectal cancer (CRC).
Colorectal cancer is a highly heterogeneous malignancy requiring personalized treatment strategies, often developed through MDT discussions that integrate input from surgery, oncology, radiology, pathology, and other specialties. While MDTs improve treatment planning and outcomes, they are time- and resource-intensive, and subject to variability in expert judgment. With the rise of artificial intelligence, especially LLMs such as ChatGPT and DeepSeek, there is growing interest in their potential role in assisting or standardizing clinical decision-making.
In this study, researchers will retrospectively analyze de-identified clinical records of approximately 1,500 patients with histologically confirmed colorectal cancer who underwent MDT consultation at a tertiary cancer center between January 2023 and December 2024. Key clinical data-including demographic information, imaging reports (CT, MRI), endoscopy results, pathology findings, and MDT recommendations-will be extracted and anonymized.
These de-identified records will be input into several LLMs (ChatGPT, DeepSeek, Baichuan, and Qwen) running on secure offline servers. The models will be asked to generate treatment recommendations, which will be categorized into predefined decision codes (e.g., surgery, systemic therapy, chemoradiotherapy, further diagnostics). Each case will be input three times to assess the consistency of the model output.
The primary outcome is the agreement between AI-generated recommendations and original MDT decisions, quantified using Cohen's Kappa. Secondary analyses include comparison among LLMs using chi-squared tests, evaluation of output consistency via Fleiss' Kappa, and identification of clinical factors associated with discordant decisions.
This study does not involve any direct patient contact, intervention, or new clinical procedures. All data are historical and anonymized in accordance with ethical and legal requirements. The results are expected to inform the potential value, limitations, and appropriate use of AI in supporting multidisciplinary decision-making in oncology.
Enrollment
Sex
Volunteers
Inclusion criteria
Exclusion criteria
Loading...
Central trial contact
Yongjiu Chen, PhD
Data sourced from clinicaltrials.gov
Clinical trials
Research sites
Resources
Legal