Status
Conditions
Treatments
About
To address the limitations of current AI-based systems that rely on the assumption of a "constant withdrawal speed," this study proposes the integration of the UPD-3 endoscopic positioning system. By using colonoscope withdrawal videos in combination with UPD-3 imaging data as training samples, we aim to develop an AI-powered bowel cleanliness assessment system that incorporates "withdrawal distance" as a weighting factor. This approach is expected to yield a more reliable, objective, and clinically applicable intelligent assessment system that better aligns with real-world clinical practice and endoscopists' operational habits.
Full description
This study developed an intelligent bowel cleanliness assessment system that uses colonoscope withdrawal distance as a weighting factor. The system consists of the following four modules:
Module 1: Exclusion of Unqualified Frames in Colonoscopy Videos
1.1 A total of 20 randomly selected colonoscope withdrawal videos (from 20 different subjects) were retrospectively collected from the Endoscopy Center database of Huadong Hospital between January 2018 and June 2024. Images were extracted at a rate of 5 frames per second. Clear frames suitable for BBPS scoring and unqualified frames (e.g., blurred, under irrigation, with instrument manipulation, images from the small intestine, outside the patient, or chromoendoscopy images) were manually labeled.
1.2 The labeled images were split into training and validation sets at a 7:3 ratio. A Transformer-based AI classification model was trained on the training set and validated on the validation set.
1.3 An additional 10 independent colonoscope withdrawal videos (from 10 different subjects) were retrospectively collected using the same method for image extraction and manual labeling. These served as an external validation set to assess the accuracy of the AI model in classifying qualified vs. unqualified frames, thereby evaluating its clinical applicability.
Module 2: BBPS 0-3 Scoring for Qualified Colonoscopy Images
2.1 Colonoscopy images were randomly collected from the same database between January 2018 and June 2024. Three expert endoscopists (with over 5 years of experience) independently assigned BBPS scores (0-3) to each image. Images with at least two consistent ratings were included. Data collection concluded once each of the four categories (BBPS 0-3) had at least 500 labeled images, sourced from at least 500 different subjects.
2.2 The labeled images were split into training and validation sets at a 7:3 ratio. A CLIP-based AI classification model was trained and validated accordingly.
2.3 Ten additional independent withdrawal videos (from 10 different subjects) were retrospectively collected. Images were extracted at 1 frame per second and manually labeled with BBPS 0-3 scores. These were used as an external validation set to evaluate the model's BBPS classification accuracy and clinical relevance.
2.4 Human-Machine Contest: A set of 120 colonoscopy images (30 for each BBPS score 0-3), labeled by three expert endoscopists, were retrospectively selected (from at least 30 different subjects). The AI system, two junior endoscopists (<5 years of experience), and two expert endoscopists (>5 years of experience) independently assigned BBPS scores to all 120 images. The accuracy of the AI system was compared with that of the junior and expert endoscopists.
Module 3: Prediction of Hepatic and Splenic Flexure Locations
3.1 Forty colonoscope withdrawal videos (from 40 different subjects) containing UPD-3 positioning data were collected. An expert endoscopist used the UPD-3 system and the video to identify and extract 5-15s video clips representing the transition through the hepatic flexure (ascending to transverse colon) and the splenic flexure (transverse to descending colon).
3.2 These 40 videos were split into training and validation sets at a 3:1 ratio. For training, the 5-15s clips representing flexure transitions, along with 5s of video before and after the clip, were used to train a Video-LLaMA-based AI model. In the validation set, the model predicted flexure transition clips, and its consistency with expert annotations was measured. Manual verification of whether predicted clips contained the actual transition process was also performed to compute accuracy.
3.3 Human-Machine Contest: Ten independent colonoscope withdrawal videos (from 10 different subjects) with UPD-3 positioning were collected. The UPD-3 overlay was masked, and the AI system, two junior endoscopists, and two expert endoscopists were asked to extract 10s clips they believed represented transitions through the hepatic and splenic flexures. An expert endoscopist then used the UPD-3 data and the videos to judge whether each 10s clip indeed included the respective transition, comparing the accuracy among the AI system, junior, and expert endoscopists.
Module 4: Real-Time Prediction of Withdrawal Distance
4.1 Dataset Construction: Video segments were randomly sampled from full withdrawal videos. OCR technology was used to extract real-time insertion depth values displayed by the UPD-3 system. The difference in length between the first and last frame was recorded as the ground truth label. Segments were manually screened to exclude invalid clips, ensuring a final dataset of over 2,000 valid segments covering a range of lengths and withdrawal distances.
4.2 Teacher Model Training: The teacher model consists of a 3D feature extractor (3D-GAN) and a video feature extractor (Transformer). The model estimates withdrawal distance using UPD-3 imaging and video clips. The 3D-GAN extracts features from multi-view UPD-3 images, and the Transformer extracts video features. The 3D features guide the training of the Transformer to predict withdrawal distance.
4.3 Student Model Training via Knowledge Distillation: The student model consists of a single Transformer trained using video-only inputs. It learns from the teacher model through knowledge distillation to improve its prediction accuracy without access to 3D data during inference.
4.4 Model Validation: The student model was evaluated using the validation set, using only video input (no 3D data) to better reflect real-world deployment. Accuracy was measured by computing the mean squared error between predicted and actual withdrawal distances.
Enrollment
Sex
Volunteers
Inclusion criteria
Exclusion criteria
700 participants in 4 patient groups
Loading...
Central trial contact
Danian Ji, M.D.; Zhiyu Dong, M.D.
Data sourced from clinicaltrials.gov
Clinical trials
Research sites
Resources
Legal