Status
Conditions
About
This is a no-profit, retrospective observational study involving real-world data (RWD), retrieved from ADPKD-related electronic health records stored at Mario Negri Institute IRCCS. RWD will be used to generate simulated and synthetic datasets, using AI tools. RWD and generated data (GD) will be used to conduct three virtual RCTs, which main outcome is change in Total Kidney Volume (TKV). Statistical tests will be performed to assess quality and privacy preservation of GD compared with RWD. GD will be also evaluated in exploratory sample size estimations.
Full description
Randomized clinical trials (RCTs) can be regarded as the least biased source of information to address intervention questions. One of the most common problems encountered in clinical trials focused on rare diseases is the difficulty in finding patients and therefore in building trials on sufficiently large population, in order to have more robust data and less methodological distortions. Several stratagems are already in use to deal with these problems, including extended trial duration, repeated outcome measures, patients genetic profiles, surrogate endpoint, multicenter studies. Another approach is to consider other trial designs in addition to parallel-arms design, such as crossover trial, n-of-1 trials, and adaptive trials.
Simulated and synthetic health data can represent new valid approaches to increase the representativeness of the patients, especially in rare diseases field, while reducing costs and time constraints, but also facing the limitations imposed by national and international regulations concerning privacy and data management. Simulation studies are defined as computer experiments that involve creating data by pseudo-random sampling from known probability distributions, based on Monte Carlo method. A promising approach now under development includes synthetic data, defined as artificially generated data with the aim of reproducing the statistical properties of an original dataset, through generative large languages models (LLMs).
Thus, while simulated data rely on known distributions that must be specified in advance, synthetic data are generated by LLMs that learn these distributions from training data, without the need for predefined distributions, offering a significant advantage in flexibility and applicability.
This study aims to find the most suitable tool for generating simulated and synthetic data in rare diseases field, and to compare the fidelity, quality, and privacy preservation of these datasets, derived from real-world ADPKD clinical trial data. Furthermore, a virtual clinical trial will be conducted using these three datasets to assess their validity in replicating real trial outcomes.
Finally, retrieved and generated data will be used to assess new sample size estimations for future clinical trial performed at the Clinical Research Center for Rare Disease "Aldo e Cele Daccò", Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Ranica (BG), Italy.
By using generative AI models, such as Generative Adversarial Networks (GANs), this study aims to overcome challenges related to data poverty and trial design. The results could provide valuable insights into whether synthetic data can be a useful tool for improving clinical trials in rare diseases, making them more efficient and cost-effective.
Enrollment
Sex
Ages
Volunteers
Inclusion criteria
Exclusion criteria
100 participants in 3 patient groups
Loading...
Central trial contact
Tobia Peracchi; Annalisa Perna, Ph.D.
Data sourced from clinicaltrials.gov
Clinical trials
Research sites
Resources
Legal