The role of synthetic data in healthcare
Trust is key to unlock the full potential for effective, secure and fair use of synthetic data in healthcare.
Healthcare systems around the world face increasing patient volumes and a dramatic rise in health challenges.
As healthcare costs continue to rise, governments, regulators, and providers are increasingly challenged to innovate and transform healthcare delivery models. The integration of AI technologies is becoming an ever more attractive solution.
AI implementation at high speed in society puts pressure on implementation in the health sector to improve patient care and reduce costs.
This creates a dual pressure—accelerating technology integration while managing compliance—adding complexity to the implementation process.
Synthetic data is artificially generated to resemble real-world data. It is either structured (quantitative, tabular) or unstructured (images, text, video), and has emerged as a powerful solution to address data access challenges in the healthcare sector.
How can synthetic data improve AI healthcare?
The use of AI-generated synthetic data supports use cases where data is scarce or accessing it costly. It can potentially enhance healthcare across various domains, such as clinical trials and drug development, patient care and diagnostics, personalized medicine, medical imaging, public health planning, disease outbreak prediction, resource allocation, policy impact assessment, virtual reality simulations, and academic research.
Potential benefits but huge quality assurance gaps
Despite its numerous potential benefits, ensuring its effective, secure, and fair use requires several critical steps as outlined further down.
One of the primary challenges to implement AI in the health sector is the availability of high-quality data. Effective AI systems require vast amounts of accurate and complete, labelled data, which is often not available due to high costs and privacy concerns and regulatory constraints.
Synthetic datasets can be generated to represent diverse patient populations and enriching real-world data with previously unobserved cases with a lower privacy risk, thus addressing the challenge of limited access to high-quality data for validation purposes.
It can augment machine learning algorithms and improve public health models by providing a rich source of information that reflects real-world scenarios without compromising patient confidentiality, as it preserves the statistical properties of the real data it replicates. Deep generative models have the ability to learn the underlying statistical patterns of a dataset and thereby maintain the interrelationships between data points and capture underlying medical information. Synthetic data can simulate electronic health records in various formats, creating a complete patient journey that helps improve the quality of care and inform the development of new therapeutics.
Using synthetic data rather than real-world data can protect patient confidentiality, as it offers a low value target, minimalizing risk and impact of breaches. This is particularly useful when sharing data with third parties. This in turn can improve data availability by alleviating concerns about leaking original data, making health care providers and health practitioners more willing and able to share their data for research and other purposes.
The creation of new datasets that address gaps or biases in existing data, can make AI models produce more accurate and reliable analyses.
Using synthetic data with confidence and DNV’s role
Despite its potential, synthetic data in healthcare is not yet mature due to several challenges. The lack of standardized quality assessment metrics, potential for bias amplification, and concerns about data privacy versus realism and accuracy limit its widespread adoption and reliability for critical healthcare applications.
This is the datascape DNV invests in, particularly through several research initiatives, including participation in the EU research project consortium SYNTHIA and a Ph.D. project on Synthetic Data. The research focuses on how synthetic data can play a role in safe uptake of AI in the healthcare sector through quality assurance and regulatory processes.