One of the key features of real-world data (RWD) is its ability to capture population diversity and variability, providing a more comprehensive and accurate representation of health experiences and outcomes across different segments of the population. Population variance refers to the dispersion or variability within a population in terms of factors such as age, gender, ethnicity, pre-existing medical conditions, access to healthcare, genetic and molecular profiles, and socioeconomic status. This variability is crucial for understanding how different subgroups respond to treatments, how diseases manifest, and which risk factors are relevant in each context.
Today, as is widely recognized, data generated by healthcare systems around the world are already yielding significant insights and revelations, with an exponential increase in studies applying sophisticated statistical methods and techniques from machine learning (ML) and deep learning (DL). During the pandemic, we witnessed the direct application of these data in the evaluation of healthcare interventions across diverse populations, enabling the formulation of equitable health policies. By identifying patterns and subgroups within population data and analyzing their variance, researchers can design personalized treatments that optimize outcomes for different types of patients—a field known as precision medicine.
For example, the monitoring of the safety and effectiveness of drugs and treatments using RWD has become a major area of exploration. The objective is to enable continuous tracking of side effects and treatment efficacy across populations, identifying risks that may not have been detected in clinical trials or in subpopulations that respond differently, due to the limited representativeness of these groups in typical study samples.
Randomized controlled trials (RCTs), which assign participants randomly to different treatment groups, are designed to answer specific health-related questions, particularly within the pharmaceutical industry and academic research. This comparative and systematic method has been in use since 1747, when a physician tested the effectiveness of citrus-derived vitamin C in treating scurvy—a disease that afflicted sailors. RCTs remain a traditional and well-established method that generates highly valuable data. In the healthcare industry, they are primarily used to obtain regulatory approval for investigational drugs or medical devices.
Historically, these studies have not incorporated real-world information, which is why data collected after the commercialization of a product has been essential to understanding its actual impact on target populations.
To control as many variables as possible, clinical trial eligibility criteria are typically strict. Consequently, individuals with vulnerable characteristics—such as preexisting health conditions (comorbidities), those concurrently taking other medications, or those at age-related risk—are often excluded. Poor external validity in these trials can result in inadequate evidence to guide real-world clinical decision-making.
When vulnerable populations are largely excluded from RCTs, yet are frequently prescribed the same medications in clinical practice, significant information gaps may arise in treatment decision-making. Many RCTs focus on individuals with a single condition, applying stringent eligibility criteria that systematically exclude:
🔹Individuals at higher risk of adverse drug reactions due to vulnerability
🔹Older adults
🔹Adolescents
🔹Individuals with coexisting health conditions
🔹People with multimorbidity
🔹Patients taking multiple medications simultaneously (polypharmacy)
Importantly, multimorbidity and polypharmacy are increasingly common even in individuals under the age of 65.
Overly strict exclusion criteria in randomized clinical trials (RCTs) often fail to account for the heterogeneity found in real-world populations, including individuals with high-risk family histories, multimorbidity, polypharmacy, and limited access to healthcare, among other factors. In many cases, real-world data (RWD) are the only available source of information on treatment outcomes for these populations.
Both insurance providers and pharmaceutical companies have increasingly turned to RWD in recent years to monitor the effectiveness of newly marketed drugs and devices, some of which are covered under risk-sharing models. Real-world evidence is also used to support treatment formulary positioning—particularly for second- and third-line therapies—by documenting realistic long-term benefits that are rarely studied in RCTs. By leveraging RWD, researchers can assess the effectiveness of treatments while accounting for additional variables such as comorbidities, demographics, and age groups.
In general, exclusion criteria in RCTs are based on three key vulnerabilities: multimorbidity, polypharmacy, and age (adolescents and older adults). A study published in The Lancet Healthy Longevity in 2022 reported exclusion rates above 50% for adolescents using multiple medications, with similar figures observed for adults over 80. Multimorbidity was associated with a mean exclusion rate of 91.1% (IQR 88.9–91.8), and coexisting medication use had an exclusion rate of 52.5% (IQR 50.0–53.7). The study also identified a significant information gap for individuals with cardiovascular and psychiatric conditions, which poses a major limitation when trying to generalize findings to these populations.
The exclusion of adolescents from clinical trials—an exclusion rate even higher than that of older adults—has led to widespread off-label prescribing. Physicians, patients, and insurers often assume that treatments will be equally safe and effective for vulnerable groups in real-world settings as they are for healthier patients in controlled trial environments. However, this assumption does not always hold. Consider, for example, the reduced first-pass metabolism observed with aging. Oral drug dosages often require adjustment for older adults because the drug concentration may be diminished before it reaches systemic circulation. Similarly, individuals with coexisting conditions (multimorbidity), those on chronic polypharmacy, and occasional concurrent users of medications are frequently excluded from trials.
The prevalence of multimorbidity is 67.7%, a significant proportion; the prevalence of polypharmacy is 62.5%; and occasional concurrent medication use reaches 98.5%. These exclusions likely contribute to poor external validity in RCTs and a scarcity of evidence-based medicine that accurately quantifies treatment risks and benefits in underrepresented populations—particularly in the context of coexisting diseases and drug interactions.
Certain medical specialties have reported bias in the representativeness of their patient populations. This is especially true for individuals with cardiovascular comorbidities, psychiatric disorders, or otolaryngological conditions, for whom little treatment decision data is available. For instance, assessing the impact of anticoagulant use in a patient with both atrial fibrillation and liver cirrhosis is particularly challenging, since most patients with liver disease are underrepresented or excluded from cardiovascular RCTs.
One of the first large-scale real-world randomized controlled trials was the "Salk" field trial for the polio vaccine, which enrolled 750,000 children who were randomly assigned to receive either the vaccine or a placebo (control group). An additional 1 million children were assigned to a non-randomized control group, all of whom received the vaccine.
Real-world evidence also holds tremendous value in observational settings. It is currently used to generate hypotheses for future clinical or observational studies; assess the generalizability of clinical trial findings; monitor the safety of drugs and medical devices; analyze shifts in therapeutic usage patterns; and measure and implement quality in healthcare delivery. Regulatory bodies have accepted prospective single-arm trials with external controls (observational data) and high-quality data collection for device evaluation purposes. One such case is a ventricular assist system that obtained approval using propensity score-matched controls from the Interagency Registry for Mechanically Assisted Circulatory Support (INTERMACS).
Integrating real-world data and evidence offers a robust solution to complement randomized clinical trials, expanding target populations, minimizing bias, and reducing costs.
Author: Valeria Analia Dávila.
Literature:
Tan, Y. Y., Papez, V., Chang, W. H., Mueller, S. H., Denaxas, S., & Lai, A. G. (2022). Comparing clinical trial population representativeness to real-world populations: an external validity analysis encompassing 43 895 trials and 5 685 738 individuals across 989 unique drugs and 286 conditions in England. The Lancet Healthy Longevity, 3(10), e674-e689.
Rosa, J. M., & Frutos, E. L. (2022). Ciencia de datos en salud: desafíos y oportunidades en América Latina. Revista Médica Clínica Las Condes, 33(6), 591-597.
Liu, F., & Panagiotakos, D. (2022). Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Medical Research Methodology, 22(1), 287.
Sharma, R., & Kshetri, N. (2020). Digital healthcare: Historical development, applications, and future research directions. International Journal of Information Management, 53, 102105.