Real World Data - Variability

8th May 2025

Analysis of Real-World Data (RWD) with a Focus on Population Variance

One of the key features of real-world data (RWD) is its ability to capture population diversity and variability, providing a more comprehensive and accurate representation of health experiences and outcomes across different segments of the population. Population variance refers to the dispersion or variability within a population in terms of factors such as age, gender, ethnicity, pre-existing medical conditions, access to healthcare, genetic and molecular profiles, and socioeconomic status. This variability is crucial for understanding how different subgroups respond to treatments, how diseases manifest, and which risk factors are relevant in each context.

Today, as is widely recognized, data generated by healthcare systems around the world are already yielding significant insights and revelations, with an exponential increase in studies applying sophisticated statistical methods and techniques from machine learning (ML) and deep learning (DL). During the pandemic, we witnessed the direct application of these data in the evaluation of healthcare interventions across diverse populations, enabling the formulation of equitable health policies. By identifying patterns and subgroups within population data and analyzing their variance, researchers can design personalized treatments that optimize outcomes for different types of patients—a field known as precision medicine.

For example, the monitoring of the safety and effectiveness of drugs and treatments using RWD has become a major area of exploration. The objective is to enable continuous tracking of side effects and treatment efficacy across populations, identifying risks that may not have been detected in clinical trials or in subpopulations that respond differently, due to the limited representativeness of these groups in typical study samples.

(mental health, wellness, fitness, nutrition and supplements, remote monitoring)
(POC testing, lab testing, diagnostic technologies, decision support, population health)
(telehealth, home care, primary treatments, specialized treatments, hospitals)
(rehabilitation, social care, chronic care, elder care)
(training and certification, health and safety)
(health records, practice management, scheduling and referrals, health analytics)
 (wearables, medical devices, medical equipment, medical imaging, medical robotics)
(health benefits, corporate wellness, health insurance, health asset financing, healthcare real estate)
(drug manufacturing, drug commercialization, healthcare logistics, pharmacies)
(discoveries, clinical trials, clinical insights, precision medicine, genomics)
A True Healthcare System Requires Evidence of What Works for Each Patient Population—Not Just Those Represented in Randomized Trials
Clinical trials are extremely valuable and are often necessary, as they are designed to provide a fundamental requirement in the pre-market phase of a medical product or device: robust evidence of its efficacy. However, these types of studies can be very expensive and complex to carry out, and the internal validity achieved in clinical trials is often obtained at the expense of uncertainty regarding generalizability. This is largely because the populations enrolled in such trials may, and often do, differ significantly from those observed in real-world practice.

A major limitation of traditional clinical trials lies in the tendency to select patients based on strict inclusion and exclusion criteria to reduce heterogeneity and control external variables. While this strengthens the internal validity of the study, it simultaneously restricts the generalizability of results to more diverse populations and can even introduce bias.  

Let us revisit the case of abacavir (Ziagen), a drug used to treat HIV. The identification of a genetic biomarker enabled the pre-selection of patients at high risk of developing hypersensitivity reactions to the drug. Recognizing this small subgroup allowed the majority of patients to continue benefiting from the therapy. Clinical trials are generally not designed to detect such specific subpopulations—but this is possible through the use of real-world data (RWD).

In light of these trends, many researchers involved in clinical trials, as well as medical product developers, have become increasingly interested in expanding and integrating clinical research into more diverse and representative settings by leveraging the exponential growth and accessibility of RWD.

The key to understanding the value of real-world evidence (RWE) lies in recognizing its potential to complement the knowledge derived from traditional clinical trials—trials that, due to well-known limitations, struggle to generalize findings to broader and more inclusive patient populations.

Recently, the United States Food and Drug Administration (FDA) developed a comprehensive framework and established protocols aimed at promoting the use of RWD and RWE in regulatory decision-making, particularly in the approval processes for drugs and medical devices. This development marks the beginning of a promising integration between traditional clinical research and real-world data-driven approaches.
The Diversity of Real-World Data Enhances the Quality of Predictive Models in Healthcare
The high variance found in real-world data (RWD) enables the identification of how treatments and diseases affect specific subgroups within the population. For instance, it can reveal differences in a drug’s efficacy between older and younger patients, or across various ethnic groups. This is essential for advancing personalized medicine and addressing healthcare disparities.

In this context, machine learning algorithms trained on diverse datasets are better equipped to generalize and produce more accurate predictions for broad populations. Because RWD reflects the diversity of real-world conditions, its analysis increases the external validity of studies. In other words, the outcomes derived from such analyses are more applicable to the general population, in contrast to the findings obtained from narrowly controlled clinical trials.
Bias in Clinical Trials

Randomized controlled trials (RCTs), which assign participants randomly to different treatment groups, are designed to answer specific health-related questions, particularly within the pharmaceutical industry and academic research. This comparative and systematic method has been in use since 1747, when a physician tested the effectiveness of citrus-derived vitamin C in treating scurvy—a disease that afflicted sailors. RCTs remain a traditional and well-established method that generates highly valuable data. In the healthcare industry, they are primarily used to obtain regulatory approval for investigational drugs or medical devices.

Historically, these studies have not incorporated real-world information, which is why data collected after the commercialization of a product has been essential to understanding its actual impact on target populations.

To control as many variables as possible, clinical trial eligibility criteria are typically strict. Consequently, individuals with vulnerable characteristics—such as preexisting health conditions (comorbidities), those concurrently taking other medications, or those at age-related risk—are often excluded. Poor external validity in these trials can result in inadequate evidence to guide real-world clinical decision-making.

When vulnerable populations are largely excluded from RCTs, yet are frequently prescribed the same medications in clinical practice, significant information gaps may arise in treatment decision-making. Many RCTs focus on individuals with a single condition, applying stringent eligibility criteria that systematically exclude:
🔹Individuals at higher risk of adverse drug reactions due to vulnerability
🔹Older adults
🔹Adolescents
🔹Individuals with coexisting health conditions
🔹People with multimorbidity
🔹Patients taking multiple medications simultaneously (polypharmacy)

Importantly, multimorbidity and polypharmacy are increasingly common even in individuals under the age of 65.

Adolescents and older adults are largely excluded from RCTs, yet together they make up approximately 50% of the real-world population

Overly strict exclusion criteria in randomized clinical trials (RCTs) often fail to account for the heterogeneity found in real-world populations, including individuals with high-risk family histories, multimorbidity, polypharmacy, and limited access to healthcare, among other factors. In many cases, real-world data (RWD) are the only available source of information on treatment outcomes for these populations.

Both insurance providers and pharmaceutical companies have increasingly turned to RWD in recent years to monitor the effectiveness of newly marketed drugs and devices, some of which are covered under risk-sharing models. Real-world evidence is also used to support treatment formulary positioning—particularly for second- and third-line therapies—by documenting realistic long-term benefits that are rarely studied in RCTs. By leveraging RWD, researchers can assess the effectiveness of treatments while accounting for additional variables such as comorbidities, demographics, and age groups.

In general, exclusion criteria in RCTs are based on three key vulnerabilities: multimorbidity, polypharmacy, and age (adolescents and older adults). A study published in The Lancet Healthy Longevity in 2022 reported exclusion rates above 50% for adolescents using multiple medications, with similar figures observed for adults over 80. Multimorbidity was associated with a mean exclusion rate of 91.1% (IQR 88.9–91.8), and coexisting medication use had an exclusion rate of 52.5% (IQR 50.0–53.7). The study also identified a significant information gap for individuals with cardiovascular and psychiatric conditions, which poses a major limitation when trying to generalize findings to these populations.

The exclusion of adolescents from clinical trials—an exclusion rate even higher than that of older adults—has led to widespread off-label prescribing. Physicians, patients, and insurers often assume that treatments will be equally safe and effective for vulnerable groups in real-world settings as they are for healthier patients in controlled trial environments. However, this assumption does not always hold. Consider, for example, the reduced first-pass metabolism observed with aging. Oral drug dosages often require adjustment for older adults because the drug concentration may be diminished before it reaches systemic circulation. Similarly, individuals with coexisting conditions (multimorbidity), those on chronic polypharmacy, and occasional concurrent users of medications are frequently excluded from trials.

The prevalence of multimorbidity is 67.7%, a significant proportion; the prevalence of polypharmacy is 62.5%; and occasional concurrent medication use reaches 98.5%. These exclusions likely contribute to poor external validity in RCTs and a scarcity of evidence-based medicine that accurately quantifies treatment risks and benefits in underrepresented populations—particularly in the context of coexisting diseases and drug interactions.

Certain medical specialties have reported bias in the representativeness of their patient populations. This is especially true for individuals with cardiovascular comorbidities, psychiatric disorders, or otolaryngological conditions, for whom little treatment decision data is available. For instance, assessing the impact of anticoagulant use in a patient with both atrial fibrillation and liver cirrhosis is particularly challenging, since most patients with liver disease are underrepresented or excluded from cardiovascular RCTs.

One of the first large-scale real-world randomized controlled trials was the "Salk" field trial for the polio vaccine, which enrolled 750,000 children who were randomly assigned to receive either the vaccine or a placebo (control group). An additional 1 million children were assigned to a non-randomized control group, all of whom received the vaccine.

Real-world evidence also holds tremendous value in observational settings. It is currently used to generate hypotheses for future clinical or observational studies; assess the generalizability of clinical trial findings; monitor the safety of drugs and medical devices; analyze shifts in therapeutic usage patterns; and measure and implement quality in healthcare delivery. Regulatory bodies have accepted prospective single-arm trials with external controls (observational data) and high-quality data collection for device evaluation purposes. One such case is a ventricular assist system that obtained approval using propensity score-matched controls from the Interagency Registry for Mechanically Assisted Circulatory Support (INTERMACS).

Integrating real-world data and evidence offers a robust solution to complement randomized clinical trials, expanding target populations, minimizing bias, and reducing costs.

Of course, not everything that glitters is gold, or in cases like this, it just needs polishing.
The use of RWD also brings challenges
While population variance adds significant value, it also introduces considerable complexity in the analysis. Factors such as bias in data collection, the unequal quality of sources, and confounding due to uncontrolled variables must be addressed with robust statistical methodologies and carefully designed study frameworks. Data from electronic medical records, administrative records, and insurers are not collected or organized with the intent of supporting research and development, nor have they been optimized for those purposes. Therefore, they must be cleaned and structured.

Additionally, there is the challenge of lacking information on the accuracy and credibility of data obtained from various personal devices and health-related apps.
The sample size influences the accuracy of causal inference, cause-and-effect analysis,
but the study design is crucial
It is correct to say that a larger sample size helps reduce certain challenges in causal inference, that is, it facilitates the analysis of datasets to establish cause-and-effect relationships. This is especially true in terms of the precision of estimates and reducing random error. However, it is also correct to mention that it does not completely resolve problems like confounding, selection bias, or unobserved variables. These issues require additional methodological approaches to complement the increase in sample size, such as adjusting for confounding variables, designing appropriate statistical studies, and using advanced statistical techniques. Therefore, while a larger sample size is essential (RWD data), the quality of the study design and the statistical analysis method remain critical for obtaining valid and reliable causal inferences.

Regardless of the area in which a RWD project is focused, causal inference, prediction, or classification, the representativeness of the data of the population to which the study’s conclusions will be generalized is fundamental. Otherwise, the estimate or prediction can be misleading or even harmful. Thus, to provide valid data, the registry from which they are taken must cover the majority (at least 95%) of the defined population, report relevant parameters with good quality, and have close and complete follow-up.

Real-world data provides the representativeness that the future biotechnology research and development needs.

Author: Valeria Analia Dávila.

Literature:
Tan, Y. Y., Papez, V., Chang, W. H., Mueller, S. H., Denaxas, S., & Lai, A. G. (2022). Comparing clinical trial population representativeness to real-world populations: an external validity analysis encompassing 43 895 trials and 5 685 738 individuals across 989 unique drugs and 286 conditions in England. The Lancet Healthy Longevity, 3(10), e674-e689.
Rosa, J. M., & Frutos, E. L. (2022). Ciencia de datos en salud: desafíos y oportunidades en América Latina. Revista Médica Clínica Las Condes, 33(6), 591-597.
Liu, F., & Panagiotakos, D. (2022). Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Medical Research Methodology, 22(1), 287.
Sharma, R., & Kshetri, N. (2020). Digital healthcare: Historical development, applications, and future research directions. International Journal of Information Management, 53, 102105.