News|Articles|December 7, 2025

Making the Genetic Models Match the Ancestry of Patient Populations

Author(s)Mary Caffrey
Fact checked by: Rose McNulty
Listen
0:00 / 0:00

Key Takeaways

  • Precision medicine models often rely on data from Northern European populations, leading to treatment inequities.
  • Genetic diversity is crucial for understanding disease risk and treatment efficacy, as shown in studies on acute lymphoblastic leukemia.
SHOW MORE

A panel at the American Society of Hematology highlights the urgent need for genetic diversity in cancer studies, addressing disparities in treatment outcomes across different ancestries.

This year at the American Society of Hematology (ASH) Annual Meeting & Exposition, Shella Saint Fleur-Lominy, MD, PhD, of the University of Maryland School of Medicine, presented a study that leverages ECOG-ACRIN cytogenetic and outcomes data from newly diagnosed patients with acute myeloid leukemia (AML), to better understand ancestry-based effects of the survival benefit of NPM1 mutations.1

Saint Fleur-Lominy and her team pieced together their own data set—made necessary by gaps in population registries—and their efforts were featured Saturday in the ASH press program, highlighting a huge inequity in this era of precision medicine:

If models used to make decisions about targeting therapies don’t represent the patients being treated, what can be done to fix this?

Later Saturday morning, a panel of experts at ASH sought to answer that question in the session, “Leveraging Genetic Diversity in Preclinical Discovery to Guide Precision Medicine.” Cochaired by Sophie Zaaijer, PhD, clinical and translation genomics specialist at the University of California, Irving, and Ann-Kathrin Eisfeld, MD, associate professor at The James—The Ohio State University Comprehensive Cancer Center, this session invited 3 researchers to share how basic biomedical research is filing the gaps in the foundational data that drive development, disease modeling, and clinical decision-making.

The problem, Zaaijer explained, is that algorithms used in precision medicine models overwhelming are built on DNA with persons of Northern European descent; an eye-popping report in Nature Genetics estimated that 96% of the data in genome-wide association studies (GWAS) are from these populations.

Despite FDA calls for better representation in clinical trials, in oncology specifically, data sets are not rapidly diversifying: less than 5% of individuals in trials leading to drug approval were Black or African American, despite the fact that they represent 14% of the US population.

Zaaijer was clear on one point: “It's not fixable by AI. People sometimes mention that AI may accelerate,” she said.

Because artificial intelligence (AI) can only learn from the data used to feed algorithms, the key is creating better data—and the session speakers described how that’s painstakingly being done.

First came some basic genetics: although only a few alleles are truly unique to specific population groups (so-called population-private alleles), most alleles are actually shared across populations at different frequencies. “So, when you now focus on just one human population group, you basically reduce the chance of seeing certain alleles when you're testing a drug,” Zaaijer explained.

Human Cell Models and Bone Marrow Organoids

Kellie Machlus, PhD, principal investigator, Megakaryocytes to Platelets Research Group at Boston Children’s Hospital, started by clarifying how the terms race, ethnicity, and ancestry are used in biomedical research. Race and ethnicity are social, cultural, or geopolitical constructs, she said; only ancestry is a measurable biological parameter derived from DNA. “Race remains reported as a surrogate marker for ancestry,” she said, explaining that “Ancestry is the framework that we use to define genetic differences between human populations.”

Machlus’ lab works with bone marrow organoids as a tool for ancestry-informed precision medicine research, which she said meet criteria to faithfully capture the bone marrow microenvironment and recapitulate the myeloid space. These take 18 days to develop; the process includes the release of proplatelet into the vasculature in ways that closely mirror in vivo conditions. An engraftment process allows addition of cells from patients with myelofibrosis, multiple myeloma, or chronic myeloid leukemia, for example.

Then, Machlus said, “We could then treat the organoids with drugs just like you would use in the clinic. So, this is really moving towards the precision medicine approach.”

Although organoids have a relatively short shelf life, she said, “These cells survive a lot longer than they would in liquid culture.”

Although this technology is promising, Machlus said significant validation work remains. “We obviously have to demonstrate that these are predictive of individual response, treatment, efficacy, and toxicity. That's something that's we all have to be sure of or keep working on,” she said.

Genetic Ancestry and Acute Lymphoblastic Leukemia

Adam de Smith, PhD, assistant professor of Population and Public Health Sciences at Keck School of Medicine, University of Southern California, next examined how genetic variation and ancestry contribute to disparities in acute lymphoblastic leukemia (ALL) incidence and outcomes. There are 40% more children of Hispanic/Latino ethnicity diagnosed with ALL than non-Hispanic White children, and de Smith said this elevated risk persists throughout a person’s lifetime.

The increase appears specific to the B-cell immunophenotype, “which points to some underlying biology,” he said. “In addition to the disparities incidence, we also know that there are racial and ethnic disparities in terms of patient outcomes.”

This disparity persists despite advances in treatment, de Smith said, driven by social determinants of health, environmental factors, tumor biology, and genetics—including genetic ancestry.

One factor showing up in evidence is that Hispanic children with more Native American or Indigenous American ancestry have worse survival. “Their genomes comprise genetic ancestry made up from indigenous American, European and African components,” he said.

Noting that Machlus had already explained the differences among race, ethnicity, and genetic ancestry. “We could describe the genetic ancestry proportions of each individual patient, or we can infer that ancestry using sequencing data or SNP genotyping data,” he said, referring to single nucleotide polymorphism, “where each individual's ancestry proportions reflect the fraction of their genetic variation or their SNPs across the genome that have frequencies most similar to those in reference populations.”

An individual might be of African, European, or Indigenous American origin, he said. “These ancestry proportions sum up to 1% or 100%, so genetic ancestry is correlated with race and ethnicity, but it captures more complex population origins and can capture different exposures or factors that may impact disease risk and patient outcomes.”

He explained that those native to Mexico or Peru have higher shares of indigenous American ancestry, compared with individuals native to Puerto Rico, who have larger contributions of European and African ancestry and much less indigenous American ancestry in their genomes.

Research by de Smith shows many ALL risk allele frequencies are skewed toward higher frequencies in Hispanic/Latino populations compared with European populations. He identified 4 variants that stand out: IKZF1, GATA3, ARID5B, and an ERG SNP on chromosome 21.

“Although the risk allele has relatively similar frequencies across the 2 populations, we found that the effect of the ERG SNP on ALL risk appears to be specific to Hispanic and Latino populations,” he said.

By combining these SNPs into a polygenic risk score (PRS) using known GWAS loci, de Smith's team found a significant shift toward higher PRS in Hispanic/Latinos. Even Hispanic/Latino controls had higher scores compared with non-Hispanic White counterparts.

He dug into discoveries around the IKZF1 gene, which has shown 3 independent risk associations in Hispanic/Latino individuals but only 2 amongnon-Hispanic Whites; a novel risk variant for the Hispanic/Latino population is positively linked to Indigenous American ancestry. He then traced the evolutionary history of this variant, showing how his team had analyzed ancient DNA samples from across the Western Hemisphere, with the oldest sample in heterozygous form dating back almost 13,000 years.

“Why did this risk allele become so common in Indigenous American populations?” de Smith asked. He speculates that “population bottlenecks,” that occurred after Asians crossed the Bering Strait to what is now North America led to the development, and it has been tempered in recent centuries by the arrival of those with European and African ancestry.

Such findings have direct impacts on somatic alterations and patient outcomes. Studies have shown that Hispanic/Latino patients with higher Indigenous American ancestry have higher relapse risk, and that certain somatic subtypes like Ph-like (BCR-ABL-like) ALL and CRLF2 rearrangements are more prevalent in Hispanic/Latino patients. Critically, African ancestry and Indigenous American ancestry remained associated with inferior outcomes even after adjusting for molecular subtypes and clinical features. These are the questions that investigators such as Saint Fleur-Lominy seek to answer by going straight to clinical trial data, only to find it is in short supply.

CRISPR Artifacts and New Therapeutic Targets

Jesse Boehm, PhD, associate director of the Broad Lab, examined how gathering more diverse data is relevant to drug discovery. He introduced the Cancer Dependency Map, an international initiative to systematically understand cancer vulnerabilities. The map allows researchers to create cell models and organoids that represent every type of human cancer, subjecting them to genome-wide CRISPR screens, systematic drug testing, and computational mapping to identify relationships between molecular features and therapeutic vulnerabilities.

There’s just one catch.

“No surprise—over 91% of today's preclinical models are derived for patients of European or East Asian ancestry,” Boehm said. “We're clearly missing enormous opportunity for genomic discovery in this area. Now, new projects are getting started to try to begin to fill in some of these gaps.”

One such project, for which Boehm is involved, is the Human Cancer Models Initiative. From over 2500 patients who gave consent, 665 successful long-term models have been developed, with 71 new organoid models from non-European donors. Boehm showed a color-coded chart identifying the ancestry of different models; green, which represented European ancestry, lit up most of the graphic. But bits of orange representing African ancestry were peaking through for key cancers.

“You can see in some of the cohorts, such as breast, colorectal, endometrial cancer, we're beginning to capture more models from patients of non-European descent, and by utilizing the local ancestry at any individual locus, the aggregate of models that we use in our experiments may be useful, even if the majority ancestry in a particular model comes from patients of European or East Asian descent,” he said. “All this information has been released and is now available to the scientific community.”

Boehm's team identified 49 putative ancestry-related dependencies in their CRISPR screens. Initially excited by these findings, their efforts to pursue answers uncovered a huge problem. CRISPR guide RNAs were largely designed based on Eurocentric reference genomes, and when germline SNPs exist in CRISPR targeting regions, the guides cause false negative signals—particularly in cell lines from non-European ancestries.

“We did an evaluation to try to find what were the likely causative SNPs driving the dependency for 33 of them, [and] we found a particular SNP that was above false discovery rate across the whole genome that looked like it was related to the dependency,” Boehm said.

After additional investigation, “We became suspicious that there was likely an artifact in the analysis that was precluding our ability to discover actually true ancestry-related signals. And this is actually what it turned out to be. It turned out that most of these 49 putative ancestry-related dependencies were actually a shocking artifact,” he said.

The team realized that the scope of this mismatch challenge was huge, affecting many important cancer genes, with discrepancies many times worse for patients of recent African descent due to greater genetic diversity compared with reference genomes. Boehm's team took corrective action, developing computational approaches to improve algorithms within the map and prevent this artifact from precluding analysis. This work propelled design of new CRISPR libraries in an ancestry-aware fashion, and the new pangenome reference will help address this issue going forward.

Much work remains. “There are really exciting things that remain to be discovered, when you remove the artifact, not just the few dependencies that I showed there that are related to ancestry,” he said. There were now 17,000 GWAS analyses that needed a fresh look to learn whether that particular germline SNP is related or not to ancestry.

“And so, we did this is,” Boehm said, calling the process “very computationally intensive.” The team found roughly 100 cancer dependencies that are surprisingly associated with a germline variant. He walked the audience through a complex example involving genes MUS81 and EME1, which showed strong dependencies linked to a specific SNP in the GEN1 gene. The relationships within this gene complex involve tumors with 1 copy of the GEN1 risk allele that could lose the second and end up being targeted with MUS81 inhibitors.

When asked how smaller laboratories without access to thousands of cell lines can incorporate ancestry-aware design, the speakers said emphasizing genomic diversity in any setting matters. Machlus said ensuring that healthy control platelet samples come from age-diverse donors rather than just graduate students, for example.

For assessing ancestry with a limited budget, genotyping arrays designed for diverse populations cost approximately $100 per sample and can be analyzed using publicly available reference genome data, de Smith said. The speakers emphasized that ancestry-aware research doesn't require enormous sample sizes—it requires intentional design from the beginning.

And how can this work thrive in a time when just using the word “diversity” can raise flags in the grant process?

“We have to be able to make the case very clearly that this is a scientific imperative. It's not just a moral imperative,” Boehm said.

Reference

  1. Saint Fleur-Lominy S, Chen L, Sun Z, et al. Inferior survival in black AML patients treated with intensive chemotherapy in ECOGACRIN clinical trials is independent of cytogenetic profiles. Presented at: 67th American Society of Hematology Annual Meeting & Exposition, December 6-9, 2025; Orlando, FL. Abstract 290.

Newsletter

Stay ahead of policy, cost, and value—subscribe to AJMC for expert insights at the intersection of clinical care and health economics.


Latest CME

Brand Logo

259 Prospect Plains Rd, Bldg H
Cranbury, NJ 08512

609-716-7777

© 2025 MJH Life Sciences®

All rights reserved.

Secondary Brand Logo