UCLA study shows accuracy of genetically based disease predictions varies from individual-to-individual
Polygenic scores – estimates of an individual’s predisposition for complex traits and diseases – hold promise for identifying patients at risk of disease and guiding early, personalized treatments, but UCLA experts found the scores fail to account for the wide range of genetic diversity across individuals in all ancestries.
“Polygenic scores can estimate the likelihood of an individual having a certain trait by pulling together and analyzing the small effects of thousands to millions of common genetic variants into a single score, but their performance among individuals from diverse genetic backgrounds is limited,” said Bogdan Pasaniuc, PhD, a UCLA Health expert in statistical and computational methods for understanding genetic risk factors for common diseases.
The researchers’ analysis, published in Nature, shows that the accuracy of polygenic scores (PGSs) varies between individuals across a continuum of genetic ancestry – and this is true even in populations that are traditionally considered as ‘homogeneous,’ (e.g., Europeans) said Pasaniuc, the paper’s senior author.
Assessing PGS performance has commonly been done at the “population” level, such as in “Europeans,” clumping individuals of similar ancestries in a genetic-ancestry cluster, the authors said.
“Imposing artificial boundaries onto this continuum and ignoring the diversity, or ‘heterogeneity,’ within clusters can obscure variation within a group, conceal the similarities that may exist in individuals in different groups, and leave out individuals who do not fit neatly into a particular genetic ancestry,” said Yi Ding, a graduate student in bioinformatics at UCLA, a member of the Pasaniuc Lab, and the paper’s first author.
To provide a more precise estimate of PGS accuracy, the researchers developed a method to evaluate PGS accuracy at the individual level. To test it, they applied PGSs for 84 complex traits to data from more than 35,000 individuals in the UCLA ATLAS Precision Health Biobank, one of the most diverse biobanks in the world, in part because the Los Angeles area is home to one of the most ancestrally diverse populations globally.
The new tool’s “training” data came from a subset of individuals in the UK Biobank in the United Kingdom. As a substitute for discrete genetic ancestries, a continuous metric of “genetic distance” was used to establish the position of each individual in the ATLAS database on the genetic-ancestry continuum, essentially showing how similar or dissimilar a target (ATLAS) individual’s genome was to that from the UK training population.
“We found that the more dissimilar – or genetically ‘distant’ – a target individual’s genome was from the UK Biobank training data, the lower the accuracy of the PGS,” Ding said.
The accuracy of PGSs declined as genetic distance became greater even when the researchers looked specifically at genetic-ancestry groupings that have been considered homogeneous, such as among individuals of European genetic ancestries. Conversely, some individuals not identified with European ancestry could have higher levels of genetic similarity, showing that PGS performance could differ between two individuals from the same ancestry but be comparable for two people from different ancestries – depending on their genetic similarity.
“Our genetic-distance metric outperformed discrete clustering in identifying individuals who could benefit from PGSs,” said Pasaniuc, a researcher at the David Geffen School of Medicine at UCLA and the UCLA Health Institute for Precision Health.
The research team identified several factors – subjects for ongoing and future studies – that could impact PGS accuracy and usefulness, especially in people with “admixed” ancestries. These are usually defined as individuals with recent ancestry from two or more continental sources – such as African Americans and Latinos.
Pasaniuc, whose research focuses on improving genetic risk factor predictions for people with admixed ancestry, said these individuals have “mosaic” genomes, with segments of different continental ancestries at every region. With different portions contributed by different ancestries, it is extremely difficult to accurately classify these individuals using conventional ancestry labels.
“For PGSs to be equitably used,” he said, “the assessment of PGS accuracy should account for the full spectrum of genetic diversity.”
Authors Pasaniuc and Ding are corresponding authors. Additional authors from UCLA and the David Geffen School of Medicine at UCLA include: Kangcheng Hou, Ziqi Xu, Aditya Pimplaskar, Ella Petter, Kristin Boulier, and Loes M. Olde Loohuis. Additional authors: Florian Privé of Aarhus University, Denmark; Bjarni J. Vilhjálmsson of Aarhus University, Denmark, and Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA.
Funding Resources provided by the Institute for Precision Health and participating patients from the UCLA ATLAS Community Health Initiative. The UCLA ATLAS Community Health Initiative in collaboration with UCLA ATLAS Precision Health Biobank is a program of the Institute for Precision Health, which directs and supports the biobanking and genotyping of biospecimen samples from participating patients from UCLA in collaboration with the David Geffen School of Medicine, UCLA Clinical and Translational Science Institute and UCLA Health. The ATLAS Community Health Initiative is supported by UCLA Health, the David Geffen School of Medicine and a grant from the UCLA Clinical and Translational Science Institute (UL1TR001881). This research was conducted using the UKBB resource under application 33297. We thank the participants of UKBB for making this work possible. This work was financially supported in part by National Institutes of Health awards U01HG011715, R01HG009120 and R01MH115676.
Competing interests The authors declare no competing interests.
Article: Ding, Y. et al., Polygenic scoring accuracy varies across the genetic ancestry continuum, Nature. DOI: 10.1038/s41586-023-06079-4