The common story of rare variants..

NGS comes to the aid of researchers to find answers for herited complex traits and diseases, paving a path towards the ‘personalized medicine era’. Whole-exome sequencing (WES) has been deployed for identifying rare variants associated with complex diseases and is providing a philip for further research and insight.

Common variants that were identified through Genome Wide Association Studies (GWASs) have not been able to answer the founding questions to an extent where the research community can identify the traits related to heritability. Moving forward and learning as we do from our experiences, the trail of identification of rare variants is helping us fill the gaps of missing heritability that were unsatiated with these genome wide global efforts.

Cirulli and Goldstein mention in their review (Uncovering the roles of rare variants in common disease through whole-genome sequencing) that common variants are being identified in Mendelian disease studies as having a key role as modifiers of the effects of rarer contributors to disease risk.

Potential frequencies of causal variants in complex traits

At conferences like ASHG in San Francisco last year the buzz words were ‘rare variants are common’. Twitter updates of late from the PAGXII have been talking about the same ‘rare variants’ and this, not just for human populations.

Scoping through literature for common and rare variants has been interesting, there are papers that point out in great depth how the thought process from Common Disease Common Variant (CDCV) moved on to the Common Disease Rare Variant (CDRV). Almost 4 years ago, Schork mentioned in his paper (Common vs. Rare Allele Hypotheses for Complex Diseases) that rare genetic variants (less than 5% frequency) can play key roles in influencing complex disease and traits.

Another interesting paper that I came accross from a decade earlier was from Reich and Lander (Lander of the Gangnam fame) On the allelic spectrum of human disease. In this paper the authors discuss the variation in allelic spectra for common disease genes and point that for some genes predominant disease alleles exist, while for others only a rare set. Their theory revolves around the idea that genes responsible for most of the risk for common diseases (hypertension, heart disease etc) have relatively simple allelic spectra and hence the causal variants for a common disease can be found using GWAS.

In his paper An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People, Nelson talks about rare variants being a result of recent mutations and being clustered geographically to some extent. He also points out that the common variants observe only a small fraction of the genetic diversity in any gene.

Number of Variants/kb of sequence

Moving on a decade (or less) from now, I imagine for 23&Me to expand its base to move on from just the common variants they report for ‘someone’s DNA’ to the rare variants that they could possibly do with specific input on ancestry, population, surnames to guide that route.

From the Whitehead Institute for Biomedical Research in Cambridge comes a paper on Identifying Personal Genomes by Surname Inference. Gymrek talks about how surnames can be recovered from profiling short tandem repeats on the Y chromosome from freely available, publicly accessible internet resources. Though the authors point out that their efforts are more in line to see effective policies being established for data sharing and awareness to the patient regarding participating in genetic studies and not for data sharing to recede.

Another interesting read for the statistically inclined people is SKAT test from the Harvard
Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. This study proposed a test: sequence kernel association test (SKAT) for studying association between a set of rare (and common) variants and continuous/dichotomous phenotypes. Using aggregates of individual score test statistics of SNPs belonging to a set, it computes p-values from the defined set level.

SKAT steps in to take the role of testing for association between variants in a region, surpassing burden tests. The authors note it is unlikely for most rare variants to influence the phenotype with the same magnitude. And as it is more common for variants within a sequenced region to have little or no effect on phenotype, SKAT allows different variants to have different directions and magnitude of effects.

Coming back to the conferences here is a snippet of what some institutes are doing: @ ASHG 2012 Quite a few sessions also revolved around the rare variants and other findings from the NHLBI Exome Sequencing Project. The NHLBI GO ESP is a dataset characterizing multiple samples of richly phenotype populations, making this endeavor a variation of the 1000 human genome project in many ways. Some sessions like these listed in the table below, highlighted the idea of accelerated gene discovery for complex traits using NGS:

Genomeweb article from ASHG2012 talks about the details of the Rucphen population study and mentions the other ongoing studies that were in the session:

Institution Isolated Population
University of Miami Midwestern Amish populations
University of Maryland Amish populations in Pennsylvania
Cittadella Univ. di Monserrato, Italy Sardinian population
Erasmus Medical Center Rotterdam Rucphen population

While others focussed on the approaches for testing these rare variant datasets:

Institution Approach
Baylor College of Medicine & Fred Hutchinson Cancer Research Center Testing for rare variant associations in the presence of missing data
Baylor College of Medicine & University of Washington Rare Variant Extensions of the Transmission Disequilibrium Test Detects Associations with Autism Exome Sequence Data
Johannes Kepler University, Austria Associating complex traits with rare variants identified by NGS: improving power by a position-dependent kernel approach

So will we step in this personalized genome era soon someday? I’d hate for this bubble to burst so I hope the CAP & CLIA certifications are done and the clinical labs are all set. The public health and data privacy issues are in check and as we breathe into a no nonsense world looking at our detailed genome analysis report we hopefully can feel its more or less uniquely ours or better still closer to our parents than our neighbours.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s