The Ebola outbreak of 2014 (Contd.)

Following up on the previous post here are some more detail on the recent Science paper along with a round-up of “what do we know, what have we learned” thus far.

The Gire et al paper in Science was huge amount of work and a giant collaborative research effort. Being a computational biology researcher, I appreciate their in-depth and detailed evaluation utilizing numerous bioinformatics software tools. Gleaning through the supplemental text, I created the flowchart below as a summary of all the analysis that went into the eventual results and interpretations. This was created using the wonderful Gliffy tool.

ebolaFlowchart of the impressive work accomplished by the Gire et al Science paper (I made this using notes from their Supplemental Data)

A slew of articles summarizing the recent Science paper came out as the hype surrounding WHO warning of this current outbreak hitting 20,000 people caught on. This is huge!! Peter Piot, who co-discovered the Ebola virus during the 1976 outbreak never imagined an outbreak like this, but is confident of ‘high-income countries’ doing just fine.

The Broad Institute and Harvard University worked with Sierra Leone Ministry of Health and Sanitation along with other researchers to provide the comprehensive paper in Science describing the sequencing of current Ebola genomes. Simultaneously, the human trial on NIH and GSK’s investigational Ebola vaccine is to begin this week, as it performed well on primate studies. Hopefully this one will be faster than the usual 10-year turn-around observed for a vaccine trial. Although the experimental drug ZMapp is being used on the cases, it is with mixed results and much and more still needs to be done. Interestingly the drug is a three-mouse monoclonal antibody and the primate research itself was published in Nature last week. Details of how it seemed to have worked on the 2 US health care workers in the midst of this outbreak is a pretty ‘miraculous’ story!

The major points to note thus far:

  • First Ebola Virus Disease of 2014 confirmed in Sierra Leone on May 25
  • It seems like there was one instance of the EBOV transmitting from the ‘natural reservoir’ to humans and has since been transmitted from human to human (implying there is rare, though present, chance of non-human transmission)
  • Substitution rate is twice as high implying that continued progression of this epidemic could allow a viral adaptation, thus the need for rapid containment
  • The 2014 outbreak has a doubling period of about 35 days!!
  • Complicating matters, positive diagnosis for malaria does not necessarily rule out Ebola Virus Disease
  • Senegal just became the 5th West African country with a confirmed case of Ebola
  • Breaking News: samples from Ebola outbreak in Congo (DRC) were evaluated to have a distinct and independent transmission event, likely via a bushmeat consumption!

Hopefully this is contained sooner rather than later…

Solving the Ebola Virus Genome and Identifying Possible Diagnosis

If you have ever played the card game Killer Bunnies and your Bunnies in the Bunny circle have died because of the Level 11 Weapon of Ebola Virus you want to read this.


Research community is making strides to understand whether the virus is adapting to its host or changing as it spreads through the different populations as more countries get in its warp, in West Africa.

5 of the 50 co-authors on this Science article were infected with the deadly Zaire Ebola Virus (EBOV) themselves. Nothing short of a thriller, the events trace back to the funeral of a healer which kick-started the spread of Ebola in the region. Also reviewed here is a paper from 2008 in which the authors have pointed the VP35 protein, which during their experiments was identified as a critical component of this hemorrhagic fever.

Ebola’s genomic sequence:

  • Linear, single-stranded genome
  • Inverse-complementary 3′ and 5′ termini
  • ~19 kb (19 thousand nucleotides long compared to 3 billion human genome)
  • Seven genes (compared to ~20k in humans)

The current outbreak is due to the EBOV virus, one of the five Ebola virus known to infect humans. Research groups are trying to identify whether the genetic sequence of this virus is changing fast enough in regions that are key for the accuracy of the PCR based diagnostic tests.

This EBOV virus in the 2014 epidemic has been reported to be 97% similar to the virus that first emerged in 1976. Articles across the web estimate that EBOV is set to evolve at about 7×10-4 substitutions per site per year suggesting that the current strain of EBOV would have accumulated many substitutions over the 40 year time period since 1976.

In this article Gire et al use genomic data and inferences by using next generation sequencing technologies to explain whether the virus is accumulating significant mutations as it changes hosts.

  • Methods compared to ascertain choice for sequencing:
    • Library preparation: Nugen and Nextera
    • Sequencing instruments: PacBio and Illumina
  • Nextera and Illumina provided most complete genome assembly and intrahost SNV identification
  • 99 virus genomes, 78 patients in Sierra Leone sequenced at a median coverage >2,000x across 99.9% of EBOV coding regions
  • Intra and Interhost genetic variations to characterize transmission patterns
  • 341 fixed substitutions identified between previous and 2014 EBOV
    • 35 nonsynonymous, 173 synonymous, 133 noncoding
  • 55 single nucleotide polymorphisms (SNPs) among this West African outbreak
    • 15 nonsynonymous, 25 synonymous, 15 noncoding
  • Genetic similarity across sequenced 2014 samples suggests single transmission


Josh Herr of Michigan State University and Daniel Park of Broad Institute aim to maintain an analysis wiki for solving the underlying genomic riddle, by studying the different strains of the virus and are encouraging contributors (ebola-crowdsource).

Screen shot 2014-08-30 at 8.01.22 PM

In an earlier paper published in Journal of Virology in 2008, Hartman et al discuss how whole genome expression profiling reveals that the innate immune response of the host can be inhibited and reversed by single amino acid change in VP35 Protein.

  • Two reverse genetic-generated Ebola virus strains
    • Encode wild-type VP35 protein or VP35 with an arginine (R)-to-alanine (A) amino acid substitution at position 312
  • Whole-genome expression profiling of the host cells in human liver
  • Host cells reveal differences in response to introduction of these viruses differing by a single amino acid
  • VP35 protein plays a vital role in inhibiting immune responses of the host
  • Single amino acid change exhibits the ability to eliminate this inhibitory effect
  • VP35 Protein demonstrates a critical role in the severity of the disease


Dr. Lipkin professor of epidemiology at the Columbia University discusses a pertinent question of whether ‘Ebola can travel to the United States’ . He explains in a matter-of-fact way that although there is a possibility of the virus traveling to US like anywhere else, there’s also a high likelihood of it being monitored and isolated by health authorities at the earliest possible.

Lets get to a round of that card game now, shall we.

How recent are rare variants?

The Department of Genome Sciences at the University of Washington in Seattle, in a multi-institutional effort sequenced 15,336 genes for the NHLBI sponsored ESP project from a total of 6515 individuals of European American and African American descent.To identify approaches for disease-gene discovery, its important to understand evolutionary history of homosapiens and identify the age of mutations.

The group estimated 73% of all protein-coding SNV’s and around 86% of all SNVs predicted to be deleterious are a recent change within 5,000-10,000 years. European Americans had an excess of deleterious variants and had weaker purifying selection and that was explained with the out-of-africa model.

The gist you ask: rare variants have an important role in heritable phenotypic variation, disease susceptibility and adverse drug responses. The increasing population size has not had enough turn around time for selection to act upon, its only been 200-300 generations since these mutations came to be. Now this increase in mutations results in more Mendelian disorders and has increased the allelic and genetic heterogeneity of traits.

Though if there’s a positive side to it, it may as well be that we as people have created a new repository of advantageous alleles that have come into being fairly recently and hopefully evolution will act upon in subsequent generations

The common story of rare variants..

NGS comes to the aid of researchers to find answers for herited complex traits and diseases, paving a path towards the ‘personalized medicine era’. Whole-exome sequencing (WES) has been deployed for identifying rare variants associated with complex diseases and is providing a philip for further research and insight.

Common variants that were identified through Genome Wide Association Studies (GWASs) have not been able to answer the founding questions to an extent where the research community can identify the traits related to heritability. Moving forward and learning as we do from our experiences, the trail of identification of rare variants is helping us fill the gaps of missing heritability that were unsatiated with these genome wide global efforts.

Cirulli and Goldstein mention in their review (Uncovering the roles of rare variants in common disease through whole-genome sequencing) that common variants are being identified in Mendelian disease studies as having a key role as modifiers of the effects of rarer contributors to disease risk.

Potential frequencies of causal variants in complex traits

At conferences like ASHG in San Francisco last year the buzz words were ‘rare variants are common’. Twitter updates of late from the PAGXII have been talking about the same ‘rare variants’ and this, not just for human populations.

Scoping through literature for common and rare variants has been interesting, there are papers that point out in great depth how the thought process from Common Disease Common Variant (CDCV) moved on to the Common Disease Rare Variant (CDRV). Almost 4 years ago, Schork mentioned in his paper (Common vs. Rare Allele Hypotheses for Complex Diseases) that rare genetic variants (less than 5% frequency) can play key roles in influencing complex disease and traits.

Another interesting paper that I came accross from a decade earlier was from Reich and Lander (Lander of the Gangnam fame) On the allelic spectrum of human disease. In this paper the authors discuss the variation in allelic spectra for common disease genes and point that for some genes predominant disease alleles exist, while for others only a rare set. Their theory revolves around the idea that genes responsible for most of the risk for common diseases (hypertension, heart disease etc) have relatively simple allelic spectra and hence the causal variants for a common disease can be found using GWAS.

In his paper An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People, Nelson talks about rare variants being a result of recent mutations and being clustered geographically to some extent. He also points out that the common variants observe only a small fraction of the genetic diversity in any gene.

Number of Variants/kb of sequence

Moving on a decade (or less) from now, I imagine for 23&Me to expand its base to move on from just the common variants they report for ‘someone’s DNA’ to the rare variants that they could possibly do with specific input on ancestry, population, surnames to guide that route.

From the Whitehead Institute for Biomedical Research in Cambridge comes a paper on Identifying Personal Genomes by Surname Inference. Gymrek talks about how surnames can be recovered from profiling short tandem repeats on the Y chromosome from freely available, publicly accessible internet resources. Though the authors point out that their efforts are more in line to see effective policies being established for data sharing and awareness to the patient regarding participating in genetic studies and not for data sharing to recede.

Another interesting read for the statistically inclined people is SKAT test from the Harvard
Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. This study proposed a test: sequence kernel association test (SKAT) for studying association between a set of rare (and common) variants and continuous/dichotomous phenotypes. Using aggregates of individual score test statistics of SNPs belonging to a set, it computes p-values from the defined set level.

SKAT steps in to take the role of testing for association between variants in a region, surpassing burden tests. The authors note it is unlikely for most rare variants to influence the phenotype with the same magnitude. And as it is more common for variants within a sequenced region to have little or no effect on phenotype, SKAT allows different variants to have different directions and magnitude of effects.

Coming back to the conferences here is a snippet of what some institutes are doing: @ ASHG 2012 Quite a few sessions also revolved around the rare variants and other findings from the NHLBI Exome Sequencing Project. The NHLBI GO ESP is a dataset characterizing multiple samples of richly phenotype populations, making this endeavor a variation of the 1000 human genome project in many ways. Some sessions like these listed in the table below, highlighted the idea of accelerated gene discovery for complex traits using NGS:

Genomeweb article from ASHG2012 talks about the details of the Rucphen population study and mentions the other ongoing studies that were in the session:

Institution Isolated Population
University of Miami Midwestern Amish populations
University of Maryland Amish populations in Pennsylvania
Cittadella Univ. di Monserrato, Italy Sardinian population
Erasmus Medical Center Rotterdam Rucphen population

While others focussed on the approaches for testing these rare variant datasets:

Institution Approach
Baylor College of Medicine & Fred Hutchinson Cancer Research Center Testing for rare variant associations in the presence of missing data
Baylor College of Medicine & University of Washington Rare Variant Extensions of the Transmission Disequilibrium Test Detects Associations with Autism Exome Sequence Data
Johannes Kepler University, Austria Associating complex traits with rare variants identified by NGS: improving power by a position-dependent kernel approach

So will we step in this personalized genome era soon someday? I’d hate for this bubble to burst so I hope the CAP & CLIA certifications are done and the clinical labs are all set. The public health and data privacy issues are in check and as we breathe into a no nonsense world looking at our detailed genome analysis report we hopefully can feel its more or less uniquely ours or better still closer to our parents than our neighbours.

Vaccine Wars or not..

Attended a talk on ‘Vaccine Wars in the 21st Century’ by Dr. Gregory Poland from the Mayo Clinic at the University of Minnesota Rochester Campus

Major points Poland covered were on how we live in an age where information is confused with knowledge. Broadly he went over how the immunization rates are dropping and anti vaccine messages are being trumped. Quoting recent outbreak and the short story snippets behind them he covered diseases like measles and mumps. Stating that mistrust in people despite safe and effective vaccines was biased of, of emotions, heuristics and data they see.

Citing examples where the majority of audience failed to read an extra “in” in the phrase “Paris in in the spring” in a diamond shaped box among other tidbits of data and a simple math question of spending $1.10 where the bat costs exactly a $1 more than the ball – what could be the price of the ball 1c/5c/10c/20c, Poland justified how public (the audience in this case) messes up in understanding data.

He went on to cite examples of Andrew Wakefield and Jenny McCarthy regarding the measles vaccine and video hype of Desiree Jennings. He concluded the talk by pointing how people fail to see the data that they need to see to make ‘good’ decisions instead of getting vaccines for fear or by requirement/coercion.

Did the talk motivate the audience to spread the word about getting vaccines and being ‘good’ – that’s difficult to say. But the talk was certainly entertaining.

With so much information available, and I use the word information because what the general public sees is certainly not data. Interpreting data and getting information out of it is some hard work but forming a bead of knowledge from information is to each their own. Its difficult to say how a ‘vaccine war’ story unfolds – while you’re in the middle of it – the decision is solely yours, the end could work for you or totally not.