Plenary Session: Genomic Studies II
John McPherson, Ontario Institute for Cancer Research, Chair
9:00 a.m. – 9:30 a.m.
Steve Scherer, The Hospital for Sick Children
“Whole Genome Sequencing Analysis in Autism”
– Autism Spectrum Disorder (ASD) – high heritability, familial clustering & ~4:1 male to female bias (as many candidates on X-chr)
– 100+ risk genes, ~10 not present on the capture
– WGS (at BGI, >30x) on ASD families; need for better indel callers (indel validation rate ~20%, SNV validation rate >90%)
– better and more uniform X chr and splice site coverage in WGS compared to WES
– also mentions PGP-Canada
9:30 a.m. – 10:00 a.m.
Jay Shendure, University of Washington
“Tackling Genetic Heterogeneity with Massive Multiplexing and Molecular Counting”
Missed out on the talk, but here is an older slide-deck from Shendure which covers most of the stuff presented
10:00 a.m. – 10:30 a.m.
* Gabe Rudy, Golden Helix – @gabeinformatics
“Home-Brewed Personalized Genomics: The Quest for Meaningful Analysis Results of a 23andMe Exome Pilot Trio of Myself, Wife, and Son”
– $999 80x exome for the trio, mother with clinically-diagnosed idiopathic rheumatoid arthritis
– 75bp PE, SureSelect capture, BWA/GATKdeliver BAM, VCF, PDF Summary report
– goals = variant call accuracy from NGS, usefulness of 23andme risk variants, usefulness of healthy person’s exome, potential to find driver variants and genes for diagnosis
– 3 Mendel errors, usually due to technical biases (eg mom and dad had non-ref nucleotide messing up child’s genotype)
– 8000 phantom variants (some GATK bug in that version)
– Ingenuity Variant Analysis performed on the exome trio data – look for rare variants within 1-hop of JIA gene
11:00 a.m. – 11:30 a.m.
Mark Yandell, University of Utah
“VAAST: A Probabilistic Disease-gene Finder for Personal Genomes”
– VAAST substantially improves upon existing approaches in terms of statistical power, flexibility and scope of use
– identify rare-disease causing loci using single trios of family members, and in small cohorts (n=3) where no two individuals share the same deleterious variants
– also identify genes involved in common, complex diseases using many fewer cases than traditional GWAS
– working to integrate indels, CNV and SV into VAAST, along with pedigrees, and non-human projects (piegeonomics)
11:30 a.m. – 12:00 p.m.
* Agnes Viale, Memorial Sloan Kettering Cancer Center
“RNA-sequencing Analysis Identifies Novel Leukemic Pathways in a Genetically Accurate Model of Acute Myeloid Leukemia”
Bronze Sponsor Workshops
Chad Nusbaum, Broad Institute of MIT and Harvard, Chair
Line-up of all the vendor talks – @PerkinElmer @iontorrent @NuGENInc @illumina @BCILifeSciences @QIAGEN @PacBio @dnanexus
1:40 p.m. – 2:00 p.m.
NuGen Technologies, Inc., Christine Malboeuf, Broad Institute of MIT and Harvard
“Viral RNA Genome Sequencing of Ultra-Low Copy Samples using NuGen’s Ovation RNA-Seq”
– 5pg of RNA is in human cell; ultra-low rna = 5fg (1000 copies) to 5 ag = amount of viral rna and does not work well with qPCR, etc
– Challenges – low quantity, host contamination, diversity (high mutation rate), technological and extraction process
– Ovation rna-seq v2 protocol from NuGen (500pg to 100ng input RNA) – low contamination
– West Nile virus – 50fg input 5M reads 31% map to virus, 48% map to host, covering 100% of viral CDS
– Dilutions starting with lesser material generated reproducible coverage profiles
– HIV – 50fg input rna – 5M reads, 69% viral aligned reads 5% host aligned, covering 100% CDS
– lesser copies of input rna meant 1-2% reads mapping to virus 30-40% mapping to host, but covered ~97% CDS with reproducible coverage profile
– process worked on samples that failed RT-PCR-454 process; method applicable to many other viral sample types (300-75k viral copies)
– applications: surveillance of endemic/emerging viral pathogens; co-infection of multiple viruses; pathogen discovery (viral parasite bacterial fungal)
Concurrent Session: Computational Biology
Mike Zody, Broad Institute of MIT and Harvard, Chair
7:30 p.m. – 7:50 p.m.
* Mark DePristo, Broad Institute of MIT and Harvard
“Overcoming Today’s Limitations in Sequencing Technology for Human Medical Genetics”
– have sequenced 40k+ samples to date from the common (Diabetes, Autism, and Heart Disease) to the uncommon/rare (Crohn’s and Mendelian disorders)
– Variation among individuals in a population – 90% SNPs 10% indels; disease-causing variation, particularly rare diseases, SNP and indel approach 50% / 50%
– indels remain an outstanding challenge; technical and analytic reasons
– PCR-free libraries improve variant calling sensitivity & specificity
– nice visual example of data looking clean with almost everything matching reference with one SNP and some noise calls; actually a het indel!
– better error models and longer reads improve sensitivity to true indels
– sample size is a huge limitation to better calling; but the ensuing massive data aggregation becomes a challenge as well
7:50 p.m. – 8:10 p.m.
* Andrew Farrell, Boston College
“Reference-free Approach for Mutation Detection”
– De novo assembly is prohibitively expensive for most labs – deep read coverage and massive computing power
– practical approach = reference guided alignment; dependent on three factors – reference accuracy, mapper’s ability to correctly place read (uniquely), degree to which a variant allele differs from reference (indels)
– developed a novel completely reference-independent method – no mapping or de novo assembly of the genome; directly compares raw sequence data from two or more samples, and identifies groups of reads unique to a sample
– tested on small genomes but will tackle human (incl. tumor) genomes, metagenomes, transcriptomes
8:10 p.m. – 8:30 p.m.
* James Knight, 454 Life Sciences
“Assembling Human Sequence into Genomes”
8:30 p.m. – 8:50 p.m.
* Aaron Quinlan, University of Virginia
“LUMPY: A Probabilistic Framework for Structural Variant Discovery and Genomic Data Mining”
– structural variation (SV) needs integration of multiple alignment signals – read-pair, split-read and read-depth
– most existing SV discovery approaches utilize only one signal; poor at low sequence coverage and for smaller SVs (Hydra, DELLY, GASVPro)
– LUMPY = extremely flexible probabilistic SV discovery framework – integrates SV detection signals from read alignments or prior evidence
– 4k simulated SV – 1k each deletion, duplication, insertion, inversion – 2x, 5x, 10x, 20x coverage
– potential for a unified variant calling framework and probabilistic analyses of diverse genomic interval datasets (ENCODE)
8:50 p.m. – 9:10 p.m.
* Jeffrey Reid, Baylor College of Medicine
“Discovery of Mobile Element Variation in Ultra-deep Whole Genome Data”
9:10 p.m. – 9:30 p.m.
* Michael Schatz, Cold Spring Harbor Laboratory
“Assembling Crop Genomes with Single Molecule Sequencing”