#ASHG2013 Platform and Poster abstract tag-clouds

With more than 6000 scientists (genetics, bioinformatics, clinicians, statistics, genetic counselor…) and more than 200 companies at Boston for this year’s American Society for Human Genetics conference, there is a lot of great science to catch up on.

Very quickly, I just pulled out the selected platform talk abstracts, and the poster abstracts (too many posters, so I simply picked my biased interest of ~260 Bioinformatics ones) and made these tag-clouds to get the popular keywords.

They are very similar! While the Bioinformatics posters have a lot of DATA, coverage and quality; the platform talks have a lot of CANCER, functional and mutations. The platform talks also have a lot of neandertal, pms and mutation. Looking forward to all the excitement!!

Bioinformatics Posters
Bioinformatics Posters

Platform abstracts
Platform abstracts

Tools & Parameters: TagCrowd to generate the cloud using text from PDF files on the ASHG website. Max 77 words to show, min frequency of 5 and excluding these keywords “boston cambridge chr ma united university”.

BTW, do check out the twitter analysis by @erlichya on #ASHG2013 tweets and keywords

AGBT 2013 Saturday sessions

Plenary Session: Genomic Technologies
Len Pennacchio, Lawrence Berkeley National Laboratory, Chair

— could not take notes on some of the talks and afternoon session

9:00 a.m. – 9:30 a.m.
Rebecca Leary, Johns Hopkins Kimmel Cancer Center
“Personalized Approaches to Non-invasive Cancer Detection”

– personalized analysis of rearranged ends (PARE)-identify structural alterations in solid tumors
– generate personalized biomarkers for the detection of circulating tumor DNA
– Tumor-derived mate-pair library -> somatic rearrangements -> confirmed by PCR in tumor & matched normal
– Application = monitor disease progression, identify residual disease (predict relapse), surgical margins
– Plasma Aneuploidy Score – clearly differentiates normals from colorectal cancer samples (just 10x physical coverage – detect rearrangements)
– 0.75% circulating tumor DNA – 90%+ sensitivity, 99%+ specificity using 1 HiSeq lane

9:30 a.m. – 9:55 a.m.
* Eric Antoniou, Cold Spring Harbor Laboratory
“Increased Read Length and Sequence Quality with Pacific Biosciences Magbead Loading System and a New DNA Polymerase”

– duckweed as Biofuel (40tonnes/acre/yr), .1 ton yields .025tons of ethanol by weight and is ~7.5 gallons a day
– rice genome (470 Mbp) sequenced using the Pacific Biosciences RS sequencer (MagBead loading system) – hybrid de novo assembly with Illumina data
– 10kbp insert library; 9X coverage of the rice genome (mean read length – 3kb, max 21kb)
– mean accuracy mode of single pass long read – 90%, (85-87% for current C2 chemistry)

9:55 a.m. – 10:20 a.m.
* Tim Harkins, Life Technologies
“Ovarian Cancer Evolution: a Tale of Two Paths”

– ovarian cancer 9th leading cancer among women, 5th leading cause of cancer related death, high relapse rate

10:45 a.m. – 11:10 a.m.
* X. Sunney Xie, Harvard University
“Detecting Single Nucleotide and Copy Number Variations of a Single Human Cell by Whole Genome Sequencing”

– Individual cells of identical descent can have different genomes (dynamic changes in DNA) – important to many biological investigations and medical diagnoses
– Single-cell whole-genome amplification methods – exponential amplification bias => low genome coverage
– Multiple Annealing and Looping Based Amplification Cycles (MALBAC) – 93% genome coverage ≥ 1x for a single human cell at 30x mean sequencing depth
– detection of digitized CNV & SNVs – ~76% efficiency for a single cancer cell
– 2.5 single-base substitutions per mitosis in human tumor cell line identified using single cell amplification/sequencing
– circulating tumor cells (CTCs) of same patient show similar CNV; CTCs of lung cancer patients show similar CTC
– clinical trial for pre-implantation genomic screening for IVF using single polar bodies of oocytes
– male’s genome can be phased by seq sperm, female’s genome phased using polar bodies genomes
– 0.1X genome coverage is enough to determine aneuploidy (at 8-cell stage) for MALBAC’s single-cell sequencing in IVF
– anomalous transition/transversion ratio for newly acquired SNVs

11:10 a.m. – 11:35 a.m.
* Jeremy Schmutz, HudsonAlpha Institute
“Evaluating Moleculo Long Read Technology for de novo Whole Genome Sequencing”

– Moleculo Long Read technology – sequencing two complex plant genomes (inbred diploid switchgrass comparator Panicum hallii (600 Mb) and the outbred tetraploid Miscanthus sinensis (~2.3 Gb)
– incldue long, retrotransposon-derived repeats, diverse GC-content and present significant challenges for short-read NGS whole genome shotgun sequencing
– Moleculo reads – 10kb reads (5kb avg), high accuracy (1.26bp error/10k), tunable to genome size/complexity, reduces computational complexity
– limitations = distribution of reads depends on local repetitive content & global repeat freq; illumina based => localized chemistry issues; some amplification bias

11:35 a.m. – 12:00 p.m.
* Jonas Korlach, Pacific Biosciences
“Automated, Non-Hybrid De Novo Genome Assemblies and Epigenomes of Bacterial Pathogens”

AGBT 2013 Friday sessions

Plenary Session:  Genomic Studies II
John McPherson, Ontario Institute for Cancer Research, Chair

9:00 a.m. – 9:30 a.m.
Steve Scherer, The Hospital for Sick Children
“Whole Genome Sequencing Analysis in Autism”

– Autism Spectrum Disorder (ASD) – high heritability, familial clustering & ~4:1 male to female bias (as many candidates on X-chr)
– 100+ risk genes, ~10 not present on the capture
– WGS (at BGI, >30x) on ASD families; need for better indel callers (indel validation rate ~20%, SNV validation rate >90%)
– better and more uniform X chr and splice site coverage in WGS compared to WES
– also mentions PGP-Canada

9:30 a.m. – 10:00 a.m.
Jay Shendure, University of Washington
“Tackling Genetic Heterogeneity with Massive Multiplexing and Molecular Counting”

Missed out on the talk, but here is an older slide-deck from Shendure which covers most of the stuff presented

10:00 a.m. – 10:30 a.m.
* Gabe Rudy, Golden Helix@gabeinformatics
“Home-Brewed Personalized Genomics: The Quest for Meaningful Analysis Results of a 23andMe Exome Pilot Trio of Myself, Wife, and Son”

– $999 80x exome for the trio, mother with clinically-diagnosed idiopathic rheumatoid arthritis
– 75bp PE, SureSelect capture, BWA/GATKdeliver BAM, VCF, PDF Summary report
– goals = variant call accuracy from NGS, usefulness of 23andme risk variants, usefulness of healthy person’s exome, potential to find driver variants and genes for diagnosis
– 3 Mendel errors, usually due to technical biases (eg mom and dad had non-ref nucleotide messing up child’s genotype)
– 8000 phantom variants (some GATK bug in that version)
– Ingenuity Variant Analysis performed on the exome trio data – look for rare variants within 1-hop of JIA gene

—- Illumina User Meeting Dispatch newsletter

11:00 a.m. – 11:30 a.m.
Mark Yandell, University of Utah
“VAAST: A Probabilistic Disease-gene Finder for Personal Genomes”

VAAST substantially improves upon existing approaches in terms of statistical power, flexibility and scope of use
– identify rare-disease causing loci using single trios of family members, and in small cohorts (n=3) where no two individuals share the same deleterious variants
– also identify genes involved in common, complex diseases using many fewer cases than traditional GWAS
– working to integrate indels, CNV and SV into VAAST, along with pedigrees, and non-human projects (piegeonomics)

11:30 a.m. – 12:00 p.m.
* Agnes Viale, Memorial Sloan Kettering Cancer Center
“RNA-sequencing Analysis Identifies Novel Leukemic Pathways in a Genetically Accurate Model of Acute Myeloid Leukemia”

Bronze Sponsor Workshops
Chad Nusbaum, Broad Institute of MIT and Harvard, Chair

Line-up of all the vendor talks – @PerkinElmer @iontorrent @NuGENInc @illumina @BCILifeSciences @QIAGEN @PacBio @dnanexus

1:40 p.m. – 2:00 p.m.
NuGen Technologies, Inc., Christine Malboeuf, Broad Institute of MIT and Harvard
“Viral RNA Genome Sequencing of Ultra-Low Copy Samples using NuGen’s Ovation RNA-Seq”

– 5pg of RNA is in human cell; ultra-low rna = 5fg (1000 copies) to 5 ag = amount of viral rna and does not work well with qPCR, etc
– Challenges – low quantity, host contamination, diversity (high mutation rate), technological and extraction process
– Ovation rna-seq v2 protocol from NuGen (500pg to 100ng input RNA) – low contamination
– West Nile virus – 50fg input 5M reads 31% map to virus, 48% map to host, covering 100% of viral CDS
– Dilutions starting with lesser material generated reproducible coverage profiles
– HIV – 50fg input rna – 5M reads, 69% viral aligned reads 5% host aligned, covering 100% CDS
– lesser copies of input rna meant 1-2% reads mapping to virus 30-40% mapping to host, but covered ~97% CDS with reproducible coverage profile
– process worked on samples that failed RT-PCR-454 process; method applicable to many other viral sample types (300-75k viral copies)
– applications: surveillance of endemic/emerging viral pathogens; co-infection of multiple viruses; pathogen discovery (viral parasite bacterial fungal)

Concurrent Session: Computational Biology
Mike Zody, Broad Institute of MIT and Harvard, Chair

7:30 p.m. – 7:50 p.m.
* Mark DePristo, Broad Institute of MIT and Harvard
“Overcoming Today’s Limitations in Sequencing Technology for Human Medical Genetics”

– have sequenced 40k+ samples to date from the common (Diabetes, Autism, and Heart Disease) to the uncommon/rare (Crohn’s and Mendelian disorders)
– Variation among individuals in a population – 90% SNPs 10% indels; disease-causing variation, particularly rare diseases, SNP and indel approach 50% / 50%
– indels remain an outstanding challenge; technical and analytic reasons
– PCR-free libraries improve variant calling sensitivity & specificity
– nice visual example of data looking clean with almost everything matching reference with one SNP and some noise calls; actually a het indel!
– better error models and longer reads improve sensitivity to true indels
– sample size is a huge limitation to better calling; but the ensuing massive data aggregation becomes a challenge as well

7:50 p.m. – 8:10 p.m.
* Andrew Farrell, Boston College
“Reference-free Approach for Mutation Detection”

– De novo assembly is prohibitively expensive for most labs – deep read coverage and massive computing power
– practical approach = reference guided alignment; dependent on three factors – reference accuracy, mapper’s ability to correctly place read (uniquely), degree to which a variant allele differs from reference (indels)
– developed a novel completely reference-independent method – no mapping or de novo assembly of the genome; directly compares raw sequence data from two or more samples, and identifies groups of reads unique to a sample
– tested on small genomes but will tackle human (incl. tumor) genomes, metagenomes, transcriptomes

8:10 p.m. – 8:30 p.m.
* James Knight, 454 Life Sciences
“Assembling Human Sequence into Genomes”

8:30 p.m. – 8:50 p.m.
* Aaron Quinlan, University of Virginia
“LUMPY: A Probabilistic Framework for Structural Variant Discovery and Genomic Data Mining”

– structural variation (SV) needs integration of multiple alignment signals – read-pair, split-read and read-depth
– most existing SV discovery approaches utilize only one signal; poor at low sequence coverage and for smaller SVs (Hydra, DELLY, GASVPro)
– LUMPY = extremely flexible probabilistic SV discovery framework – integrates SV detection signals from read alignments or prior evidence
– 4k simulated SV – 1k each deletion, duplication, insertion, inversion – 2x, 5x, 10x, 20x coverage
– potential for a unified variant calling framework and probabilistic analyses of diverse genomic interval datasets (ENCODE)

8:50 p.m. – 9:10 p.m.
* Jeffrey Reid, Baylor College of Medicine
“Discovery of Mobile Element Variation in Ultra-deep Whole Genome Data”

9:10 p.m. – 9:30 p.m.
* Michael Schatz, Cold Spring Harbor Laboratory
“Assembling Crop Genomes with Single Molecule Sequencing”

AGBT13 – Thu 2/21 Afternoon session

Missed out on a talk and parts of others (and internet was so poor!):

Plenary Session: Clinical Genomics II
Sharon Plon, Baylor College of Medicine, Chair

2:45 p.m. – 3:15 p.m.
Stephen Kingsmore, Children’s Mercy Hospital
“Two Year Experience of Pediatric Genomic Medicine at a Large Children’s Hospital”
– In genetic diseases, genome sequence is uniquely deterministic – single gene diseases are proving ground for genomic medicine
– Bioinformatics whole genome analysis GSNAP -> GATK -> RUNES in ~17hr -> VIKING analysis for reporting
– 24hr STAT-seq test in 2013 – 18hr hiSeq sequencing, ~4hr bioinformatics
– Causal gene is known in 3677 / 7334 genetic diseases – treatment for ~500
– Diagnostic Odyssey – a surprising new term I came across – defined as the weird concert of multiple doctor visits, sanger gene tests, etc. to understand the genetic disorder – takes ~5 years!!
– Highlights inconsistent results from Sanger-seq
– 3 tests to pediatricians:

  • STAT-seq – 2 day time-to-result genome seq test
  • TaGSCAN – CLIA targeted panel for 514 disease genes – 800x coverage – ~$1350, 99% precision
  • WES

3:15 p.m. – 3:45 p.m.
Jonathan Berg, The University of North Carolina at Chapel Hill
“Binning” the Genome: Practical Management of Genomic Incidental Findings in a Clinical Context”

– Incidental findings – on vast majority of genome, and with not much clinical significance; Incidentalome best considered in the context of predictive value
– 3 bins of incidental variants – clinical mgmt implications (benefit > harm), clinically valid but uncertain, no clinical utility
– Define a priori, the actionability of gene-phenotype pairs – mindful of

  • medical ethics (duty to warn and autonomy)
  • reproducible & transparent
  • scalable & flexible

– talks about crowd-sourcing this task for consistency/variability of scores

4:15 p.m. – 4:45 p.m.
Zivana Tezak, FDA
“Translating Ultra High Throughput Sequencing into Clinical Applications-Regulatory Considerations”

4:45 p.m. – 5:15 p.m.
* Michael Talkowski, Harvard Medical School
“Rapid Prenatal Diagnosis by Whole-genome Clinical Sequencing of Jumping Libraries”

– WGS pipeline for detection of balanced chromosomal abnormalities (BCA) – FISH is the standard but low-resolution method
– Using customized large-insert jumping libraries – ~200-300x coverage (single HiSeq lane)
– 13-day sequence and analytic protocol for prenatal delineation of BCAs
– The challenge of interpreting morbid genome; imbalance in some regions is tolerated

AGBT13 – Thu 2/21 Morning session

Plenary Session: Clinical Genomics I
Heidi Rehm, Partners Healthcare Center for Personalized Genetic Medicine, Chair

— useful collection of links by @nextgenseekagbt-2013-first-day-link-roundup
AGBT13 stories (updated twice daily based on tweets)

9:00 a.m. – 9:30 a.m.
Russ Altman, Stanford University
“Pharmacogenomics (PG) in the Era of Genome Sequencing”

Founder of Personalis, and I really like his saying from elsewhere “We have to bring “genome-drug” interactions to (physicians’) attention just as we currently bring “drug-drug” interactions to their attention.”
– Current PG knowledge (PharmGKB) – from common variants with large effects on metabolism or action of drugs
– Key is understanding the clinical significance of less common variants on drug response
– Need for large repositories of empirical data about drug response & genome sequence
– VIP = Very Important Pharmacogene!
– Dosing guidelines manually curated in PharmGKB
– Impressive Pharmacogenomics ‘good news’ and ‘bad news’ summaries, along with novel non-synonymous damaging SNPs in VIP genes and relevant drugs
– Also presented exome seq of dose extremes in African-Americans – found a common variant (not in CEU) with strong effect on dose
– 900 genes that have 1 variant with very well characterized drug response. and 900 drugs with 1 variant

9:30 a.m. – 10:00 a.m.
Jon Seidman, Harvard Medical School
– Inherited cardioMyopathy
– understand their genetic bases of hypertrophy (thick wall) and dilatation (thin) as precursors of heart failures
– 2 models – common disease common variant or common disease lots of rare var
– most individuals in US with HCM (500k/yr) have novel de novo mutations rather than more rare founder mutations
– 80+ genes implicated in idiopathic DCM making this more complicated to evaluate
– 25% of DCM caused by truncating mutations in TTN (largest human gene)

10:00 a.m. – 10:30 a.m.
Christine Eng, Baylor College of Medicine
“Clinical Whole Exome Sequencing (WES): Results and Outcome of First 450 Reported Clinical Cases”

– Technical, bioinformatic and interpretive WES pipelines in CAP-CLIA lab to identify causative mutations underlying disease phenotypes (genetic disorders)
– Variable pickup rate of Sanger seq based on gene
– NIH undiagnosed disease study results: 39/160 diagnosed
– 900+ WES tests, 450 reported to physicians (medical geneticists/neurologists)
– Causative mutant allele identified in 121/450 (27% confirmed diagnostic rate)
– OMIM = 3500 genes & genetic disorders, mostly rare, phenotypically het – leads to undiagnosed individuals with various genetic syndromes – WES to the rescue

11:00 a.m. – 11:30 a.m.
Elizabeth Worthey, Medical College of Wisconsin
“Making a Definitive Diagnosis: Experiences from Our WGS Based Genomic Medicine Clinic”

Very similar to the earlier exome talk, but this is whole-genome! Way to go…
– Integrated informatics solutions supporting variant annotation, prioritization, analysis, interpretation and reporting – integration into clinical genetics lab – CAP-CLIA (= 1400+ pg docs, 220 SOPs …)
– ACMG classification of variants into reporting categories
– 100-120 variants (170x coverage) per genome flagged for in-depth review, most not clincally reportable
– trouble with poorly covered regions – options – WES, Sanger, PacBio or something else
– end-to-end time from sample-prep, sequencing, bioinformatics, interpretation to reduce from ~2 weeks (2012) to 2 days (2013)!!
– amazingly precise cost slides – $1500 WGS, $1050 whole-genome interpretation and so on..

11:30 a.m. – 12:00 p.m.
Kjersti Aagaard, Baylor College of Medicine
“The Emerging Era of Metagenomics Medicine”


– Human + their microbiome = co-evolved physiologic community
– metagenome = incredibly diverse & plastic
– Human Microbiome Project Consortium = ca. structure, function, diversity of healthy  – microbiome = “reference”
– Metagenomic medicine – needs robust body sampling & strong analytic approaches
– How pregnancy and mitochondrial genome structures the microbiome
– 16S-based metagenomics using Roche FLX and Ion; WGS using Illumina
– 16S metagenomics shows us “who is there” while WGS metagenomics shows us “what they may be doing”