The New Genomes of 2012

From the first large genome (goat’s genome) to be sequenced and assembled de novo using whole-genome mapping technology, published almost a week ago to around 1000 rice genomes sequenced at 1x coverage: there’s a lot that has happened at the Whole Genome Sequencing level.

Roche/454 is still being used and so is Sanger, not everyone has moved on to Illumina as you’ll see in the table below. With clinical sequencing aiming a 100x coverage most of the research sequencing seems to be lower. In these papers you’re sure to spot BGI some in partnerships like with University of Copenhagen for the Bat paper and their Illumina data.

As this year wraps up, provided food for thought in his ‘The class of 2012’ post. Here’s a snapshot of some quirky points/details that caught my eye in these ‘Whole Genome Sequencing’ papers.

Lets see what more genomes and sequencing technologies await us in 2013!

Organism Sequencing Usefulness Size & Annotation Software Interesting Findings Scope
Capra hircus; 3yr old female Yunnan black goat WGS: Illumina GA IIx12x coverage For the first time whole-genome mapping technology was used for de novo assembly of large genomes Investigate genetic basis of complex traits, in transgene production of peptide medicines ~2.66-Gb genome sequence; 22,175 protein-coding genes Contigs & scaffolds assembled using SOAPdenovo (Release 1.05) and ABYSSWhole genome mapping: Genome-Builder

Annotation: GLEAN

51 genes are differentially expressed in primary and secondary follicles of cashmere goat Markers for breeding better cashmere goats can be identified and/or potential targets for genetic or nongenetic manipulation
Female western lowland gorilla named Kamilah (San Diego Zoo) WGS: Hybrid de novo assembly combining 5.4 Gb Sanger & 166.1 Gb of Illumina short reads57.4x coverage Closest living relatives after chimpanzees will aid in study of human evolution ~3 Gb genome sequence; 20,962 Protein-coding genes Assembler: ABYSS, Phusion assembler, Maq, VelvetChromosomal AGP files: LASTZ

Annotation: ENSEMBL

30% of the gorilla genome is closer to human though its rarer around coding genes Deeper understanding of great ape biology and evolution
Pan paniscus; female bonobo individual Ulindi (Leipzig Zoo) WGS: 454/Roche 23x coverage (additionally 19 bonobo & chimpanzee genomes on Illumina GAIIx) Compared it to genomes of chimpanzees and humans to study its evolutionary relationship 2.7 Gb genome sequence Assembler: Celera Assembler software 25% of human genes contain parts that are more closely related to one of the two apes than the other Illuminate population history and selective events that affected evolution
Solanum lycopersicum; inbred tomato cultivar ‘Heinz 1706’ WGS: Combination of 21Gb of Roche/454 Titanium & 3.3 Gb of Sanger27x Coverage Compared it with closest wild relative, Solanum pimpinellifolium and potato genome (Solanum tuberosum) 900Mb Genome size; 34,727 protein-coding genes Assembler: Newbler, CABOGAnnotation based on EuGene pipeline Tomato genome more than 8% divergence from potato 18,320 orthologous gene pairs Triplications Comparing gene family evolution & understanding bottlenecks that have narrowed tomato genetic diversity
Denisovan, an extinct relative of Neandertals (& 11 present day individuals) WGS; Single-stranded library preparation method; Illumina GA IIx31x coverage DNA library preparation method to reconstruct a high-coverage (30×) genome sequence ~1.86Gb Genome size; Coverage was not biased toward GC-rich sequences Illumina Genome Analyzer RTA 1.6 software, mapped to reference using BWA, GATK (also realignment) Denisovans share more alleles with east Asian & South American populations (Dai, Han, and Karitiana) than with European populations (French and Sardinian) Determine how modern humans expanded in population size & cultural complexity while archaic humans became physically extinct
Musa acuminata (banana) WGS; 27.5 million Roche/454 single reads and 2.1 million Sanger reads (50× of Illumina data used to correct sequence errors) Served as a stepping-stone to finding conserved non-coding sequences conserved beyond monocotyledons 523-Mb genome size; 36,542 protein-coding genes Assembler: NewblerAligner: BLAT Comparison of Musa, rice, sorghum, Brachypodium, date palm and Arabidopsis proteomes revealed 7,674 gene clusters common to all six species Whole-genome duplications Unravel complex genetics & key to identifying genes responsible for agronomic characters, such as fruit quality and pest resistance
Rice: 1,083 O. sativa accessions & 446 O. rufipogon accessions (China, Japan) WGS; Illumina GA IIx 73-bp paired-end reads O. sativa: 1x coverage O. rufipogon 2x coverage Identified 55 selective sweeps that have occurred during domestication Aligner: SmaltSNP caller: Ssaha Pileup Insights into how and where rice was likely domesticated & set of domestication sweeps and putative causal genes Important resource for rice breeders to effectively exploit diverse genetic resources for rice improvement
Female domestic Duroc pig (Sus scrofa) WGS Hybrid de novo assembly based sequences from BAC clones40x coverage Comparison with the genomes of wild and domestic pigs from Europe and Asia Identification of putative disease-causing variants can aid the pig to be a biomedical model ~2.6Gb genome size; 21,640 protein-coding genes Contigs Assembler: PhrapAssembler: SOAPdenovo, Cortex

Aligner: BLAT

Genes associated with immune response and olfaction exhibit fast evolution112 positions where porcine protein has same amino acid that is implicated in a human disease Important resource for improvement in livestock species
Bread wheat (Triticum aestivum); Chinese Spring (CS42) WGS; Hexaploid genome 454/Roche – 5x coverage. SOLiD used for additional sequencing to increase accuracy of homologous SNP identification. Comparison with diploid progenitors and relatives showed overall trend of gene family size reduction in large gene families in wheat. Defined genome-wide catalog of SNPs 17 Gb genome size; ~94k genes MetaSim to generate readsOrthologous genes: OrthoMCL clustering, BLASTX Assembled gene sequences representing a complete gene set were sequenced. Powerful framework for identifying genes Identification of extensive genetic variation can provide a resource for accelerating gene discovery and improving this crop
Pacific oyster Crassostrea gigas (inbred female produced by four generations of brother–sister mating) WGS 155x Illumina (unable to assemble due to high  polymorphism & repetitive sequence) fosmid-pooling strategy Fosmid library 10x; 60x sequencing &  assembly Combination of fosmid pooling, NGS and hierarchical assembly: new, cost-effective alternative for de novo sequencing & assembly of complex genomes 637 Mb; 28,027 genes Alignment: LASTZ Expansion of genes coding for HSP70 & IAPs is probably central to adaptation to sessile life in the highly stressful intertidal zone valuable resources for studying molluscan biology and lophotrochozoan evolution
owl limpet (Lottia gigantea), a marine polychaete (Capitella teleta) and a freshwater leech (Helobdella robusta) WGS with Sanger dideoxy sequencing reads8x coverage compare them with other animal genomes to investigate the origin and diversification of bilaterians from a genomic perspective ~200-300 Mb genome size; ~23,000 to 33,000 protein-coding genes orthologs BLAST ~8K bilaterian gene families likely from single progenitor genes. 231 putative spiralian-specific gene families, members aligned across all three spiralians, indicating purifying selection rate-stratification approach could be used to place problematic taxa when genome data becomes available
Wild-caught bats, fruit bat (P. alecto) and insectivorous (M. davidii) WGS Illumina HiSeq 2000109x-118x Coverage Identify genetic changes associated with the development of bat-specific traits by comparative analyses of two distantly related bat species ~2Gb genome size; protein coding genes: 21,392 P. alecto & 21,705 M. davidii Assembler:SOAPdenovo aligner BWA

Repeat Annotation: Tandem Repeats Finder

Gene annotation: GLEAN

Genes in DNA damage checkpoint/DNA repair pathway found to be under positive selection. So are COL3A1 (skin elasticity) CACNA2D1 (muscle contraction). Entire locus of PYHIN gene family (sensing microbial DNA) is lost Comparison with other species may provide new insights into bat biology and evolution

