Bioinformatics tools for VCF files

With the ever growing abundance of Next Generation Sequencing (NGS) data there continues to be a challenge faced by the research community to not only standardize best practices and analysis methods, but to embrace the foray of open-source tools and file formats. Having been engaged since almost the beginning of this NGS data outburst with Illumina’s read length extending from 30bp to beyond 250bp, for the file format category SAM, BAM and VCF have become well accepted formats to some extent. I say well accepted because a few versions ago CLCbio still reported variants in a csv file format, but maybe I shouldn’t digress.

So in part due to the 1000 Genome Project and Hapmap consortium, formats like VCF are largely accepted as output format as most open-source tools report variants as VCF reports. This has allowed development of tools / parsers to use the file formats as a standard and provide efficient functionality and data portability.

A recent discussion on twitter about the efficiency and functionality of tools made me compile a list of these VCF parsers for future reference. You will be amazed at the variety of tools available for helping parse the eventual VCF reports from NGS data – free of cost!

Feel free to point out pros/cons of these tools and I will continue to update and make a comprehensive post. Also, it would be most helpful if folks could share their experiences with different tools and the example use-cases of the functionality!!

Tools

Tidbits

VAWK awk-like arithmetic on a VCF file
Bioawk support of several common biological data formats, including optionally gzip’ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats
VCFFilterJS Filters a VCF with javascript
bio-vcf new generation VCF parser
bcftools contains all the vcf* commands
VCFtools provide easily accessible methods for working with complex genetic variation data in the form of VCF files (Paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/)
VariantFiltration Filters variant calls using a number of user-selectable, parameterizable criteria
PyVCF Variant Call Format Parser for Python
vcflib simple C++ library for parsing and manipulating VCF files
wormtable write-once read-many table for large scale datasets (vcf2wt)
SnpSift toolbox to filter and manipulate annotated files
gvcftools Utilities to create and analyze gVCF files