With the ever growing abundance of Next Generation Sequencing (NGS) data there continues to be a challenge faced by the research community to not only standardize best practices and analysis methods, but to embrace the foray of open-source tools and file formats. Having been engaged since almost the beginning of this NGS data outburst with Illumina’s read length extending from 30bp to beyond 250bp, for the file format category SAM, BAM and VCF have become well accepted formats to some extent. I say well accepted because a few versions ago CLCbio still reported variants in a csv file format, but maybe I shouldn’t digress.
So in part due to the 1000 Genome Project and Hapmap consortium, formats like VCF are largely accepted as output format as most open-source tools report variants as VCF reports. This has allowed development of tools / parsers to use the file formats as a standard and provide efficient functionality and data portability.
A recent discussion on twitter about the efficiency and functionality of tools made me compile a list of these VCF parsers for future reference. You will be amazed at the variety of tools available for helping parse the eventual VCF reports from NGS data – free of cost!
Feel free to point out pros/cons of these tools and I will continue to update and make a comprehensive post. Also, it would be most helpful if folks could share their experiences with different tools and the example use-cases of the functionality!!
|VAWK||awk-like arithmetic on a VCF file|
|Bioawk||support of several common biological data formats, including optionally gzip’ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats|
|bio-vcf||new generation VCF parser|
|bcftools||contains all the vcf* commands|
|VCFtools||provide easily accessible methods for working with complex genetic variation data in the form of VCF files (Paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/)|
|VariantFiltration||Filters variant calls using a number of user-selectable, parameterizable criteria|
|PyVCF||Variant Call Format Parser for Python|
|vcflib||simple C++ library for parsing and manipulating VCF files|
|wormtable||write-once read-many table for large scale datasets (vcf2wt)|
|SnpSift||toolbox to filter and manipulate annotated files|
|gvcftools||Utilities to create and analyze gVCF files|