MAQ SNPfilter

MAQ is this very vesatile and popular tool for mapping of short read data (Solexa, SOLiD).

This is a very curious case. I used MAQ to call SNPs on a Solexa data of 6.5 million 36bp reads. I used maq map .. followed by SNPfilter using all defaults. Interestingly, I trimmed the 1st two bases of each read and used the 24bp reads to map and call SNPs. This run aligned more reads to the reference and called fewer SNPs after the SNP filtering on MAQ data. Not stopping here, I trimmed even the last two bases of all reads. This left me with 32 base reads, off a 36 base original which had the 1st and last two bases clipped. This yielded even more mapped reads (~ 100k more than the original 36 base mapping) and even fewer final filtered SNPs.

Below is the VENN for the SNP calls, and interestingly, most of the exclusive ones are present in SNP calls of the other data-sets, but got filtered out by MAQ’s SNP filtering steps. SNP filtering takes place due to quality and indels. Trimming a few bases off each read can make all this difference? Fortunately, I do have a True Positive data here to compare, and decide which of these is the best solution, but how would one know in a new study?

MAQ filtered SNP results
MAQ filtered SNP results



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s