MAQ is this very vesatile and popular tool for mapping of short read data (Solexa, SOLiD).
This is a very curious case. I used MAQ to call SNPs on a Solexa data of 6.5 million 36bp reads. I used maq map .. followed by maq.pl SNPfilter using all defaults. Interestingly, I trimmed the 1st two bases of each read and used the 24bp reads to map and call SNPs. This run aligned more reads to the reference and called fewer SNPs after the SNP filtering on MAQ data. Not stopping here, I trimmed even the last two bases of all reads. This left me with 32 base reads, off a 36 base original which had the 1st and last two bases clipped. This yielded even more mapped reads (~ 100k more than the original 36 base mapping) and even fewer final filtered SNPs.
Below is the VENN for the SNP calls, and interestingly, most of the exclusive ones are present in SNP calls of the other data-sets, but got filtered out by MAQ’s SNP filtering steps. SNP filtering takes place due to quality and indels. Trimming a few bases off each read can make all this difference? Fortunately, I do have a True Positive data here to compare, and decide which of these is the best solution, but how would one know in a new study?