The 1000 Genomes Project envisions to sequence genomes of a number of people from different populations and make this dataset publicly available to the scientific community as a comprehensive resource on human genetic variation. The project provides a thorough characterization of human genome sequence variation from the whole genomes sequences they have collected. This characterization of human genome sequence variation can be used as a foundation for further investigating the relationship between genotypes and respective phenotypes.
How this large a project works around the (still) high cost of deep sequencing whole genomes is interesting to note (Whole Genome Sequencing being: where multiple copies of the same genome are shred into pieces to be sequenced and then aligned with reference sequence to study variation in the sample compared to the reference sequence).
The 1000 Genome Project has chosen to take the route of “light sequencing” their collected whole genome sequences. This entails 4X coverage, (as mentioned on their website 1000 Genomes) compared to 30X or more coverage, leads to lowering the sequencing costs compared to deep sequencing methods. The project is designed in a manner where data across many samples will be combined to give efficient detection of most variants in the region of interest – and this explains that how in their point of view light sequencing of this large a data set seems more viable than deep sequencing a smaller data set.
The Project also considers the detection of such variants to frequencies as low as 1%. Considering the large sample data combining data from this big a sample of whole genomes can give accurate insight into the variants and genotypes for each sample which might not have been as effective with light sequencing on a smaller sample.
After reading their project description we can deduce that this data set which has now from the original goal of sequencing the 1000 Human Genomes been expanded to sequencing 2500 genomes would provide useful information which can be useful for research studies where groups can study variations in large samples and deduce information with comparison to disease samples. Selection and population structure are among loads of interesting aspects that can be studied given the large amount of data sequencing has made available. Interesting.