Is Whole Exome Sequencing my best option?
Before you decide whether or not whole exome sequencing is the best sequencing method to complete your next project, let's fist take an in-depth look into what whole exome sequencing actually entails. As you well know, the human genome is made up of approximately 3 billion basepairs, but only 1% of those basepairs actually code for proteins. That 1% of our genome is what is known as the exome or "expressed genome". So the next logical question is, "Why should I be interested in sequenicng only 1% of the genome?" Well, as stated above, within the 1% of protein coding nucleotides are 85% of disease-causing variants. In short, whole exome sequencing provides a lot of bang for your buck. While it is true that next generation sequencing has driven the cost of whole genome sequencing down, there are still other factors to consider.
One of the more constraining factors when speaking of whole exome sequencing vs. whole genome sequencing is data storage. The simple fact is, whole genome sequencing, while more affordable, still requires a large amount of local storage space in order to complete any type of analysis. Take the Illumina HiSeq for example, on average, the fastq files generated for whole genome sequencing are going to require anywhere from 180GB to 220GB of data storage (30x coverage). Keep in mind that number is for one genome, so you figure for a realistic sample size of 100 genomes you are going to need ~ 20TB of storage...just for the raw data! So yes, the $1000 genome is now a reality, but so is the reality of measuring your data storage in GB a thing of the past. Additionally, once you've ironed out all of the data storage issues, then the real work begins; data analysis. The long and short of it is this; yes, whole genome sequencing is becoming more and more affordable, but that doesn't change the fact that there is a significant requirement for data storage, computing power, and time in order to utilize the data.
This brings us to whole exome sequencing. If you find yourself in the position of needing to identify genetic variants associated with a specific pathology, the odds are in your favor that whole exome sequencing will suit you just fine. In fact, because you are only sequencing 1% of the genome, you are now able to use those millions upon millions of reads to achieve a deeper coverage of the bases you are sequencing. Whole exome sequencing affords researchers the ability to identify those rarest of variants that may otherwise have been missed by whole genome sequencing. In comparison to 200GB of data storage for one genome, one exome will only cost you approximately 5Gb of data storage. If you were to sample the same 100 individuals as previously, but only sequencing the exome, you would bring your data storage needs down from 20TB to 0.5 TB. Similarly, as the amount of data is reduced, so is the computing power and time necessary to analyze the data set.