Whole Exome Sequencing (WES) is a sequencing technique that specifically targets and analyzes the regions of the genome that are responsible for coding proteins.
These regions, known as exons, represent less than 2% of the human genome; however, they are of critical importance as they contain approximately 85% of all known genetic variants that are associated with various diseases.
WES provides a more focused and efficient approach in comparison to Whole Genome Sequencing (WGS), as WGS involves sequencing the entire genome, including the non-coding regions which may not be directly involved in disease development.
By concentrating only on the exome, WES allows researchers and clinicians to pinpoint disease-causing mutations with greater precision and less data complexity than WGS.
The technique is highly valuable in both research and clinical environments for the detection and analysis of genetic variants that are either common or rare.
Because a majority of disease-related mutations are located within the exome, WES serves as a powerful tool for diagnosing genetic disorders and for gaining a deeper understanding of the biological mechanisms underlying various diseases.
Principle of Whole Exome Sequencing
The principle of Whole Exome Sequencing (WES) is based on the selective capture and sequencing of exons, which are the specific regions of the genome responsible for encoding proteins.
Although exons constitute only a small portion of the entire genome, they harbor the vast majority of genetic variants that are linked to various diseases, making them critical targets for genetic analysis.
WES offers a faster and more cost-effective alternative to Whole Genome Sequencing (WGS), particularly when the goal is to identify genetic variants that may contribute to inherited disorders or complex conditions.
This targeted approach not only reduces the time and cost of sequencing but also simplifies the downstream data analysis by focusing only on the most relevant coding regions.
The process begins with exome enrichment, where special probes are used to selectively bind to and capture the exon sequences from a library of adapter-ligated DNA fragments.
These probes help isolate the protein-coding regions, allowing for the extraction of only the most relevant parts of the genome for further analysis.
The captured exonic DNA is then subjected to high-throughput sequencing using advanced platforms to generate a comprehensive dataset detailing the nucleotide sequences of the exons.
The resulting sequence data is analyzed to identify genetic variants—such as single nucleotide polymorphisms (SNPs) or insertions and deletions (indels)—that may alter protein structure or function.
By focusing on these functional regions, WES helps researchers and clinicians understand the genetic basis of various diseases and supports the diagnosis and study of both common and rare genetic conditions.
Process/Steps of Whole Exome Sequencing
Sample Preparation:
The Whole Exome Sequencing process begins with the extraction of high-quality genomic DNA from a variety of biological samples, which may include blood, tissue biopsies, buccal swabs, or other relevant sources.
The choice of extraction method depends on the type of sample and the required purity of DNA, with commonly used methods being the salting-out method and spin column-based techniques that help in isolating pure DNA efficiently.
Once the DNA is extracted, it undergoes quality and quantity checks to ensure it is suitable for sequencing. Tools like spectrophotometers (e.g., NanoDrop) and fluorometers (e.g., Qubit) are often used for quantification, while agarose gel electrophoresis or capillary electrophoresis systems assess the integrity and purity of the DNA.
High-quality DNA is essential for downstream steps, as degraded or contaminated DNA can negatively affect library preparation, enrichment efficiency, and sequencing accuracy.
Library Preparation:
After quality control, the next major step is library preparation, which is essential for making the DNA compatible with sequencing platforms.
The first part of this process involves fragmenting the extracted DNA into smaller pieces, which can be accomplished through mechanical shearing methods such as sonication or by enzymatic digestion using DNA fragmentase enzymes.
Once fragmented, the ends of the DNA are repaired by removing overhanging nucleotides and filling in or polishing the ends to create blunt-ended fragments, which are then ready for adapter ligation.
Special synthetic adapters are then ligated to both ends of the DNA fragments. These adapters contain sequences that allow the fragments to bind to the sequencing flow cell, enable PCR amplification, and incorporate platform-specific primer binding sites.
The final library is a collection of adapter-ligated DNA fragments, representing the entire genome, which are now ready for exon capture (enrichment).
Exome Enrichment:
This step involves capturing only the exonic (protein-coding) regions of the genome using hybridization-based methods.
The most commonly used technique is aqueous-phase hybridization capture, which involves using a library of biotinylated RNA or DNA probes designed to specifically bind to exonic sequences in the prepared DNA library.
When mixed with the library, these probes hybridize or bind to their complementary exon-containing DNA fragments.
After hybridization, the DNA-probe complexes are isolated using streptavidin-coated magnetic beads, which bind to the biotin molecules attached to the probes.
The captured exonic DNA is pulled out of the solution magnetically, and all non-specific or non-exonic regions of the genome that did not hybridize with the probes are washed away.
The enriched exonic fragments are then eluted (separated from the beads) and amplified using PCR to produce enough DNA for sequencing.
In addition to aqueous-phase capture, other enrichment techniques may be used, such as solid-phase (array-based) hybridization, where the probes are immobilized on a microarray surface, or polymerase-mediated capture, which uses primer extension instead of hybridization for exon selection.
Sequencing:
Once the exome has been enriched and amplified, the library is ready for high-throughput sequencing using platforms such as Illumina (e.g., NovaSeq, HiSeq), Thermo Fisher Ion Torrent, or BGI sequencers.
These platforms use sequencing-by-synthesis or semiconductor-based sequencing to generate millions of short reads from the DNA fragments.
Typically, paired-end sequencing is employed, where both ends of each DNA fragment are sequenced, producing two reads per fragment. This technique provides more accurate alignment, improves the detection of insertions/deletions (indels), and offers better mapping in repetitive or complex genomic regions.
The result of this sequencing step is raw sequence data, which consists of many short DNA reads that cover the exonic regions of the genome.
Data Analysis:
The raw data from the sequencing machine undergoes pre-processing and quality filtering before being analyzed.
Reads are first aligned to a reference genome (e.g., GRCh38/hg38) using alignment software tools such as BWA (Burrows-Wheeler Aligner) or Bowtie, which map each read to its corresponding location in the genome.
During this process, PCR duplicates—reads that originated from the same DNA fragment due to amplification—are identified and removed to avoid bias in variant calling.
After alignment, variant calling algorithms such as GATK (Genome Analysis Toolkit), FreeBayes, or SAMtools are used to identify genetic variants, including single nucleotide variants (SNVs) and small insertions/deletions (indels) within the captured exonic regions.
These variants are then filtered, annotated, and interpreted using specialized bioinformatics tools and databases such as dbSNP, ClinVar, OMIM, gnomAD, and Ensembl VEP to assess their potential pathogenicity or clinical significance.
Annotation helps in identifying whether a variant is benign, likely pathogenic, or disease-causing, and whether it affects protein structure, function, or splicing mechanisms.
This comprehensive analysis ultimately aids in diagnosing genetic disorders, discovering new disease-associated mutations, and contributing to clinical decision-making in both research and healthcare settings.
Advantages of Whole Exome Sequencing
Whole Exome Sequencing offers broad and detailed coverage of all protein-coding regions within the genome, which are the primary sites for mutations that cause diseases.
It generates a smaller and more manageable dataset compared to whole-genome sequencing, which significantly speeds up the analysis process and reduces the demands on data storage and computational resources.
By targeting all coding regions at once, WES enables efficient and simultaneous examination of the entire exome, making it highly effective for identifying genetic variants associated with diseases.
The technique is scalable and suitable for large-scale genetic screening, which is particularly valuable in research focused on discovering novel mutations and understanding the genetic basis of various conditions.
Whole Exome Sequencing is more affordable than whole-genome sequencing, making it a cost-effective and practical choice for both clinical diagnostics and a wide range of genetic research applications.
Limitations of Whole Exome Sequencing
Whole Exome Sequencing is limited to sequencing only the exonic regions of the genome, which means it overlooks potentially functional non-coding elements such as regulatory regions, promoters, and introns—this exclusion can result in missing clinically relevant variants that lie outside of the coding regions.
The method may encounter challenges related to sequencing depth and accuracy, such as uneven read coverage and errors during the alignment of reads to the reference genome, which can compromise the reliability of variant detection.
WES has reduced effectiveness in identifying structural variants like copy number variants (CNVs) and insertions/deletions (indels), as these are often difficult to detect accurately due to the uneven coverage and complexity of these types of alterations across different exonic regions.
The exome enrichment or capture step can introduce technical variability and bias, making it difficult to consistently capture all intended target regions; certain genomic regions may fail to hybridize effectively to probes, resulting in incomplete or missing data from important genes.
This technique may also reveal secondary or incidental findings—genetic mutations unrelated to the primary reason for testing—which can raise ethical concerns and complicate clinical interpretation and counseling, particularly when patients and providers must decide how to handle unexpected information.
Applications of Whole Exome Sequencing
WES is widely used for diagnosing genetic disorders, particularly in cases where multiple genetic factors contribute to the disease. It helps identify the genetic mutations responsible for complex conditions.
It plays a crucial role in prenatal screening by detecting fetal abnormalities, allowing for the early identification of potential developmental disorders or congenital conditions. This aids in understanding the inheritance patterns of genetic abnormalities.
WES is valuable in identifying inherited mutations that lead to Mendelian disorders, which are caused by mutations in a single gene, and can help in providing accurate diagnoses and family counseling.
The technique is instrumental in the study of novel and rare genetic variants linked to various diseases. This enables researchers to discover new potential therapeutic targets for drug development.
WES helps in pharmacogenomics, where it is used to investigate how genetic variations influence an individual's response to drugs. This knowledge contributes to the creation of personalized medicines tailored to specific genetic profiles.
It is extensively used in cancer research, particularly in identifying genetic mutations that drive cancer development. This allows for better understanding of tumor biology and the potential for targeted cancer therapies.
Whole Exome Sequencing (WES) vs. Whole Genome Sequencing (WGS)
Sequencing Coverage
Whole Exome Sequencing (WES): Focuses specifically on sequencing only the exonic regions of the genome, which code for proteins.
Whole Genome Sequencing (WGS): Sequences the entire genome, including both coding (exons) and non-coding regions, offering a more comprehensive approach.
Data Size and Complexity
Whole Exome Sequencing (WES): Produces smaller datasets, which simplifies data processing and analysis, reducing the computational and storage requirements.
Whole Genome Sequencing (WGS): Generates larger datasets due to sequencing the entire genome, requiring more computational resources and storage capacity.
Cost
Whole Exome Sequencing (WES): More affordable and cost-effective, making it a practical option for genetic analysis, especially in clinical settings.
Whole Genome Sequencing (WGS): More expensive compared to WES, as it sequences the entire genome, increasing costs.
Advantages
Whole Exome Sequencing (WES): Focuses on the exonic regions, where most disease-causing mutations occur, making it easier to identify mutations that directly affect proteins.
Whole Genome Sequencing (WGS): Provides a more thorough analysis by detecting mutations across the entire genome, including well-studied and lesser-understood regions, without the need for prior knowledge of the disease.
Limitations
Whole Exome Sequencing (WES): Does not capture non-coding regions, which means it may miss crucial regulatory mutations that could play a role in disease development.
Whole Genome Sequencing (WGS): Generates large amounts of data, which increases costs and complicates data analysis and interpretation.
Applications
Whole Exome Sequencing (WES): Primarily used in clinical settings to identify variants in protein-coding regions associated with various diseases, especially when focusing on specific genetic conditions.
Whole Genome Sequencing (WGS): Ideal for research purposes, enabling the discovery of novel variants, studying structural variations, and analyzing non-coding regions for insights into disease mechanisms.
Use Cases
Whole Exome Sequencing (WES): Best suited for targeted studies that focus on protein-coding regions, making it ideal for diagnosing genetic disorders caused by mutations in these regions.
Whole Genome Sequencing (WGS): Recommended for comprehensive genomic projects that require an in-depth analysis of the entire genome, with the capability to examine both coding and non-coding areas, and requiring more extensive resources.
References
Biesecker, L. G., Shianna, K. V., & Mullikin, J. C. (2011). Expert perspectives on exome sequencing. Genome Biology, 12(9), 128. https://doi.org/10.1186/gb-2011-12-9-128
Frost, A., & Campen, J. V. (2022, November 15). Whole exome sequencing overview. GeNotes. Retrieved from https://www.genomicseducation.hee.nhs.uk/genotes/knowledge-hub/whole-exome-sequencing/
Goh, G., & Choi, M. (2012). Utilizing whole exome sequencing to uncover disease-causing variants in inherited human diseases. Genomics & Informatics, 10(4), 214. https://doi.org/10.5808/gi.2012.10.4.214
Principles and workflow of whole exome sequencing – CD Genomics. (n.d.). Retrieved from https://www.cd-genomics.com/resourse-principles-and-workflow-of-whole-exome-sequencing.html
Rabbani, B., Tekin, M., & Mahdieh, N. (2013). The potential of whole-exome sequencing in medical genetics. Journal of Human Genetics, 59(1), 5–15. https://doi.org/10.1038/jhg.2013.114
Seaby, E. G., Pengelly, R. J., & Ennis, S. (2015). Understanding exome sequencing: A practical guide for its clinical use. Briefings in Functional Genomics, 15(5), 374–384. https://doi.org/10.1093/bfgp/elv054
Singleton, A. B. (2011). Exome sequencing: A groundbreaking technology. The Lancet Neurology, 10(10), 942–946. https://doi.org/10.1016/s1474-4422(11)70196-x
Teer, J. K., & Mullikin, J. C. (2010). Exome sequencing: The ideal approach before sequencing whole genomes. Human Molecular Genetics, 19(R2), R145–R151. https://doi.org/10.1093/hmg/ddq333
Wang, Z., Liu, X., Yang, B. Z., & Gelernter, J. (2013). The challenges and role of exome sequencing in human disease studies. Frontiers in Genetics, 4, 160. https://doi.org/10.3389/fgene.2013.00160
WGS vs WES: Choosing the right genetic sequencing method. – Novogene. (2023, March 10). Retrieved from https://www.novogene.com/amea-en/resources/blog/wgs-vs-wes-which-genetic-sequencing-method-is-right-for-you/
Whole Exome vs. Whole Genome Sequencing – CD Genomics. (n.d.). Retrieved from https://www.cd-genomics.com/resource-wes-and-wgs.html
Whole Genome Sequencing vs. Whole Exome Sequencing. (2023, August 3). Retrieved from https://www.psomagen.com/blog/whole-genome-sequencing-whole-exome-sequencing