Exons represent the segments of the genome capable of transcribing mature RNA. The entirety of exons within a genome collectively forms what is known as an exome. It is crucial to distinguish that the term 'whole-exome sequencing' specifically targets exons of protein-coding genes, with minimal involvement of non-coding genes.
The term 'gene' encompasses a sequence of nucleotides in DNA carrying specific genetic information. Genes are fragments of the DNA molecule with hereditary effects, serving as the fundamental units of inheritance that regulate biological traits. Human gene intervals vary in size, ranging from a few hundred base pairs (bp) to over 2 million bp. The Human Genome Project estimates that humans possess 20,000-25,000 protein-coding genes.
Recommended: What is the Human Genome Project (HGP)?
The 'genome' encompasses all the genetic information within an organism's DNA. It consists of gene regions and non-coding regions. The human genome's size is approximately 3 billion base pairs (bp) (3GB), with non-coding regions constituting the majority, while protein-coding regions make up only about 2%.
The exome constitutes the entirety of exons in the genome. Human exons number around 180,000, representing about 1% of the human genome, equivalent to approximately 30 million base pairs (30 MB).
Please read our article: Exome Sequencing Q&A for more information.
An important aspect to consider regarding exons involves the untranslated region (UTR). This includes both 5'UTRs (leading sequences) and 3'UTRs (tailing sequences) situated on either side of the mRNA. Their primary function is to regulate the initiation and termination of translation, respectively. While these UTRs comprise exonic sequences, it's crucial to note that they do not undergo translation into amino acids. Consequently, it's essential to recognize that not every exonic sequence translates into amino acids.
Please read our article The Methods of Whole Genome Sequencing for more details about WGS.
CD Genomics high-throughput sequencing and long-read sequencing platforms facilitate the robust analysis of exomes and genomes. This advanced sequencing approach allows for comprehensive and efficient examination of genetic material, providing valuable insights into the molecular landscape and potential biomarkers associated with various conditions.
Whole-exome sequencing (WES), also referred to as Exome Sequencing, Whole Exome Sequencing, and similar variations, is a sequencing method specifically designed to analyze the exome, encompassing all exons within the genome.
In contrast, whole-genome sequencing (WGS) involves sequencing the entire genome, providing a comprehensive view of an organism's genetic makeup. On the other hand, Targeted Sequencing, also known as panel sequencing, focuses on sequencing selected genes, typically ranging from a few dozen to a thousand genes. Panel sequencing operates on two technical principles: hybridization capture sequencing and multiplex amplicon sequencing. The all-encompassing approach is achieved through the utilization of sequence hybridization principles.
Recommended article: Whole Exome Sequencing Based on Hybridization Capture Protocol
Therefore, in terms of genome coverage, the hierarchy is as follows: whole genome sequencing > whole exome sequencing > targeted sequencing.
It's important to recognize that whole-exome sequencing can be considered a specialized form of targeted sequencing, as its focus is directed towards the entirety of exonic regions in the genome.
Table 1 Differences between WGS, WES and targeted NGS panels
WGS | WES | Panel | |
Sequencing Region | Whole genome | Whole exome | Selected regions |
Region Size | 3 G | > 30 M | Tens to thousands of genes |
Sequencing Depth | > 30X | 50-150X | > 500X |
Data | > 90 G | 5-10 G | - |
Detectable Variant Types | SNPs InDels CNV Fusion SV |
SNPs InDels CNV Fusion |
SNPs InDels CNV Fusion |
Recommended article: How to Decide Between 100X Whole Exome Sequencing (WES) and 30X Whole Genome Sequencing (WGS)?
The Whole Exome Sequencing (WES) workflow can be broadly categorized into three main stages: library preparation, sequencing, and bioinformatics analysis.
Library Preparation
Sequencing
Utilization of sequencing platforms, including foreign platforms such as Illumina and domestic platforms like those manufactured by UWI.
Bioinformatics Analysis
Please read our article: Bioinformatics Workflow of Whole Exome Sequencing.
When assessing the performance of an Exome Panel Capture Probe, several criteria play a crucial role. Whether opting for off-the-shelf probes or considering customization, careful consideration is essential. Here are key aspects to evaluate:
Probe Evaluation Criteria
Additional Considerations for Off-The-Shelf Probes
Customization Options
Special Features in External Probes
Commonly Used Metrics for WES Probes Evaluation
Evaluation of WES Probes commonly involves the following metrics. Of course, given that WES is a specific type of targeted sequencing, these metrics can also be used to assess hybridization capture probes used in other targeted sequencing.
On-Target Rate
The on-target rate is a crucial percentage indicating the extent to which sequencing data aligns with the target region. While exons are the primary focus, many genomic areas, such as introns and intergenic regions, share homology with exons. In practice, non-target (exon) regions captured during hybridization are considered off-target. Off-target data is deemed invalid and cannot be utilized in subsequent analyses, representing a waste of sequencing resources. A higher on-target rate and reduced off-target waste signify a more efficient probe.
Coverage
Coverage, often paired with depth (e.g., "10X coverage" or "30X coverage"), denotes the extent to which sequencing reads cover a given region. For instance, "10X coverage of 90%" indicates that 90% of the sequencing data covers the target region at least 10 times. If coverage is not specified with depth, it is interpreted as "1X coverage," implying that the region is covered by at least one read. Higher coverage and lower missed target percentages enhance the effectiveness of the probe.
Homogeneity
Homogeneity assesses the evenness of coverage across different sites within the target region. Ideal uniformity ensures that the depth at each site closely aligns with the average depth. Fold-80, a metric evaluating homogeneity, represents the additional sequencing required to ensure 80% of target bases reach the average depth. A lower Fold-80 signifies efficient capture, minimizing wasteful sequencing. Probes with excellent homogeneity contribute to cost-efficient and effective sequencing.
Duplication Rate
Duplication rate reflects the percentage of duplicate reads in the total sequenced sequence. Duplicate reads, devoid of additional information, are removed in downstream analysis to enhance mutation detection accuracy. A higher duplication rate reduces data utilization, leading to wasted sequencing costs. Lower duplication rates, in the same context, result in cost savings, indicating the efficiency of the probe.