Currently, single-cell RNA sequencing has emerged as a prominent and timely subject. It offers invaluable insights that are not attainable through traditional bulk RNA sequencing, specifically when it comes to investigating developmental biology, tumor biology, immunity, and related fields. At the heart of single-cell sequencing lies T-sne dimensionality reduction and clustering techniques, which facilitate the exploration and analysis of data. However, it is crucial to emphasize that the success or failure of the entire analysis heavily relies on meticulous quality control measures conducted prior to these steps. In this article, we provide a comprehensive overview of single-cell quality control.
Various factors influence preferences in single-cell RNA sequencing, including:
By understanding and addressing these preferences, researchers can enhance the reliability and validity of single-cell RNA sequencing studies.
Prior to performing single-cell sequencing, it is essential to separate the cells effectively. Failure to do so within a limited timeframe can adversely impact cell integrity, potentially resulting in RNA leakage from the cells. Here are several important factors to bear in mind when isolating single cells from tissues:
Therefore, when analyzing clustering results, it is crucial to thoroughly examine whether there are genes that exhibit specific expression patterns in particular cell groups, which could be attributed to the cell separation experiment.
When it comes to cell sorting, we encounter several challenges, including:
To address these challenges, different strategies for sequencing single cells have been developed. It is crucial to carefully select the appropriate single-cell strategy for studying specific tissues. Furthermore, low cell quality or the presence of dead cells or cellular debris can result in multiple cells being encapsulated within droplets. During the subsequent data analysis, these droplets may either form a distinct cluster or appear enriched between two cell groups.
To determine the presence of droplets containing multiple cells, the following criteria are typically used:
Currently, several software tools are available to assist in identifying doublets, such as:
These doublet detection algorithms exhibit similarities in their approach and follow a basic principle consisting of the following steps:
Prior to conducting single cell sequencing, it is necessary to lyse the cells. The lysis conditions will vary depending on the cell tissues being studied. If the lysis conditions are excessively stringent, it will adversely impact the library preparation.
The efficiency of reverse transcriptase is of utmost importance. The dropout rate typically ranges from 60% to 90%. In cases where the same cell line is processed in the same manner but using two different libraries, the dropout rate may exhibit significant variation.
Every amplification step can introduce biases. Many single-cell transcriptome sequencing techniques utilize Unique Molecular Identifiers (UMIs) as a measure to help correct for amplification-induced biases. However, full-length transcriptomes such as SmartSeq2 lack UMIs, making it impossible to correct for amplification preferences using UMI-based methods.
Utilizing Spike-in RNAs, a collection of RNA transcripts with known sequences, the library construction process involved the addition of spike-in molecules at known concentrations. This set of spike-ins included:
Applications of Spike-ins:
Limitations of Spike-ins:
Despite their utility, spike-ins still differ from endogenous genes, particularly in terms of amplification preference. This disparity must be taken into consideration when interpreting the results. Furthermore, spike-ins are generally not utilized in drop-seq methodologies.
Typically, the checkpoints for quality control (QC) include the following:
A low ratio or a low number of reads can be attributed to issues with library construction. A low number of reads may result from an increased formation of primer dimers, while a low ratio is typically indicative of problems during library construction.
The absence of spike-in RNA sequences directly indicates failure in library construction. However, if the spike-in is normal and the cell exhibits a low number of RNA sequences, it could be due to the small size of the cell or damage to the cell before library construction.
The number of detected genes is directly linked to the size of the cell. If an excessive number of genes (UMI) are detected, it is likely that multiple cells are present within the droplet. However, it cannot be ruled out that the cell itself is simply very large. As shown below, having too many or too few genes is not considered normal.
Generally, there is a positive correlation between cell size, spike-in RNA ratio, and the number of detected genes. Elevated levels of mitochondrial RNA also indicate a broken cell. When a cell breaks, cytoplasmic RNA is released, but mitochondrial RNA remains encapsulated within the mitochondrial membrane. Therefore, when the cell membrane is damaged, the percentage of mitochondrial RNA becomes elevated. Note: This phenomenon can also occur during apoptosis or necrosis.
High levels of ribosomal RNA may indicate increased RNA degradation within the cell. In full-length single-cell transcriptomes, 3' preference can be utilized to identify substantial RNA degradation within the cell.
Usually, most of the cells will have the same trend, and we combine multiple metrics to remove some of the cells that don't qualify. So take a look at the distribution of the data before deciding which cells need to be filtered out.
Based on PCA this algorithm can also be used for QC to find cells that are clearly not clustered with other cells. These cells are considered to be the ones that do not meet the quality control standards.
The next step is to discuss how to filter the genes, for the vast majority of cases, we will not use all the genes to perform a downscaling analysis, so a gene set selection is needed.
The gene set is set based on:
Only the first few PCs are selected for t-SNE downscaling
One of the most challenging issues in single-cell RNA sequencing revolves around batch effects. Batch effects can manifest in various scenarios, such as:
To mitigate batch effects, it is essential to establish distinct quality control standards for different sample batches. One approach involves utilizing principal component analysis (PCA) to identify any conspicuous batch effects within the obtained results.