Isoform sequencing (Iso-Seq) is a single-molecule real-time sequencing technology developed by PacBio, which is used for the analysis and functional characterization of full-length transcripts. This technology can provide high-quality full-length transcript sequence data by directly sequencing RNA molecules without transcriptome assembly. This method is especially suitable for gene annotation, isomer discovery, alternative splicing, polyadenylation (APA) research and gene fusion detection.
Iso-Seq is a single molecule real-time sequencing technology based on PacBio platform, which is used for sequencing and analyzing full-length transcripts. Its core steps include the following key links:
Transformation from RNA to cDNA: First, RNA samples are transformed into cDNA, which is the basis of subsequent sequencing.
Construction of cDNA library: Insert cDNA fragments into library vectors suitable for sequencing, such as SMRTbell library format.
Sequencing: Sequel system is used for sequencing to generate high-quality long reading data.
Generating circular consensus sequence (CCS): Processing the original sequencing data by advanced algorithm of PacBio to generate high-quality circular consensus sequence.
Data analysis: Using special software tools (such as SMRT Link or TAGET), analyze the generated data, including identifying transcript isomers, splicing variations, polyadenylation sites and so on.
Shematic protocol of Iso-Seq (An et al., 2018)
Importance of a comprehensive guide for Iso-Seq experiments
Because Iso-Seq experiment involves many steps, from sample preparation to data analysis, each step needs precise operation and optimization. Therefore, it is very important to provide a comprehensive experimental guide to ensure the success of the experiment and the reliability of the results. For example, RNA quality control, optimization of library construction and adjustment of sequencing parameters need to be considered during sample preparation.
In terms of data analysis, PacBio recommended using SMRT Link software for polyploidy and isomer analysis, and further functional characterization by combining with community tools such as SQANTI, TAMA and LoReAn. In addition, for different types of samples (such as single cells or long reading samples), it is necessary to select appropriate kits and analysis procedures.
In order to improve the efficiency of data analysis and reduce errors, researchers need a systematic process to deal with Iso-Seq data. This includes quality control, splicing and annotation of original sequencing data, differential expression analysis and functional annotation. In addition, with the deepening of research, how to integrate Iso-Seq data with other omics data (such as RNA-Seq and protein omics data) to obtain more comprehensive biological insights is also an important direction of current research.
Services you may interested in
Want to know more about the details of Iso-seq? Check out these articles:
Iso-seq process is mainly to extract high-quality RNA from samples, then reverse transcribe to synthesize full-length cDNA, then fragment the cDNA, then connect it to sequencing adapter to construct sequencing library, then load the library into PacBio sequencer, obtain long reading sequence containing complete transcript information by using single molecule real-time sequencing technology, and finally use special analysis software for sequence correction, clustering and annotation to accurately identify and identify different transcript isomers.
A. Sample preparation
a) Methods and tools: RNA extraction usually adopts phenol chloroform method, silica gel column method or magnetic bead method. Phenol chloroform method is simple but easy to be polluted, silica gel column method has high purity but high cost, and magnetic bead method combines the advantages of rapidity, automation and high recovery. In addition, commonly used methods include the use of commercial kits, such as the Qiagen RNeasy kits, which are based on silica-membrane technology. These kits can efficiently isolate total RNA from various sample types, including tissues, cells, and blood. For plant samples, the CTAB (Cetyltrimethylammonium bromide) method is often employed, which is effective in dealing with the high levels of polysaccharides and polyphenols present in plant tissues. Tools like centrifuges are essential for separating the phases during extraction, and spectrophotometers are used to measure the concentration of the extracted RNA.
The workflow of RNA extraction (Griffith et al., 2015)
b) Quality evaluation: RNA quality can be evaluated by measuring A260/A280 ratio (ideal range is 2.0-0.2) and A260/A230 ratio (ideal range is 2.0-0.2) with spectrophotometer. The quality of the extracted RNA is vital for downstream applications. The RNA Integrity Number (RIN) is a widely used metric to assess RNA quality. It ranges from 1 to 10, with 10 indicating the highest quality. RIN values are determined using automated capillary electrophoresis systems, such as the Agilent 2100 Bioanalyzer. An RNA sample with a RIN value of 7 or higher is generally considered suitable for Iso-Seq. Samples with lower RIN values may result in degraded RNA, leading to incomplete or inaccurate sequencing results.
Quality control and size selection of RNA (Griffith et al., 2015)
a) Reverse transcription method: Using Smarter Total RNA-SeqKit (Clontech) or Kapa Hifi RNA-to-cDNA Synthesis Kit to synthesize cDNA. These methods reverse transcribe RNA into cDNA by random primers or oligo (dT) primers, and improve the amplification efficiency by using KAPA HiFi enzyme. Random hexamer priming is a more general approach that can bind to any region of the RNA molecule, making it suitable for a wide range of transcripts. Gene-specific priming is used when only specific transcripts are of interest. Reverse transcriptase enzymes, such as Superscript III, are commonly used in these reactions.
Schematic synthesis of cDNA (Griffith et al., 2015)
b) cDNA quality inspection: After cDNA synthesis, it is important to perform quality checks. This can be done by running the cDNA on an agarose gel to check for the presence of a smear or specific bands, indicating successful synthesis. Additionally, quantitative PCR (qPCR) can be used to assess the quantity and integrity of the cDNA. The cDNA should have a consistent concentration and be free from inhibitors that could affect downstream library preparation and sequencing steps. Check the size distribution of cDNA fragments to ensure that they are suitable for subsequent library construction.
Different size distribution of cDNA before and after size selection (Griffith et al., 2015)
B. Library construction
a) Size selection and amplification: For PacBio library construction, size selection is an important step. This can be achieved using techniques such as gel extraction or the use of size selection beads. The goal is to isolate cDNA fragments within a specific size range, typically around 1-10 kb for full-length transcript analysis. Then the library was amplified by KAPA HiFi PCR to increase the library concentration.The primers used in this amplification step are designed to add the necessary adapter sequences for sequencing on the PacBio platform.
b) Quality control step: Quality control of the constructed library is essential. This includes measuring the concentration of the library using methods like Qubit fluorometry, which provides a more accurate measurement of the DNA concentration compared to spectrophotometry. Additionally, the library is analyzed using capillary electrophoresis or other high-throughput methods to check for the presence of the correct size distribution of fragments. Any libraries that do not meet the quality standards may need to be reamplified or reconstructed.
The most common Iso-Seq library construction methods (Erwin et al., 2014)
a) SMRT cell loading: SMRT cell is the platform for sequencing on the PacBio system. Loading the SMRT cell involves carefully adding the prepared library to the cell, ensuring that the correct concentration and volume are used. The library is mixed with sequencing reagents, including polymerase and nucleotides, before being loaded into the cell, and each cell is loaded with about 100-200 ng of library.
b) Sequencing parameters: Several sequencing parameters need to be set up, such as the movie length, which determines how long the sequencing run will last. Longer movie lengths can increase the coverage and the likelihood of obtaining full-length transcripts, but they also increase the cost and time of the experiment. Other parameters include the polymerase binding time and the temperature settings, which are optimized to ensure accurate and efficient sequencing. PacBio Sequel II platform is recommended for sequencing, and the sequencing time is set at 180 minutes, so as to obtain high-quality long reading data.
C. Data generation and quality control
a) After the sequencing run, raw data is generated in the form of signal files. These files need to be processed to obtain the sequence reads. Initial checks of the raw data include assessing the signal quality, the number of reads generated, and the distribution of read lengths. Any obvious signs of poor signal quality or low read counts may indicate issues with the sequencing run or the library preparation.
a) Error correction is a necessary step in Iso-Seq data processing due to the relatively high error rate of the PacBio sequencing technology. Tools like Canu and Falcon are commonly used for error correction. These tools use algorithms to correct the sequencing errors by comparing multiple reads of the same region. After error correction, preprocessing steps such as adapter trimming and filtering of low-quality reads are carried out. Software like BBDuk can be used for adapter trimming, and tools like FastQC can be used to assess the quality of the preprocessed reads.
D. Data analysis
a) Recommended process and tools: Transcript assembly is the process of putting together the corrected reads to form complete transcripts. Pipelines like the PacBio Iso-Seq pipeline are designed specifically for this purpose. Tools such as StringTie and Cufflinks can also be used for transcript assembly. After assembly, isoform identification is carried out to identify different splice variants of the same gene. These tools use algorithms to compare the assembled transcripts and identify regions of alternative splicing.
a) Bioinformatics resources and databases: Functional annotation of the identified transcripts involves assigning biological functions to them. This can be done using resources such as the Gene Ontology (GO) database, which provides information on the molecular function, biological process, and cellular component of genes. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database is also useful for understanding the metabolic pathways and biological processes in which the genes are involved. For alternative splicing analysis, databases like the Human Splicing Database (HSD) can be used to compare the identified splicing events with known splicing patterns.
E. Data interpretation and visualization
a) Several tools are available for visualizing Iso-Seq results. Integrative Genomics Viewer (IGV) is a popular tool that can display the aligned reads, transcripts, and gene models. It allows researchers to visualize the alternative splicing events, the coverage of the reads, and the distribution of transcripts across different genes. Another tool, Circos, can be used to create circular visualizations of the data, which are useful for comparing multiple samples or for visualizing the relationships between different genes.
IGV screenshots of reads due to internal oligo-dT priming (Prech et al., 2020)
a) Once the data analysis is complete, it is important to report and share the Iso-Seq data. This can be done by publishing the results in scientific journals, along with the raw data deposited in public databases such as the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). Additionally, researchers can share their analysis scripts and processed data on platforms like GitHub, which allows other researchers to reproduce the analysis and build on the results. Clear and detailed reporting of the methods, results, and data sources is essential for the reproducibility and transparency of the research.
The above steps cover the complete Iso-Seq experimental process from sample preparation to data analysis. Each step combines the current mainstream technologies and tools to ensure the efficiency and accuracy of the experiment. For example, RNA extraction uses commercial kits to ensure purity and integrity. cDNA synthesis uses efficient SMARTer kit. Library construction combines size selection and amplification techniques. In the data analysis part, it is recommended to use tools such as LoRC and GMAP to assemble and annotate high quality transcripts. These steps provide a solid foundation for in-depth analysis of gene expression and transcriptome complexity.
Although Iso-Seq technology has obvious advantages, there are still many problems. During sample preparation, the quality of RNA extraction is unstable, which is easily affected by factors such as pollution, degradation or low purity, thus reducing the quality of sequencing data. In sequencing, it is common that the sequencing depth is insufficient, and it is difficult to obtain enough full-length transcriptome information because of short reading length or small sample size. In data analysis, it is difficult to detect isomers, which is limited by the incompleteness of genome reference sequence and the low expression level of isomers.
Common problems in Iso-Seq experiment
Best practices for successfully running Iso-Seq experiments
Through the above troubleshooting and best practice guidance, the success rate of Iso-Seq experiment can be effectively improved, and high-quality data support can be provided for subsequent research.
Iso-Seq protocol includes five main steps: the transformation from RNA to cDNA, the construction of cDNA to SMRTbell library, sequencing with Sequel system, generating consensus cyclic sequence (CCS) and finding isomers through Iso-Seq analysis. In addition, the technology also supports multiplex sequencing and single cell sequencing, which makes it have unique advantages in the study of complex samples and rare cell types.
Sample preparation is crucial as its quality impacts subsequent sequencing accuracy. High-quality RNA is essential; degradation or pollution can cause sequencing failure. In preparation, adding multiple A-tails and using an rRNA depletion kit following standard procedures is necessary.
Experimental design should be rational, considering sample type, sequencing depth, and target analysis. For plant research, choose tissues and time points based on gene expression. Include multiple strategies for better data coverage and accuracy.
Sequencing parameters like on-board concentration and time need optimization according to sample features and analysis requirements. Use quality instruments and reagents following the manufacturer's guide to enhance data quality.
Data analysis demands specialized tools like SMRT Link or TAGET to handle complex isomer data. Correct potential deviations and verify results with other technologies like RNA-seq. During the experiment, continuously monitor sample and sequencing quality. Check if the data meets expectations post-sequencing and adjust the scheme promptly. Ensure all data is complete after the experiment.
References: