The duplication rate refers to the percentage of duplicate reads in the total sequenced sequences. The higher the dup rate, the lower the data utilization and the higher the wasted sequencing cost.
• Comprehensive Variant Detection: Whole genome resequencing allows for the detection of various types of genetic variants, providing a comprehensive view of the genome. This includes both common and rare variants across the entire genome.
• High Resolution: Resequencing provides a high level of resolution, enabling the detection of even subtle genetic variations.
• Discovery of Novel Variants: Whole genome resequencing can reveal previously unknown or rare variants that may be associated with specific diseases or traits, allowing for novel discoveries and insights into human genetics.
Resequencing is performed with a reference genome to identify genetic variants, while de novo sequencing and assembly are used when no reference genome is available to reconstruct the entire genome from scratch.
• dup values due to the samples themselves: small sample size, poor sample diversity (e.g. ctDNA, etc.), etc;
• dup values resulting from the library building process:
fragmentation: randomness, homogeneity and fragment size;
junction ligation: the higher the ligation efficiency, the better the molecular diversity, and the lower the dup rate
• PCR amplification: the GC content of the fragment is related to the amplification efficiency of the sample
• Effect of Cluster generation on dup: appropriate density of cluster generation
• Dup caused by optical resolution: errors in signal collection.
The choice of sequencing platform depends on factors like the experiment goals, organism, and available resources. Illumina is commonly used for resequencing and variant detection, while PacBio or Oxford Nanopore are better for de novo assembly and detecting large structural variants. Illumina is cost-effective, and long-read sequencing is useful for complex genomes. Consider project requirements to make an informed decision.
The required coverage for sequencing depends on various factors such as the specific goals of your experiment, the organism being studied, and the desired level of confidence in variant detection or genome assembly. Here are some general recommendations for coverage:
NGS (e.g., Illumina):
• Germline/frequent variant analysis: 20-50x coverage.
• Somatic/rare variant analysis: 100-1000x coverage.
• Tumor vs. Normal comparison: ≥60x coverage for tumor, ≥30x coverage for normal.
• Population studies: 20-50x coverage.
• De novo assembly: 100-1000x coverage. Long-read sequencing (e.g., PacBio):
• Gap filling and scaffolding: 10x coverage.
• Large structural variant detection: 10x coverage.
• Germline/frequent variant analysis: 20-50x coverage.
• De novo assembly: 50-100x coverage.
On the PacBio Sequel platform, average read lengths of 10-15 kb are typically achieved, with maximum read lengths reaching up to 60 kb. It's important to note that the actual read lengths obtained can vary depending on several factors, including the specific sequencing conditions, the quality of the DNA sample, and the library preparation method used.
Screening is generally done by frequency and functional impact in public databases, then by disease phenotype information or genetic pattern, and then factors such as protein hazard and conservativeness are considered.
Whole genome resequencing to analyze somatic mutations (tumor direction) requires paired samples, such as cancerous tissue and para-cancerous tissue from the same patient, or cancerous tissue and whole blood from the same patient.
There are two main ways to fragment DNA, physical fragmentation and enzymatic fragmentation. Physical fragmentation mainly uses mechanical fragmentation to break the genome randomly, such as ultrasonic fragmentation, aerosolization, etc. Enzymatic fragmentation mainly uses the shearing function of enzymes to fragment the genome. The enzymatic fragmentation method is simple to operate, but the physical fragmentation method has better fragmentation randomness.
Depending on whether PCR amplification enrichment is required during library preparation, there are two types of library preparation options: PCR-amplified and PCR-free.