Genotypes detected using high-depth resequencing methods are undoubtedly the most comprehensive, but are currently too costly to apply to plant and animal breeding, especially for species with complex, large genomes. Low-coverage genome sequencing (lcWGS) is emerging as an effective alternative by sacrificing sequencing depth for greater genome coverage and larger sample size with the help of probabilistic statistical strategies.
The LcWGS strategy combines the advantages of RAD-seq, while avoiding the disadvantages. It can study the whole genome at the population level (considering both genome depth and breadth), while retaining the information of individuals, and the cost is comparable to both. Therefore, obtaining genome-wide genotypes by lcWGS combined with algorithms is a popular practice in recent years.
Features of low-coverage whole genome sequencing technology
LcWGS | WGS | Array | RAD-seq | |
Sequencing depth | low | high | - | high |
Number of variants | more | more | less | less |
New variant detection | yes | yes | no | no |
Accuracy | moderate | high | high | high |
Reference genome | yes | yes | yes | yes/no |
Cost | low | high | low | low |
Low-coverage genome sequencing (lcWGS) first performs whole-genome low-depth resequencing and variant detection for all individuals in a population, and then uses algorithms to infer and fill (Imputation) the missing genotypes based on the linkage disequilibrium (LD) between variants, and finally obtains high-density genetic markers at the whole-genome level for large-scale samples.
In recent years, LcWGS for large samples has been theoretically demonstrated to obtain genome-wide high-density SNP markers at a very low cost, which in turn increases the accuracy of QTL localization and better explores the genetic mechanisms of various diseases. LcWGS has also been used for association analysis and population genetic studies. The advantage of populating low-density data to the level of whole-genome sequencing for breeding value prediction was found to be highly dependent on the frequency distribution of causal mutations. The superiority of populated data under a neutral model was small, and the accuracy of genetic assessment using populated data could be improved by 30% when all causal mutation minimum allele frequencies were low.
The LcWGS pre-processing process is similar to WGS, but an important difference is the need to use genotypic probabilities to explain the probability of genotypic uncertainty, such as downstream analysis using the site frequency spectrum (SFS). lcWGS data analysis process, which uses genotypic probabilities to explain the genotypic A probabilistic framework for uncertainty. From allele frequency spectrum (SFS) to diversity statistics and FST, is the analysis process of ANGSD software. Other tools (e.g. ATLAS) can infer these statistics directly from GLs without prior use of SFS.
Workflow of lcWGS. (Lou et al., 2021)
There is no single set of experimental designs for low-depth resequencing that is suitable for all study purposes. Instead, the optimal design depends on the study's objectives, system, and budget. Given a budget, the main trade-offs for low-depth resequencing are sample size and sequencing depth. For example, allele frequency estimation, population structure analysis, and genetic differentiation between populations can be sequenced with more samples to obtain accurate results; allele frequency spectra (SFS), demographic inference using δaδi, Tajima'D, or LD absolute values require consideration of higher sequencing depths. Therefore, researchers must carefully consider which type of analysis is most important for the study objectives and find the appropriate balance. Synthesizing the results of our and previous studies, we provide some general guidelines for the design of low-depth resequencing experiments.
Low-coverage genome sequencing refers to the sequencing of a genome at a relatively low depth, typically resulting in incomplete coverage of the entire genome. While low-coverage sequencing has some advantages, it also comes with certain limitations.
Advantages of low-coverage genome sequencing:
Despite the many advantages of LcWGS, there are still shortcomings in the following areas:
Reference: