In BSA (bulked segregant analysis), individuals in a segregating population are grouped and mixed according to the phenotype of the target trait, and the individuals or strains in the population are divided into two groups according to the relative differences of the target trait, and then the DNA of the individuals or strains in the two groups are mixed separately to form a relative DNA mixing pool. The parental and offspring pools are sequenced, SNPs are detected, and the index values of the offspring pools are calculated based on the loci with pure differences between the parents, and the loci with larger differences are selected.
It is recommended to select parents with single trait differences and few heterozygous loci as much as possible, and it is also recommended that the differences between parents should not be too large, because if the parental differences are too large, it will lead to heavy false positives and easy to locate the false positive region; if you select two varieties that are directly related, it is easy to locate the candidate region because the parental differences are too small; for the parents of the target trait, there are various ways to obtain the parents. For the parents of the target trait, there are various methods to obtain the candidate genes, such as EMS mutagenesis, natural individuals, UV mutagenesis, all of which can be used to locate the candidate genes by BSA.
According to the process of population formation, population types are usually divided into natural and family populations. BSA trait localization is applied to locate the main effect of a single trait in a family population. The source of parents is important and there are various methods to obtain parents for the target traits, such as EMS mutagenesis, natural individuals, UV mutagenesis, T-DNA insertion, etc.
There are many ways to construct the family population, among which F2, BC and RIL are the common populations that can be used for BSA, the conventional F1 generation cannot be used for BSA trait localization because there is no trait segregation, DH can also be used for BSA trait localization, but it is not common due to the difficulty of construction.
(1) If the trait under study is an EMS-induced mutation trait, only two subgenerational pools (wild type + mutant) can be tested, or one parent (wild type) and one subgenerational pool (mutant) can be tested; (2) If the trait under study is a quantitative trait, it is recommended to measure two parents and two subgenerational pools. The effect of the number of samples on the analysis results was evaluated by the actual online project, in which the best results were obtained with four samples (i.e., two parents + two subgenerational pools), followed by one parent and two subgenerational pools, and if only two subgenerational pools were measured, the number of false positives would increase and many SNPs would be obtained.
Whether the DNA from different parts of the material affects the DNA sequencing is mainly based on two factors: first, the composition of the seed leather and endosperm, the seed leather usually comes from the mother, if the seed leather is mostly affected; second, the degree of purity, if the degree of purity is high, the parent and the offspring are not very different, so the effect is not significant.
It is recommended to extract DNA from each offspring sample separately and then mix equal amounts, which can reduce background noise and avoid systematic errors.
The size of the candidate region and the number of candidate genes are related to the size of the population, the size of the parental material differences, the characteristics of the target traits, the depth of sequencing, the genomic level of the analyzed species, and many other factors, which can be estimated by referring to the project experience or published literature.
You can adjust the confidence interval or select SNP and InDel markers in the BSA localization interval and perform local mapping to effectively reduce the localization interval.
(1) Validation of SNPs. (A) Convert the candidate SNPs into CAPS or dCAPS markers, i.e., perform restriction endonuclease recognition site analysis on the candidate SNPs, screen out the SNPs that cause changes in the enzymatic recognition sites, amplify the fragments where these SNPs are located using the corresponding primers, and then perform enzymatic digestion and electrophoresis on the amplified products to convert the SNPs into CAPS markers; perform polymorphism analysis on the CAPS markers to verify the availability of the markers; (B) The candidate SNPs were amplified by PCR, and then the amplification products were verified by Sanger sequencing; (2) Validation of candidate gene expression across phenotypes using RT-PCR (3) Transcriptome-based differential expression analysis to see if there are significant differences in gene expression (4) RNAi analysis: using RNAi technology to specifically knock out or turn off the expression of specific genes.
There are many factors that lead to unsatisfactory localization, mainly the following: (1) large differences in genetic background between parents, there are many other differences in addition to the target trait, which creates a lot of interference in the analysis and makes it difficult to localize; (2) complex trait statistics, the target trait may be composed of several simple traits, which can be split and re-localized, and also the quantitative traits themselves are difficult to locate and have some uncontrollability; (3) the sequencing data are contaminated, and the comparison results can be checked by extracting part of the sequencing data to do blast comparison in the nr library; (4) the analysis method is not applicable, we can use ED method and SNP-index method for localization respectively and compare the localization results.
The parents should be as pure individuals as possible, which can be purified by self-crossing. The two parents should have significant differences in the target traits, but other traits should be consistent as much as possible to reduce the interference of later locus analysis.
Theoretically, as long as the offspring of crosses between parents with different target traits produce trait segregation can be used for BSA, but the more commonly used populations are: F2, backcross populations, recombinant self-incompatibility lines, etc. For quality traits, the ratio of dominant to recessive individuals in the offspring may be 1:1, 3:1, etc. For quantitative traits, the offspring traits should conform to normal distribution.
Yes, but it is not recommended. At present, the mainstream localization algorithms are ED method and snp-index method, among which ED method can be performed without parents, but the effect of such localization is definitely not as good as the experiment with two-parent data. It is recommended to re-hybridize to construct the population and save the parental DNA for later use.
Sampling of offspring should conform to the following principles: for locating qualitative traits, as many recessive individuals as possible should be taken, with a minimum of no less than 20, generally between 30-50, and then the corresponding number of dominant individuals; for locating quantitative traits, the most extreme 5%-10% of individuals for each trait are generally selected, and a histogram similar to the following can also be made of the trait data of the offspring:
It is difficult to do more than 200 offspring population for many species, there will be less than 20 extreme individuals, then we suggest rather to take less samples than to select the samples with intermediate traits in, which will only interfere with the later analysis. Of course, there is a certain risk whether the experiment with only ten or so samples per pool will yield satisfactory results.
The ideal state is to extract DNA from each daughter sample individually, and then mix them equally and evenly into one DNA according to the concentration of DNA, so that the amount of DNA information of each daughter is equal. However, due to the current low transaction price of sequencing projects, the general company practice is to let customers take equal amounts of tissue mix from each sample and then extract as a mixed pool DNA sample. Mixing and extracting equal tissue samples is not as effective as mixing after single sample extraction, but its impact is perfectly acceptable. Although mixing first and then extracting will result in unequal amounts of DNA from each sample, if the sampling is ideal enough, the genotypes of these samples at the target loci should be consistent, and the main impact is the composition of the background noise, which does not affect our ability to find the target loci later in the data analysis.
Although the specific requirements of each company are slightly different, the quality of DNA required for resequencing is not too demanding. What you need to do yourself is to run the agarose gel, observe whether the main band is clear and free of protein contamination, and also check the concentration and total amount with a spectrophotometer (it is recommended that the concentration is not less than 20ng/ml and the total amount is not less than 2 micrograms).
In order to ensure the accuracy of SNP and InDel markers, sequencing should ensure a certain depth, parental recommended not less than 20×, mixed pool should be combined with the number of samples to determine the average of each sample is not less than 1×, for example, the offspring is 30 add 30, then the sequencing depth of each offspring mixed pool can not be less than 30×, if funding allows, can then be appropriately deepened.
For Research Use Only. Not for use in diagnostic procedures.