Request A Project Quote
Request A Project Quote

Pan Genome

The Introduction of Pan Genome Sequencing

The concept of the pan-genome encompasses the entirety of genes present across all strains within a given species, thereby offering a comprehensive perspective on genetic diversity that transcends the limitations of individual genome sequences. It includes two primary components:

  • Core Genome: This segment comprises genes ubiquitously present across all strains of the species. These core genes are typically associated with fundamental biological functions and primary phenotypic traits, thereby reflecting the evolutionary stability of the species. For instance, in Escherichia coli, core genes include those implicated in essential cellular processes such as DNA replication and transcription.
  • Variable (Accessory/Dispensable) Genome: In contrast, this segment encompasses genes that are found in only a subset of strains or individuals within the species. These accessory genes often encode for specific adaptations or unique traits, including factors such as antibiotic resistance or virulence. For example, in Streptococcus pneumoniae, accessory genes contribute to variations in pathogenicity and resistance profiles among different strains.

Through the comprehensive study of the pan-genome, researchers can gain insights into the full spectrum of genetic variability within a species, facilitating a deeper understanding of its evolutionary dynamics, functional adaptations, and potential response to environmental pressures.

Figure 1. Overview of the Solanum section Petota pangenome composition.Figure 1. Composition of a pan genome

The pursuit of pan-genome sequencing involves leveraging high-throughput sequencing technologies alongside sophisticated bioinformatics tools. This approach entails the meticulous construction of sequencing libraries followed by comprehensive sequencing of individuals, subspecies, or lineages within a species. Subsequent assembly of these sequences leads to the development of a pan-genome map. This map serves to enrich the genetic repository of the species and enables the investigation of crucial biological questions, contributing significantly to our understanding of genetic diversity and evolutionary processes.

Why is Pan-Genome Research Essential

In the extensive course of evolution, influenced by geographical and environmental factors, individual organisms develop highly unique genetic traits. A single reference genome is insufficient to encompass the complete genetic information of an entire species. In other words, relying solely on a single reference genome for studies on genetic domestication and variation can result in the loss of significant genomic content, as many unique sequences are not represented in the reference genome.

Moreover, the decreasing cost of genome sequencing has made pan-genome research increasingly viable in recent years. This reduction in cost facilitates the exploration of genetic diversity at a depth and scale that was previously unattainable. Consequently, the study of pan-genomes has become a burgeoning field, offering profound insights into the complexity of genomic information and the adaptive potential of species.

Difference Between the Pan-Genome and the Whole Genome

The concept of the whole genome pertains to the complete set of genetic material found within a single individual or strain, typically leveraged as a reference in genomic research. In contrast, the pan-genome amalgamates genetic information from multiple individuals or strains, offering a more holistic and comprehensive genetic landscape of the species.

Key Differences

  • Scope: The whole genome delineates the genetic architecture of a single strain. By contrast, the pan-genome encapsulates all genetic variants present across multiple strains, thereby capturing a broader and more inclusive spectrum of genetic diversity.
  • Resolution: Pan-genome analysis elucidates genetic variations that may remain undetected when relying solely on a single reference genome. This encompasses unique accessory genes that drive strain-specific traits and adaptations.

For instance, examination of the pan-genome of Mycobacterium tuberculosis reveals considerable genetic variability among different strains, a factor of paramount importance for understanding mechanisms of drug resistance and for the development of effective therapeutic interventions.

Advantages of Pan Genome

  • Enrich genome information of the species through sequencing the subspecies and individuals
  • Quickly find genes or structural variations of genes related to important traits based on study of variable genome
  • Study the differences within species from the perspective of unique gene sequences
  • Small population, cost and time efficient

Applications of Pan Genome

  • Species Evolution and Phylogenetics: Assists in tracing evolutionary relationships and genetic diversity among species.
  • Trait Discovery and Breeding: Identifies genes associated with desirable traits for improved crop breeding.
  • Pathogen Research: Reveals genetic variations in pathogens related to virulence and resistance, informing control measures.
  • Ecological and Environmental Studies: Provides insights into species' adaptations to different environments and ecological niches.

Pan Genome Workflow

The Workflow of GWAS.

Service Specifications

Sample Requirements
  • Sample Type: Genomic DNA;
  • Small fragment library: ≥1 μg;
  • 2 Kb to 6 Kb large fragment library: ≥20 μg;
  • 10 Kb large fragment library: ≥30 μg;
  • 20 Kb and 40 Kb large fragment libraries: ≥60 μg;
  • For whole-genome sequencing, the required amount of sample DNA is approximately 500 μg to 1 mg;
  • Sample Concentration: Small fragment library ≥30 ng/μl; Large fragment library: ≥133 ng/μl;
  • Sample Purity: OD260/280 ratio should be between 1.8 and 2.0.
  • All DNA should be RNase-treated and should show no degradation or contamination.
Note: Sample amounts are listed for reference only. For detailed information, please contact us with your customized requests.

Click
Sequencing Strategy
  • Simple genome (genome size <2Gb, heterozygosity <0.5%, repeated sequence ratio <50%);
  • Complex genome (genome size >2Gb, heterozygosity >0.5%, repeated sequence ratio >50%);
  • ONT long reads
  • ONT ultra-long reads
  • PacBio continuous long reads (CLRs)
  • PacBio HiFi reads
Bioinformatics Analysis
We provide multiple customized bioinformatics analyses:
  • Pan-genome build
  • Assembly evaluation
  • Genome annotation
  • Repeat sequence annotation
  • Gene function annotation
  • Non-coding RNA annotation
  • Comparative genomic analysis
  • Biological analysis
  • …and more
Note: Recommended data outputs and analysis contents displayed are for reference only. For detailed information, please contact us with your customized requests.

Analysis Pipeline

The Data Analysis Pipeline of GWAS.

Deliverables

  • The original sequencing data
  • Experimental results
  • Data analysis report
  • Details in Pan Genome for your writing (customization)

Partial results are shown below:

The GWAS Results Display Figure.

References

  1. Liu Y, Du H, Li P, et al. Pan-genome of wild and cultivated soybeans. Cell. 2020, 182(1):162-76.
  2. Chen J, Liu Y, Liu M, et. Pangenome analysis reveals genomic variations associated with domestication traits in broomcorn millet. Nature Genetics. 2023, 55(12):2243-54.

1. How many samples are required for pan-genome sequencing?

A minimum of two individuals or subspecies is essential to commence a pan-genome analysis. However, to achieve a more exhaustive understanding of genetic diversity within a species, a larger number of samples should be considered. The inclusion of additional samples significantly enhances the resolution of genetic variation and leads to a more comprehensive pan-genome map.

2. Is a reference genome required for pan-genome analysis?

The utilization of a reference genome in pan-genome analysis is not strictly obligatory but can be beneficial. A reference genome serves as an initial framework for gene annotation and comparative analysis. Nonetheless, the primary objective of pan-genome studies is to uncover genetic variations that transcend the scope of a single reference genome. Therefore, pan-genome analysis is designed to identify and incorporate genetic diversity beyond that represented by any single reference, ensuring a holistic capture of the species' genomic landscape.

3. What types of analyses are performed in standard pan-genome services?

Pan-Genome Assembly: Includes GC content analysis, sequencing depth analysis, and construction of super-scaffolds using reference genomes.

Pan-Genome Structural Annotation: Involves predicting core and accessory genomes, annotating repeat sequences, and identifying non-coding RNAs.

Pangenome analyses reveal impact of transposable elements and ploidy on the evolution of potato species

Journal: Proceedings of the National Academy of Sciences
Impact factor: 12.779
Published: July 24, 2023

Background

Potato (Solanum tuberosum) is a key global crop with rich genetic diversity. Originating in the Andean highlands, it spans multiple ploidy levels. This study presents the most comprehensive potato pan-genome, combining sequences from 296 accessions and identifying 132,355 pangenes. The analysis highlights genetic diversity, adaptation, and evolutionary relationships within the Solanum section Petota.

Materials & Methods

Sample Preparation

  • Potato 
  • Genome sequences of potato accessions

Method

  • Sequencing data download
  • Construction of the pangenome

Data Analysis

  • De novo assembly
  • Alignment
  • Annotation of the pangenome
  • PAV analysis
  • Gene frequency differences analysis

Results

1. The Pangenome of Solanum Section Petota

The pangenome comprises 296 potato samples with diverse origins and ploidy levels, including 154 Gbp of new and public assemblies, resulting in 514,888 contigs and 132,355 gene models.

Fig 1. Pangenome of the Solanum section Petota. (Bozan et al., 2023)Fig 1. The Solanum section Petota pangenome.

2. PAV Variation in Protein-Coding Genes

Variation in protein-coding genes across samples is influenced by ploidy and domestication, affecting gene numbers and selection.

Fig 2. Gene content variation among accessions in the Solanum section Petota pangenome, based on presence/absence variation (PAV). (Bozan et al., 2023)Fig 2. Gene content of the accessions included in the Solanum section Petota pangenome based on PAV.

3. Clustering the Accessions Based on Gene Content

Phylogenetic analysis and PCA identify distinct clades and subgroups, reflecting gene content variations.

Fig 3. Maximum likelihood phylogenetic tree derived from PAV data for the Solanum section Petota pangenome. (Bozan et al., 2023)Fig 3. A maximum likelihood phylogenetic tree constructed using PAV data for the Solanum section Petota pangenome.

4. Variation in Gene Content Distinguishes Clades and Subgroups

Gene content differences are notable between clades, highlighting evolutionary and adaptation traits.

5. TE Content in the Pangenome

The pangenome contains 75.5% transposable elements, with significant clade-specific variation and differences in TE content.

Conclusion

Future potato crop survival amid climate change relies on conserving biodiversity and integrating it into new cultivars. A pangenome of 296 accessions from 60 Solanum section Petota species reveals that PAV, especially TE variation, plays a role in speciation and shows increased TE content in in vitro propagated materials, similar to natural stress-induced TE activity.

Reference

  1. Bozan I, Achakkagari SR, Anglin NL, et. Pangenome analyses reveal impact of transposable elements and ploidy on the evolution of potato species. Proceedings of the National Academy of Sciences. 2023, 120(31):e2211117120.
For Research Use Only. Not for use in diagnostic procedures.
Featured Resources
Related Services
PDF Download
* Email Address:

CD Genomics needs the contact information you provide to us in order to contact you about our products and services and other content that may be of interest to you. By clicking below, you consent to the storage and processing of the personal information submitted above by CD Genomcis to provide the content you have requested.

×
Quote Request
! For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment.
Contact CD Genomics
Terms & Conditions | Privacy Policy | Feedback   Copyright © CD Genomics. All rights reserved.
Top