Genome research encompasses various dimensions, primarily categorized into 1D, 2D, and 3D representations. In the 1D realm, researchers utilize linear mapping techniques to study genomic sequences. Moving into the 2D dimension, they delve into network analysis, particularly focusing on scale-free networks. Finally, the 3D dimension examines the structural and dynamic aspects of the genome.
Hi-C technology stands out as a powerful method for investigating the 3D structure of genomes. Derived from the fusion of High-Throughput Sequencing (HTS) and Chromosome Conformation Capture (3C), Hi-C offers insights into the spatial organization of chromatin within the nucleus.
Chromosome Conformation Capture (3C) involves a series of steps: fixation of nuclear chromatin, digestion of chromatin-protein cross-links, ligation of digests, release of bound proteins, and PCR analysis to detect interactions between DNA fragments. This method assumes that physically interacting DNA fragments exhibit higher linkage frequencies, which are identified through locus-specific PCR.
Hi-C takes this a step further by constructing chromosome-level assemblies of fragmented genomic sequences and determining their order and orientation on the chromosome. Additionally, Hi-C can be integrated with other omics data, such as RNA-Seq and ChIP-Seq, to elucidate gene regulatory and epigenetic networks underlying organismal traits.
Forms of complex genomic rearrangements: Chromoplexy is characterized by the exchange of larger fragments between chromosomes. (Schöpflin et al., 2022)
Cells undergo preparation and fixation through cross-linking with either formaldehyde or paraformaldehyde. This process preserves intracellular protein-DNA and DNA-DNA interactions, thus maintaining the 3D structure within the cell. For live samples, a typical treatment involves 1-3% formaldehyde for 10-30 minutes at room temperature. However, it's crucial to note that this step can hinder the efficiency of DNA sequence digestion by restriction endonucleases and requires precise control.
DNA is enzymatically cleaved using restriction endonucleases, generating sticky ends on both sides of the crosslinks. The size of the resulting fragments impacts sequencing resolution. Generally, two enzymes are available for selection: a 6 bp restriction endonuclease or a 4 bp restriction endonuclease. Enzymes such as EcoR1 or HindIII are utilized to cut the genome approximately every 4000 bp, resulting in around 1 million fragments within the human genome.
Main steps in the Hi-C protocol prior to sequencing. (Lun et al., 2015)
The fragmented DNA possesses either flat or sticky ends, which undergo repair to create blunt ends. During this process, biotin-labeled bases are introduced to facilitate subsequent DNA purification and capture.
End-repaired DNA fragments are looped between DNA segments containing interactions using T4 DNA ligase. Subsequently, proteins connecting the DNA fragments are digested to isolate the cross-linked fragments.
The DNA is de-crosslinked, purified, and fragmented into 300 bp - 700 bp fragments. Fragments containing interactions are then captured for library construction using strand-affinity magnetic beads. Ultrasound or similar methods are employed to further break down the fragments.
Biotin-containing fragments are captured using magnetic beads, libraries are constructed, and sequencing is carried out.
The data analysis process for Hi-C sequencing entails six critical steps:
Hi-C Assembly
Hi-C assembly is typically conducted using software such as LACHESIS, which segments, sequences, and orients the genome based on the support provided by valid read pairs. This process involves manual mapping and verification of the genome to obtain a final chromosome-level assembly.
Valid read pairs yield signals on the map, with the strength of the signal directly proportional to the spatial and sequence distance between contigs. This information allows for error correction, including the identification and correction of misassembled contigs, adjustment of contig orientations, and determination of contig placement within chromosomes through clustering.
Ultimately, a refined chromosome-level genome assembly is obtained, with any remaining discrepancies fine-tuned manually to ensure accuracy. The manual adjustment process aims to achieve a clear diagonal signal, indicative of a well-assembled genome.
In recent years, Hi-C technology has played a pivotal role in advancing genome assembly and understanding the three-dimensional (3D) structure of genomes across various organisms including humans, goats, mosquitoes, yeast, barley, and wheat. The successful assembly of chromosome-level genomes in these species underscores the reliability and versatility of Hi-C-assisted genome assembly technology.
Hi-C technology unveils the intricate three-dimensional structure of genomes, elucidating the hierarchical organization of chromatin from compartments (A/B Compartments) to topology-associated structural domains (TADs), and further to loops. This comprehensive understanding is crucial for studying spatial interactions among DNA sequences, constructing high-resolution chromosome 3D structures, deciphering gene regulation mechanisms, and facilitating the construction of trans-chromosomal genomes and chromosome-spanning haplotypes. Notably, genome 3D structures have been successfully reconstructed in various organisms including humans, Drosophila, yeast, Arabidopsis thaliana, rice, and cotton species. Comparative analysis of genome 3D structures across different samples has also been accomplished, shedding light on evolutionary and functional insights.
References: