Research in the field of epigenetics has unveiled that methylation modifications are frequently present in mammalian genomes, primarily in the form of 5mC. This modification arises when the carbon atom at position 5 of cytosine C combines with a methyl group (CH3) under the influence of a methyltransferase enzyme. Such methylation modifications have demonstrated their influence on gene expression, thereby exerting regulatory control over cell fate and subsequently impacting disease onset.
During the initial phases of disease development, circulating free DNA (cfDNA) can be released through either autonomous cell lysis or the passive lysis of diseased cells recognized by the immune system. The released cfDNA, to a certain degree, encapsulates the epigenetic information pertaining to the origin of the disease. Recognizing the epigenetic cues carried by cfDNA in a timely manner holds paramount importance for the early screening of diseases.
In this context, mining methylated epigenetic information stands as a pivotal component of research related to cfDNA. The ability to discern and understand the epigenetic information embedded in cfDNA could prove instrumental in advancing early disease detection methods.
The biological significance of DNA methylation has been extensively explored through the analysis of genome-wide or simplified genome methylation sequencing. However, these methods face limitations when it comes to effectively studying the numerous non-coding regulatory elements within the mammalian genome. While Whole Genome Bisulfite Sequencing (WGBS) provides comprehensive coverage, it is both expensive and relatively inefficient. On the other hand, Reduced Representation Bisulfite Sequencing (RRBS) targets CpG-enriched regions but lacks coverage of enhancer subregions and CTCF binding sites beyond CpG islands.
To address these limitations, researchers have introduced a novel methylation sequencing approach known as cfDNA-reduced representation bisulfite sequencing (cfDNA-RRBS). This method enhances the coverage of promoter, enhancer, and CTCF binding sites, offering improved insights into the regulatory landscape. Notably, cfDNA-RRBS exhibits compatibility with low sample sizes and is applicable for single-cell analysis.
DNA extraction from tissues or cells was followed by rigorous quality control, involving agarose gel electrophoresis for assessing sample DNA integrity and detecting degradation. Total DNA amount was quantified using a Qubit fluorometer.
(1) Library Construction
Upon successful quality checks, 5-10 ng of genomic DNA underwent digestion with Msp I enzyme at 37°C for 3 hours. The resulting purified DNA was ligated with T4 DNA ligase to form 5'-methylcytosine synthetic junctions. Subsequent Bisulfite treatment converted unmethylated Cs to Us. The final library was obtained after amplification via Random hexamer primers and PCR.
(2) Library Quality Control
Post-library construction, preliminary quantification was conducted, with library dilution to 1 ng/μl. The Agilent 2100 system assessed insert size, ensuring it met expectations. Libraries with satisfactory insert sizes were accurately quantified (effective concentration >2 nM) using qPCR to guarantee quality.
After passing library inspection, various libraries were pooled based on effective concentration and the required data amount for the target downstream machine. Subsequently, Illumina sequencing was performed.
The raw downstream data encompasses junction sequences introduced during library construction and low-quality bases, necessitating filtration to eliminate them. This step is crucial as it enhances the number of reads aligned to the genome, thereby maximizing information retrieval. Random primers are excised from cfDNA-RRBS libraries due to their specific structure.
Following raw data filtration, an evaluation of sequencing error rates and examination of GC content distribution are conducted.