We investigated the performance of the Infinium MethylationEPIC v2.0 microarray (EPIC v2.0) using a combination of fragmented (average sizes of 350, 230, 165, and 95 bp) and low-input (100, 50, 20, 10 ng) DNA material. Supplementary Fig. 1 shows the obtained fragment size distributions of the four DNA sample sizes. These distributions are represented by Gaussian-like curves with elongated tails extending towards longer DNA fragments (right skewed), with shorter DNA fragment samples exhibiting a narrower distribution. The obtained DIs for the DNA samples with average fragment sizes of 350, 230, 165, and 95 bp were 2.09, 3.44, 8.88, and 8008.45, respectively. Supplementary Fig. 2 illustrates the relationship between the average DNA fragment size and DI.
Quality Control of Degraded DNA Methylation DataIt was key to carry out a thorough QC procedure of the Illumina EPIC v2.0 microarray data to ensure reliable results given that samples of suboptimal DNA quality and quantity were employed.
Illumina EPIC v2.0 microarray utilizes beads with multiple copies of 50 bp oligonucleotide probes targeting specific loci in the genome. Once a DNA fragment hybridizes with its complementary probe, single-base extension (SBE) incorporates labeled nucleotides (green or red), which emits a signal. The intensities of the signals from probes targeting a given CpG site provide information about its DNAm level. Non-specific binding of negative control probes and fluorescence intrinsically associated with the microarray surface generate background signal intensities, which could interfere with the analysis if not properly removed. Figure 2A shows the relationship between the two mean background signal intensities (green and red) across all samples in the raw data. Degraded samples with average DNA fragment sizes of 350 bp, 230 bp, and 165 bp (except those with a 10 ng input) showed higher red than green background signal intensities. This imbalance was corrected after preprocessing with the pOOBAH algorithm (Supplementary Fig. 3), after which these samples clustered closely with the high-quality DNA control (grey dot). In contrast, degraded samples with an average DNA fragment size of 95 bp and 165 bp with 10 ng DNA input showed low background signal intensity before and after preprocessing (Fig. 2A and Supplementary Fig. 3), with at least one background signal (green or red) close to zero after preprocessing, indicating potential issues. Figure 2B shows the mean signal intensities of all probes across all samples in the raw data. The control sample exhibited a mean signal intensity of approximately 6,000, followed by a progressive decline in degraded samples with lower DNA quality and quantity. Degraded samples with average DNA fragment sizes of 95 bp and 165 bp with 20 ng and 10 ng input had mean signal intensities lower than 2,000, falling below the QC threshold for passing samples. However, after preprocessing, samples with average DNA fragment size of 165 bp and 20 ng input showed improved mean signal intensity (Supplementary Fig. 4A). The low mean red and green background signal intensity observed for degraded samples with a DNA fragment size of 95 bp and 165 bp with 10 ng DNA input was desirable, however, the mean signal intensity was low, which indicates poor DNA hybridization. Another important QC metric is the ratio of median red to green signal intensities, that represents the relative balance between the two fluorescent dye channels (green and red) used in the microarray. A balanced ratio (close to 1) indicates good microarray performance, while an imbalanced ratio suggests potential technical issues. In the raw data, this ratio ranged between 0.5 and 2 (QC thresholds) for all degraded samples (Fig. 2C). After preprocessing with pOOBAH, samples with 95 bp and 165 bp fragment sizes at 10 ng input deviated notably from a ratio of 1, again suggesting potential technical issues (Supplementary Fig. 4B). Additionally, we assessed the genotyping performance of the 65 SNP probes included in the EPIC v2.0 microarray for sample matching (Supplementary Table 1). When comparing the SNP genotypes of the degraded samples with the control sample, we observed an increase in discordant genotype calls with decreasing length of DNA fragment size and lower DNA input. Particularly, more than half of the SNP genotypes were discordant in samples with an average fragment size of 165 bp at 10 ng input, as well as samples with DNA fragment size of 95 bp.
Fig. 2Quality control (QC) of the obtained EPIC v2.0 data obtained. A Correlation of the mean (red and green) background signal intensities in the raw data. Diagonal dashed line represents a ratio of 1. B Mean signal intensity in raw data. The red dashed line represents the pass/fail QC threshold. C Ratio of red/green median intensity in the raw data. The red dashed lines represent the pass/fail QC thresholds. D Principal Component analysis (PCA) based on 6,160 shared CpGs among all samples (preprocessed with POOBAH algorithm), including samples that failed QC. E PCA based on 202,439 shared CpGs among samples that passed QC (preprocessed with POOBAH algorithm). F Probe detection rate and number of probes passing filters in degraded samples passing QC, preprocessed with the pOOBAH algorithm
We performed PCA using beta values of the 6,160 CpGs that were shared among all samples (including the samples that failed QC) and preprocessed with the pOOBAH algorithm (Fig. 2D). The aim was to assess potential outliers, batch effect, and degradation patterns (DNA fragment size and input amount) in the data. We observed that the control sample and the samples with average fragment sizes higher than 95 bp (except for the combination of 10 ng input and average fragment size of 165 bp DNA) clustered together, while the other samples were scattered across the PCA plane. PC1 and PC2 represented 44.41% and 19.12% of the total variance in the data, respectively, indicating that average DNA fragment size and input amount were the primary drivers of the variance. We also examined the β-value distribution in all samples. Degraded samples that clustered together with the control sample showed the expected bimodal β-value distribution shape (Supplementary Fig. 5). In contrast, degraded samples with average fragment sizes of 95 bp and 165 bp (with a DNA input of 10 ng) presented a deviation from the bimodal β-value distribution shape in both replicates. Based on these results, we excluded all DNA samples with an average DNA fragment size of 95 bp, and DNA samples with an average DNA fragment size of 165 bp and a DNA input of 10 ng. Thus, 23 out of 33 samples, including both control and degraded samples, passed QC and were included in subsequent analyses. Looking at the corresponding PCA plot considering the 202,439 common CpGs (Fig. 2E), degraded samples with an average DNA fragment size of 350 bp clustered together with the control sample, whereas samples with DNA fragment size of 230 bp and 165 bp were more spread out, indicating a different degradation pattern. PC1 and PC2 represented 20.25% and 8.94% of the total variance, respectively. Finally, considering the probe detection rate in samples passing QC and preprocessed with pOOBAH, 903,211 probes (99.9%) passed filtering for the control sample. A decrease in probe detection rate was observed with both shorter DNA fragment size and lower DNA input (Fig. 2F). Degraded samples with an average DNA fragment size of 350 bp and 100 ng DNA input had a probe detection rate of 90.3%, corresponding to 815,754 probes passing filtering. The most degraded samples to pass QC (average DNA fragment size of 165 bp and 20 ng DNA input) had the lowest probe detection rate (41.86.%) and the smallest number of probes passing filtering (378,116). We also investigated the failing probes across the different DNA fragment sizes and DNA input amounts. The SeSAMe pipeline masked 32,896 poorly designed probes, which were therefore excluded from this analysis. A total of 6,066 failing probes were common between all the samples passing QC (including the control sample), however, when considering only the degraded samples, we obtained 57,910 commonly failing probes. We calculated the number of common failing probes for each DNA fragment size and input amount. A total of 71,820, 132,622, and 173,396 common failing probes were detected in samples with average DNA fragment sizes of 350, 230, and 165 bp, respectively (Supplementary Fig. 6A). Regarding DNA input amounts, we obtained 67,563, 93,651, 130,661, and 166,795 shared failing probes for input amounts of 100, 50, 20, and 10 ng, respectively (Supplementary Fig. 6B). Lists with the names of the commonly failing probes across all samples, all degraded samples, DNA fragment sizes, and DNA input amounts are shown in Supplementary File 1.
Degraded DNA Methylation Data Correlations and Absolute DifferencesTo investigate the impact of low quality and quantity DNA samples on EPIC v2.0 microarray, we calculated the within-replicate correlation (precision) for each combination of average DNA fragment size and input amount preprocessed with the pOOBAH algorithm (Supplementary Table 2). Degraded samples with an average DNA fragment size of 350 bp and DNA input of 100 ng had within-replicate r value of 0.990 (804,732 shared CpGs). We observed a progressive decrease in r values and number of shared CpGs with shorter DNA fragment sizes and lower DNA inputs. The lowest within-replicate r value (0.851) and number of shared CpGs (259,031) was observed in degraded samples with average DNA fragment size of 165 bp and DNA input of 20 ng. All correlation tests had an adjusted p-value below 0.05. We also assessed the correlation between degraded samples and control sample (reproducibility) for each combination of average DNA fragment size and input amount. Figure 3 shows the r values obtained from 202,439 common CpGs among all samples passing QC and the control sample. First, the correlation between the control sample and sample with average DNA fragment size of 350 bp and DNA input of 100 ng was r = 0.995. For the rest of the degraded samples, we observed a progressive decrease in r values with shorter DNA fragment sizes and lower DNA inputs. The lowest correlation value, r = 0.946, was observed in sample with an average DNA fragment size of 165 bp and DNA input of 20 ng. A drop in r values (from 0.995 to 0.989) was observed in degraded samples with DNA fragment size of 350 bp when reducing the DNA input to 20 ng. Degraded samples with DNA fragment size of 165 bp showed a larger r value drop (from 0.971 to 0.946) (Supplementary Fig. 7).
Fig. 3EPIC v2.0 performance on degraded samples. A Pearson correlation coefficients (r) based on mean β-value of 202,439 common CpGs between control sample and degraded samples that passed QC. B Absolute β-value differences (|∆β|) distributions based on 202,439 common CpGs between control and degraded samples. Vertical black lines represent median values of the distributions
Furthermore, we aimed to estimate the deviation of β-values from the control sample (accuracy) in all degraded samples that passed QC. Quantifying the impact of sample degradation on DNAm measurement could provide a criterion for determining whether to proceed with DNAm analysis. Therefore, we calculated the absolute difference of β-values (|∆β|) between the control and degraded samples, using mean β-values between sample replicates that passed QC. A distribution of |∆β| is generated for each pair of degraded samples (Fig. 4). All distributions showed a peak close to 0 and elongated tails towards higher values of |∆β|. We observed a lower median |∆β| and percentage of probes with a |∆β| ≥ 0.05 and 0.10 for samples with larger average DNA fragment sizes and higher DNA inputs (Supplementary Table 3). Samples with average DNA fragment size of 350 bp and DNA input amount of 100 ng had a median |∆β| of 0.016 and 1.95% of probes with a |∆β| ≥ 0.10. While samples with average DNA fragment size of 165 bp and DNA input amount of 20 ng had a median |∆β| of 0.051 and 31.70% of probes with a |∆β| ≥ 0.10. We further investigated |∆β| in β-value intervals of 0.1 to comprehend whether EPIC v2.0 measurements of CpG sites with intermediate β-values were more affected by low quality and quantity DNA samples. We observed higher |∆β| values for intermediate β-values intervals (0.1–0.9) compared to the extreme β-values intervals (0.0–0.1 and 0.9–1.0) in all DNA fragment sizes investigated (Supplementary Fig. 8). Samples with shorter average DNA fragment sizes and lower DNA inputs exhibited higher |∆β| in intermediate β-value intervals, while extreme β-values intervals were less affected by the quality and quantity of the DNA sample. We also calculated |∆β| between sample replicates and estimated median |∆β| and percentage of probes with a |∆β| ≥ 0.05 and 0.10. We observed an increase of these parameters with shorter average DNA fragment sizes and lower DNA inputs (Supplementary Table 4). Degraded samples with average DNA fragment size of 350 bp and DNA input of 100 ng had median |∆β| of 0.018 and percentage of probes with |∆β| ≥ 0.10 of 0.068 (804,732 shared CpGs). While for the sample with average DNA fragment size of 165 bp and DNA input of 20 ng, the median |∆β| was 0.03 and 0.23% of probes had |∆β| ≥ 0.10 (259,031 shared CpGs).
Fig. 4Age prediction accuracies. Mean absolute errors (MAEs) of predicted ages from degraded samples using four epigenetic clocks (BLUP, EN, Horvath, and skinHorvath clock)
Then, we investigated the type of probes and target regions that were more robust in low quality and quantity DNA samples. Illumina EPIC v2.0 microarray uses two types of oligonucleotide probes, Type I probes target methylated and unmethylated epialleles of a CpG site using two different probes, while Type II probes use a single probe able to target both methylated and unmethylated epialleles. We calculated the proportion of Type I and II probes in probes passing filtering for each DNA sample that passed QC. The relative proportion of Type I probes that passed filtering increased in samples with shorter average DNA fragment sizes and lower DNA input (Supplementary Fig. 9A), which indicated that these probes were more efficient than the Type II probes. We also assessed the genomic (CpG island, shore, shelf, and open sea regions) and gene (promoters, gene body, and intergenic regions) distribution of probes passing filtering (Supplementary Fig. 9B, C). We observed an increase in the proportion of successfully typed probes targeting CpG islands and gene promoters in samples with shorter average DNA fragment sizes and lower DNA input when compared to the control sample. To determine whether this enrichment was simply due to the higher retention of Type I probes (which are inherently enriched in CpG islands and promoter regions), we stratified the analysis by probe type. This revealed that the increase in the proportion of probes passing filtering targeting CpG islands and promoters was evident within both Type I and Type II probes (Supplementary Figs. 10 and 11).
Age Prediction of Degraded SamplesFinally, we wanted to test the impact of DNA fragment sizes and DNA inputs on a DNAm application. We chose to estimate DNAm age because it is a potential analysis of interest in the medical and forensic fields. We calculated MAE for the control and degraded samples using four epigenetic clocks: BLUP (319,607 CpGs), EN (514 CpGs), Horvath (353 CpGs), and skinHorvath (391 CpGs) clocks. The control sample had an absolute error (AE) of 0.19, 1.83, 8.77, and 0.21 years for BLUP, EN, Horvath, and skinHorvath clocks, respectively. For the degraded samples, we obtained higher MAE in all samples, especially for the BLUP, EN, and skinHorvath clocks (Fig. 5). We observed a general increase in MAE with shorter DNA fragment size and lower DNA input amounts in all epigenetic clocks tested, indicating an impact of sample degradation on DNAm age prediction analysis. Samples with average DNA fragment size of 350 bp had a MAE < 10 years for all DNA inputs and epigenetic clocks, except for the age predicted with the Horvath clock in 20 ng DNA input samples. In contrast, degraded samples with average DNA fragment size of 165 had MAE > 10 years, except for age predicted from 20 ng DNA input samples with the BLUP clock and the 50 ng DNA input sample with the skinHorvath clock. Supplementary Table 5 shows the number of missing CpGs in each epigenetic clock across all samples that passed QC. As expected, the number of missing CpGs in the four epigenetic clocks increased with shorter DNA fragment size and lower DNA input amounts.
Fig. 5Comparison of pOOBAH and ELBAR algorithm performance on degraded samples passing QC. A Probe detection rate and number of probes passing filters. B Pearson correlation coefficients (r) between degraded samples and the control, calculated using both algorithms. C Median absolute β-value differences (|∆β|) between degraded samples and the control, calculated using both algorithms. D Mean absolute errors (MAEs) of predicted chronological ages from degraded samples preprocessed with pOOBAH and ELBAR, using the BLUP clock. E MAEs of predicted ages from degraded samples preprocessed with pOOBAH and ELBAR, using the EN clock
ELBAR Algorithm Performance on Degraded SamplesThe ELBAR algorithm is a detection method specifically designed for low-input DNA samples. It aims to exclude only those probes fully dominated by background signal while retaining probes with true biological signal. We compared ELBAR with the established pOOBAH algorithm across our set of degraded DNA samples that passed QC. ELBAR consistently achieved higher probe detection rates in all degraded samples (Fig. 5A), detecting over 99.98% of probes in samples with an average DNA fragment size of 350 bp (at 100 ng, 50 ng, and 20 ng input) and 230 bp (at 100 ng and 50 ng input) (Supplementary Table 6). These results suggest that ELBAR retains nearly all probes, even in degraded samples. To assess the quality of the retained probes, we examined the whole-array r and the median|∆β|. Similar to pOOBAH, ELBAR showed a decline in correlation with decreasing average DNA fragment size and input amount (Fig. 5B; Supplementary Table 6). Nevertheless, samples with average DNA fragment size of 350 bp and 230 bp at 100 ng and 50 ng input maintained comparable correlation values, while a slight reduction was observed in samples with shorter fragments and lower input. Notably, all samples had an r above 0.93. Median|∆β| values were generally higher in degraded samples preprocess with ELBAR (Fig. 5C; Supplementary Table 6), indicating a modest reduction in precision compared to pOOBAH, likely due to the inclusion of probes with weaker signal intensities. Finally, we evaluated whether the probes retained by ELBAR algorithm could improve age prediction. Using the BLUP clock, ELBAR outperformed pOOBAH, yielding lower or comparable MAEs across all combinations of average DNA fragment size and input amount. However, this improvement was not observed for the other three epigenetic clocks (EN, Horvath, and skinHorvath), where no consistent performance differences were found between the two algorithms (Fig. 5D; Supplementary Fig. 12A, B). This may suggest that clocks relying on fewer, highly age-correlated CpGs may not benefit from ELBAR’s broader probe retention strategy.
Comments (0)