- Open Access
Novel epigenetic clock for fetal brain development predicts prenatal age for cellular stem cell models and derived neurons
Molecular Brain volume 14, Article number: 98 (2021)
Induced pluripotent stem cells (iPSCs) and their differentiated neurons (iPSC-neurons) are a widely used cellular model in the research of the central nervous system. However, it is unknown how well they capture age-associated processes, particularly given that pluripotent cells are only present during the earliest stages of mammalian development. Epigenetic clocks utilize coordinated age-associated changes in DNA methylation to make predictions that correlate strongly with chronological age. It has been shown that the induction of pluripotency rejuvenates predicted epigenetic age. As existing clocks are not optimized for the study of brain development, we developed the fetal brain clock (FBC), a bespoke epigenetic clock trained in human prenatal brain samples in order to investigate more precisely the epigenetic age of iPSCs and iPSC-neurons. The FBC was tested in two independent validation cohorts across a total of 194 samples, confirming that the FBC outperforms other established epigenetic clocks in fetal brain cohorts. We applied the FBC to DNA methylation data from iPSCs and embryonic stem cells and their derived neuronal precursor cells and neurons, finding that these cell types are epigenetically characterized as having an early fetal age. Furthermore, while differentiation from iPSCs to neurons significantly increases epigenetic age, iPSC-neurons are still predicted as being fetal. Together our findings reiterate the need to better understand the limitations of existing epigenetic clocks for answering biological research questions and highlight a limitation of iPSC-neurons as a cellular model of age-related diseases.
Induced pluripotent stem cells (iPSCs) offer a unique cellular system to investigate disease in human-derived cells. iPSCs are obtained by treating skin or blood cells with a set of core pluripotency transcription factors that reprogram the cells to a pluripotent state . Established iPSC lines have the capacity to be further differentiated into specific cell types, including neurons, when treated with the appropriate factors [2,3,4]. This is of particular interest for neuroscience, as the only alternative cellular model for human neurons are immortalized cell lines. Because immortalized cell lines retain some physiological properties of the cancerous cells they were derived from  they do not fully recapitulate the neuronal phenotype. iPSC-derived neurons (iPSC-neurons), on the other hand, express appropriate morphological and neurophysiological properties of neurons and, when subjected to distinct differentiation protocols, they can be differentiated into a wide range of specific neuronal subtypes . iPSCs and their neuronal derivatives have been widely used to research disorders of the central nervous system, including developmental disorders such as autism and schizophrenia and age-related diseases such as Alzheimer’s disease (AD) and Parkinson’s disease. However, the extent to which iPSCs and especially iPSC-neurons capture age-associated processes is not known, which is fundamental to the study of age-related diseases. Of specific relevance is the fact that pluripotent cells only occur during the early stages of mammalian development and the effect of differentiation from iPSCs towards neurons on the developmental or aging trajectory of the cellular model  has yet to be adequately profiled.
Epigenetic mechanisms, such as DNA methylation (DNAm), are chemical processes that stably regulate gene expression, and while they are sensitive to environmental stimuli they also underpin key developmental processes [8,9,10]. There has been much interest and success in capitalizing on these patterns of epigenetic variation to derive individual age predictions from a biological sample. Age predictors based on DNAm, known as “epigenetic clocks” or “DNAm clocks”, are widely used to predict the “epigenetic age” of a sample. Epigenetic age, defined here as age predicted by an epigenetic clock, correlates strongly with chronological age, albeit not perfectly, and it has been hypothesized that the deviations from this prediction, referred to as age acceleration, are meaningful in the context of disease [11, 12]. The most well-known epigenetic clock is the Horvath multi-tissue clock (MTC) which was developed using a large number of samples (n > 8000) from 51 different tissues and cell types . Overall, the MTC generates reliable predictions of chronological age for most sample-types, although there are potential biases when using Horvath’s clock in samples derived from certain tissues, especially the brain [14, 15]. To this end, a number of new DNAm clocks have been developed for specific tissue types, including whole blood  and cortex , which demonstrate more accurate predictions within the specified tissue. A less established refinement of epigenetics clocks is the application to specific developmental stages, with prenatal samples excluded or underrepresented in most training datasets. Recently, clocks were developed to predict gestational age (GA) of newborns, derived using pre- and perinatal DNAm data from blood samples  or placental samples . While existing epigenetic clocks have been shown to accurately predict age in either postnatal brain samples (predominantly middle and older age) or non-brain prenatal samples, these tools have not been thoroughly tested on prenatal brain samples, and it is unknown whether they are able to delineate the earliest stages of brain development.
Previous analysis applying the MTC to DNAm data generated from iPSCs and their corresponding primary cells from adult donors found that the induction of pluripotency reversed the aging process, predicting the epigenetic age of iPSCs to be close to zero (i.e. birth) but still postnatal . As human pluripotent cells only occur during prenatal development, we hypothesize that existing clocks are not sensitive enough to accurately predict iPSCs at prenatal developmental stages. The inability to accurately estimate age during this crucial stage of neurodevelopment limits our ability to profile changes in epigenetic age induced by the differentiation of iPSCs into specific cell-types using already established DNAm clocks. Here we present a novel DNAm clock developed using prenatal brain samples that accurately predicts fetal age, outperforming other DNAm clocks in neurodevelopmental samples. We then apply our clock to iPSCs, iPSC-derived neuronal progenitor cells (NPCs) and iPSC-derived neurons, as well as in other cellular stem cell models and derived neuronal cells, to characterize the epigenetic age of these cellular models before and during the differentiation process.
All statistical analyses were performed using R version 3.5.2 (https://www.r-project.org/) .
Development of the fetal brain clock (FBC)
Description of fetal brain samples
To develop and profile the performance of the fetal brain clock (FBC), we collated a dataset of 258 fetal brain samples (see Additional file 1: Table S1) of which 194 were processed by our group at the University of Exeter as described previously  and 64 were a subset (age < 0 years) of a publicly available dataset downloaded from the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/; GSE74193) . Of the samples processed in Exeter, 154 overlap with those included in  following additional outlier filtering by principal component analysis, where DNAm was quantified using the Illumina 450 K DNA methylation array. The other 40 samples represent additional samples where DNAm was quantified using the Illumina EPIC DNA methylation array using a standard protocol as previously described .
Data pre-processing and quality control
All datasets for which raw data was available were pre-processed following a standard quality control (QC) and normalization pipeline as described before  using either the R package wateRmelon  or bigmelon . Briefly, samples with low signal intensities or incomplete bisulfite conversion were excluded prior to applying the pfilter() function from the wateRmelon package, excluding samples with > 1% of probes with a detection P value > 0.05 and probes with > 1% of samples with detection P value > 0.05. This was followed by the exclusion of probes known to be affected by SNPs or known to cross-hybridize . QC was finished by quantile normalization using the dasen() function of the packages wateRmelon or bigmelon [22, 23]. To harmonize the age variable across datasets, age was converted into days post-conception (dpc), as it represents the most precise unit of age available across the datasets. Where age was provided as weeks post-conception it was transformed to days post-conception by dividing by 7, and where age was reported in (negative) years it was transformed to days post-conception by multiplying by 365 and adding 280. Of note, a few samples (15 out of 258) are actually defined as embryonic (GA < 63 dpc) and not fetal.
Fetal brain clock development
To create two separate datasets for the purpose of training and testing the FBC, 75% of the samples from each dataset were randomly assigned into a training dataset (n = 193, age range = 37–184 dpc, age median = 99 dpc), while the remaining 25% were included in the testing dataset (n = 65, age range = 23–153 dpc, age median = 99 dpc) (Additional file 3: Figure S1, Additional file 1: Table S1). There was no overlap in samples between the training and testing dataset. To simplify the FBC development, only probes available in all samples after QC were taken forward (n = 385,069 probes). To develop the fetal brain clock, we applied an elastic net (EN) regression model, using the approach described by Horvath , regressing chronological age against DNAm level of all available probes. The EN algorithm selects a subset of DNA methylation probes that together produce the optimal prediction of the outcome, in this case chronological age, by combining ridge and LASSO (Least Absolute Shrinkage and Selection Operator) regression. Briefly, ridge regression penalizes the sum of squared coefficients while LASSO penalizes the sum of the absolute values of coefficients. EN is a combination of both methods, where the user specifies the extent of the mixing of the two methods as a number between 0 and 1, in our application this was set to 0.5 . EN was implemented with the R package GLMnet . The shrinkage parameter lambda was calculated using tenfold cross-validation on the data, which resulted in a lambda of 3.27.
Statistical evaluation of FBC performance
To profile the performance of the FBC, we additionally tested three established DNAm clocks: Horvath’s multi-tissue clock (MTC) , Knight’s Gestational Age clock (GAC)  and Lee’s Control Placental epigenetic clock (CPC) . The clocks were applied using the agep() function of the wateRmelon package , where the default estimates the MTC and other clocks (here the GAC and CPC) can be estimated by providing the necessary coefficient and intercept values. The predictive accuracy of each clock was profiled in each dataset by two measures: Pearson’s correlation coefficient with reported chronological age and root mean squared error (RMSE). To investigate potential effects of sex on the predicted epigenetic age, linear models were fitted in the testing and validation datasets with FBC predicted epigenetic age as the dependent variable, chronological age and sex as main effects and an interaction between chronological age and sex.
Validation of the fetal brain clock (FBC) in additional datasets (fetal and adult cortex)
To further test the FBC, we used data from 96 additional fetal brain samples currently being assessed by our group (unpublished data), none of which overlapped with either the training or testing data described above, with DNAm quantified using the Illumina EPIC DNA methylation array. QC and normalization were performed as described above. We also included data from 33 fetal brain samples from two publicly available datasets on GEO (GSE116754 and GSE90871) [27, 28], where DNAm was quantified using the Illumina 450 K DNA methylation array. Pre-processing and QC for the publicly available datasets was not performed in our lab as no raw data was available. Age of all samples was converted to dpc as described above. The combined validation dataset has an age range of 42–280 dpc with a median of 112 dpc (Additional file 3: Figure S1, Additional file 1: Table S1). To evaluate the performance of the DNAm clocks in adult brain samples we utilized data from the Brains for Dementia Research (BDR) cohort previously generated by our group . Briefly, these data consist of 1221 samples from 632 donors (age range 41–104 years, median = 84 years), with DNA extracted from the prefrontal cortex (n = 610) and occipital cortex (n = 611). DNAm was quantified using the Illumina EPIC DNAm array, and were pre-processed using a standard QC pipeline as described in .
Statistical evaluation of FBC performance
The predictive accuracy of the FBC was profiled in each dataset by two measures: Pearson’s correlation coefficient with reported chronological age and root mean squared error (RMSE).
Testing of Fetal Brain Clock (FBC) in cellular samples
Five different DNAm datasets generated from iPSCs, iPSC-derived NPCs and iPSC-derived neurons were used to characterize epigenetic age of the neuronal cell model, details of which can be found in Additional file 1: Table S1. For two of these datasets (Imm, Price) DNAm data was generated by our lab in Exeter, where DNAm was quantified using the Illumina EPIC DNA methylation array. These were supplemented by three publicly available datasets, downloaded from GEO (Nazor, GSE31848, Sultanov, GSE105093, and Fernández-Santiago, GSE51921) consisting of Illumina 450 K DNAm array data [3, 4, 29]. References describing the origin of cell lines and the different methods used for cell culture and differentiation are listed in Additional file 1: Table S1. Pre-processing and QC for the Nazor and Fernández-Santiago datasets was not performed in our lab as no raw data was available.
iPSC—motor neuron samples
A dataset comprised of 23 cellular samples with two iPSC samples and 21 derived motor neurons was generated by our lab, with DNAm quantified using the Illumina EPIC DNAm array (Additional file 1: Table S1). These data were QC’d following the pipeline described above.
ESC – neuron samples
Two publicly available cellular datasets with NPCs and neurons derived from embryonic stem cells (ESCs) were downloaded from GEO (Nazor, GSE31848, Kim GSE38214) [29, 30]. Both datasets consist of data quantified using the Illumina 450 K DNAm array. Pre-processing and QC for both ESC – neuron datasets were not performed in our lab as no raw data was available.
Statistical comparison of cellular states
The FBC was applied to DNAm data for all cellular samples available. To test for differences in predicted epigenetic age between cell stages within each dataset, either two sample t-tests or ANOVA followed by Tukey HSD multiple comparison (when three cell stages were available), were used. To combine results across all iPSC—neuron datasets, a mixed effects linear model was fitted with predicted epigenetic age as the dependent variable, a fixed effect for cell stage represented as two dummy variables contrasting NPCs vs iPSCs and iPSC-neurons vs iPSCs as and a random effect (i.e. random intercept) for dataset.
Fetal brain clock outperforms existing DNAm clocks at predicting age of prenatal brain samples
We applied EN regression to genome wide DNAm data from a subset of available prenatal brain samples (n = 193; Additional file 1: Table S1 and Additional file 3: Figure S1) to develop the fetal brain clock (FBC). 107 DNAm probes were assigned non-zero coefficients and therefore were selected as the basis of the FBC (Additional file 3: Table S2). We found no overlap in the DNAm sites selected for the FBC and DNAm sites used in the other established clocks tested in our analysis. Testing the FBC clock in an independent test dataset of fetal brain samples (Additional file 1: Table S1 and Additional file 3: Figure S1) to evaluate its performance we found a strong linear relationship between chronological and predicted prenatal age (r = 0.80; Fig. 1A) with the majority of samples predicted within 15 days of their actual chronological age (RMSE = 14.84 dpc). To benchmark the performance of our clock, we compared it to three existing DNAm clocks: Horvath’s MTC , Knight’s GAC  and Lee’s CPC . These clocks were selected as they represent either the most well-established algorithm with the broadest applicability (MTC) or were specifically developed to predict pre- and perinatal gestational ages, albeit in non-brain tissue (GAC and CPC). Of note, the MTC only predicted 27 fetal brain samples (41.2%) as prenatal (dpc < 280) with a low correlation between chronological and predicted age (rMTC = 0.06). This correlation is much lower than those reported in the original manuscript when Horvath tested the clock in adult samples , highlighting the challenges with extrapolating clocks to samples which were not well represented in model development. By comparison, the GAC and CPC perform better than the MTC, although they have smaller correlation coefficients (rGAC = 0.52 and rCPC = 0.76) and are associated with a larger error (RMSEGAC = 21.32 and RMSECPC = 60.08) than the FBC. Interestingly, while the predictions from the GAC are more precise, it is not as effective at ranking the samples by age as the CPC. Taken together, these results demonstrate that our novel FBC outperforms existing clocks at predicting age in fetal brain samples, and therefore is the optimal tool available to profile the epigenetic age in models of neuronal development. When applying clocks to the training data, the three established clocks produce similar correlations and RMSEs as in the testing data. As expected, the predictions of the FBC in the training data are more accurate than the predictions in the testing data, reflecting overfitting of the model (Additional file 4: Figure S2).
We further tested the FBC in an independent prenatal brain dataset (n = 129, Additional file 1: Table S1), finding a stronger linear relationship between chronological age and predicted epigenetic age (r = 0.87, Fig. 2A) than in the test dataset (Fig. 1A), but a larger error rate (RMSE = 26.36 dpc). On closer inspection, we observed that this error is mainly driven by a subset of older samples (> 185dpc, Fig. 2A) in the validation dataset (Additional file 1: Table S1), that are older than any of the samples in the training data (Fig. 2). If we limit our analysis to the samples whose chronological age overlaps the range of ages used in the training data (37—185 dpc; n = 125) then the error is decreased to 18.94 dpc. The performance of the established clocks in the validation dataset is also comparable to their performance in the testing dataset, with smaller correlations and higher error compared to the FBC and no sample with a predicted prenatal age by the MTC (Fig. 2B–D).
Given our previous finding of divergent, sex-specific age trajectories at multiple DNAm sites during prenatal development , we tested whether the FBC performed differently between males and females in our testing dataset. Although this analysis initially indicated a significant difference in the correlation with age between males and females (PSex*Age = 0.0007), this relationship is likely driven by outliers. Indeed, a sensitivity analysis excluding the two samples with youngest and oldest predicted ages produced a non-significant result (PSex*Age = 0.081). Repeating the analysis in the validation dataset we find a small significant effect of sex on age in the full dataset (PSex*Age = 0.00179), which was driven by the samples older than 185 dpc, all of them being female. As samples older than 185 dpc produce inaccurate predictions, they could unfairly bias the analysis of potential sex effects and after removing the them from the analysis, there is no longer a significant effect of sex on the accuracy of the clock (PSex*Age = 0.95).
Fetal and gestational age clocks are not able to predict adult ages in adult brain tissue
All four clocks were additionally tested in an adult brain DNAm dataset (Additional file 5: Figure S3). As expected, the FBC performs poorly in this sample set, with all samples predicted as prenatal albeit at the older end of the spectrum of ages in the training data (range of predicted ages 115–170 dpc). In contrast the MTC performs the best (rMTC = 0.65, RMSEMTC = 20.11 years) as it is the only clock we considered that was developed using adult samples. As with the FBC, the GAC and CPC fail to produce predictions of postnatal age, again reflecting the fact that they were also constructed using data from pre- or perinatal samples.
Fetal brain clock captures differences in differentiation of cellular stem cell models towards neurons
Having demonstrated that our novel FBC is the optimal clock to profile age in prenatal brain samples, we applied it to DNAm data from multiple cellular studies to characterize epigenetic age in iPSCs and ESCs differentiating towards cortical neurons. All samples were estimated to have a fetal epigenetic age, regardless of cell stage, cell line origin or differentiation protocol. Furthermore, they were predicted to have a “young” fetal age (Fig. 3) with the iPSCs having a mean predicted age of 75.6 dpc (SD = 6.9 dpc; n = 59), the iPSC-NPCs having a mean age of 79.1 dpc (SD = 11.0 dpc; n = 8) and iPSC—neurons having a mean predicted age of 83.2 dpc (SD = 8.87 dpc; n = 31). To test whether the differentiation process had an effect on the epigenetic age predictions from the FBC we compared the estimated ages between iPSCs and neurons, observing significant differences in all datasets (Fig. 3A), with neurons being older than iPSCs. For the Imm dataset, which included proliferative NPCs as well as postmitotic neurons, we additionally found a significant difference between NPCs and iPSC-neurons (Δmean = 20.0 dpc, P = 0.00039), but not between iPSCs and NPCs (Δmean = 10.0 dpc, P = 0.24). In contrast, in the Nazor dataset, which only included iPSCs and NPCs, we did find a significant difference in predicted epigenetic age between iPSCs and NPCs (Δmean = 10.5 dpc, P = 0.00734). Meta-analyzing the data across the five studies including iPSCs and iPSC-derived NPCs and neurons, we found that iPSC-neurons were predicted to have a significantly advanced epigenetic age compared to iPSCs of about 2 weeks (Δmean = 13.35 dpc, P = 2.13e−11) but no significant difference was observed between iPSC and NPCs (Δmean = 4.33 dpc, P = 0.11).
As well as cortical neurons, we applied the FBC to DNAm data from a study on iPSCs and differentiated motor neurons. The motor neurons had a mean predicted epigenetic age of 79.82 dpc (SD = 6.82 dpc) which is slightly younger than the cortical neurons (meaniPSC-neurons = 83.2 dpc). When comparing the predicted ages between the motor neurons and the iPSCs they were originally derived from, we did not observe a significant difference (Δmean = 2.94 dpc, P = 0.825, Fig. 3C).
Finally, we tested for effects on epigenetic age through the process of differentiation from embryonic stem cells (ESCs) to NPCs and neurons, using two publicly available DNAm datasets (Additional file 1: Table S1). In both datasets we observed a significant increase of predicted epigenetic age from ESC to NPC (Fig. 3D). Additionally, in the Kim dataset, we were able to compare ESC derived neurons, and found a significant difference between ESCs and neurons (Δmean = 55.42 dpc, P = 0.00015, Fig. 3D) but no change in epigenetic age from NPC to neuron (Δmean = − 0.58 dpc, P = 0.95).
In this study we established a novel epigenetic clock, the fetal brain clock (FBC), to characterize the earliest stages of human neurodevelopment, and applied it to determine the epigenetic age of iPSCs and ESCs and their derived NPCs and neurons. Epigenetic clocks have been widely applied, including for the analysis of both in-vivo and in-vitro models of aging, where they have been shown to correlate with hallmarks of the aging process [31,32,33,34]. However, their application has predominantly been focused on studies involving adult samples. Given the lack of fetal brain samples in the development of existing DNAm clocks prior to this study, there was no optimal method for estimating the age of fetal brain samples from DNAm data, limiting the ability to characterize iPSC-derived neuronal models or other models of neuronal development. We show that, in two non-overlapping independent validation datasets, the FBC generates predictions that correlate strongly with chronological age in prenatal brain samples. Furthermore, it outperforms both a pan-tissue epigenetic clock (Horvath’s MTC), and epigenetic clocks focused on the same developmental stage, but based on DNAm profiled in different tissues (Knight’s GAC and Lee’s CPC) [13, 17, 18]. The FBC outperforms these clocks using both correlation and error statistics (RMSE), indicating that it is not only better at ranking the samples, but it generates more precise estimates.
As the FBC was predominantly trained on second trimester brain samples, with some first trimester samples, it made less accurate predictions when applied to third trimester samples and performed extremely poorly in adult brain samples. Altogether, this reinforces the findings of previous studies that have also demonstrated that the applicability of DNAm clocks is dependent on the characteristics of the data they were trained on, with the tissue and age range of the training samples of particular relevance [14, 15]. More specifically, we note that while the accuracy of a DNAm clock is typically decreased in tissues not represented in its training data, clocks are completely limited to predicting ages represented in the training data. If the true age of a tested sample lies outside of the age range of the training data, the clock is unable to provide an appropriate prediction with the magnitude of inaccuracy increasing as the true age of the sample becomes more extreme suggesting that, in general, age range is more critical than tissue when training a clock.
Previous epigenetic clocks have shown that the predicted epigenetic age of iPSCs is significantly lower than the predicted age of cells from which they were reprogrammed and also the chronological age of the donor at sample donation . The induction of pluripotency reprograms the epigenome, including at the loci used in the clock algorithm, ultimately leading to a younger predicted epigenetic age. However, in these analyses the predicted ages remain postnatal, which is unexpected as human pluripotent cells only occur during the early stages of human development and hence we hypothesized that, with an adequately calibrated clock, iPSCs would be expected to be estimated as being early fetal. Applying the FBC to five datasets of iPSCs and iPSC-derived NPCs and neurons, we found this to be the case. iPSCs were estimated as having a mean age of 75.6 dpc, fitting our hypothesis that they represent first trimester developmental stages. These results align with studies that have reported rejuvenation effects on the transcriptome, telomeres and mitochondria of iPSCs following reprogramming [35,36,37]. In addition, we profiled the effect on predicted epigenetic age following the differentiation of iPSCs towards neurons reporting a small but significant aging effect of 13 dpc. This developmental stage coincides with fetal neurogenesis  and suggests that while differentiation does induce an aging process, it does not accelerate iPSC-neurons to a postnatal state. Of note, Mertens and colleagues found that while iPSCs lose age related transcriptomic signatures, induced neurons (iNs; neurons directly reprogrammed from fibroblasts) keep their specific aging signatures . Therefore, it would be interesting to apply our FBC to iNs, iPSCs, iPSC-neurons and their corresponding somatic tissues to verify whether age associated DNAm differences are also preserved in iNs. In addition, we tested the FBC on ESC and ESC-derived neurons. Although our sample size was small, the results also suggested that the differentiation process induced a small aging effect as the ESCs were differentiated into neurons. Altogether, our results indicate that iPSC-neurons may have limited utility for the study of age-related brain diseases, like Alzheimer’s disease or other dementias, as many molecular processes related to an aging phenotype may not be recapitulated.
Epigenetic clocks have been utilized for a wide range of applications , with a predominant focus on exploring the biological meaning of deviations between chronological age and epigenetic age. As we have shown the cause of this deviation may result from the use of an inappropriate clock, therefore the FBC is a critical tool for assessing whether epigenetic age acceleration during neurodevelopment is associated with later life outcomes such as disease or in utero exposures (such as maternal smoking). The FBC also has utility for determining the specific developmental stage of a model of neurodevelopment recapitulates (e.g. brain organoids or cellular neuronal models) and how different exposures or genetic backgrounds may influence neurodevelopmental processes and aging.
While a strength of our study is the development of a bespoke clock to optimally profile the epigenetic age of human fetal brain samples, due to the training data predominantly containing second trimester samples, the FBC is most accurate for this period of neurodevelopment. We are confident that the FBC has correctly predicted fetal epigenetic ages for the cell lines included in our analysis as the age predictions of the vast majority of the stem cell models and their derived neurons were less than the median age in the training data. This indicates that the predictions are not confounded by saturation of the coefficients. Although we took advantage of previously published data to include all available samples appropriate for addressing our research questions, some of the group sizes, in particular the NPCs, were small and we were not powered to detect significant aging effects as a result of differentiation from iPSCs to NPCs in all datasets. Furthermore, across the different studies there was variation in the estimated prenatal age of each cell state; we hypothesize that these differences results from subtle variation in differentiation protocols, timepoints of cell collection or the definition of NPCs within the respective studies . Despite this, we are confident in the conclusions we report as these study specific effects were controlled for in our analysis.
In summary, we demonstrate that established DNAm clocks struggle to capture changes in epigenetic age during neurodevelopment and for precise predictions a bespoke clock trained on fetal brain data is required. Using the FBC to assess the epigenetic age of iPSCs and differentiated neurons, we found that iPSCs and derived NPCs and neurons reflect early prenatal developmental stages. Our findings question the suitability of the iPSC-neurons for the study of aging associated processes.
Availability of data and materials
Raw and normalized DNAm data for the human fetal brain samples used to test and train the fetal brain clock have been submitted to the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE58885, GSE74193, GSE157908, GSE116754 and GSE90871. DNAm data for the unpublished fetal brain are available from the corresponding author on request. DNAm data for the cellular datasets, are also available under accession numbers GSE158089, GSE105093, GSE51921, GSE31848 and GSE38214. DNAm data for the cellular dataset Bhinge is available from the corresponding author on request. The coefficients of the FBC are included within the article in Additional file 2: Table S2. The custom R scripts used for this study and to apply the FBC are available at https://github.com/LSteg/EpigeneticFetalClock.
Brains for Dementia Research
Control placental clock
Embryonic stem cell
Fetal brain clock
Gestational age clock
Gene Expression Omnibus
Induced pluripotent stem cell
Least Absolute Shrinkage and Selection Operator
Neuronal precursor cell
Root mean squared error
Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K, Yamanaka S. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007;131:861–72.
Telezhkin V, Schnell C, Yarova P, et al. Forced cell cycle exit and modulation of GABA A, CREB, and GSK3β signaling promote functional maturation of induced pluripotent stem cell-derived neurons. Am J Physiol Physiol. 2016;310:C520–41.
Fernández-Santiago R, Carballo-Carbajal I, Castellano G, et al. Aberrant epigenome in iPSC-derived dopaminergic neurons from Parkinson’s disease patients. EMBO Mol Med. 2015;7:1529–46.
Sultanov R, Lebedeva O, Arapidi G, Lagarkova M, Kiselev S. Methylation profile of induced pluripotent stem cells generated by integration and integration-free approaches. Data Br. 2018;17:662–6.
Gordon J, Amini S, White MK. General overview of neuronal cell culture. Methods Mol Biol. 2013;1078:1–8.
Arber C, Lovejoy C, Wray S. Stem cell models of Alzheimer’s disease: progress and challenges. Alzheimer’s Res Ther. 2017. https://doi.org/10.1186/s13195-017-0268-4.
Morgani S, Nichols J, Hadjantonakis AK. The many faces of pluripotency: in vitro adaptations of a continuum of in vivo states. BMC Dev Biol. 2017;17:1–20.
Monk D, Mackay DJG, Eggermann T, Maher ER, Riccio A. Genomic imprinting disorders: lessons on how genome, epigenome and environment interact. Nat Rev Genet. 2019;20:235–48.
Zeng Y, Chen T. DNA methylation reprogramming during mammalian development. Genes. 2019. https://doi.org/10.3390/genes10040257.
Smith ZD, Meissner A. DNA methylation: Roles in mammalian development. Nat Rev Genet. 2013;14:204–20.
Levine ME, Lu AT, Quach A, et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging. 2018;10:573–91.
Hillary RF, Stevenson AJ, McCartney DL, et al. Epigenetic measures of ageing predict the prevalence and incidence of leading causes of death and disease burden. Clin Epigenetics. 2020;12:115.
Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013. https://doi.org/10.1186/gb-2013-14-10-r115.
Shireby GL, Davies JP, Francis PT, et al. Recalibrating the epigenetic clock: Implications for assessing biological age in the human cortex. Brain. 2020;143:3763–3775.
El Khoury LY, Gorrie-Stone T, Smart M, et al. Systematic underestimation of the epigenetic clock and age acceleration in older subjects. Genome Biol. 2019;20:283.
Zhang Q, Vallerga CL, Walker RM, et al. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing. Genome Med. 2019;11:54.
Knight AK, Craig JM, Theda C, et al. An epigenetic clock for gestational age at birth based on blood methylation data. Genome Biol. 2016;17:1–11.
Lee Y, Choufani S, Weksberg R, et al. Placental epigenetic clocks: Estimating gestational age using placental DNA methylation levels. Aging (Albany NY). 2019;11:4238–53.
R Development Core Team. R: A Language and Environment for Statistical Computing. 2012.
Spiers H, Hannon E, Schalkwyk LC, Smith R, Wong CCY, O’Donovan MC, Bray NJ, Mill J. Methylomic trajectories across human fetal brain development. Genome Res. 2015;25:338–52.
Jaffe AE, Gao Y, Deep-Soboslay A, Tao R, Hyde TM, Weinberger DR, Kleinman JE. Mapping DNA methylation across development, genotype and schizophrenia in the human frontal cortex. Nat Neurosci. 2015;19:40–7.
Pidsley R, Wong CC, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013. https://doi.org/10.1186/1471-2164-14-293.
Gorrie-Stone TJ, Smart MC, Saffari A, Malki K, Hannon E, Burrage J, Mill J, Kumari M, Schalkwyk LC. Bigmelon: tools for analysing large DNA methylation datasets. Bioinformatics. 2019;35:981–6.
Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–9.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67:301–20.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.
Chatterton Z, Hartley BJ, Seok MH, et al. In utero exposure to maternal smoking is associated with DNA methylation alterations and reduced neuronal content in the developing fetal brain. Epigenetics Chromatin. 2017. https://doi.org/10.1186/s13072-017-0111-y.
Colunga T, Hayworth M, Kreß S, et al. Human pluripotent stem cell-derived multipotent vascular progenitors of the mesothelium lineage have utility in tissue engineering and repair. Cell Rep. 2019;26:2566-2579.e10.
Nazor KL, Altun G, Lynch C, et al. Recurrent variations in DNA methylation in human pluripotent stem cells and their differentiated derivatives. Cell Stem Cell. 2012;10:620–34.
Kim M, Park YK, Kang TW, et al. Dynamic changes in DNA methylation and hydroxymethylation when hES cells undergo differentiation toward a neuronal lineage. Hum Mol Genet. 2014;23:657–67.
Liu Z, Leung D, Thrush K, Zhao W, Ratliff S, Tanaka T, Schmitz LL, Smith JA, Ferrucci L, Levine ME. Underlying features of epigenetic aging clocks in vivo and in vitro. Aging Cell. 2020;19:e13229.
Horvath S, Raj K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat Rev Genet. 2018;19:371–84.
Wagner W. The link between epigenetic clocks for aging and senescence. Front Genet. 2019. https://doi.org/10.3389/fgene.2019.00303.
Ryan CP. “Epigenetic clocks”: Theory and applications in human biology. Am J Hum Biol. 2020;15:e23488.
Suhr ST, Chang EA, Tjong J, Alcasid N, Perkins GA, Goissis MD, Ellisman MH, Perez GI, Cibelli JB. Mitochondrial rejuvenation after induced pluripotency. PLoS ONE. 2010. https://doi.org/10.1371/journal.pone.0014095.
Mertens J, Paquola ACM, Ku M, et al. Directly reprogrammed human neurons retain aging-associated transcriptomic signatures and reveal age-related nucleocytoplasmic defects. Cell Stem Cell. 2015;17:705–18.
Lapasset L, Milhavet O, Prieur A, et al. Rejuvenating senescent and centenarian human cells by reprogramming through the pluripotent state. Genes Dev. 2011;25:2248–53.
Lui JH, Hansen DV, Kriegstein AR. Development and evolution of the human neocortex. Cell. 2011;146:18–36.
Mattis VB, Tom C, Akimov S, et al. HD iPSC-derived neural progenitors accumulate in culture and are susceptible to BDNF withdrawal due to glutamate toxicity. Hum Mol Genet. 2014;24:3257–71.
Bhinge A, Namboori SC, Zhang X, VanDongen AMJ, Stanton LW. Genetic correction of SOD1 mutant iPSCs reveals ERK and JNK activated AP1 as a driver of neurodegeneration in amyotrophic lateral sclerosis. Stem Cell Reports. 2017;8:856–69.
Cocks G, Curran S, Gami P, et al. The utility of patient specific induced pluripotent stem cells for the modelling of Autistic Spectrum Disorders. Psychopharmacology. 2014;231:1079–88.
Warre-Cornish K, Perfect L, Nagy R, et al. Interferon-γ signaling in human iPSC–derived neurons recapitulates neurodevelopmental disorder phenotypes. Sci Adv. 2020;6:9506.
This work was supported by a Simons Foundation (SFARI) grant (Grant Number: 573312) awarded to JM and a Medical Research Council (MRC) project grant (Grant Number: MR/R005176/1) to JM. The human embryonic and fetal material was provided by the Joint MRC Wellcome Trust Human Developmental Biology Resource (http://www.hdbr.org), which is funded by Medical Research Council grant G0700089 and Wellcome Trust grant GR082557. Adult brain DNAm data was generated using tissue from the Brains for Dementia Research (BDR) cohort, which is jointly funded by Alzheimer's Research UK and the Alzheimer's Society in association with the Medical Research Council. The South West Dementia Brain Bank is part of the Brains for Dementia Research program, jointly funded by Alzheimer's Research UK and Alzheimer's Society, and is also supported by BRACE (Bristol Research into Alzheimer's and Care of the Elderly) and the Medical Research Council. The Price cellular data was supported by grants from the European Autism Interventions (EU-AIMS): the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115300, resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution (JP, DPS); StemBANCC: support from the Innovative Medicines Initiative joint undertaking under grant 115439-2, whose resources are composed of financial contribution from the European Union [FP7/2007-2013] and EFPIA companies’ in-kind contribution (JP, DPS). G.M. further acknowledges support from EU-AIMS (European Autism Interventions)/EU AIMS-2-TRIALS, an Innovative Medicines Initiative Joint Undertaking under Grant Agreement No. 777394. In addition, G.M. was supported by the National Institute for Health Research (NIHR) Mental Health Biomedical Research Centre (BRC) at South London and Maudsley NHS Foundation Trust and King’s College London. G.S. was supported by a PhD studentship from the Alzheimer’s Society. N.J.B. was supported by the Medical Research Council Grant MR/L010674/2. N.D.A. was supported by Medical Research Council grant N013255/1.
Ethics approval and consent to participate
Ethical approval for the collection of human fetal brain tissue acquired from the Human Developmental Biology Resource was granted by the Royal Free Hospital research ethics committee under reference 08/H0712/34 and Human Tissue Authority material storage license 12220. Ethical approval for collection of human fetal brain tissue for the Medical Research Council Brain Bank was granted under reference 08/MRE09/38. Ethical approval for the work with methylomic data of human brain tissue from the BDR cohort was granted by the University of Exeter, College of Medicine and Health Research Ethics Committee under reference Mar20/D/009∆5. Ethical approval for the Price cellular samples is as follows: participants were recruited and methods carried out in accordance to the ‘Patient iPSCs for Neurodevelopmental Disorders (PiNDs) study’ (REC No 13/LO/1218). Informed consent was obtained from all subjects for participation in the PiNDs study. Ethical approval for the PiNDs study was provided by the NHS Research Ethics Committee at the South London and Maudsley (SLaM) NHS R&D Office.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: Table S1.
Summary of fetal, adult and cellular datasets.
Additional file 2: Table S2.
Intercept and coefficients of the Fetal Brain Clock.
Additional file 3: Fig. S1.
Histogram of age distribution. Chronological age of the fetal samples measured in days post-conception in A) training data, B) testing data and C) validation data.
Additional file 4: Fig. S2.
Comparison of predictions from the four DNAm clocks in the training data (n = 193). Shown are scatterplots comparing chronological age (x-axis; days post-conception) against predicted epigenetic age (y-axis; days post-conception) for A Fetal Brain Clock (FBC); B Horvath’s Multi Tissue Clock (MTC); C Knight’s Gestational Age Clock (GAC); D Lee’s Control Placental Clock (CPC) in the data used for training of the FBC. Where necessary, predicted age was converted to days post-conception. The black line indicates the identity line of chronological and predicted epigenetic age and represents a perfect prediction. Two statistics were calculated to evaluate the precision of each DNAm clock: Pearson’s correlation coefficient (r) and the root mean squared error (RMSE).
Additional file 5: Fig. S3.
Comparison of predictions from the four DNAm clocks in adult brain samples (n = 1221). Shown are scatterplots comparing chronological age (x-axis; years) against predicted epigenetic age (y-axis; years) calculated using A Fetal Brain Clock (FBC); B Horvath’s Multi Tissue Clock (MTC); C Knight’s Gestational Age Clock (GAC); D Lee’s Control Placental Clock (CPC) in an independent adult brain dataset. Where necessary, predicted age was converted to years, where 0 indicates birth. The black line indicates the identity line of chronological and predicted epigenetic age and represents a perfect prediction. Two statistics were calculated to evaluate the precision of each DNAm clock: Pearson’s correlation coefficient (r) and the root mean squared error (RMSE).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Steg, L.C., Shireby, G.L., Imm, J. et al. Novel epigenetic clock for fetal brain development predicts prenatal age for cellular stem cell models and derived neurons. Mol Brain 14, 98 (2021). https://doi.org/10.1186/s13041-021-00810-w
- Epigenetic clock
- DNA methylation
- Induced pluripotent stem cells
- iPSC-derived neurons
- Neuronal precursor cells
- DNAm clock