Spencer Farrell, Arnold Mitnitski, Kenneth Rockwood, Andrew D. Rutenberg
We have built a computational model for individual aging trajectories of health and survival, which contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable interaction network, where health variables are coupled by explicit pair-wise interactions within a stochastic dynamical system.
Aging is a high-dimensional process due to the enormous number of aspects of healthy functioning that can change with age across a multitude of physical scales. This complexity is compounded by the heterogeneity and stochasticity of individual aging outcomes. Strategies to simplify the complexity of aging include identifying key biomarkers that quantitatively assess the aging process or integrating many variables into simple and interpretable one-dimensional summary measures of the progression of aging, as with “Biological Age”, clinical measures such as frailty, or recent machine learning models of aging.
Materials and methods
We use waves 0–8 of the English Longitudinal study of Aging (ELSA) dataset , with 25290 total individuals. We include both original and refreshment samples that joined the study after the start at wave 0. In Table A in S1 Text. we list all variables used. In S1 Fig, we show the number of individuals for which the variable is available at different times from their entrance wave. Each available wave is used as a baseline state for each individual, see section for details.
We extract 29 longitudinally observed continuous or discrete ordinal health variables (treated as continuous) and 19 background health variables (taken as constant with age).
We have developed a machine learning aging model, DJIN, to predict multidimensional health trajectories and survival given baseline information, and to generate realistic synthetic aging populations—while also learning interpretable network interactions that characterize the dynamics in terms of realistic physiological interactions. The DJIN model uses continuous-valued health variables from the ELSA dataset, including physical, functional, and molecular variables.
Citation: Farrell S, Mitnitski A, Rockwood K, Rutenberg AD (2022) Interpretable machine learning for high-dimensional trajectories of aging health. PLoS Comput Biol 18(1): e1009746. https://doi.org/10.1371/journal.pcbi.1009746
Editor: Tatiana Engel, Cold Spring Harbor Laboratory, UNITED STATES
Received: August 7, 2021; Accepted: December 11, 2021; Published: January 10, 2022.
Copyright: © 2022 Farrell et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Our code is available at https://github.com/Spencerfar/djin-aging. The English Longitudinal Study of Aging waves 0-8, 1998-2017 with identifier UKDA-SN-5050-17 is available at https://www.elsa-project.ac.uk/accessing-elsa-data. This requires registering with the UK Data Service.
Funding: ADR thanks the Natural Sciences and Engineering Research Council (NSERC) for an operating Grant (RGPIN 2019-05888). KR has operational funding from the Canadian Institutes of Health Research (PJT- 156114) and personal support from the Dalhousie Medical Research Foundation as the Kathryn Allen Weldon Professor of Alzheimer Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: Author Arnold Mitnitski was unable to confirm their authorship contributions. On their behalf, the corresponding author has reported their contributions to the best of their knowledge. The authors have declared that no other competing interests exist.