Biomedical research and clinical data are often collected on the same sample of data at different points in time. These data are called “longitudinal data.” (See the definition by BLS.) When performing supervised learning (e.g., SVM) on data of this kind, the impact of time-varying correlation of the features on the outcomes / predictions may be blurred. In order to smoothing out the temporal effect of the features, changes to the original learning algorithms are necessary.
In a study conducted by Center for Information Technology (CIT), and National Institutes on Aging (NIA) in National Institutes of Health (NIH), with some clinical data as the training data, a longitudinal support vector regression (LSVR) algorithm was presented, and shown to outperform other machine learning methods. [Du et. al. 2015] Their results were published in IEEE BIBM (Bioinformatics and Biomedicine) conference. Their work is adapted from an earlier work by Chen and Bowman. [Chen & Bowman 2011] The dataset is a longitudinal, because it contains N patients with p features, taken at T points in time.
Traditional support vector regression (SVR) is to solve the following optimization problem:
where is a hyperplane surface, under the constraints:
However, in LSVR, the data points are more complicated. For each patient s, its features at time t is given by a vector . The first goal of LSVR is to assign each patient a T-by-p matrix , and a T-by-1 vector , with an unknown parameter vector such that the constraints becomes:
where ‘s are additional regularization parameters. The parameters ‘s can be found by iteratively quadratic optimization. The constraints are handled with Lagrangian’s multipliers.
For details, please refer to [Du et. al. 2015]. This way decouples, or smoothes out, the temporal covariation within the patients. A better prediction can be made.
- “What Are Longitudinal Data?” National Longitudinal Surveys, Bureau of Labor Statistics.
- W. Du, H. Cheung, C. A. Johnson, I. Goldberg, M. Thambisetty, K. Becker, “A Longitudinal Support Vector Regression for Prediction of ALS Score,” IEEE BIBM (2015). [ResearchGate Link]
- S. Chen, F. D. Bowman, “A Novel Support Vector Classifier for Longitudinal High-dimensional Data and Its Application to Neuroimaging Data,” Stat. Anal. Data Min. 4, 604-611 (2011). [ResearchGate Link]
- Division of Computational Biosciences (DCB), Center for Information Technology (CIT), National Institutes of Health (NIH).
- National Institute of Aging (NIA), National Institutes of Health (NIH).