首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
Authors:Hervé Cardot  David Degras
Institution:1. Institut de Mathématiques de Bourgogne, Université de Bourgogne Franche Comté, Dijon, France;2. Department of Mathematics, University of Massachusetts Boston, Boston, MA, USA
Abstract:Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical accuracy, computation time and memory requirements using artificial and real data. Extensions of online PCA to missing data and to functional data are detailed. All studied algorithms are available in the  package onlinePCA on CRAN.
Keywords:Eigenvalue decomposition  perturbation methods  stochastic gradient  generalised Hebbian algorithm  incremental SVD
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号