首页 | 本学科首页   官方微博 | 高级检索  
     


A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes
Authors:Jorge Iván Pérez-Rave  Juan Carlos Correa-Morales  Favián González-Echavarría
Affiliation:1. Grupo de investigación IDINNOV, IDINNOV S.A.S, Medellín, Colombia;2. Escuela de Estadística, Universidad Nacional de Colombia, Medellín, Colombia;3. Departamento de Ingeniería Industrial, Universidad de Antioquia, Medellín, Colombia
Abstract:The hedonic price regressions have mainly been used for inference. In contrast, machine learning employed on big data has a great potential for prediction. To contribute to the integration of these two strategies, this article proposes a machine learning approach to the regression analysis of big data, viz. real estate prices, for both inferential and predictive purposes. The methodology incorporates a new procedure of selecting variables, called ‘incremental sample with resampling’ (MINREM). The methodology is tested on two cases. The first is data from web advertisements selling used homes in Colombia (61,826 observations). The second considers the data (58,888 observations) from a sample of the Metropolitan American Housing Survey 2011 obtained and prepared by a reference study. The methodology consists of two stages. The first chooses the important variables under MINREM; the second focuses on the traditional training and validation procedure for machine learning, adding three activities. In both test cases, the methodology shows its value for obtaining highly parsimonious and stable models for different sample sizes, as well as taking advantage of the inferential and predictive use of the obtained regression functions. This paper contributes to an original methodology for big data regression analysis.
Keywords:Regression analysis  real estate  machine learning  big data  variable selection
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号