首页 | 本学科首页   官方微博 | 高级检索  
     


Descriptive statistics of large data sets by scatter plots, an exploratory approach
Authors:W.J.J. Rey
Affiliation:Philips Research Laboratories P.O. Box 80.000, 5600 JA Eindhoven The Nefhedands
Abstract:In the analysis of large tables of M variables on N observations one is interested in the relations between the variables and it is usual to inspect the M(M-1)/2 scatter plots of N points. Clearly, the scatter plot approach relies on visual inspection and is to be preferred in so far as applicable to detect simple relations, namely when M is small. Other approaches are needed for large values of M .
We consider that only the relatively few scatter plots that present a 'structure' are of interest for an exploratory analysis and, by 'structure', we mean a domain of specially high local density in the plot. Based on this concept, we propose a method constructed around two steps: the selection of the possibly interesting pairs of variables and the validation of the corresponding scatter plots. The selection of the pairs results from an algorithm based on a binary partitioning tree. The validation of the corresponding scatter plots enables the production of only those where a structure is found the recognition of a structure is derived from a statistic based on the length of the Minimum Spanning Tree constructed on the N points of the candidate scatter plot.
For illustration, we report on an industrial application where the method is routinely applied for exploratory purposes.
Keywords:data screening    industrial application    minimum spanning tree    recursive partitioning tree    robust methods    scatter plot
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号