Descriptive statistics of large data sets by scatter plots, an exploratory approach |
| |
Authors: | W.J.J. Rey |
| |
Affiliation: | Philips Research Laboratories P.O. Box 80.000, 5600 JA Eindhoven The Nefhedands |
| |
Abstract: | In the analysis of large tables of M variables on N observations one is interested in the relations between the variables and it is usual to inspect the M(M-1)/2 scatter plots of N points. Clearly, the scatter plot approach relies on visual inspection and is to be preferred in so far as applicable to detect simple relations, namely when M is small. Other approaches are needed for large values of M . We consider that only the relatively few scatter plots that present a 'structure' are of interest for an exploratory analysis and, by 'structure', we mean a domain of specially high local density in the plot. Based on this concept, we propose a method constructed around two steps: the selection of the possibly interesting pairs of variables and the validation of the corresponding scatter plots. The selection of the pairs results from an algorithm based on a binary partitioning tree. The validation of the corresponding scatter plots enables the production of only those where a structure is found the recognition of a structure is derived from a statistic based on the length of the Minimum Spanning Tree constructed on the N points of the candidate scatter plot. For illustration, we report on an industrial application where the method is routinely applied for exploratory purposes. |
| |
Keywords: | data screening industrial application minimum spanning tree recursive partitioning tree robust methods scatter plot |
|
|