首页 | 本学科首页   官方微博 | 高级检索  
     检索      


GenD An Evolutionary System for Resampling in Survey Research
Authors:Cinzia Meraviglia  Giulia Massini  Daria Croce  Massimo Buscema
Institution:(1) University of Eastern Piedmont, Alessandria, Italy;(2) Semeion Research Centre of Sciences of Communication, Rome, Italy;(3) University of Milan-Bicocca, Milan, Italy
Abstract:The paper is a preliminary research report and presents a method for generating new records using an evolutionary algorithm (close to but different from a genetic algorithm). This method, called Pseudo-Inverse Function (in short P-I Function), was designed and implemented at Semeion Research Centre (Rome). P-I Function is a method to generate new (virtual) data from a small set of observed data. P-I Function can be of aid when budget constraints limit the number of interviewees, or in case of a population that shows some sociologically interesting trait, but whose small size can seriously affect the reliability of estimates, or in case of secondary analysis on small samples. The applicative ground is given by research design with one or more dependent and a set of independent variables. The estimation of new cases takes place according to the maximization of a fitness function and outcomes a number as large as needed of ‘virtual’ cases, which reproduce the statistical traits of the original population. The algorithm used by P-I Function is known as Genetic Doping Algorithm (GenD), designed and implemented by Semeion Research Centre; among its features there is an innovative crossover procedure, which tends to select individuals with average fitness values, rather than those who show best values at each ‘generation’. A particularly thorough research design has been put on: (1) the observed sample is half-split to obtain a training and a testing set, which are analysed by means of a back propagation neural network; (2) testing is performed to find out how good the parameter estimates are; (3) a 10% sample is randomly extracted from the training set and used as a reduced training set; (4) on this narrow basis, GenD calculates the pseudo-inverse of the estimated parameter matrix; (5) ‘virtual’ data are tested against the testing data set (which has never been used for training). The algorithm has been proved on a particularly difficult ground, since the data set used as a basis for generating ‘virtual’ cases counts only 44 respondents, randomly sampled from a broader data set taken from the General Social Survey 2002. The major result is that networks trained on the ‘virtual’ resample show a model fit as good as the one of the observed data, though ‘virtual’ and observed data differ on some features. It can be seen that GenD ‘refills’ the joint distribution of the independent variables, conditioned by the dependent one. This paper is the result of deep collaboration among all authors. Cinzia Meraviglia wrote § 1, 3, 4, 6, 7 and 8; Giulia Massini wrote §5; Daria Croce performed some elaborations with neural networks and linear regression; Massimo Buscema wrote §2.
Keywords:evolutionary algorithm  resampling  neural networks
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号