期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Nonuniform DIF Detection using Discriminant Logistic Analysis and Multinomial Logistic Regression: A comparison for polytomous items

M. Dolores Hidalgo Juana Gómez 《Quality and Quantity》2006,40(5):805-823

This study focused on the effectiveness in nonuniform polytomous item DIF detection using Discriminant Logistic Analysis (DLA) and Multinomial Logistic Regression (MLR). A computer simulation study was conducted to compare the effect of using DLA and MLR, applying either an iterative test purification procedure or non-iterative to detect nonuniform DIF. The conditions under study were: DIF effect size (0.5, 1.0 and 1.5), sample size (500 and 1000), percentage of DIF items in the test (0, 10 and 20%) and DIF type (nonuniform). The results suggest that DLA is more accurate than MLR in detecting DIF. However, the purification process only improved the correct detection rate when MLR was applied. The false positive rates for both procedures were similar. Moreover, when the test purification procedure was used, the proportion of non-DIF items that were detected as DIF decreased for both procedures, although the false positive rates were smaller for DLA than for MLR. 相似文献

2.

Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel–Haenszel and logistic regression techniques

Aura-Nidia Herrera Juana Gómez 《Quality and Quantity》2008,42(6):739-755

In recent decades several methods have been developed for detecting differential item functioning (DIF), and many studies have aimed to identify both the conditions under which items may or may not be adequate and the factors which affect their power and Type I error. This paper describes a Monte Carlo experiment that was carried out in order to analyse the effect of reference group sample size, focal group sample size and the interaction of the two on the power and Type I error of the Mantel–Haenszel (MH) and Logistic regression (LR) procedures. The data were generated using a three-parameter logistic model, the design was fully-crossed factorial with 12 experimental conditions arising from the crossing of the two main factors, and the dependent variables were power and the rate of false positives calculated across 100 replications. The results enabled the significant factors to be identified and the two statistics to be compared. Practical recommendations are made regarding use of the procedures by psychologists interested in the development and analysis of psychological tests. 相似文献

3.

The Performance of Some Observed and Unobserved Conditional Invariance Techniques for the Detection of Differential Item Functioning

Jerry?Welkenhuysen-Gybels Email author 《Quality and Quantity》2004,38(6):681-702

The current research investigates the performance of some conditional and unconditional invariance measures of differential item functioning (DIF), namely: the log-linear model, the logistic regression model, the signed and the unsigned area, the SOS1 and the SOS3. A simulation study is used to evaluate their ability to detect uniform as well as nondirectional nonuniform DIF under several test conditions. The factors that were subject to experimental manipulation in the simulation study are the size of the DIF, the ability distribution of the focal and the reference group, the sample size, the proportion of DIF items and the length of the test. 相似文献

4.

Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test

María Ester Aguerri María Silvia Galibert Horacio Félix Attorresi Pedro Prieto Marañón 《Quality and Quantity》2009,43(1):35-44

This paper studies the Type I error rate obtained using the Breslow-Day (BD) test to detect Nonuniform Differential Item Functioning (NUDIF) in a short test when the average ability of one group is significantly higher than that of the other. The performance is compared with the logistic regression (LR) and the standard Mantel-Haenszel procedure (MH). Responses to a 20-item test were simulated without Differential Item Functioning (DIF) according to the three-parameter logistic model. The manipulated factors were sample size and item parameters. The design yielded 40 conditions that were replicated 50 times and the false positive rate at a 5% significance level obtained with the three methods was recorded for each condition. In most cases, BD performed better than LR and MH in terms of proneness to Type I error. With the BD test, the Type I error rate was similar to the nominal one when the item with the highest discrimination and difficulty parameters in the case of equally sized groups was excluded from the goodness-of-fit to the binomial distribution (number of false positives among the fifty replications of a Bernoulli variable with parameter equal to 0.05). 相似文献

5.

A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

Juana Gómez-benito María JosÉ Navas-ara 《Quality and Quantity》2000,34(1):17-31

Bias research began at the end of the 1960s anddeveloped rapidly in the following decades for obvioussocial and political reasons, and due to the importantimpact that this issue has on the field ofpsychological and educational measurement. Since then,several methods have been proposed for the study anddetection of item bias or differential itemfunctioning (DIF). This paper presents a simulationstudy comparing the potential of some of these methodsfor detecting DIF: two IRT-based techniques (area measures), three ²-based procedures (MantelHaenszel, Logit Model and Logistic Regression) and theRestricted Factor Analysis method. The results showthat the technique that appears to do the best job isthe Mantel Haenszel statistic. Moreover, all detectiontechniques tend to overidentify DIF items, that is,some of the items labeled with DIF may in fact bewithout DIF. This tendency is slightly reversed in theLogistic Regression procedure. 相似文献

6.

Item analysis of university-wide multiple choice objective examinations: the experience of a Nigerian private university

Jonathan A. Odukoya Olajide Adekeye Angie O. Igbinoba A. Afolabi 《Quality and Quantity》2018,52(3):983-997

Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for ‘doing it rightly’ is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112–1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized. 相似文献

7.

The use of positively and negatively phrased items and the fit of a factor solution

Gerard Maassen 《Quality and Quantity》1991,25(1):91-101

To avoid the response set phenomenon, in handbooks it is suggested to compose item batteries of both positively and negatively formulated items. However, while doing a factor analysis, researchers usually neglect the effect this may have on the fit of the factor solution. In that case, the fit between a plausibly interpretable factor solution and the observed correlations may prove to be very unsatisfactory. In our opinion, however when both positively and negatively formulated items are used, it seems unrealistic to base the factor analysis on the content of the items only; also the difference in phrasing may influence the intercorrelations of the items. In this paper we discuss some possible additional assumptions in consequence of this consideration and we examine to what extent the deriving analysis decisions may improve the fit. 相似文献

8.

Comparison of four common data collection techniques to elicit preferences

Pasquale Anselmi Luigi Fabbris Maria Cristiana Martini Egidio Robusto 《Quality and Quantity》2018,52(3):1227-1239

We compare four common data collection techniques to elicit preferences: the rating of items, the ranking of items, the partitioning of a given amount of points among items, and a reduced form of the technique for comparing items in pairs. University students were randomly assigned a questionnaire employing one of the four techniques. All questionnaires incorporated the same collection of items. The data collected with the four techniques were converted into analogous preference matrices, and analyzed with the Bradley–Terry model. The techniques were evaluated with respect to the fit to the model, the precision and reliability of the item estimates, and the consistency among the produced item sequences. The rating, ranking and budget partitioning techniques performed similarly, whereas the reduced pair comparisons technique performed a little worse. The item sequence produced by the rating technique was very close to the sequence obtained averaging over the three other techniques. 相似文献

9.

Testing structural equation models: the impact of error variances in the data generating process

Randi Hammervold Ulf Henning Olsson 《Quality and Quantity》2012,46(5):1547-1570

Yet another paper on fit measures? To our knowledge, very few papers discuss how fit measures are affected by error variance in the Data Generating Process (DGP). The present paper deals with this. Based upon an extensive simulation study, this paper shows that the effects of increased error variance differ significantly for various fit measures. In addition to error variance the effects depend on sample size and severity of misspecification. The findings confirm the general notion that good fit as measured by the chi-square, RMSEA and GFI etc. does not necessarily mean that the model is correctly specified and reliable. One finding is that the chi square test may give support to misspecified models in situations with a high level of error variance in the DGP, for small sample sizes. Another finding is that the chi-square test looses power also for large sample sizes when the model is negligible misspecified. Other results include incremental fit indices as NFI and RFI which prove to be more informative indicators under these circumstances. At the end of the paper we formulate some guidelines for use of different fit measures. 相似文献

10.

A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data

Ting Hsiang Lin 《Quality and Quantity》2010,44(2):277-287

This study investigated the performance of multiple imputations with Expectation-Maximization (EM) algorithm and Monte Carlo Markov chain (MCMC) method in missing data imputation. We compared the accuracy of imputation based on some real data and set up two extreme scenarios and conducted both empirical and simulation studies to examine the effects of missing data rates and number of items used for imputation. In the empirical study, the scenario represented item of highest missing rate from a domain with fewest items. In the simulation study, we selected a domain with most items and the item imputed has lowest missing rate. In the empirical study, the results showed there was no significant difference between EM algorithm and MCMC method for item imputation, and number of items used for imputation has little impact, either. Compared with the actual observed values, the middle responses of 3 and 4 were over-imputed, and the extreme responses of 1, 2 and 5 were under-represented. The similar patterns occurred for domain imputation, and no significant difference between EM algorithm and MCMC method and number of items used for imputation has little impact. In the simulation study, we chose environmental domain to examine the effect of the following variables: EM algorithm and MCMC method, missing data rates, and number of items used for imputation. Again, there was no significant difference between EM algorithm and MCMC method. The accuracy rates did not significantly reduce with increase in the proportions of missing data. Number of items used for imputation has some contribution to accuracy of imputation, but not as much as expected. 相似文献

11.

Mixture cumulative count control chart for mixture geometric process characteristics

Muhammad Younas Majeed Muhammad Aslam Muhammad Riaz 《Quality and Quantity》2013,47(4):2289-2307

A statistical process control chart named the mixture cumulative count control chart (MCCC-chart) is suggested in this study, motivated by an existing control chart named cumulative count control chart (CCC-chart). The MCCC-chart is constructed based on the distribution function of a two component mixture of geometric distributions using the number of items inspected until a defective item is observed ‘n’ as plotting statistics. We have observed that the MCCC-chart has the ability to perform equivalent to the CCC-chart when number of defective items follows geometric distribution and better than the CCC-chart when the number of defective items produced by a process follows a mixture geometric model. The MCCC-chart may be considered as a generalized version of CCC-chart. 相似文献

12.

An alternative procedure for taking a random sample

Hugo C. Hamaker 《Statistica Neerlandica》1975,29(2):63-66

Summary The procedure proposed consists in going through the population to be sampled item by item deciding each time with probability p whether the item at hand shall be incorporated in the sample. The "distances" between successive items in the sample will then form a random sample from a geometric distribution. A series of these random distances can easily be produced on a computer and can be conveniently used for taking the sample required. In some cases this method may have its advantages over the conventional use of a table of random numbers. 相似文献

13.

Using Remote Sensing for Agricultural Statistics 总被引：7，自引：0，他引：7

Elisabetta Carfagna F. Javier Gallego 《Revue internationale de statistique》2005,73(3):389-404

Remote sensing can be a valuable tool for agricultural statistics when area frames or multiple frames are used. At the design level, remote sensing typically helps in the definition of sampling units and the stratification, but can also be exploited to optimise the sample allocation and size of sampling units. At the estimator level, classified satellite images are generally used as auxiliary variables in a regression estimator or for estimators based on confusion matrixes. The most often used satellite images are LANDSAT-TM and SPOT-XS. In general, classified or photo-interpreted images should not be directly used to estimate crop areas because the proportion of pixels classified into the specific crop is often strongly biased. Vegetation indexes computed from satellite images can give in some cases a good indication of the potential crop yield. 相似文献

14.

Quality and quantity in the construction and validation of a psychological test for the assessment and selection of aspiring volunteer rescuers: the action-research in an Italian health association

Riccardo Sartori Andrea Ceschi Serena Cubico Giuseppe Favretto 《Quality and Quantity》2014,48(6):3037-3051

相似文献

15.

Goodness of fit in confirmatory factor analysis: The effects of sample size and model parsimony

Herbert W. Marsh John Balla 《Quality and Quantity》1994,28(2):185-217

The purpose of the present investigation is to examine the influence of sample size (N) and model parsimony on a set of 22 goodness-of-fit indices including those typically used in confirmatory factor analysis and some recently developed indices. For sample data simulated from two known population data structures, values for 6 of 22 fit indices were reasonably independent ofN and were not significantly affected by estimating parameters known to have zero values in the population: two indices based on noncentrality described by McDonald (1989; McDonald and Marsh, 1990), a relative (incremental) index based on noncentrality (Bentler, 1990; McDonald & Marsh, 1990), unbiased estimates of LISREL's GFI and AGFI (Joreskog & Sorbom, 1981) presented by Steiger (1989, 1990) that are based on noncentrality, and the widely known relative index developed by Tucker and Lewis (1973). Penalties for model complexity designed to control for sampling fluctuations and to address the inevitable compromise between goodness of fit and model parsimony were evaluated. 相似文献

16.

Basic Process Capability Indices: An Expository Review

Mohammed Z. Anis 《Revue internationale de statistique》2008,76(3):347-367

A review of the four basic process capability indices has been made. The interrelationship among these indices has been highlighted. Attention has been drawn to their drawbacks. The relation of these indices to the proportion nonconforming has been dwelt upon and the requirement of the adequate sample size has been emphasized. Cautionary remarks on the use of these indices in the case of nonnormal distributions, skewed distributions, and autocorrelated data are also presented. The effect of measurement error on process capability indices has been dealt with in great detail. 相似文献

17.

Bootstrap and Asymptotic Tests of Long-run Relationships in Cointegrated Systems

Stefano Fachin 《Oxford bulletin of economics and statistics》2000,62(4):543-551

Hypothesis testing on cointegrating vectors based on the asymptotic distributions of the test statistics are known to suffer from severe small sample size distortion. In this paper an alternative bootstrap procedure is proposed and evaluated through a Monte Carlo experiment, finding that the Type I errors are close to the nominal signficance levels but power might be not entirely adequate. It is then shown that a combined test based on the outcomes of both the asymptotic and the bootstrap tests will have both correct size and low Type II error, therefore improving the currently available procedures. 相似文献

18.

Hierarchical Models for the Analysis of Likert Scales in Regression and Item Response Analysis

Gerhard Tutz 《Revue internationale de statistique》2021,89(1):18-35

Appropriate modelling of Likert‐type items should account for the scale level and the specific role of the neutral middle category, which is present in most Likert‐type items that are in common use. Powerful hierarchical models that account for both aspects are proposed. To avoid biased estimates, the models separate the neutral category when modelling the effects of explanatory variables on the outcome. The main model that is propagated uses binary response models as building blocks in a hierarchical way. It has the advantage that it can be easily extended to include response style effects and non‐linear smooth effects of explanatory variables. By simple transformation of the data, available software for binary response variables can be used to fit the model. The proposed hierarchical model can be used to investigate the effects of covariates on single Likert‐type items and also for the analysis of a combination of items. For both cases, estimation tools are provided. The usefulness of the approach is illustrated by applying the methodology to a large data set. 相似文献

19.

Factors affecting the variability of IRT equating coefficients

下载免费PDF全文

Michela Battauz 《Statistica Neerlandica》2015,69(2):85-101

Knowing the effect of the factors that can influence the variability of the equating coefficients is an important tool for the development of the linkage plans. This paper explores the effect of various factors on the variability of item response theory equating coefficients. The factors studied are the sample size, the number of common items, the length of the chain, and the possibility of averaging the equating transformations related to different paths that connect the same two forms. Both asymptotic and simulations results are provided. 相似文献

20.

局部平稳性未知条件下STAR模型的线性性检验

张凌翔张晓峒《数量经济技术经济研究》2012,(1):100-117,134

本文讨论了局部随机游走STAR模型、局部随机趋势STAR模型的线性性检验问题,构造了Wald类检验统计量,推导出了这些统计量的极限分布,并分析了这些统计量有限样本下的统计特性;本文提出了在局部平稳性未知的条件下,进行STAR模型的线性性检验方法,构建了稳健的检验统计量。检验功效与检验水平分析表明,该统计量具有良好的检验水平及较高的检验功效。相似文献