共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we review statistical methods for analyzing developmental toxicity data. Such data raise a number of challenges. Models that try to accommodate the complex data generating mechanism of a developmental toxicity study, should take into account the litter effect and the number of viable fetuses, malformation indicators, weight and clustering, as a function of exposure. Further, the size of the litter may be related to outcomes among live fetuses. Scientific interest may be in inference about the dose effect, on implications of model misspecification, on assessment of model fit, and on the calculation of derived quantities such as safe limits, etc. We describe the relative merits of conditional, marginal and random-effects models for multivariate clustered binary data and present joint models for both continuous and discrete data. 相似文献
2.
Sy Han Chiou Chiung‐Yu Huang Gongjun Xu Jun Yan 《Revue internationale de statistique》2019,87(1):24-43
Panel count data arise in many applications when the event history of a recurrent event process is only examined at a sequence of discrete time points. In spite of the recent methodological developments, the availability of their software implementations has been rather limited. Focusing on a practical setting where the effects of some time‐independent covariates on the recurrent events are of primary interest, we review semiparametric regression modelling approaches for panel count data that have been implemented in R package spef . The methods are grouped into two categories depending on whether the examination times are associated with the recurrent event process after conditioning on covariates. The reviewed methods are illustrated with a subset of the data from a skin cancer clinical trial. 相似文献
3.
This article discusses modelling strategies for repeated measurements of multiple response variables. Such data arise in the context of categorical variables where one can select more than one of the categories as the response. We consider each of the multiple responses as a binary outcome and use a marginal (or population‐averaged) modelling approach to analyse its means. Generalized estimating equations are used to account for different correlation structures, both over time and between items. We also discuss an alternative approach using a generalized linear mixed model with conditional interpretations. We illustrate the methods using data from a panel study in Australia called the Household, Income, and Labour Dynamics Survey. 相似文献
4.
This article surveys various strategies for modeling ordered categorical (ordinal) response variables when the data have some type of clustering, extending a similar survey for binary data by Pendergast, Gange, Newton, Lindstrom, Palta & Fisher (1996). An important special case is when repeated measurement occurs at various occasions for each subject, such as in longitudinal studies. A much greater variety of models and fitting methods are available than when a similar survey for repeated ordinal response data was prepared a decade ago (Agresti, 1989). The primary emphasis of the review is on two classes of models, marginal models for which effects are averaged over all clusters at particular levels of predictors, and cluster-specific models for which effects apply at the cluster level. We present the two types of models in the ordinal context, review the literature for each, and discuss connections between them. Then, we summarize some alternative modeling approaches and ways of estimating parameters, including a Bayesian approach. We also discuss applications and areas likely to be popular for future research, such as ways of handling missing data and ways of modeling agreement and evaluating the accuracy of diagnostic tests. Finally, we review the current availability of software for using the methods discussed in this article. 相似文献
5.
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation. 相似文献
6.
A local maximum likelihood estimator based on Poisson regression is presented as well as its bias, variance and asymptotic
distribution. This semiparametric estimator is intended to be an alternative to the Poisson, negative binomial and zero-inflated
Poisson regression models that does not depend on regularity conditions and model specification accuracy. Some simulation
results are presented. The use of the local maximum likelihood procedure is illustrated on one example from the literature.
This procedure is found to perform well.
This research was partially supported by Calouste Gulbenkian Foundation and PRODEP III. 相似文献
7.
文章讨论了样本数据缺失情形下泊松过程的强度估计和检验问题。用极大似然估计、矩估计法和最小二乘估计法对强度进行估计,分别得出了极大似然估计强度的迭代方法,矩估计值及最小二乘估计值。证明了矩估计值和最小二乘估计值的无偏性和相合性,导出了其统计量的极限分布。最后,对两个Poisson过程的差异进行了假设检验同时给出渐近置信区间。 相似文献
8.
Typically, a Poisson model is assumed for count data. In many cases, there are many zeros in the dependent variable, thus the mean is not equal to the variance value of the dependent variable. Therefore, Poisson model is not suitable anymore for this kind of data because of too many zeros. Thus, we suggest using a hurdle‐generalized Poisson regression model. Furthermore, the response variable in such cases is censored for some values because of some big values. A censored hurdle‐generalized Poisson regression model is introduced on count data with many zeros in this paper. The estimation of regression parameters using the maximum likelihood method is discussed and the goodness‐of‐fit for the regression model is examined. An example and a simulation will be used to illustrate the effects of right censoring on the parameter estimation and their standard errors. 相似文献
9.
Estimation in the interval censoring model is considered. A class of smooth functionals is introduced, of which the mean is an example. The asymptotic information lower bound for such functionals can be represented as an inner product of two functions. In case 1, i.e. one observation time per unobservable event time, both functions can be given explicitly. We mainly consider case 2, with two observation times for each unobservable event time, in the situation that the observation times can not become arbitrarily close to each other. For case 2, one of the functions in the inner product can only be given implicitly as solution to a Fredholm integral equation. We study properties of this solution and, in a sequel to this paper, prove that the nonparametric maximum likelihood estimator of the functional asymptotically reaches the information lower bound. 相似文献
10.
A broad class of generalized linear mixed models, e.g. variance components models for binary data, percentages or count data, will be introduced by incorporating additional random effects into the linear predictor of a generalized linear model structure. Parameters are estimated by a combination of quasi-likelihood and iterated MINQUE (minimum norm quadratic unbiased estimation), the latter being numerically equivalent to REML (restricted, or residual, maximum likelihood). First, conditional upon the additional random effects, observations on a working variable and weights are derived by quasi-likelihood, using iteratively re-weighted least squares. Second, a linear mixed model is fitted to the working variable, employing the weights for the residual error terms, by iterated MINQUE. The latter may be regarded as a least squares procedure applied to squared and product terms of error contrasts derived from the working variable. No full distributional assumptions are needed for estimation. The model may be fitted with standardly available software for weighted regression and REML. 相似文献
11.
Pooling of data is often carried out to protect privacy or to save cost, with the claimed advantage that it does not lead to much loss of efficiency. We argue that this does not give the complete picture as the estimation of different parameters is affected to different degrees by pooling. We establish a ladder of efficiency loss for estimating the mean, variance, skewness and kurtosis, and more generally multivariate joint cumulants, in powers of the pool size. The asymptotic efficiency of the pooled data non‐parametric/parametric maximum likelihood estimator relative to the corresponding unpooled data estimator is reduced by a factor equal to the pool size whenever the order of the cumulant to be estimated is increased by one. The implications of this result are demonstrated in case–control genetic association studies with interactions between genes. Our findings provide a guideline for the discriminate use of data pooling in practice and the assessment of its relative efficiency. As exact maximum likelihood estimates are difficult to obtain if the pool size is large, we address briefly how to obtain computationally efficient estimates from pooled data and suggest Gaussian estimation and non‐parametric maximum likelihood as two feasible methods. 相似文献
12.
We study the generalized bootstrap technique under general sampling designs. We focus mainly on bootstrap variance estimation but we also investigate the empirical properties of bootstrap confidence intervals obtained using the percentile method. Generalized bootstrap consists of randomly generating bootstrap weights so that the first two (or more) design moments of the sampling error are tracked by the corresponding bootstrap moments. Most bootstrap methods in the literature can be viewed as special cases. We discuss issues such as the choice of the distribution used to generate bootstrap weights, the choice of the number of bootstrap replicates, and the potential occurrence of negative bootstrap weights. We first describe the generalized bootstrap for the linear Horvitz‐Thompson estimator and then consider non‐linear estimators such as those defined through estimating equations. We also develop two ways of bootstrapping the generalized regression estimator of a population total. We study in greater depth the case of Poisson sampling, which is often used to select samples in Price Index surveys conducted by national statistical agencies around the world. For Poisson sampling, we consider a pseudo‐population approach and show that the resulting bootstrap weights capture the first three design moments of the sampling error. A simulation study and an example with real survey data are used to illustrate the theory. 相似文献
13.
In this paper, we introduce a new algorithm for estimating non-negative parameters from Poisson observations of a linear transformation of the parameters. The proposed objective function fits both a weighted least squares (WLS) and a minimum χ2 estimation framework, and results in a convex optimization problem. Unlike conventional WLS methods, the weights do not need to be estimated from the datas, but are incorporated in the objective function. The iterative algorithm is derived from an alternating projection procedure in which "distance" is determined by the chi-squared test statistic, which is interpreted as a measure of the discrepancy between two distributions. This may be viewed as an alternative to the Kullback-Leibler divergence which corresponds to the maximum likelihood (ML) estimation. The algorithm is similar in form to, and shares many properties with, the expectation maximization algorithm for ML estimation. In particular, we show that every limit point of the algorithm is an estimator, and the sequence of projected (by the linear transformation into the data space) means converge. Despite the similarities, we show that the new estimators are quite distinct from ML estimators, and obtain conditions under which they are identical. 相似文献
14.
Estimation in the interval censoring model is considered. A class of smooth functionals is introduced, of which the mean is an example. We consider case 2, with two observation times for each unobservable event time, in the situation that the observation times cannot become arbitrarily close to each other. It is proved that the nonparametric maximum likelihood estimator of the functional asymptotically reaches the information lower bound. 相似文献
15.
Estimation with longitudinal Y having nonignorable dropout is considered when the joint distribution of Y and covariate X is nonparametric and the dropout propensity conditional on (Y,X) is parametric. We apply the generalised method of moments to estimate the parameters in the nonignorable dropout propensity based on estimating equations constructed using an instrument Z, which is part of X related to Y but unrelated to the dropout propensity conditioned on Y and other covariates. Population means and other parameters in the nonparametric distribution of Y can be estimated based on inverse propensity weighting with estimated propensity. To improve efficiency, we derive a model‐assisted regression estimator making use of information provided by the covariates and previously observed Y‐values in the longitudinal setting. The model‐assisted regression estimator is protected from model misspecification and is asymptotically normal and more efficient when the working models are correct and some other conditions are satisfied. The finite‐sample performance of the estimators is studied through simulation, and an application to the HIV‐CD4 data set is also presented as illustration. 相似文献
16.
17.
Comparing occurrence rates of events of interest in science, business, and medicine is an important topic. Because count data are often under‐reported, we desire to account for this error in the response when constructing interval estimators. In this article, we derive a Bayesian interval for the difference of two Poisson rates when counts are potentially under‐reported. The under‐reporting causes a lack of identifiability. Here, we use informative priors to construct a credible interval for the difference of two Poisson rate parameters with under‐reported data. We demonstrate the efficacy of our new interval estimates using a real data example. We also investigate the performance of our newly derived Bayesian approach via simulation and examine the impact of various informative priors on the new interval. 相似文献
18.
In this paper we investigate two-sample U -statistics in the case of clusters of repeated measurements observed on individuals from independent populations. The observations on the i -th individual in the first population are denoted by , 1 ≤ i ≤ m , and those on the k -th individual in the second population are denoted by , 1 ≤ k ≤ n . Given the kernel φ ( x , y ), we define the generalized two-sample U -statistic by
We derive the asymptotic distribution of U m , n for large sample sizes. As an application we study the generalized Mann–Whitney–Wilcoxon rank sum test for clustered data. 相似文献
We derive the asymptotic distribution of U
19.
Julia Plass Marco E.G.V. Cattaneo Thomas Augustin Georg Schollmeyer Christian Heumann 《Revue internationale de statistique》2019,87(3):580-603
In most surveys, one is confronted with missing or, more generally, coarse data. Traditional methods dealing with these data require strong, untestable and often doubtful assumptions, for example, coarsening at random. But due to the resulting, potentially severe bias, there is a growing interest in approaches that only include tenable knowledge about the coarsening process, leading to imprecise but reliable results. In this spirit, we study regression analysis with a coarse categorical‐dependent variable and precisely observed categorical covariates. Our (profile) likelihood‐based approach can incorporate weak knowledge about the coarsening process and thus offers a synthesis of traditional methods and cautious strategies refraining from any coarsening assumptions. This also allows a discussion of the uncertainty about the coarsening process, besides sampling uncertainty and model uncertainty. Our procedure is illustrated with data of the panel study ‘Labour market and social security' conducted by the Institute for Employment Research, whose questionnaire design produces coarse data. 相似文献
20.
Vanessa Didelez 《Statistica Neerlandica》2002,56(3):330-345
ML–estimation of regression parameters with incomplete covariate information usually requires a distributional assumption regarding the concerned covariates that implies a source of misspecification. Semiparametric procedures avoid such assumptions at the expense of efficiency. In this paper a simulation study with small sample size is carried out to get an idea of the performance of the ML–estimator under misspecification and to compare it with the semiparametric procedures when the former is based on a correct assumption. The results show that there is only a little gain by correct parametric assumptions, which does not justify the possibly large bias when the assumptions are not met. Additionally, a simple modification of the complete case estimator appears to be nearly semiparametric efficient. 相似文献