Journal of the Royal Statistical Society. Series C: Applied Statistics
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
A parametric model is developed and fitted to English league and cup football data from 1992 to 1995. The model is motivated by an aim to exploit potential inefficiencies in the association football betting market, and this is examined using bookmakers’ odds from 1995 to 1996. The technique is based on a Poisson regression model but is complicated by the data structure and the dynamic nature of teams’ performances. Maximum likelihood estimates are shown to be computationally obtainable, and the model is shown to have a positive return when used as the basis of a betting strategy.
We present a survival analysis of Soay sheep mark recapture and recovery data. Unlike previous conditional analyses, it is not necessary to assume equality of recovery and recapture probabilities; instead these are estimated by maximum likelihood. Male and female sheep are treated separately, with the higher numbers and survival probabilities of the females resulting in a more complex model than that used for the males. In both cases, however, age and time aspects need to be included and there is a strong indication of a reduction in survival for sheep aged 7 years or more. Time variation in survival is related to the size of the population and selected weather variables, by using logistic regression. The size of the population significantly affects the survival probabilities of male and female lambs, and of female sheep aged 7 or more years. March rainfall and a measure of the North Atlantic oscillation are found to influence survival significantly for all age groups considered, for both males and females. Either of these weather variables can be used in a model. Several phenotypic and genotypic individual covariates are also fitted. The only covariate which is found to influence survival significantly is the type of horn of first-year female sheep. There is a substantial variation in the recovery probabilities over time, reflecting in part the increased effort when a population crash was expected. The goodness of fit of the model is checked by using graphical procedures.
A two-phase design has been widely used in epidemiological studies of dementia. The first phase assesses a large sample with screening tests. The second, based on the screening test results and possibly on other observed patient’s factors, selects a subset of the study sample for a more definitive disease verification assessment. In comparing the accuracies of two screening tests in a two-phase study of dementia, inferences are commonly made from a sample of verified cases. The omission of non-verified cases can seriously bias comparison results. To correct for this bias, we derive the maximum likelihood (ML) estimators for the accuracies of two screening tests and their corresponding correlation. The p-values and confidence intervals are computed using the asymptotic normality of the ML estimators. Our method is used to compare the accuracies of two screening tests in a two-phase epidemiological study of dementia. We found that, although the sensitivities of the new and standard screening tests in detecting a diseased subject are not different, the new screening test performs better in detecting a non-diseased subject.
A modelling approach to optimize a multiresponse system is presented. The approach aims to identify the setting of the input variables to maximize the degree of overall satisfaction with respect to all the responses. An exponential desirability functional form is suggested to simplify the desirability function assessment process. The approach proposed does not require any assumptions regarding the form or degree of the estimated response models and is robust to the potential dependences between response variables. It also takes into consideration the difference in the predictive ability as well as relative priority among the response variables. Properties of the approach are revealed via two real examples—one classical example taken from the literature and another that the authors have encountered in the steel industry.
A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y, as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random-effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton–Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.
Treatment response heterogeneity poses serious challenges for selecting treatment for many diseases. To understand this heterogeneity better and to help in determining the best patient-specific treatments for a given disease, many clinical trials are collecting large amounts of patient level data before administering treatment in the hope that some of these data can be used to identify moderators of treatment effect. These data can range from simple scalar values to complex functional data such as curves or images. Combining these various types of baseline data to discover ‘biosignatures’ of treatment response is crucial for advancing precision medicine. Motivated by the problem of selecting optimal treatment for subjects with depression based on clinical and neuroimaging data, we present an approach that both identifies covariates associated with differential treatment effect and estimates a treatment decision rule based on these covariates. We focus on settings where there is a potentially large collection of candidate biomarkers consisting of both scalar and functional data. The validity of the approach proposed is justified via extensive simulation experiments and illustrated by using data from a placebo-controlled clinical trial investigating antidepressant treatment response in subjects with depression.
- 1
- 2
- 3