Statistical Methods in Medical Research
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Marginal structural models are causal models designed to adjust for time-dependent confounders in observational studies with dynamically adjusted treatments. They are robust tools to assess causality in complex longitudinal data. In this paper, a marginal structural model is proposed with an innovative dose-delay joint-exposure model for Inverse-Probability-of-Treatment Weighted estimation of the causal effect of alterations to the therapy intensity. The model is motivated by a precise clinical question concerning the possibility of reducing dosages in a regimen. It is applied to data from a randomised trial of chemotherapy in osteosarcoma, an aggressive primary bone-tumour. Chemotherapy data are complex because their longitudinal nature encompasses many clinical details like composition and organisation of multi-drug regimens, or dynamical therapy adjustments. This manuscript focuses on the clinical dynamical process of adjusting the therapy according to the patient’s toxicity history, and the causal effect on the outcome of interest of such therapy modifications. Depending on patients’ toxicity levels, variations to therapy intensity may be achieved by physicians through the allocation of either a reduction or a delay of the next planned dose. Thus, a negative feedback is present between exposure to cytotoxic agents and toxicity levels, which acts as time-dependent confounders. The construction of the model is illustrated highlighting the high complexity and entanglement of chemotherapy data. Built to address dosage reductions, the model also shows that delays in therapy administration should be avoided. The last aspect makes sense from the cytological point of view, but it is seldom addressed in the literature.
Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.
Health economic evaluations have recently become an important part of the clinical and medical research process and have built upon more advanced statistical decision-theoretic foundations. In some contexts, it is officially required that uncertainty about both parameters and observable variables be properly taken into account, increasingly often by means of Bayesian methods. Among these, probabilistic sensitivity analysis has assumed a predominant role. The objective of this article is to review the problem of health economic assessment from the standpoint of Bayesian statistical decision theory with particular attention to the philosophy underlying the procedures for sensitivity analysis.
The assumption of positivity or experimental treatment assignment requires that observed treatment levels vary within confounder strata. This article discusses the positivity assumption in the context of assessing model and parameter-specific identifiability of causal effects. Positivity violations occur when certain subgroups in a sample rarely or never receive some treatments of interest. The resulting sparsity in the data may increase bias with or without an increase in variance and can threaten valid inference. The parametric bootstrap is presented as a tool to assess the severity of such threats and its utility as a diagnostic is explored using simulated and real data. Several approaches for improving the identifiability of parameters in the presence of positivity violations are reviewed. Potential responses to data sparsity include restriction of the covariate adjustment set, use of an alternative projection function to define the target parameter within a marginal structural working model, restriction of the sample, and modification of the target intervention. All of these approaches can be understood as trading off proximity to the initial target of inference for identifiability; we advocate approaching this tradeoff systematically.
Propensity score methods are a part of the standard toolkit for applied researchers who wish to ascertain causal effects from observational data. While they were originally developed for binary treatments, several researchers have proposed generalizations of the propensity score methodology for non-binary treatment regimes. Such extensions have widened the applicability of propensity score methods and are indeed becoming increasingly popular themselves. In this article, we closely examine two methods that generalize propensity scores in this direction, namely, the propensity function (PF), and the generalized propensity score (GPS), along with two extensions of the GPS that aim to improve its robustness. We compare the assumptions, theoretical properties, and empirical performance of these methods. On a theoretical level, the GPS and its extensions are advantageous in that they are designed to estimate the full dose response function rather than the average treatment effect that is estimated with the PF. We compare GPS with a new PF method, both of which estimate the dose response function. We illustrate our findings and proposals through simulation studies, including one based on an empirical study about the effect of smoking on healthcare costs. While our proposed PF-based estimator preforms well, we generally advise caution in that all available methods can be biased by model misspecification and extrapolation.
The time-dependent receiver operating characteristic curve is often used to study the diagnostic accuracy of a single continuous biomarker, measured at baseline, on the onset of a disease condition when the disease onset may occur at different times during the follow-up and hence may be right censored. Due to right censoring, the true disease onset status prior to the pre-specified time horizon may be unknown for some patients, which causes difficulty in calculating the time-dependent sensitivity and specificity. We propose to estimate the time-dependent sensitivity and specificity by weighting the censored data by the conditional probability of disease onset prior to the time horizon given the biomarker, the observed time to event, and the censoring indicator, with the weights calculated nonparametrically through a kernel regression on time to event. With this nonparametric weighting adjustment, we derive a novel, closed-form formula to calculate the area under the time-dependent receiver operating characteristic curve. We demonstrate through numerical study and theoretical arguments that the proposed method is insensitive to misspecification of the kernel bandwidth, produces unbiased and efficient estimators of time-dependent sensitivity and specificity, the area under the curve, and other estimands from the receiver operating characteristic curve, and outperforms several other published methods currently implemented in R packages.
In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave- j-out techniques.
- 1
- 2
- 3
- 4