An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening – impact on reviewer-relevant outcomesBMC Medical Research Methodology - Tập 20 - Trang 1-14 - 2020
C. Hamel, S. E. Kelly, K. Thavorn, D. B. Rice, G. A. Wells, B. Hutton
Systematic reviews often require substantial resources, partially due to the large number of records identified during searching. Although artificial intelligence may not be ready to fully replace human reviewers, it may accelerate and reduce the screening burden. Using DistillerSR (May 2020 release), we evaluated the performance of the prioritization simulation tool to determine the reduction in screening burden and time savings. Using a true recall @ 95%, response sets from 10 completed systematic reviews were used to evaluate: (i) the reduction of screening burden; (ii) the accuracy of the prioritization algorithm; and (iii) the hours saved when a modified screening approach was implemented. To account for variation in the simulations, and to introduce randomness (through shuffling the references), 10 simulations were run for each review. Means, standard deviations, medians and interquartile ranges (IQR) are presented. Among the 10 systematic reviews, using true recall @ 95% there was a median reduction in screening burden of 47.1% (IQR: 37.5 to 58.0%). A median of 41.2% (IQR: 33.4 to 46.9%) of the excluded records needed to be screened to achieve true recall @ 95%. The median title/abstract screening hours saved using a modified screening approach at a true recall @ 95% was 29.8 h (IQR: 28.1 to 74.7 h). This was increased to a median of 36 h (IQR: 32.2 to 79.7 h) when considering the time saved not retrieving and screening full texts of the remaining 5% of records not yet identified as included at title/abstract. Among the 100 simulations (10 simulations per review), none of these 5% of records were a final included study in the systematic review. The reduction in screening burden to achieve true recall @ 95% compared to @ 100% resulted in a reduced screening burden median of 40.6% (IQR: 38.3 to 54.2%). The prioritization tool in DistillerSR can reduce screening burden. A modified or stop screening approach once a true recall @ 95% is achieved appears to be a valid method for rapid reviews, and perhaps systematic reviews. This needs to be further evaluated in prospective reviews using the estimated recall.
A cautionary tale: an evaluation of the performance of treatment switching adjustment methods in a real world case studyBMC Medical Research Methodology -
Nicholas R. Latimer, Alice Dewdney, Marco Campioni
Abstract
Background
Treatment switching in randomised controlled trials (RCTs) is a problem for health technology assessment when substantial proportions of patients switch onto effective treatments that would not be available in standard clinical practice. Often statistical methods are used to adjust for switching: these can be applied in different ways, and performance has been assessed in simulation studies, but not in real-world case studies. We assessed the performance of adjustment methods described in National Institute for Health and Care Excellence Decision Support Unit Technical Support Document 16, applying them to an RCT comparing panitumumab to best supportive care (BSC) in colorectal cancer, in which 76% of patients randomised to BSC switched onto panitumumab. The RCT resulted in intention-to-treat hazard ratios (HR) for overall survival (OS) of 1.00 (95% confidence interval [CI] 0.82–1.22) for all patients, and 0.99 (95% CI 0.75–1.29) for patients with wild-type KRAS (Kirsten rat sarcoma virus).
Methods
We tested several applications of inverse probability of censoring weights (IPCW), rank preserving structural failure time models (RPSFTM) and simple and complex two-stage estimation (TSE) to estimate treatment effects that would have been observed if BSC patients had not switched onto panitumumab. To assess the performance of these analyses we ascertained the true effectiveness of panitumumab based on: (i) subsequent RCTs of panitumumab that disallowed treatment switching; (ii) studies of cetuximab that disallowed treatment switching, (iii) analyses demonstrating that only patients with wild-type KRAS benefit from panitumumab. These sources suggest the true OS HR for panitumumab is 0.76–0.77 (95% CI 0.60–0.98) for all patients, and 0.55–0.73 (95% CI 0.41–0.93) for patients with wild-type KRAS.
Results
Some applications of IPCW and TSE provided treatment effect estimates that closely matched the point-estimates and CIs of the expected truths. However, other applications produced estimates towards the boundaries of the expected truths, with some TSE applications producing estimates that lay outside the expected true confidence intervals. The RPSFTM performed relatively poorly, with all applications providing treatment effect estimates close to 1, often with extremely wide confidence intervals.
Conclusions
Adjustment analyses may provide unreliable results. How each method is applied must be scrutinised to assess reliability.
Laplace approximation, penalized quasi-likelihood, and adaptive Gauss–Hermite quadrature for generalized linear mixed models: towards meta-analysis of binary outcome with sparse dataBMC Medical Research Methodology - Tập 20 - Trang 1-11 - 2020
Ke Ju, Lifeng Lin, Haitao Chu, Liang-Liang Cheng, Chang Xu
In meta-analyses of a binary outcome, double zero events in some studies cause a critical methodology problem. The generalized linear mixed model (GLMM) has been proposed as a valid statistical tool for pooling such data. Three parameter estimation methods, including the Laplace approximation (LA), penalized quasi-likelihood (PQL) and adaptive Gauss–Hermite quadrature (AGHQ) were frequently used in the GLMM. However, the performance of GLMM via these estimation methods is unclear in meta-analysis with zero events. A simulation study was conducted to compare the performance. We fitted five random-effects GLMMs and estimated the results through the LA, PQL and AGHQ methods, respectively. Each scenario conducted 20,000 simulation iterations. The data from Cochrane Database of Systematic Reviews were collected to form the simulation settings. The estimation methods were compared in terms of the convergence rate, bias, mean square error, and coverage probability. Our results suggested that when the total events were insufficient in either of the arms, the GLMMs did not show good point estimation to pool studies of rare events. The AGHQ method did not show better properties than the LA estimation in terms of convergence rate, bias, coverage, and possibility to produce very large odds ratios. In addition, although the PQL had some advantages, it was not the preferred option due to its low convergence rate in some situations, and the suboptimal point and variance estimation compared to the LA. The GLMM is an alternative for meta-analysis of rare events and is especially useful in the presence of zero-events studies, while at least 10 total events in both arms is recommended when employing GLMM for meta-analysis. The penalized quasi-likelihood and adaptive Gauss–Hermite quadrature are not superior to the Laplace approximation for rare events and thus they are not recommended.
Immortal time bias for life-long conditions in retrospective observational studies using electronic health recordsBMC Medical Research Methodology - - 2022
Freya Tyrer, Krishnan Bhaskaran, Mark J. Rutherford
Abstract
Background
Immortal time bias is common in observational studies but is typically described for pharmacoepidemiology studies where there is a delay between cohort entry and treatment initiation.
Methods
This study used the Clinical Practice Research Datalink (CPRD) and linked national mortality data in England from 2000 to 2019 to investigate immortal time bias for a specific life-long condition, intellectual disability. Life expectancy (Chiang’s abridged life table approach) was compared for 33,867 exposed and 980,586 unexposed individuals aged 10+ years using five methods: (1) treating immortal time as observation time; (2) excluding time before date of first exposure diagnosis; (3) matching cohort entry to first exposure diagnosis; (4) excluding time before proxy date of inputting first exposure diagnosis (by the physician); and (5) treating exposure as a time-dependent measure.
Results
When not considered in the design or analysis (Method 1), immortal time bias led to disproportionately high life expectancy for the exposed population during the first calendar period (additional years expected to live: 2000–2004: 65.6 [95% CI: 63.6,67.6]) compared to the later calendar periods (2005–2009: 59.9 [58.8,60.9]; 2010–2014: 58.0 [57.1,58.9]; 2015–2019: 58.2 [56.8,59.7]). Date of entry of diagnosis (Method 4) was unreliable in this CPRD cohort. The final methods (Method 2, 3 and 5) appeared to solve the main theoretical problem but residual bias may have remained.
Conclusions
We conclude that immortal time bias is a significant issue for studies of life-long conditions that use electronic health record data and requires careful consideration of how clinical diagnoses are entered onto electronic health record systems.
Modelling seizure rates rather than time to an event within clinical trials of antiepileptic drugsBMC Medical Research Methodology - - 2020
Laura J. Bonnett, Jane L. Hutton, Anthony G. Marson
Predictive models within epilepsy are frequently developed via Cox’s proportional hazards models. These models estimate risk of a specified event such as 12-month remission. They are relatively simple to produce, have familiar output, and are useful to answer questions about short-term prognosis. However, the Cox model only considers time to first event rather than all seizures after starting treatment for example. This makes assessing change in seizure rates over time difficult. Variants to the Cox model exist enabling recurrent events to be modelled. One such variant is the Prentice, Williams and Peterson – Total Time (PWP-TT) model. An alternative is the negative binomial model for event counts. This study aims to demonstrate the differences between the three approaches, and to consider the benefits of the PWP-TT approach for assessing change in seizure rates over time. Time to 12-month remission and time to first seizure after randomisation were modelled using the Cox model. Risk of seizure recurrence was modelled using the PWP-TT model, including all seizures across the whole follow-up period. Seizure counts were modelled using negative binomial regression. Differences between the approaches were demonstrated using participants recruited to the UK-based multi-centre Standard versus New Antiepileptic Drug (SANAD) study. Results from the PWP-TT model were similar to those from the conventional Cox and negative binomial models. In general, the direction of effect was consistent although the variables included in the models and the significance of the predictors varied. The confidence intervals obtained via the PWP-TT model tended to be narrower due to the increase in statistical power of the model. The Cox model is useful for determining the initial response to treatment and potentially informing when the next intervention may be required. The negative binomial model is useful for modelling event counts. The PWP-TT model extends the Cox model to all included events. This is useful in determining the longer-term effects of treatment policy. Such a model should be considered when designing future clinical trials in medical conditions typified by recurrent events to improve efficiency and statistical power as well as providing evidence regarding changes in event rates over time.
Using web conferencing to engage Aboriginal and Torres Strait Islander young people in research: a feasibility studyBMC Medical Research Methodology - Tập 21 - Trang 1-8 - 2021
Kate Anderson, Alana Gall, Tamara Butler, Brian Arley, Kirsten Howard, Alan Cass, Gail Garvey
While web conferencing technologies are being widely used in communication and collaboration, their uptake in conducting research field work has been relatively slow. The benefits that these technologies offer researchers for engaging with hard-to-reach populations are beginning to be recognised, however, the acceptability and feasibility of using web conferencing technology to engage Aboriginal and Torres Strait Islander young people in research is unknown. This study aims to evaluate whether the use of web conferencing to engage Aboriginal and Torres Strait Islander young people in research is an acceptable and feasible alternative to conventional face-to-face methods. Aboriginal and Torres Strait Islander young people aged between 18 and 24 years were recruited via emails, flyers and snowballing to participate in an Online Yarning Circle (OYC) about wellbeing conducted via web conferencing. Five young Aboriginal and Torres Strait Islander Australians were trained as peer facilitators and each conducted one or more OYCs with support from an experienced Aboriginal and Torres Strait Islander researcher. The OYCs were recorded and the researchers conducted post-OYC interviews with the facilitators. OYC recordings, facilitator interviews and researchers’ reflections about the method were analysed to assess acceptability and feasibility for use with this population. 11 OYCs were conducted with 21 participants. The evaluation focused on (a) acceptability of the method for participants and facilitators and (b) feasibility of data collection method and procedures for use in research. Our evaluation revealed good acceptability and feasibility of the method, with only minor challenges experienced, which were predominantly logistical in nature and related to scheduling, obtaining documentation of consent, and technical issues. These challenges were offset by the greater control over the level of engagement that was comfortable for individual participants and the greater ease with which they felt they could withdraw from participating. This shift in the traditional researcher-participant power dynamic was recognised by both participants and peer facilitators and was regarded as a support for Aboriginal and Torres Strait Islander young people’s participation in research. The use of web conferencing to engage Aboriginal and Torres Strait Islander young people in research offers an acceptable and feasible alternative to face-to-face research methods. The benefits conferred by these technologies associated with yielding greater control and power to the research participant has broad relevance to research with marginalised populations.
A demonstration of using formal consensus methods within guideline development; a case studyBMC Medical Research Methodology - - 2021
Patrice Carter, Katriona O’Donoghue, Katharina Dworzynski, Laura E. O’Shea, Victoria Louise Roberts, Tim Reeves, Anastasios Bastounis, M. A. Mugglestone, Joe Fawke, Stephen Pilling
Abstract
Background
Recommendations within guidelines are developed by synthesising the best available evidence; when limited evidence is identified recommendations are generally based on informal consensus. However, there are potential biases in group decision making, and formal consensus methods may help reduce these.
Methods
We conducted a case study using formal consensus, to develop one set of recommendations within the Neonatal Parenteral Nutrition guideline being produced for the National Institute for Health and Care Excellence. Statements were generated through identification of published guidelines on several topics relating to neonatal parenteral nutrition. Ten high quality guidelines were included, and 28 statements were generated; these statements were rated by the committee via two rounds of voting. The statements which resulted in agreement were then used to develop the recommendations.
Results
The approach was systematic and provided transparency. Additionally, a number of lessons were learnt; including the value of selecting the appropriate topic, giving adequate time to the process, and ensuring methodologies are understood by the committee for their value and relevance.
Conclusion
Formal consensus is a valuable option for use within guideline development when specific criteria are met. The approach provides transparent methodology, ensuring clarity on how recommendations are developed.
Comparison of statistical models for estimating intervention effects based on time-to-recurrent-event in stepped wedge cluster randomized trial using open cohort designBMC Medical Research Methodology - Tập 22 - Trang 1-18 - 2022
Shunsuke Oyamada, Shih-Wei Chiu, Takuhiro Yamaguchi
There are currently no methodological studies on the performance of the statistical models for estimating intervention effects based on the time-to-recurrent-event (TTRE) in stepped wedge cluster randomised trial (SWCRT) using an open cohort design. This study aims to address this by evaluating the performance of these statistical models using an open cohort design with the Monte Carlo simulation in various settings and their application using an actual example. Using Monte Carlo simulations, we evaluated the performance of the existing extended Cox proportional hazard models, i.e., the Andersen-Gill (AG), Prentice-Williams-Peterson Total-Time (PWP-TT), and Prentice-Williams-Peterson Gap-time (PWP-GT) models, using the settings of several event generation models and true intervention effects, with and without stratification by clusters. Unidirectional switching in SWCRT was represented using time-dependent covariates. Using Monte Carlo simulations with the various described settings, in situations where inter-individual variability do not exist, the PWP-GT model with stratification by clusters showed the best performance in most settings and reasonable performance in the others. The only situation in which the performance of the PWP-TT model with stratification by clusters was not inferior to that of the PWP-GT model with stratification by clusters was when there was a certain amount of follow-up period, and the timing of the trial entry was random within the trial period, including the follow-up period. In situations where inter-individual variability existed, the PWP-GT model consistently underperformed compared to the PWP-TT model. The AG model performed well only in a specific setting. By analysing actual examples, it was found that almost all the statistical models suggested that the risk of events during the intervention condition may be somewhat higher than in the control, although the difference was not statistically significant. When estimating the TTRE-based intervention effects of SWCRT in various settings using an open cohort design, the PWP-GT model with stratification by clusters performed most reasonably in situations where inter-individual variability was not present. However, if inter-individual variability was present, the PWP-TT model with stratification by clusters performed best.