Q-Learning: Flexible Learning About Useful Utilities

Statistics in Biosciences - 2014

Erica E. M. Moodie¹, Nema Dean², Yue Sun³

¹Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada

²School of Mathematics and Statistics, University of Glasgow, Glasgow, Scotland, UK

³Department of Mathematics and Statistics, School of Computer Science, McGill University, Montreal, Quebec, Canada

Tóm tắt

Từ khóa

Tài liệu tham khảo

Chakraborty B (2011) Dynamic treatment regimes for managing chronic health conditions: A statistical perspective. Am J Publ Health 101(1):40–45

Chakraborty B, Laber EB, Zhao Y (2013) Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme (submitted)

Chakraborty B, Moodie EEM (2013) Estimating optimal dynamic treatment regimes with shared decision rules across stages: An extension of Q-learning (submitted)

Chakraborty B, Murphy SA, Strecher V (2010) Inference for non-regular parameters in optimal dynamic treatment regimes. Stat Methods Med Res 19(3):317–343

Fava M, Rush AJ, Trivedi MH, Nierenberg AA, Thase ME, Sackeim HA, Quitkin FM, Wisniewski S, Lavori PW, Rosenbaum JF, Kupfer DJ (2003) Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study. Psychiatr Clin North Am 26(2):457–494

Golub G, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21:215–224

Hastie T, Tibshirani R (1986) Generalized additive models. Stat Sci 1(3):297–318

Hastie T, Tibshirani R (1990) Generalized additive models. Chapman & Hall, London

Huang X, Ning J (2012) Analysis of multi-stage treatments for recurrent diseases. Stat Med 31:2805–2821

Li KC (1987) Asymptotic optimality of C p , C L , cross-validation and generalized cross-validation: Discrete index set. Ann Stat 15:958–975

Moodie EEM, Chakraborty B, Kramer MS (2012) Q-learning for estimating optimal dynamic treatment rules from observational data. Can J Stat 40:629–645

Moodie EEM, Richardson TS (2010) Estimating optimal dynamic regimes: Correcting bias under the null. Scand J Stat 37:126–146

Murphy SA, Oslin DW, Rush AJ, Zhu J (2007) Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology 32:257–262

Murphy SA (2005) A generalization error for Q-learning. J Mach Learn Res 6:1073–1097

Nahum-Shani I, Qian M, Almirall D, Pelham WE, Gnagy B, Fabiano GA, Waxmonsky JG, Yu J, Murphy SA (2012) Q-Learning: A data analysis method for constructing adaptive interventions. Psychol Methods 17:478–494

R Core Team (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0

Robins JM, Hernán MA, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11:550–560

Robins JM (2004) Optimal structural nested models for optimal sequential decisions. In: Lin DY, Heagerty P (eds) Proceedings of the second Seattle symposium on biostatistics. Springer, New York, pp 189–326

Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55

Rosthoj S, Fullwood C, Henderson R, Stewart S (2006) Estimation of optimal dynamic anticoagulation regimes from observational data: A regret-based approach. Stat Med 25:4197–4215

Schneider LS, Tariot PN, Lyketsos CG, Dagerman KS, Davis KL, Davis S (2001) National institute of mental health clinical antipsychotic trials of intervention effectiveness (CATIE): Alzheimer disease trial methodology. Am J Geriatr Psychiatry 9:346–360

Shortreed SM, Moodie EEM (2012) Estimating the optimal dynamic antipsychotic treatment regime: Evidence from the sequential-multiple assignment randomized CATIE schizophrenia study. J R Stat Soc, Ser B, Stat Methodol 61:577–599

Song R, Wang W, Zeng D, Kosorok MR (2013) Penalized Q-learning for dynamic treatment regimes (submitted)

Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. MIT Press, Cambridge

Thall PF, Millikan RE, Sung HG (2000) Evaluating multiple treatment courses in clinical trials. Stat Med 30:1011–1128

Thall PF, Sung HG, Estey EH (2002) Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. J Am Stat Assoc 97(457):29–39

Topol E (2012) Creative destruction of medicine: How the digital revolution and personalized medicine will create better health care. Basic Books, New York

Wood SN (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99(467):673–686

Wood SN (2006) Generalized additive models: An introduction with R. Chapman & Hall, London

Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc B 73(1):3–36

Xin J, Chakraborty B, Laber EB (2012) qLearn: Estimation and inference for Q-learning. R package version 1.0

Zhao Y, Kosorok MR, Zeng D (2009) Reinforcement learning design for cancer clinical trials. Stat Med 28:3294–3315

Zhao Y, Zeng D, Socinski MA, Kosorok MR (2011) Reinforcement learning strategies for clinical trials in non-small cell lung cancer. Biometrics 67(4):1422–1433

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA